Re: Parallel Full Hash Join - Mailing list pgsql-hackers

From Melanie Plageman
Subject Re: Parallel Full Hash Join
Date
Msg-id CAAKRu_Yvpd08MvXO3yeevmXeqykw+Fbh02VVefWuvqKyg6j8iQ@mail.gmail.com
Whole thread Raw
In response to Re: Parallel Full Hash Join  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Parallel Full Hash Join
List pgsql-hackers
On Sat, Apr 8, 2023 at 1:30 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
>
> On Sat, Apr 8, 2023 at 12:33 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > Thomas Munro <thomas.munro@gmail.com> writes:
> > > I committed the main patch.
> >
> > BTW, it was easy to miss in all the buildfarm noise from
> > last-possible-minute patches, but chimaera just showed something
> > that looks like a bug in this code [1]:
> >
> > 2023-04-08 12:25:28.709 UTC [18027:321] pg_regress/join_hash LOG:  statement: savepoint settings;
> > 2023-04-08 12:25:28.709 UTC [18027:322] pg_regress/join_hash LOG:  statement: set local
max_parallel_workers_per_gather= 2; 
> > 2023-04-08 12:25:28.710 UTC [18027:323] pg_regress/join_hash LOG:  statement: explain (costs off)
> >              select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
> > 2023-04-08 12:25:28.710 UTC [18027:324] pg_regress/join_hash LOG:  statement: select  count(*) from simple r full
outerjoin simple s on (r.id = 0 - s.id); 
> > TRAP: failed Assert("BarrierParticipants(&batch->batch_barrier) == 1"), File: "nodeHash.c", Line: 2118, PID: 19147

So, after staring at this for awhile, I suspect this assertion is just
plain wrong. BarrierArriveAndDetachExceptLast() contains this code:

    if (barrier->participants > 1)
    {
        --barrier->participants;
        SpinLockRelease(&barrier->mutex);

        return false;
    }
    Assert(barrier->participants == 1);

So in between this assertion and the one we tripped,

    if (!BarrierArriveAndDetachExceptLast(&batch->batch_barrier))
    {
    ...
        return false;
    }

    /* Now we are alone with this batch. */
    Assert(BarrierPhase(&batch->batch_barrier) == PHJ_BATCH_SCAN);
    Assert(BarrierParticipants(&batch->batch_barrier) == 1);

Another worker attached to the batch barrier, saw that it was in
PHJ_BATCH_SCAN, marked it done and detached. This is fine.
BarrierArriveAndDetachExceptLast() is meant to ensure no one waits
(deadlock hazard) and that at least one worker stays to do the unmatched
scan. It doesn't hurt anything for another worker to join and find out
there is no work to do.

We should simply delete this assertion.

- Melanie



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Direct I/O
Next
From: Tom Lane
Date:
Subject: Re: longfin missing gssapi_ext.h