Thread: wrong query result with jit_above_cost= 0

wrong query result with jit_above_cost= 0

From

Rushabh Lathia

Date:

26 June 2018, 21:20:32

Hi,

I found the below query which returns the wrong output
when jit_above_cost= 0 is set.

Steps to reproduce:

CREATE TABLE emp (
    epno           NUMERIC(4),
    ename           VARCHAR(10),
    job             VARCHAR(9),
    mgr             NUMERIC(4),
    hiredate        DATE,
    sal             NUMERIC(7,2),
    comm            NUMERIC(7,2),
    deptno          NUMERIC(2)
);

INSERT INTO emp VALUES (7369,'SMITH','CLERK',7902,'17-DEC-80',800,NULL,20);
INSERT INTO emp VALUES (7499,'ALLEN','SALESMAN',7698,'20-FEB-81',1600,300,30);

set jit_above_cost= 0;

select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc;

without the ROLLUP, I don't see any problem with results.

Thanks,

Rushabh Lathia

www.EnterpriseDB.com

Re: wrong query result with jit_above_cost= 0

From

Andres Freund

Date:

26 June 2018, 21:23:06

Hi,

On 2018-06-26 23:50:32 +0530, Rushabh Lathia wrote:
> I found the below query which returns the wrong output
> when jit_above_cost= 0 is set.
> 
> Steps to reproduce:
> 
> CREATE TABLE emp (
>     epno           NUMERIC(4),
>     ename           VARCHAR(10),
>     job             VARCHAR(9),
>     mgr             NUMERIC(4),
>     hiredate        DATE,
>     sal             NUMERIC(7,2),
>     comm            NUMERIC(7,2),
>     deptno          NUMERIC(2)
> );
> 
> INSERT INTO emp VALUES (7369,'SMITH','CLERK',7902,'17-DEC-80',800,NULL,20);
> INSERT INTO emp VALUES
> (7499,'ALLEN','SALESMAN',7698,'20-FEB-81',1600,300,30);
> 
> set jit_above_cost= 0;
> 
> select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc;
> 
> without the ROLLUP, I don't see any problem with results.

Interesting.  I've opened an open item referencing this.

Greetings,

Andres Freund

Re: wrong query result with jit_above_cost= 0

From

Dmitry Dolgov

Date:

26 June 2018, 23:09:10

> On 26 June 2018 at 20:23, Andres Freund <andres@anarazel.de> wrote:
> Hi,
>
> On 2018-06-26 23:50:32 +0530, Rushabh Lathia wrote:
>> I found the below query which returns the wrong output
>> when jit_above_cost= 0 is set.
>>
>> Steps to reproduce:
>>
>> CREATE TABLE emp (
>>     epno           NUMERIC(4),
>>     ename           VARCHAR(10),
>>     job             VARCHAR(9),
>>     mgr             NUMERIC(4),
>>     hiredate        DATE,
>>     sal             NUMERIC(7,2),
>>     comm            NUMERIC(7,2),
>>     deptno          NUMERIC(2)
>> );
>>
>> INSERT INTO emp VALUES (7369,'SMITH','CLERK',7902,'17-DEC-80',800,NULL,20);
>> INSERT INTO emp VALUES (7499,'ALLEN','SALESMAN',7698,'20-FEB-81',1600,300,30);
>>
>> set jit_above_cost= 0;
>>
>> select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc;
>>
>> without the ROLLUP, I don't see any problem with results.
>
> Interesting.  I've opened an open item referencing this.

Hi,

Just out of curiosity, what exactly is wrong with the output of this query? I
see the same results with jit_above_cost = 0 and with the default value:

=# show jit_above_cost;
 jit_above_cost
----------------
 100000
(1 row)

=# select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc;
 max
------
 7369
 7499
 7499
(3 rows)

=# set jit_above_cost = 0;
SET
=# select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc;
 max
------
 7369
 7499
 7499
(3 rows)

And as far as I understand it's totally correct, since we do rollup by just two
values and have one more row as a total (with NULLs):

=# select max(epno), deptno, epno
   from emp group by rollup((deptno,epno)) order by 1 asc;

 max  | deptno | epno
------+--------+------
 7369 |     20 | 7369
 7499 |   NULL | NULL
 7499 |     30 | 7499
(3 rows)

Re: wrong query result with jit_above_cost= 0

From

Andres Freund

Date:

26 June 2018, 23:11:37

On 2018-06-26 22:09:10 +0200, Dmitry Dolgov wrote:
> > On 26 June 2018 at 20:23, Andres Freund <andres@anarazel.de> wrote:
> > Hi,
> >
> > On 2018-06-26 23:50:32 +0530, Rushabh Lathia wrote:
> >> I found the below query which returns the wrong output
> >> when jit_above_cost= 0 is set.
> >>
> >> Steps to reproduce:
> >>
> >> CREATE TABLE emp (
> >>     epno           NUMERIC(4),
> >>     ename           VARCHAR(10),
> >>     job             VARCHAR(9),
> >>     mgr             NUMERIC(4),
> >>     hiredate        DATE,
> >>     sal             NUMERIC(7,2),
> >>     comm            NUMERIC(7,2),
> >>     deptno          NUMERIC(2)
> >> );
> >>
> >> INSERT INTO emp VALUES (7369,'SMITH','CLERK',7902,'17-DEC-80',800,NULL,20);
> >> INSERT INTO emp VALUES (7499,'ALLEN','SALESMAN',7698,'20-FEB-81',1600,300,30);
> >>
> >> set jit_above_cost= 0;
> >>
> >> select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc;
> >>
> >> without the ROLLUP, I don't see any problem with results.
> >
> > Interesting.  I've opened an open item referencing this.
> 
> Hi,
> 
> Just out of curiosity, what exactly is wrong with the output of this query? I
> see the same results with jit_above_cost = 0 and with the default value:
> 
> =# show jit_above_cost;
>  jit_above_cost
> ----------------
>  100000
> (1 row)
> 
> =# select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc;
>  max
> ------
>  7369
>  7499
>  7499
> (3 rows)
> 
> =# set jit_above_cost = 0;
> SET
> =# select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc;
>  max
> ------
>  7369
>  7499
>  7499
> (3 rows)
> 
> And as far as I understand it's totally correct, since we do rollup by just two
> values and have one more row as a total (with NULLs):
> 
> =# select max(epno), deptno, epno
>    from emp group by rollup((deptno,epno)) order by 1 asc;
> 
>  max  | deptno | epno
> ------+--------+------
>  7369 |     20 | 7369
>  7499 |   NULL | NULL
>  7499 |     30 | 7499
> (3 rows)

I've not reproduced the problem yet (I'm deep in a review / edit of
another patchset). Could it be that you've not compiled with JIT
support and thus don't see the problem Rushab was complaining about?
SELECT pg_jit_available();

Greetings,

Andres Freund

Re: wrong query result with jit_above_cost= 0

From

Dmitry Dolgov

Date:

26 June 2018, 23:21:19

> On 26 June 2018 at 22:11, Andres Freund <andres@anarazel.de> wrote:
> On 2018-06-26 22:09:10 +0200, Dmitry Dolgov wrote:
>> > On 26 June 2018 at 20:23, Andres Freund <andres@anarazel.de> wrote:
>> > Hi,
>> >
>> > On 2018-06-26 23:50:32 +0530, Rushabh Lathia wrote:
>> >> I found the below query which returns the wrong output
>> >> when jit_above_cost= 0 is set.
>> >>
>> >> Steps to reproduce:
>> >>
>> >> CREATE TABLE emp (
>> >>     epno           NUMERIC(4),
>> >>     ename           VARCHAR(10),
>> >>     job             VARCHAR(9),
>> >>     mgr             NUMERIC(4),
>> >>     hiredate        DATE,
>> >>     sal             NUMERIC(7,2),
>> >>     comm            NUMERIC(7,2),
>> >>     deptno          NUMERIC(2)
>> >> );
>> >>
>> >> INSERT INTO emp VALUES (7369,'SMITH','CLERK',7902,'17-DEC-80',800,NULL,20);
>> >> INSERT INTO emp VALUES (7499,'ALLEN','SALESMAN',7698,'20-FEB-81',1600,300,30);
>> >>
>> >> set jit_above_cost= 0;
>> >>
>> >> select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc;
>> >>
>> >> without the ROLLUP, I don't see any problem with results.
>> >
>> > Interesting.  I've opened an open item referencing this.
>>
>> Hi,
>>
>> Just out of curiosity, what exactly is wrong with the output of this query? I
>> see the same results with jit_above_cost = 0 and with the default value:
>>
>> =# show jit_above_cost;
>>  jit_above_cost
>> ----------------
>>  100000
>> (1 row)
>>
>> =# select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc;
>>  max
>> ------
>>  7369
>>  7499
>>  7499
>> (3 rows)
>>
>> =# set jit_above_cost = 0;
>> SET
>> =# select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc;
>>  max
>> ------
>>  7369
>>  7499
>>  7499
>> (3 rows)
>>
>> And as far as I understand it's totally correct, since we do rollup by just two
>> values and have one more row as a total (with NULLs):
>>
>> =# select max(epno), deptno, epno
>>    from emp group by rollup((deptno,epno)) order by 1 asc;
>>
>>  max  | deptno | epno
>> ------+--------+------
>>  7369 |     20 | 7369
>>  7499 |   NULL | NULL
>>  7499 |     30 | 7499
>> (3 rows)
>
> I've not reproduced the problem yet (I'm deep in a review / edit of
> another patchset). Could it be that you've not compiled with JIT
> support and thus don't see the problem Rushab was complaining about?
> SELECT pg_jit_available();

Yep, my bad, forgot to turn it on. Now I see what's the problem, one of the
null fields is screwed up, will try to figure out why is that.

Re: wrong query result with jit_above_cost= 0

From

Andrew Gierth

Date:

26 June 2018, 23:55:07

>>>>> "Dmitry" == Dmitry Dolgov <9erthalion6@gmail.com> writes:

 Dmitry> Yep, my bad, forgot to turn it on. Now I see what's the
 Dmitry> problem, one of the null fields is screwed up, will try to
 Dmitry> figure out why is that.

The handling of nulls in grouping set results is a bit icky, see
prepare_projection_slot in nodeAgg.c. The comment there lists a number
of assumptions which may or may not hold true under JIT which might give
a starting point to look for problems. (Unfortunately I'm not currently
in a position to test on a JIT build)

(This wasn't my first choice of approach; originally I had a GroupedVar
node that evaluated either as the Var would or to NULL instead, and put
that into the projection. However there was a lot of resistance to this
approach at the time, and the new node idea was abandoned in favour of
poking directly into the slot.)

-- 
Andrew (irc:RhodiumToad)

Re: wrong query result with jit_above_cost= 0

From

Andres Freund

Date:

26 June 2018, 23:56:43

On 2018-06-26 21:55:07 +0100, Andrew Gierth wrote:
> >>>>> "Dmitry" == Dmitry Dolgov <9erthalion6@gmail.com> writes:
> 
>  Dmitry> Yep, my bad, forgot to turn it on. Now I see what's the
>  Dmitry> problem, one of the null fields is screwed up, will try to
>  Dmitry> figure out why is that.
> 
> The handling of nulls in grouping set results is a bit icky, see
> prepare_projection_slot in nodeAgg.c. The comment there lists a number
> of assumptions which may or may not hold true under JIT which might give
> a starting point to look for problems. (Unfortunately I'm not currently
> in a position to test on a JIT build)

I probably just screwed up a bit of code generation. I can't see any of
the more fundamental assumptions being changed by the way JITing is
done.

Greetings,

Andres Freund

Re: wrong query result with jit_above_cost= 0

From

Dmitry Dolgov

Date:

27 June 2018, 18:49:39

> On 26 June 2018 at 22:56, Andres Freund <andres@anarazel.de> wrote:
> On 2018-06-26 21:55:07 +0100, Andrew Gierth wrote:
>> >>>>> "Dmitry" == Dmitry Dolgov <9erthalion6@gmail.com> writes:
>>
>>  Dmitry> Yep, my bad, forgot to turn it on. Now I see what's the
>>  Dmitry> problem, one of the null fields is screwed up, will try to
>>  Dmitry> figure out why is that.
>>
>> The handling of nulls in grouping set results is a bit icky, see
>> prepare_projection_slot in nodeAgg.c. The comment there lists a number
>> of assumptions which may or may not hold true under JIT which might give
>> a starting point to look for problems. (Unfortunately I'm not currently
>> in a position to test on a JIT build)
>
> I probably just screwed up a bit of code generation. I can't see any of
> the more fundamental assumptions being changed by the way JITing is
> done.

So far I found out that in agg_retrieve_hash_table, when there is a scan for
TupleHashEntryData, that contains AggStatePerGroup structure in the field
"additional", it's possible to get some garbage data (or at least transValue is
lost). It happens when we do:

ReScanExprContext(aggstate->aggcontexts[i]);

in agg_retrieve_direct before that. Apparently, the reason is that in the jit
code there is a store operation for curaggcontext into aggcontext:

v_aggcontext = l_ptr_const(op->d.agg_trans.aggcontext,
                           l_ptr(StructExprContext));

/* set aggstate globals */
LLVMBuildStore(b, v_aggcontext, v_curaggcontext);

I haven't found anything similar in the original code or in the other branches
for aggregation logic. I can't say that I fully understand the idea behind it,
but at least it was suspicious for me. When I removed this operation, the
problem has disappeared.

Re: wrong query result with jit_above_cost= 0

From

Dmitry Dolgov

Date:

07 July 2018, 22:41:05

> On Wed, 27 Jun 2018 at 17:49, Dmitry Dolgov <9erthalion6@gmail.com> wrote:
>
> > On 26 June 2018 at 22:56, Andres Freund <andres@anarazel.de> wrote:
> > On 2018-06-26 21:55:07 +0100, Andrew Gierth wrote:
> >> >>>>> "Dmitry" == Dmitry Dolgov <9erthalion6@gmail.com> writes:
> >>
> >>  Dmitry> Yep, my bad, forgot to turn it on. Now I see what's the
> >>  Dmitry> problem, one of the null fields is screwed up, will try to
> >>  Dmitry> figure out why is that.
> >>
> >> The handling of nulls in grouping set results is a bit icky, see
> >> prepare_projection_slot in nodeAgg.c. The comment there lists a number
> >> of assumptions which may or may not hold true under JIT which might give
> >> a starting point to look for problems. (Unfortunately I'm not currently
> >> in a position to test on a JIT build)
> >
> > I probably just screwed up a bit of code generation. I can't see any of
> > the more fundamental assumptions being changed by the way JITing is
> > done.
>
> So far I found out that in agg_retrieve_hash_table, when there is a scan for
> TupleHashEntryData, that contains AggStatePerGroup structure in the field
> "additional", it's possible to get some garbage data (or at least transValue is
> lost). It happens when we do:
>
> ReScanExprContext(aggstate->aggcontexts[i]);
>
> in agg_retrieve_direct before that. Apparently, the reason is that in the jit
> code there is a store operation for curaggcontext into aggcontext:
>
> v_aggcontext = l_ptr_const(op->d.agg_trans.aggcontext,
>                            l_ptr(StructExprContext));
>
> /* set aggstate globals */
> LLVMBuildStore(b, v_aggcontext, v_curaggcontext);
>
> I haven't found anything similar in the original code or in the other branches
> for aggregation logic. I can't say that I fully understand the idea behind it,
> but at least it was suspicious for me. When I removed this operation, the
> problem has disappeared.

Ok, looks like I found the issue. v_aggcontext & v_curaggcontext have nothing
to do here (this change was just masking a problem by changing a memory context
so that the wrong one will never be used). The problem was that in the
llvmjit_expr in AGG_INIT_TRANS section we need to assign a current memory
context from op->d.agg_init_trans.aggcontext (see the attached patch),
otherwise we'll get in the situation when current memory context is hashcontext
instead of aggcontext.

Also, I found one suspicious thing, in AGG_PLAIN_TRANS section we don't
switch the memory context back in the branch with ExecAggTransReparent. I
never found any consequences of that, but just in case I believe it makes sense
to do so.

And the last thing - where can I find a documentation about how to properly
apply patches for GDB & perf support to llvm? I remember they were posted here,
and found some of them here [1] from Andres, but apparently part of them was
already applied on top of llvm. Looks like for the gdb support I need to apply
0006-ORC-JIT-event-listener-support (since there is a gdb listener mentioned
there), but with this patch I have an error:

    error: ‘ObjHandleT’ was not declared in this scope

So I'm confused how should it be?

[1]: https://www.postgresql.org/message-id/20171005065739.dgsplipwkpmrkspg%40alap3.anarazel.de

Attachment

Re: wrong query result with jit_above_cost= 0

From

Andres Freund

Date:

23 July 2018, 03:09:37

Hi,

On 2018-07-07 21:41:05 +0200, Dmitry Dolgov wrote:
> Ok, looks like I found the issue. v_aggcontext & v_curaggcontext have nothing
> to do here (this change was just masking a problem by changing a memory context
> so that the wrong one will never be used). The problem was that in the
> llvmjit_expr in AGG_INIT_TRANS section we need to assign a current memory
> context from op->d.agg_init_trans.aggcontext (see the attached patch),
> otherwise we'll get in the situation when current memory context is hashcontext
> instead of aggcontext.

Nice catch! I pushed your fix, but I also made it set current_set.


> Also, I found one suspicious thing, in AGG_PLAIN_TRANS section we don't
> switch the memory context back in the branch with ExecAggTransReparent. I
> never found any consequences of that, but just in case I believe it makes sense
> to do so.

I'll look at that next.


> And the last thing - where can I find a documentation about how to properly
> apply patches for GDB & perf support to llvm? I remember they were posted here,
> and found some of them here [1] from Andres, but apparently part of them was
> already applied on top of llvm. Looks like for the gdb support I need to apply
> 0006-ORC-JIT-event-listener-support (since there is a gdb listener mentioned
> there), but with this patch I have an error:
> 
>     error: ‘ObjHandleT’ was not declared in this scope
> 
> So I'm confused how should it be?

I've merged the GDB part into LLVM, and am about to merge the perf part
too. I plan to push a fix to PG adapting it to use the agreed upon /
merged LLVM APIs. Then you'll just need a recent LLVM
checkout. Unfortunately the relevant LLVM internal APIs have changed
quite rapidly over the last few releases (and a lot within individual
releases), so it's not easy to provide a patch for the individual versions.

Greetings,

Andres Freund

Re: wrong query result with jit_above_cost= 0

From

Andres Freund

Date:

23 July 2018, 07:24:27

Hi,

On 2018-07-22 17:09:37 -0700, Andres Freund wrote:
> > Also, I found one suspicious thing, in AGG_PLAIN_TRANS section we don't
> > switch the memory context back in the branch with ExecAggTransReparent. I
> > never found any consequences of that, but just in case I believe it makes sense
> > to do so.
> 
> I'll look at that next.

That's indeed necessary. Pushed. Thanks!

> 
> > And the last thing - where can I find a documentation about how to properly
> > apply patches for GDB & perf support to llvm? I remember they were posted here,
> > and found some of them here [1] from Andres, but apparently part of them was
> > already applied on top of llvm. Looks like for the gdb support I need to apply
> > 0006-ORC-JIT-event-listener-support (since there is a gdb listener mentioned
> > there), but with this patch I have an error:
> > 
> >     error: ‘ObjHandleT’ was not declared in this scope
> > 
> > So I'm confused how should it be?
> 
> I've merged the GDB part into LLVM, and am about to merge the perf part
> too. I plan to push a fix to PG adapting it to use the agreed upon /
> merged LLVM APIs. Then you'll just need a recent LLVM
> checkout. Unfortunately the relevant LLVM internal APIs have changed
> quite rapidly over the last few releases (and a lot within individual
> releases), so it's not easy to provide a patch for the individual versions.

Adapted the PG bits.

Greetings,

Andres Freund