Thread: wrong query result with jit_above_cost= 0
Hi,
I found the below query which returns the wrong outputwhen jit_above_cost= 0 is set.
CREATE TABLE emp (
epno NUMERIC(4),
ename VARCHAR(10),
job VARCHAR(9),
mgr NUMERIC(4),
hiredate DATE,
sal NUMERIC(7,2),
comm NUMERIC(7,2),
deptno NUMERIC(2)
);
INSERT INTO emp VALUES (7369,'SMITH','CLERK',7902,'17-DEC-80',800,NULL,20);
INSERT INTO emp VALUES (7499,'ALLEN','SALESMAN',7698,'20-FEB-81',1600,300,30);
set jit_above_cost= 0;
select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc;
Hi, On 2018-06-26 23:50:32 +0530, Rushabh Lathia wrote: > I found the below query which returns the wrong output > when jit_above_cost= 0 is set. > > Steps to reproduce: > > CREATE TABLE emp ( > epno NUMERIC(4), > ename VARCHAR(10), > job VARCHAR(9), > mgr NUMERIC(4), > hiredate DATE, > sal NUMERIC(7,2), > comm NUMERIC(7,2), > deptno NUMERIC(2) > ); > > INSERT INTO emp VALUES (7369,'SMITH','CLERK',7902,'17-DEC-80',800,NULL,20); > INSERT INTO emp VALUES > (7499,'ALLEN','SALESMAN',7698,'20-FEB-81',1600,300,30); > > set jit_above_cost= 0; > > select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc; > > without the ROLLUP, I don't see any problem with results. Interesting. I've opened an open item referencing this. Greetings, Andres Freund
> On 26 June 2018 at 20:23, Andres Freund <andres@anarazel.de> wrote: > Hi, > > On 2018-06-26 23:50:32 +0530, Rushabh Lathia wrote: >> I found the below query which returns the wrong output >> when jit_above_cost= 0 is set. >> >> Steps to reproduce: >> >> CREATE TABLE emp ( >> epno NUMERIC(4), >> ename VARCHAR(10), >> job VARCHAR(9), >> mgr NUMERIC(4), >> hiredate DATE, >> sal NUMERIC(7,2), >> comm NUMERIC(7,2), >> deptno NUMERIC(2) >> ); >> >> INSERT INTO emp VALUES (7369,'SMITH','CLERK',7902,'17-DEC-80',800,NULL,20); >> INSERT INTO emp VALUES (7499,'ALLEN','SALESMAN',7698,'20-FEB-81',1600,300,30); >> >> set jit_above_cost= 0; >> >> select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc; >> >> without the ROLLUP, I don't see any problem with results. > > Interesting. I've opened an open item referencing this. Hi, Just out of curiosity, what exactly is wrong with the output of this query? I see the same results with jit_above_cost = 0 and with the default value: =# show jit_above_cost; jit_above_cost ---------------- 100000 (1 row) =# select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc; max ------ 7369 7499 7499 (3 rows) =# set jit_above_cost = 0; SET =# select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc; max ------ 7369 7499 7499 (3 rows) And as far as I understand it's totally correct, since we do rollup by just two values and have one more row as a total (with NULLs): =# select max(epno), deptno, epno from emp group by rollup((deptno,epno)) order by 1 asc; max | deptno | epno ------+--------+------ 7369 | 20 | 7369 7499 | NULL | NULL 7499 | 30 | 7499 (3 rows)
On 2018-06-26 22:09:10 +0200, Dmitry Dolgov wrote: > > On 26 June 2018 at 20:23, Andres Freund <andres@anarazel.de> wrote: > > Hi, > > > > On 2018-06-26 23:50:32 +0530, Rushabh Lathia wrote: > >> I found the below query which returns the wrong output > >> when jit_above_cost= 0 is set. > >> > >> Steps to reproduce: > >> > >> CREATE TABLE emp ( > >> epno NUMERIC(4), > >> ename VARCHAR(10), > >> job VARCHAR(9), > >> mgr NUMERIC(4), > >> hiredate DATE, > >> sal NUMERIC(7,2), > >> comm NUMERIC(7,2), > >> deptno NUMERIC(2) > >> ); > >> > >> INSERT INTO emp VALUES (7369,'SMITH','CLERK',7902,'17-DEC-80',800,NULL,20); > >> INSERT INTO emp VALUES (7499,'ALLEN','SALESMAN',7698,'20-FEB-81',1600,300,30); > >> > >> set jit_above_cost= 0; > >> > >> select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc; > >> > >> without the ROLLUP, I don't see any problem with results. > > > > Interesting. I've opened an open item referencing this. > > Hi, > > Just out of curiosity, what exactly is wrong with the output of this query? I > see the same results with jit_above_cost = 0 and with the default value: > > =# show jit_above_cost; > jit_above_cost > ---------------- > 100000 > (1 row) > > =# select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc; > max > ------ > 7369 > 7499 > 7499 > (3 rows) > > =# set jit_above_cost = 0; > SET > =# select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc; > max > ------ > 7369 > 7499 > 7499 > (3 rows) > > And as far as I understand it's totally correct, since we do rollup by just two > values and have one more row as a total (with NULLs): > > =# select max(epno), deptno, epno > from emp group by rollup((deptno,epno)) order by 1 asc; > > max | deptno | epno > ------+--------+------ > 7369 | 20 | 7369 > 7499 | NULL | NULL > 7499 | 30 | 7499 > (3 rows) I've not reproduced the problem yet (I'm deep in a review / edit of another patchset). Could it be that you've not compiled with JIT support and thus don't see the problem Rushab was complaining about? SELECT pg_jit_available(); Greetings, Andres Freund
> On 26 June 2018 at 22:11, Andres Freund <andres@anarazel.de> wrote: > On 2018-06-26 22:09:10 +0200, Dmitry Dolgov wrote: >> > On 26 June 2018 at 20:23, Andres Freund <andres@anarazel.de> wrote: >> > Hi, >> > >> > On 2018-06-26 23:50:32 +0530, Rushabh Lathia wrote: >> >> I found the below query which returns the wrong output >> >> when jit_above_cost= 0 is set. >> >> >> >> Steps to reproduce: >> >> >> >> CREATE TABLE emp ( >> >> epno NUMERIC(4), >> >> ename VARCHAR(10), >> >> job VARCHAR(9), >> >> mgr NUMERIC(4), >> >> hiredate DATE, >> >> sal NUMERIC(7,2), >> >> comm NUMERIC(7,2), >> >> deptno NUMERIC(2) >> >> ); >> >> >> >> INSERT INTO emp VALUES (7369,'SMITH','CLERK',7902,'17-DEC-80',800,NULL,20); >> >> INSERT INTO emp VALUES (7499,'ALLEN','SALESMAN',7698,'20-FEB-81',1600,300,30); >> >> >> >> set jit_above_cost= 0; >> >> >> >> select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc; >> >> >> >> without the ROLLUP, I don't see any problem with results. >> > >> > Interesting. I've opened an open item referencing this. >> >> Hi, >> >> Just out of curiosity, what exactly is wrong with the output of this query? I >> see the same results with jit_above_cost = 0 and with the default value: >> >> =# show jit_above_cost; >> jit_above_cost >> ---------------- >> 100000 >> (1 row) >> >> =# select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc; >> max >> ------ >> 7369 >> 7499 >> 7499 >> (3 rows) >> >> =# set jit_above_cost = 0; >> SET >> =# select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc; >> max >> ------ >> 7369 >> 7499 >> 7499 >> (3 rows) >> >> And as far as I understand it's totally correct, since we do rollup by just two >> values and have one more row as a total (with NULLs): >> >> =# select max(epno), deptno, epno >> from emp group by rollup((deptno,epno)) order by 1 asc; >> >> max | deptno | epno >> ------+--------+------ >> 7369 | 20 | 7369 >> 7499 | NULL | NULL >> 7499 | 30 | 7499 >> (3 rows) > > I've not reproduced the problem yet (I'm deep in a review / edit of > another patchset). Could it be that you've not compiled with JIT > support and thus don't see the problem Rushab was complaining about? > SELECT pg_jit_available(); Yep, my bad, forgot to turn it on. Now I see what's the problem, one of the null fields is screwed up, will try to figure out why is that.
>>>>> "Dmitry" == Dmitry Dolgov <9erthalion6@gmail.com> writes: Dmitry> Yep, my bad, forgot to turn it on. Now I see what's the Dmitry> problem, one of the null fields is screwed up, will try to Dmitry> figure out why is that. The handling of nulls in grouping set results is a bit icky, see prepare_projection_slot in nodeAgg.c. The comment there lists a number of assumptions which may or may not hold true under JIT which might give a starting point to look for problems. (Unfortunately I'm not currently in a position to test on a JIT build) (This wasn't my first choice of approach; originally I had a GroupedVar node that evaluated either as the Var would or to NULL instead, and put that into the projection. However there was a lot of resistance to this approach at the time, and the new node idea was abandoned in favour of poking directly into the slot.) -- Andrew (irc:RhodiumToad)
On 2018-06-26 21:55:07 +0100, Andrew Gierth wrote: > >>>>> "Dmitry" == Dmitry Dolgov <9erthalion6@gmail.com> writes: > > Dmitry> Yep, my bad, forgot to turn it on. Now I see what's the > Dmitry> problem, one of the null fields is screwed up, will try to > Dmitry> figure out why is that. > > The handling of nulls in grouping set results is a bit icky, see > prepare_projection_slot in nodeAgg.c. The comment there lists a number > of assumptions which may or may not hold true under JIT which might give > a starting point to look for problems. (Unfortunately I'm not currently > in a position to test on a JIT build) I probably just screwed up a bit of code generation. I can't see any of the more fundamental assumptions being changed by the way JITing is done. Greetings, Andres Freund
> On 26 June 2018 at 22:56, Andres Freund <andres@anarazel.de> wrote: > On 2018-06-26 21:55:07 +0100, Andrew Gierth wrote: >> >>>>> "Dmitry" == Dmitry Dolgov <9erthalion6@gmail.com> writes: >> >> Dmitry> Yep, my bad, forgot to turn it on. Now I see what's the >> Dmitry> problem, one of the null fields is screwed up, will try to >> Dmitry> figure out why is that. >> >> The handling of nulls in grouping set results is a bit icky, see >> prepare_projection_slot in nodeAgg.c. The comment there lists a number >> of assumptions which may or may not hold true under JIT which might give >> a starting point to look for problems. (Unfortunately I'm not currently >> in a position to test on a JIT build) > > I probably just screwed up a bit of code generation. I can't see any of > the more fundamental assumptions being changed by the way JITing is > done. So far I found out that in agg_retrieve_hash_table, when there is a scan for TupleHashEntryData, that contains AggStatePerGroup structure in the field "additional", it's possible to get some garbage data (or at least transValue is lost). It happens when we do: ReScanExprContext(aggstate->aggcontexts[i]); in agg_retrieve_direct before that. Apparently, the reason is that in the jit code there is a store operation for curaggcontext into aggcontext: v_aggcontext = l_ptr_const(op->d.agg_trans.aggcontext, l_ptr(StructExprContext)); /* set aggstate globals */ LLVMBuildStore(b, v_aggcontext, v_curaggcontext); I haven't found anything similar in the original code or in the other branches for aggregation logic. I can't say that I fully understand the idea behind it, but at least it was suspicious for me. When I removed this operation, the problem has disappeared.
> On Wed, 27 Jun 2018 at 17:49, Dmitry Dolgov <9erthalion6@gmail.com> wrote: > > > On 26 June 2018 at 22:56, Andres Freund <andres@anarazel.de> wrote: > > On 2018-06-26 21:55:07 +0100, Andrew Gierth wrote: > >> >>>>> "Dmitry" == Dmitry Dolgov <9erthalion6@gmail.com> writes: > >> > >> Dmitry> Yep, my bad, forgot to turn it on. Now I see what's the > >> Dmitry> problem, one of the null fields is screwed up, will try to > >> Dmitry> figure out why is that. > >> > >> The handling of nulls in grouping set results is a bit icky, see > >> prepare_projection_slot in nodeAgg.c. The comment there lists a number > >> of assumptions which may or may not hold true under JIT which might give > >> a starting point to look for problems. (Unfortunately I'm not currently > >> in a position to test on a JIT build) > > > > I probably just screwed up a bit of code generation. I can't see any of > > the more fundamental assumptions being changed by the way JITing is > > done. > > So far I found out that in agg_retrieve_hash_table, when there is a scan for > TupleHashEntryData, that contains AggStatePerGroup structure in the field > "additional", it's possible to get some garbage data (or at least transValue is > lost). It happens when we do: > > ReScanExprContext(aggstate->aggcontexts[i]); > > in agg_retrieve_direct before that. Apparently, the reason is that in the jit > code there is a store operation for curaggcontext into aggcontext: > > v_aggcontext = l_ptr_const(op->d.agg_trans.aggcontext, > l_ptr(StructExprContext)); > > /* set aggstate globals */ > LLVMBuildStore(b, v_aggcontext, v_curaggcontext); > > I haven't found anything similar in the original code or in the other branches > for aggregation logic. I can't say that I fully understand the idea behind it, > but at least it was suspicious for me. When I removed this operation, the > problem has disappeared. Ok, looks like I found the issue. v_aggcontext & v_curaggcontext have nothing to do here (this change was just masking a problem by changing a memory context so that the wrong one will never be used). The problem was that in the llvmjit_expr in AGG_INIT_TRANS section we need to assign a current memory context from op->d.agg_init_trans.aggcontext (see the attached patch), otherwise we'll get in the situation when current memory context is hashcontext instead of aggcontext. Also, I found one suspicious thing, in AGG_PLAIN_TRANS section we don't switch the memory context back in the branch with ExecAggTransReparent. I never found any consequences of that, but just in case I believe it makes sense to do so. And the last thing - where can I find a documentation about how to properly apply patches for GDB & perf support to llvm? I remember they were posted here, and found some of them here [1] from Andres, but apparently part of them was already applied on top of llvm. Looks like for the gdb support I need to apply 0006-ORC-JIT-event-listener-support (since there is a gdb listener mentioned there), but with this patch I have an error: error: ‘ObjHandleT’ was not declared in this scope So I'm confused how should it be? [1]: https://www.postgresql.org/message-id/20171005065739.dgsplipwkpmrkspg%40alap3.anarazel.de
Attachment
Hi, On 2018-07-07 21:41:05 +0200, Dmitry Dolgov wrote: > Ok, looks like I found the issue. v_aggcontext & v_curaggcontext have nothing > to do here (this change was just masking a problem by changing a memory context > so that the wrong one will never be used). The problem was that in the > llvmjit_expr in AGG_INIT_TRANS section we need to assign a current memory > context from op->d.agg_init_trans.aggcontext (see the attached patch), > otherwise we'll get in the situation when current memory context is hashcontext > instead of aggcontext. Nice catch! I pushed your fix, but I also made it set current_set. > Also, I found one suspicious thing, in AGG_PLAIN_TRANS section we don't > switch the memory context back in the branch with ExecAggTransReparent. I > never found any consequences of that, but just in case I believe it makes sense > to do so. I'll look at that next. > And the last thing - where can I find a documentation about how to properly > apply patches for GDB & perf support to llvm? I remember they were posted here, > and found some of them here [1] from Andres, but apparently part of them was > already applied on top of llvm. Looks like for the gdb support I need to apply > 0006-ORC-JIT-event-listener-support (since there is a gdb listener mentioned > there), but with this patch I have an error: > > error: ‘ObjHandleT’ was not declared in this scope > > So I'm confused how should it be? I've merged the GDB part into LLVM, and am about to merge the perf part too. I plan to push a fix to PG adapting it to use the agreed upon / merged LLVM APIs. Then you'll just need a recent LLVM checkout. Unfortunately the relevant LLVM internal APIs have changed quite rapidly over the last few releases (and a lot within individual releases), so it's not easy to provide a patch for the individual versions. Greetings, Andres Freund
Hi, On 2018-07-22 17:09:37 -0700, Andres Freund wrote: > > Also, I found one suspicious thing, in AGG_PLAIN_TRANS section we don't > > switch the memory context back in the branch with ExecAggTransReparent. I > > never found any consequences of that, but just in case I believe it makes sense > > to do so. > > I'll look at that next. That's indeed necessary. Pushed. Thanks! > > > And the last thing - where can I find a documentation about how to properly > > apply patches for GDB & perf support to llvm? I remember they were posted here, > > and found some of them here [1] from Andres, but apparently part of them was > > already applied on top of llvm. Looks like for the gdb support I need to apply > > 0006-ORC-JIT-event-listener-support (since there is a gdb listener mentioned > > there), but with this patch I have an error: > > > > error: ‘ObjHandleT’ was not declared in this scope > > > > So I'm confused how should it be? > > I've merged the GDB part into LLVM, and am about to merge the perf part > too. I plan to push a fix to PG adapting it to use the agreed upon / > merged LLVM APIs. Then you'll just need a recent LLVM > checkout. Unfortunately the relevant LLVM internal APIs have changed > quite rapidly over the last few releases (and a lot within individual > releases), so it's not easy to provide a patch for the individual versions. Adapted the PG bits. Greetings, Andres Freund