Re: Questionable result from lead(0) IGNORE NULLS - Mailing list pgsql-hackers
From | Tatsuo Ishii |
---|---|
Subject | Re: Questionable result from lead(0) IGNORE NULLS |
Date | |
Msg-id | 20251008.170007.1614484426305151178.ishii@postgresql.org Whole thread Raw |
In response to | Questionable result from lead(0) IGNORE NULLS (Tatsuo Ishii <ishii@postgresql.org>) |
List | pgsql-hackers |
Hi Oliver, After studying the standard more, it seems I was totally wrong. In summary, current code is correct. Sorry for noise. From the standard explaining lead(): > B) If OFFSET = 0 (zero), then the value of <window function> is the > value of VE1 evaluated for the current row. (here VE1 referes to the first argument of lead()). So it seems we need to treat offset==0 case specially. i.e. eval VE1 and do not through away even if the result is NULL. > I noticed a questionable result from "lead(0) IGNORE NULLS". > > CREATE TEMP TABLE g(x INT, y INT); > CREATE TABLE > INSERT INTO g (VALUES(NULL,1),(NULL,2),(1,3)); > INSERT 0 3 > SELECT * FROM g; > x | y > ---+--- > | 1 > | 2 > 1 | 3 > (3 rows) > > SELECT x, y, lead(x, 0) RESPECT NULLS OVER w FROM g > WINDOW w AS (ORDER BY y); > x | y | lead > ---+---+------ > | 1 | > | 2 | > 1 | 3 | 1 > (3 rows) > > SELECT x, y, lead(x, 0) IGNORE NULLS OVER w FROM g > WINDOW w AS (ORDER BY y); > x | y | lead > ---+---+------ > | 1 | > | 2 | > 1 | 3 | 1 > (3 rows) > > As you can see, "lead(x, 0) IGNORE NULLS" shows the same result as > "lead(x, 0) RESPECT NULLS". IMO "lead(x, 0) IGNORE NULLS" should show > something like: No. The result above is perfectly correct. > if (winobj->ignore_nulls == IGNORE_NULLS && relpos != 0) Now I understand why "relpos != 0" part is necessary. Also, my email below was wrong. > While working on this, I found some of window function regression > tests using lead/lag are not quite correct. Below is some of them. > > -- lead > SELECT name, > orbit, > lead(orbit) OVER w AS lead, > lead(orbit) RESPECT NULLS OVER w AS lead_respect, > lead(orbit) IGNORE NULLS OVER w AS lead_ignore > FROM planets > WINDOW w AS (ORDER BY name) > ; > name | orbit | lead | lead_respect | lead_ignore > ---------+-------+-------+--------------+------------- > earth | | 4332 | 4332 | 4332 > jupiter | 4332 | | | 88 > mars | | 88 | 88 | 88 > mercury | 88 | 60182 | 60182 | 60182 > neptune | 60182 | 90560 | 90560 | 90560 > pluto | 90560 | 24491 | 24491 | 24491 > saturn | 24491 | | | 224 > uranus | | 224 | 224 | 224 > venus | 224 | | | > xyzzy | | | | > (10 rows) > > Why lead_ignore shows "4332" on the first row? Since "orbit"'s second > non null row is orbit==88, I think lead(orbit) should return 88, > rather than 4332 if my understanding of the SQL standard is correct. According to the standard if OFFSET is not 0, then: > B) Otherwise, let TX be the sequence of values that is the result of applying VE1 to each > row of T that follows the current row and eliminating null values, ordered according > to the window ordering of WDX. > C) Otherwise, the value of <window function> is the m-th value of > TX, where m = OFFSET. Thus TX does not include current row and for the first row, and lead(orbit) returns the second row, that is orbit==4332 because m==1. So, current regression test expected file is correct. Again, sorry for noise. -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
pgsql-hackers by date: