Re: Questionable result from lead(0) IGNORE NULLS - Mailing list pgsql-hackers

From Tatsuo Ishii
Subject Re: Questionable result from lead(0) IGNORE NULLS
Date
Msg-id 20251008.170007.1614484426305151178.ishii@postgresql.org
Whole thread Raw
In response to Questionable result from lead(0) IGNORE NULLS  (Tatsuo Ishii <ishii@postgresql.org>)
List pgsql-hackers
Hi Oliver,

After studying the standard more, it seems I was totally wrong.
In summary, current code is correct. Sorry for noise.

From the standard explaining lead():

> B) If OFFSET = 0 (zero), then the value of <window function> is the
> value of VE1 evaluated for the current row.
(here VE1 referes to the first argument of lead()).

So it seems we need to treat offset==0 case specially. i.e. eval VE1
and do not through away even if the result is NULL.

> I noticed a questionable result from "lead(0) IGNORE NULLS".
> 
> CREATE TEMP TABLE g(x INT, y INT);
> CREATE TABLE
> INSERT INTO g (VALUES(NULL,1),(NULL,2),(1,3));
> INSERT 0 3
> SELECT * FROM g;
>  x | y 
> ---+---
>    | 1
>    | 2
>  1 | 3
> (3 rows)
> 
> SELECT x, y, lead(x, 0) RESPECT NULLS OVER w FROM g
> WINDOW w AS (ORDER BY y);
>  x | y | lead 
> ---+---+------
>    | 1 |     
>    | 2 |     
>  1 | 3 |    1
> (3 rows)
> 
> SELECT x, y, lead(x, 0) IGNORE NULLS OVER w FROM g
> WINDOW w AS (ORDER BY y);
>  x | y | lead 
> ---+---+------
>    | 1 |     
>    | 2 |     
>  1 | 3 |    1
> (3 rows)
> 
> As you can see, "lead(x, 0) IGNORE NULLS" shows the same result as
> "lead(x, 0) RESPECT NULLS". IMO "lead(x, 0) IGNORE NULLS" should show
> something like:

No. The result above is perfectly correct.

>    if (winobj->ignore_nulls == IGNORE_NULLS && relpos != 0)

Now I understand why "relpos != 0" part is necessary.

Also, my email below was wrong.

> While working on this, I found some of window function regression
> tests using lead/lag are not quite correct. Below is some of them.
> 
> -- lead
> SELECT name,
>        orbit,
>        lead(orbit) OVER w AS lead,
>        lead(orbit) RESPECT NULLS OVER w AS lead_respect,
>        lead(orbit) IGNORE NULLS OVER w AS lead_ignore
> FROM planets
> WINDOW w AS (ORDER BY name)
> ;
>   name   | orbit | lead  | lead_respect | lead_ignore 
> ---------+-------+-------+--------------+-------------
>  earth   |       |  4332 |         4332 |        4332
>  jupiter |  4332 |       |              |          88
>  mars    |       |    88 |           88 |          88
>  mercury |    88 | 60182 |        60182 |       60182
>  neptune | 60182 | 90560 |        90560 |       90560
>  pluto   | 90560 | 24491 |        24491 |       24491
>  saturn  | 24491 |       |              |         224
>  uranus  |       |   224 |          224 |         224
>  venus   |   224 |       |              |            
>  xyzzy   |       |       |              |            
> (10 rows)
> 
> Why lead_ignore shows "4332" on the first row? Since "orbit"'s second
> non null row is orbit==88, I think lead(orbit) should return 88,
> rather than 4332 if my understanding of the SQL standard is correct.

According to the standard if OFFSET is not 0, then:
> B) Otherwise, let TX be the sequence of values that is the result of applying VE1 to each
> row of T that follows the current row and eliminating null values, ordered according
> to the window ordering of WDX.

> C) Otherwise, the value of <window function> is the m-th value of
> TX, where m = OFFSET.

Thus TX does not include current row and for the first row, and
lead(orbit) returns the second row, that is orbit==4332 because m==1.
So, current regression test expected file is correct.

Again, sorry for noise.
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp



pgsql-hackers by date:

Previous
From: Jeremy Schneider
Date:
Subject: Re: sync_standbys_defined and pg_stat_replication
Next
From: "v@viktorh.net"
Date:
Subject: Re: Allow ON CONFLICT DO UPDATE to return EXCLUDED values