Thread: [Question] Window Function Results without ORDER BY Clause
Hi,
I am reaching out to discuss the behavior of window functions in Postgres, specifically regarding the use of the
In our recent tests, we observed that the results can be unstable.
For example, when executing the following query:
SELECT sum(unique1) OVER (ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING),
unique1, four
FROM tenk1
WHERE unique1 < 10;
The case is in window.sql of regression.
explain(costs off) SELECT sum(unique1) over (rows between current row and unbounded following),
unique1, four
FROM tenk1 WHERE unique1 < 10;
QUERY PLAN
--------------------------------------------------------------------
WindowAgg
Window: w1 AS (ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
-> Index Scan using tenk1_unique1 on tenk1
Index Cond: (unique1 < 10)
(4 rows)
regression=# SELECT sum(unique1) over (rows between current row and unbounded following),
unique1, four
FROM tenk1 WHERE unique1 < 10;
sum | unique1 | four
-----+---------+------
45 | 0 | 0
45 | 1 | 1
44 | 2 | 2
42 | 3 | 3
39 | 4 | 0
35 | 5 | 1
30 | 6 | 2
24 | 7 | 3
17 | 8 | 0
9 | 9 | 1
(10 rows)
However, after setting
regression=# set enable_indexscan = off;
SET
regression=# SELECT sum(unique1) over (rows between current row and unbounded following),
unique1, four
FROM tenk1 WHERE unique1 < 10;
sum | unique1 | four
-----+---------+------
45 | 4 | 0
41 | 2 | 2
39 | 1 | 1
38 | 6 | 2
32 | 9 | 1
23 | 8 | 0
15 | 5 | 1
10 | 3 | 3
7 | 7 | 3
0 | 0 | 0
(10 rows)
regression=# explain(costs off) SELECT sum(unique1) over (rows between current row and unbounded following),
unique1, four
FROM tenk1 WHERE unique1 < 10;
QUERY PLAN
--------------------------------------------------------------------
WindowAgg
Window: w1 AS (ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
-> Seq Scan on tenk1
Filter: (unique1 < 10)
(4 rows)
Referring to the SQL 2011 standard, it states that if
While using a window function without
So, are both result sets technically correct given the absence of an
--
Zhang Mingli
HashData
I am reaching out to discuss the behavior of window functions in Postgres, specifically regarding the use of the
OVER()
clause without an ORDER BY
specification.In our recent tests, we observed that the results can be unstable.
For example, when executing the following query:
SELECT sum(unique1) OVER (ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING),
unique1, four
FROM tenk1
WHERE unique1 < 10;
The case is in window.sql of regression.
explain(costs off) SELECT sum(unique1) over (rows between current row and unbounded following),
unique1, four
FROM tenk1 WHERE unique1 < 10;
QUERY PLAN
--------------------------------------------------------------------
WindowAgg
Window: w1 AS (ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
-> Index Scan using tenk1_unique1 on tenk1
Index Cond: (unique1 < 10)
(4 rows)
regression=# SELECT sum(unique1) over (rows between current row and unbounded following),
unique1, four
FROM tenk1 WHERE unique1 < 10;
sum | unique1 | four
-----+---------+------
45 | 0 | 0
45 | 1 | 1
44 | 2 | 2
42 | 3 | 3
39 | 4 | 0
35 | 5 | 1
30 | 6 | 2
24 | 7 | 3
17 | 8 | 0
9 | 9 | 1
(10 rows)
However, after setting
enable_indexscan = off
, the results changed:regression=# set enable_indexscan = off;
SET
regression=# SELECT sum(unique1) over (rows between current row and unbounded following),
unique1, four
FROM tenk1 WHERE unique1 < 10;
sum | unique1 | four
-----+---------+------
45 | 4 | 0
41 | 2 | 2
39 | 1 | 1
38 | 6 | 2
32 | 9 | 1
23 | 8 | 0
15 | 5 | 1
10 | 3 | 3
7 | 7 | 3
0 | 0 | 0
(10 rows)
regression=# explain(costs off) SELECT sum(unique1) over (rows between current row and unbounded following),
unique1, four
FROM tenk1 WHERE unique1 < 10;
QUERY PLAN
--------------------------------------------------------------------
WindowAgg
Window: w1 AS (ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
-> Seq Scan on tenk1
Filter: (unique1 < 10)
(4 rows)
Referring to the SQL 2011 standard, it states that if
ORDER BY
is omitted, the order of rows in the partition is undefined. While using a window function without
ORDER BY
is valid, the resulting output seems unpredictable.So, are both result sets technically correct given the absence of an
ORDER BY
clause?--
Zhang Mingli
HashData
On Friday, July 11, 2025, Zhang Mingli <zmlpostgres@gmail.com> wrote:
Referring to the SQL 2011 standard, it states that ifORDER BY
is omitted, the order of rows in the partition is undefined.
While using a window function withoutORDER BY
is valid, the resulting output seems unpredictable.
So, are both result sets technically correct given the absence of anORDER BY
clause?
The system is behaving within the requirements of the specification. The query itself is bugged code that the query author should fix.
David J.
"David G. Johnston" <david.g.johnston@gmail.com> writes: > On Friday, July 11, 2025, Zhang Mingli <zmlpostgres@gmail.com> wrote: >> So, are both result sets technically correct given the absence of an ORDER >> BY clause? > The system is behaving within the requirements of the specification. The > query itself is bugged code that the query author should fix. Well, it's our own regression-test query. I think the actual question being asked here is "do our regression tests need to pass under random non-default GUC settings?". I'd say no; it'd be next door to impossible to guarantee that. If this query gave unstable results in practice, we'd have noticed by now (it's been there since 2010). regards, tom lane
Hi,
David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Friday, July 11, 2025, Zhang Mingli <zmlpostgres@gmail.com> wrote:
>> So, are both result sets technically correct given the absence of an ORDER
>> BY clause?
> The system is behaving within the requirements of the specification. The
> query itself is bugged code that the query author should fix.
Well, it's our own regression-test query. I think the actual question
being asked here is "do our regression tests need to pass under random
non-default GUC settings?". I'd say no; it'd be next door to
impossible to guarantee that. If this query gave unstable results
in practice, we'd have noticed by now (it's been there since 2010).
regards, tom lane
>> So, are both result sets technically correct given the absence of an ORDER
>> BY clause?
> The system is behaving within the requirements of the specification. The
> query itself is bugged code that the query author should fix.
Well, it's our own regression-test query. I think the actual question
being asked here is "do our regression tests need to pass under random
non-default GUC settings?". I'd say no; it'd be next door to
impossible to guarantee that. If this query gave unstable results
in practice, we'd have noticed by now (it's been there since 2010).
regards, tom lane
Would like to know ..
- Any particular reason why ORDER BY clause was ignored/removed from windows function
- if by applying the
ORDER BY
clause on windows function, were the regression test results become deterministic.
Thanks in advance
Dinesh
From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: Friday, July 11, 2025 9:27 PM
To: David G. Johnston <david.g.johnston@gmail.com>
Cc: Zhang Mingli <zmlpostgres@gmail.com>; PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>
Subject: Re: [Question] Window Function Results without ORDER BY Clause
Sent: Friday, July 11, 2025 9:27 PM
To: David G. Johnston <david.g.johnston@gmail.com>
Cc: Zhang Mingli <zmlpostgres@gmail.com>; PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>
Subject: Re: [Question] Window Function Results without ORDER BY Clause
Caution: This email was sent from an external source. Please verify the sender’s identity before clicking links or opening attachments.
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Friday, July 11, 2025, Zhang Mingli <zmlpostgres@gmail.com> wrote:
>> So, are both result sets technically correct given the absence of an ORDER
>> BY clause?
> The system is behaving within the requirements of the specification. The
> query itself is bugged code that the query author should fix.
Well, it's our own regression-test query. I think the actual question
being asked here is "do our regression tests need to pass under random
non-default GUC settings?". I'd say no; it'd be next door to
impossible to guarantee that. If this query gave unstable results
in practice, we'd have noticed by now (it's been there since 2010).
regards, tom lane
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Friday, July 11, 2025, Zhang Mingli <zmlpostgres@gmail.com> wrote:
>> So, are both result sets technically correct given the absence of an ORDER
>> BY clause?
> The system is behaving within the requirements of the specification. The
> query itself is bugged code that the query author should fix.
Well, it's our own regression-test query. I think the actual question
being asked here is "do our regression tests need to pass under random
non-default GUC settings?". I'd say no; it'd be next door to
impossible to guarantee that. If this query gave unstable results
in practice, we'd have noticed by now (it's been there since 2010).
regards, tom lane