Re: Unclear guarantees about sort order on https://www.postgresql.org/docs/current/queries-order.html - Mailing list pgsql-docs

From David G. Johnston
Subject Re: Unclear guarantees about sort order on https://www.postgresql.org/docs/current/queries-order.html
Date
Msg-id CAKFQuwYzD3S3VKx_6fM6tqw88NQnJJaPMnNiM90mnCFqDxJYCg@mail.gmail.com
Whole thread Raw
In response to Re: Unclear guarantees about sort order on https://www.postgresql.org/docs/current/queries-order.html  (Erik Wienhold <ewie@ewie.name>)
List pgsql-docs
On Wed, Oct 4, 2023 at 6:37 PM Erik Wienhold <ewie@ewie.name> wrote:
On 2023-10-04 16:24 +0200, PG Doc comments form write:
> The following documentation comment has been logged on the website:
>
> Page: https://www.postgresql.org/docs/16/queries-order.html
> Description:
>
> The document only says this about unsorted queries:
>
> > After a query has produced an output table (after the select list has been
> > processed) it can optionally be sorted. If sorting is not chosen, the rows
> > will be returned in an unspecified order. The actual order in that case will
> > depend on the scan and join plan types and the order on disk, but it must
> > not be relied on. A particular output ordering can only be guaranteed if the
> > sort step is explicitly chosen.
>
> It mentions "If sorting is not chosen". This sort of implies that if you
> pick a sort the output order is predictable. However I believe that the only
> actual guarantee is if the sort columns selected produce a unique value.
>
> For example if you do `ORDER BY name` and have two rows with the same name I
> don't think the order of those rows is predictable.

"The relative ordering of two rows that are not distinct with respect to
 the <sort specification> is implementation-dependent."

The OP is assuming a promise of a deterministic ordering of all output rows and such a promise is only possible if the order by clause columns uniquely identify every row in the output.  This is because all the order by promises is that output ordering will conform to the order by specification, and indeed if it is under-specified such that multiple rows match a given bin, then there is no deterministic relative ordering among those rows.

I don't feel that the wording makes any such inference regarding determinism of row output due to the mere presence of an order by clause.  Nor doesn't such determinism in the face of an under-specific clause even make logical sense.  I'm mostly inclined to leave the wording alone given this single report.  My only complaints are style-istic at this point.

That said, maybe a final sentence:

Assuming every output row can be uniquely identified by some subset of the output columns, that subset must all be listed within the order by clause if you wish to ensure a fully deterministic ordering.

David J.

pgsql-docs by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: This is too implicit that recovery.signal will be removed
Next
From: "David G. Johnston"
Date:
Subject: Re: It is not clear from documentation when and how I should restore base backup