Re: MergeAppend could consider sorting cheapest child path - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: MergeAppend could consider sorting cheapest child path
Date
Msg-id CAPpHfdvHzZqTGtVi-0mwfs8sug0qOsrv3axNQux4j2M8NFuFkQ@mail.gmail.com
Whole thread Raw
In response to Re: MergeAppend could consider sorting cheapest child path  (Andrei Lepikhov <lepihov@gmail.com>)
Responses Re: MergeAppend could consider sorting cheapest child path
List pgsql-hackers
On Tue, Jun 3, 2025 at 4:53 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
> On 3/6/2025 15:38, Alexander Korotkov wrote:
> > On Tue, Jun 3, 2025 at 4:23 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
> >> To establish a stable foundation for discussion, I conducted simple
> >> tests - see, for example, a couple of queries in the attachment. As I
> >> see it, Sort->Append works faster: in my test bench, it takes 1250ms on
> >> average versus 1430ms, and it also has lower costs - the same for data
> >> with and without massive numbers of duplicates. Playing with sizes of
> >> inputs, I see the same behaviour.
> >
> > I run your tests.  For Sort(Append()) case I've got actual
> > time=811.047..842.473.  For MergeAppend case I've got actual time
> > actual time=723.678..967.004.  That looks interesting.  At some point
> > we probably should teach our Sort node to start returning tuple before
> > finishing the last merge stage.
> >
> > However, I think costs are not adequate to the timing.  Our cost model
> > predicts that startup cost of MergeAppend is less than startup cost of
> > Sort(Append()).  And that's correct.  However, in fast total time of
> > MergeAppend is bigger than total time of Sort(Append()).  The
> > differences in these two cases are comparable.  I think we need to
> > just our cost_sort() to reflect that.
> May you explain your idea? As I see (and have shown in the previous
> message), the total cost of the Sort->Append is fewer than
> MergeAppend->Sort.
> Additionally, as I mentioned earlier, the primary reason for choosing
> MergeAppend in the regression test was a slight total cost difference
> that triggered the startup cost comparison.
> May you show the query and its explain, that is a subject of concern for
> you?

My point is that difference in total cost is very small.  For small
datasets it could be even within the fuzzy limit.  However, in
practice difference in total time is as big as difference in startup
time.  So, it would be good to make the total cost difference bigger.

------
Regards,
Alexander Korotkov
Supabase



pgsql-hackers by date:

Previous
From: Alexander Korotkov
Date:
Subject: Re: Incremental Sort Cost Estimation Instability
Next
From: Bruce Momjian
Date:
Subject: Re: pg18: Virtual generated columns are not (yet) safe when superuser selects from them