On 3/6/2025 15:38, Alexander Korotkov wrote:
> On Tue, Jun 3, 2025 at 4:23 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
>> To establish a stable foundation for discussion, I conducted simple
>> tests - see, for example, a couple of queries in the attachment. As I
>> see it, Sort->Append works faster: in my test bench, it takes 1250ms on
>> average versus 1430ms, and it also has lower costs - the same for data
>> with and without massive numbers of duplicates. Playing with sizes of
>> inputs, I see the same behaviour.
>
> I run your tests. For Sort(Append()) case I've got actual
> time=811.047..842.473. For MergeAppend case I've got actual time
> actual time=723.678..967.004. That looks interesting. At some point
> we probably should teach our Sort node to start returning tuple before
> finishing the last merge stage.
>
> However, I think costs are not adequate to the timing. Our cost model
> predicts that startup cost of MergeAppend is less than startup cost of
> Sort(Append()). And that's correct. However, in fast total time of
> MergeAppend is bigger than total time of Sort(Append()). The
> differences in these two cases are comparable. I think we need to
> just our cost_sort() to reflect that.
May you explain your idea? As I see (and have shown in the previous
message), the total cost of the Sort->Append is fewer than
MergeAppend->Sort.
Additionally, as I mentioned earlier, the primary reason for choosing
MergeAppend in the regression test was a slight total cost difference
that triggered the startup cost comparison.
May you show the query and its explain, that is a subject of concern for
you?
--
regards, Andrei Lepikhov