Re: LIMIT/SORT optimization - Mailing list pgsql-patches
From | Heikki Linnakangas |
---|---|
Subject | Re: LIMIT/SORT optimization |
Date | |
Msg-id | 460A2E9F.7020508@enterprisedb.com Whole thread Raw |
In response to | Re: LIMIT/SORT optimization (Gregory Stark <stark@enterprisedb.com>) |
Responses |
Re: LIMIT/SORT optimization
|
List | pgsql-patches |
Some comments on the patch below. Gregory Stark wrote: > + /* tuplesort_set_bound - External API to set a bound on a tuplesort > + * > + * Must be called before inserting any tuples. > + > + * Sets a maximum number of tuples the caller is interested in. The first > + * <bound> tuples are maintained using a simple insertion sort and returned > + * normally. Any tuples that lie after those in the sorted result set are > + * simply thrown out > + */ The "Must be called before inserting any tuples" is in contradiction with the comment in the header file: > + /* This can be called at any time before performsort to advise tuplesort that > + * only this many tuples are interesting. If that many tuples fit in memory and > + * we haven't already overflowed to disk then tuplesort will switch to a simple > + * insertion sort or heap sort and throw away the uninteresting tuples. > + */ The latter seems to be correct. > ! /* > ! * Convert the existing unordered list of sorttuples to a heap in either order. > ! * This used to be inline but now there are three separate places we heap sort > ! * (initializing the tapes, if we have a bounded output, and any time the user > ! * says he doesn't want to use glibc's qsort). > ! * > ! * NOTE heapify passes false for checkIndex (and stores a constant tupindex > ! * passed as a parameter) even though we use heaps for multi-run sources > ! * because we only heapify when we're doing in-memory sorts or in inittapes > ! * before there's any point in comparing tupindexes. > ! */ > ! > ! static void > ! tuplesort_heapify(Tuplesortstate *state, int tupindex, HeapOrder heaporder) > ! { The comment claims that we use heap sort when the user says he doesn't want to use glibc's qsort. I recall that we always use our own qsort implementation nowadays. And we never used the heap sort as a qsort replacement, did we? In performsort, you convert the in-memory heap to a sorted list in one go. I wonder if it would be better to switch to a new TSS_ALLINHEAP state that means "all tuples are now in the in-memory heap", and call tuplesort_heap_siftup in gettuple. It probably doesn't make much difference in most cases, but if there's another limit node in the plan with a smaller limit or the client only fetches a few top rows with a cursor you'd avoid unheapifying tuples that are just thrown away later. There's a few blocks of code surrounded with "#if 0 - #endif". Are those just leftovers that should be removed, or are things that still need to finished and enabled? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
pgsql-patches by date: