Re: GiST kNN search queue (Re: KNN-GiST with recheck) - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: GiST kNN search queue (Re: KNN-GiST with recheck) |
Date | |
Msg-id | 5497ED67.2090109@vmware.com Whole thread Raw |
In response to | Re: GiST kNN search queue (Re: KNN-GiST with recheck) (Peter Geoghegan <pg@heroku.com>) |
Responses |
Re: GiST kNN search queue (Re: KNN-GiST with recheck)
|
List | pgsql-hackers |
On 12/20/2014 10:50 PM, Peter Geoghegan wrote: > On Wed, Dec 17, 2014 at 6:07 AM, Heikki Linnakangas > <hlinnakangas@vmware.com> wrote: >> How about adding a src/backend/lib/README for that, per attached? > > I took a quick look at this. Some observations: > > * I like the idea of adding a .../lib README. However, the README > fails to note that ilist.c implements an *integrated* list, unlike the > much more prevalent cell-based List structure. It should note that, > since that's the whole point of ilist.c. Added a sentence on that. > * pairingheap_remove() is technically dead code. It still makes sense > that you'd have it in this patch, but I think there's an argument for > not including it at all on the theory that if you need to use it you > should use a different data structure. After all, the actual > (non-amortized) complexity of that operation is O(n) [1], and if > remove operations are infrequent as we might expect, that might be the > more important consideration. As long as you are including > pairingheap_remove(), though, why is the local variable "prev_ptr" a > pointer to a pointer to a pairingheap_node, rather than just a pointer > to a pairingheap_node? > > * Similarly, the function-like macro pairingheap_reset() doesn't seem > to pull its weight. Why does it exist alongside pairingheap_free()? > I'm not seeing a need to re-use a heap like that. pairingheap_remove and pairingheap_reset are both unused in this patch, but they were needed for the other use case, tracking snapshots to advance xmin more aggressively, discussed here: http://www.postgresql.org/message-id/5488ACF0.8050901@vmware.com. In fact, without the pairingheap_remove() operation, the prev_or_parent pointer wouldn't be necessarily at all. We could've added them as a separate patch, but that seems like unnecessary code churn. The prev_ptr variable is used to set the parent's first_child pointer, or the previous sibling's next_sibling pointer, depending on whether the removed node is the parent's first child or not. I'll add more comments in pairingheap_remove to explain that. > * "Assert(!pairingheap_is_empty(heap))" appears in a few places. > You're basically asserting that a pointer isn't null, often > immediately before dereferencing the pointer. This seems to be of > questionable value. I copied that from binaryheap.c. It has some documentation value; they make it easy to see that the functions require the heap to not be empty. It's also explained in comments, but still. > * I think that the indentation of code could use some tweaking. > > * More comments, please. In particular, comment the struct fields in > pairingheap_node. There are various blocks of code that could use at > least an additional terse comment, too. Added some comments, hope it's better now. > * You talked about tuplesort.c integration. In order for that to > happen, I think the comparator logic should know less about min-heaps. > This should formally be a max-heap, with the ability to provide > customizations only encapsulated in the comparator (like inverting the > comparison logic to get a min-heap, or like custom NULLs first/last > behavior). So IMV this comment should be more generic/anticipatory: > > + /* > + * For a max-heap, the comparator must return <0 iff a < b, 0 iff a == b, > + * and >0 iff a > b. For a min-heap, the conditions are reversed. > + */ > + typedef int (*pairingheap_comparator) (const pairingheap_node *a, > const pairingheap_node *b, void *arg); > > I think the functions should be called pairing_max_heap* for this > reason, too. Although that isn't consistent with binaryheap.c, so I > guess this whole argument is a non-starter. I don't see what the problem is. The pairingheap.c (and binaryheap.c) code works the same for min and max-heaps. The comments assume a max-heap in a few places, but that seems OK to me in the context. > * We should just move rbtree.c to .../lib. We're not using CVS anymore > -- the history will be magically preserved. Yeah, I tend to agree. Tom Lane has not liked moving things, because it breaks back-patching. That's true in general, even though git has some smarts to follow renames. I think it would work in this case, though. Anyway, let's discuss and do that as a separate patch, so that we don't get stuck on that. > Anyway, to get to the heart of the matter: in general, I think the > argument for the patch is sound. It's not a stellar improvement, but > it's worthwhile. That's all I have for now... Ok, thanks for the review! I have committed this, with some cleanup and more comments added. - Heikki
pgsql-hackers by date: