Re: Handling GIN incomplete splits - Mailing list pgsql-hackers
From | Michael Paquier |
---|---|
Subject | Re: Handling GIN incomplete splits |
Date | |
Msg-id | CAB7nPqShw2nHRkM+P41UWfMp3o8j2=3s2bteoiDx8ZEa30po2w@mail.gmail.com Whole thread Raw |
In response to | Re: Handling GIN incomplete splits (Heikki Linnakangas <hlinnakangas@vmware.com>) |
Responses |
Re: Handling GIN incomplete splits
Re: Handling GIN incomplete splits |
List | pgsql-hackers |
On Thu, Nov 21, 2013 at 9:44 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > Here's a new version. To ease the review, I split the remaining patch again > into two, where the first patch is just yet more refactoring. > > I also fixed some bugs in WAL logging and replay that I bumped into while > testing. Cool. Here is the review of the two remaining patches. 1) More refactoring, general remarks: - Code compiles without warnings - Passes make check - If I got it correctly... this patch separates the part managing data to be inserted from ginbtree. I can understand the meaning behind that if we consider that GinBtree is used only to define methods and search conditions (flags and keys). However I am wondering if this does not make the code more complicated... Particularly the flag isDelete that is only passed to ginxlogInsert meritates its comment IMO. Note that I haven't read the 2nd patch when writing that :) - With this patch, previous SELECT example takes 5.236 ms in average, runtime does not change. 1-1) This block of code is repeated several times and should be refactored into a single function: /* During index build, count the new page */ if (buildStats) { if (btree->isData) buildStats->nDataPages++; else buildStats->nEntryPages++; } Something with a function like that perhaps: static void ginCountNewPage(GinBtree btree, GinStatsData *buildStats); 1-2) Could it be possible to change the variable name of "GinBtreeEntryInsertData *entry" in entryIsEnoughSpace? entry->entry is kind of hard to apprehend... Renaming it to either insertEntry. Another idea would be to rename entry in GinBtreeEntryInsertData to entryData or entryTuple. 1-3) This block of code is present two times: + if (!isleaf) + { + PostingItem *pitem = GinDataPageGetPostingItem(lpage, off); + PostingItemSetBlockNumber(pitem, updateblkno); + } Should the update of a downlink to point to the next page be a separate function? 2) post recovery cleanup: - OK, so roughly the soul of this patch is to change the update mechanism for a left child gin page so as the parent split is always done first before any new data is inserted in this child. And this ensures that we can remove the xlog cleanup mechanism for gin page splits in the same fashion as gist... xlog redo mechanism is then adapted according to that. - Compilation fails becausze the flags GIN_SPLIT_ISLEAF and GIN_SPLIT_ISDATA are missing in gin_private.h. The patch attached fixes that though. - With my additional patch, it passes make check, compilation shows no warnings. - I did some tests with the patch: -- Index creation time vanilla: 3266.834 with the two patches: 3412.473 ms -- Tried the new redo mechanism by simulating some server failures a couple of times and saw no failures - I am seeing similar run times for queries like the example used in previous emails of this thread. So no problem on this side. - And... Here are some comments about the code: 2-1) In ginFindParents, is the case where the stack has no parent possible (aka the stack is the root itself)? Shouldn't this code path check if root is NULL or not? 2-2) Not sure that this structure is in-line with the project policy: struct { BlockNumber left; BlockNumber right; } children; Why not adding a complementary structure in gin_private.h doing that? It could be used as well in ginxlogSplit to specify a left/right family of block numbers. 2-3) s/kepy/kept (comments of ginFinishSplit) Other than that, the patch looks great. I particularly like the new redo logic in ginxlog.c, the code is more easily understandable with the split redo removed. Even if I am just a noob for this code, it is nicely built and structured. Regards, -- Michael
Attachment
pgsql-hackers by date: