Re: bad estimation together with large work_mem generates terrible slow hash joins - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: bad estimation together with large work_mem generates terrible slow hash joins |
Date | |
Msg-id | 540F8401.6040808@fuzzy.cz Whole thread Raw |
In response to | Re: bad estimation together with large work_mem generates terrible slow hash joins (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: bad estimation together with large work_mem generates
terrible slow hash joins
|
List | pgsql-hackers |
On 9.9.2014 16:09, Robert Haas wrote: > On Mon, Sep 8, 2014 at 5:53 PM, Tomas Vondra <tv@fuzzy.cz> wrote: >> So I only posted the separate patch for those who want to do a review, >> and then a "complete patch" with both parts combined. But it sure may be >> a bit confusing. > > Let's do this: post each new version of the patches only on this > thread, as a series of patches that can be applied one after another. OK, attached. Apply in this order 1) dense-alloc-v5.patch 2) hashjoin-nbuckets-v12.patch >>> In ExecChooseHashTableSize(), if buckets_bytes is independent of >>> nbatch, could we just compute it before working out dbatch, and just >>> deduct it from hash_table_bytes? That seems like it'd do the same >>> thing for less work. >> >> Well, we can do that. But I really don't think that's going to make >> measurable difference. And I think the code is more clear this way. > > Really? It seems to me that you're doing more or less the same > calculation that's already been done a second time. It took me 20 > minutes of staring to figure out what it was really doing. Now > admittedly I was a little low on caffeine, but... I reworked this part a bit, to make it easier to understand. Mostly following your recommendations. > >>> Either way, what happens if buckets_bytes is already bigger than >>> hash_table_bytes? >> >> Hm, I don't see how that could happen. > > How about an Assert() and a comment, then? Done. According to the comment, we should never really get over 25%, except maybe for strange work_mem values, because we're keeping nbuckets as 2^N. But even then we should not reach 50%, so I added this assert: Assert(buckets_bytes <= hash_table_bytes/2); And then we subtract the buckets_bytes, and continue with the loop. > >>> A few more stylistic issues that I see: >>> >>> + if ((hashtable->nbatch == 1) && >>> + if (hashtable->nbuckets_optimal <= (INT_MAX/2)) >>> + if (size > (HASH_CHUNK_SIZE/8)) >>> >>> While I'm all in favor of using parentheses generously where it's >>> needed to add clarity, these and similar usages seem over the top to >>> me. >> >> It seemed more readable for me at the time of coding it, and it still >> seems better this way, but ... >> >> https://www.youtube.com/watch?v=CxK_nA2iVXw > > Heh. Well, at any rate, I think PostgreSQL style is to not use > parentheses in that situation. FWIW, I added HASH_CHUNK_THRESHOLD macro, specifying the threshold. I also simplified the conditions a bit, and removed some of the parentheses. > >>> +char * chunk_alloc(HashJoinTable hashtable, int size) { >>> >>> Eh... no. >>> >>> + /* XXX maybe we should use MAXALIGN(size) here ... */ >>> >>> +1. >> >> I'm not sure what's the 'no' pointing at? Maybe that the parenthesis >> should be on the next line? And is the +1 about doing MAXALING? > > The "no" is about the fact that you have the return type, the function > name, and the opening brace on one line instead of three lines as is > project style. The +1 is for applying MAXALIGN to the size. OK, fixed. I also did a few 'minor' changes to the dense allocation patch, most notably: * renamed HashChunk/HashChunkData to MemoryChunk/MemoryChunkData The original naming seemed a bit awkward. * renamed the function to 'dense_alloc' (instead of 'chunk_alloc') Seems like a better fit to what the function does. * the chunks size is 32kB (instead of 16kB), and we're using 1/4 threshold for 'oversized' items We need the threshold to be >=8kB, to trigger the special case within AllocSet. The 1/4 rule is consistent with ALLOC_CHUNK_FRACTION. I also considered Heikki's suggestion that while rebatching, we can reuse chunks with a single large tuple, instead of copying it needlessly. My attempt however made the code much uglier (I almost used a GOTO, but my hands rejected to type it ...). I'll look into that. Tomas
Attachment
pgsql-hackers by date: