Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets |
Date | |
Msg-id | 603c8f070903061128w7bdac4e0u6612f5a1d5f407df@mail.gmail.com Whole thread Raw |
In response to | Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets
|
List | pgsql-hackers |
On Fri, Mar 6, 2009 at 1:57 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Bryce Cutt <pandasuit@gmail.com> writes: >> Here is the new patch. >> Our experiments show no noticeable performance issue when using the >> patch for cases where the optimization is not used because the number >> of extra statements executed when the optimization is disabled is >> insignificant. > >> We have updated the patch to remove a couple of if statements, but >> this is really minor. The biggest change was to MultiExecHash that >> avoids an if check per tuple by duplicating the hashing loop. > > I think you missed the point of the performance questions. It wasn't > about avoiding extra simple if-tests in the per-tuple loops; a few of > those are certainly not going to add measurable cost given how complex > the code is already. (I really don't think you should be duplicating > hunks of code to avoid adding such tests.) Rather, the concern was that Well, at one point we were still trying to verify that (1) the patch actually had a benefit and (2) blowing out the IM hashtable wasn't too horribly nasty. A great deal of improvement has been made in those areas since this was first reviewed. But your questions are completely valid, too. (I don't think anyone ever expressed a concern about the simple if-tests, either.) > if we are dedicating a fraction of available work_mem to this purpose, > that reduces the overall efficiency of the regular non-IM code path, > principally by forcing the creation of more batches than would otherwise > be needed. It's not clear whether the savings for IM tuples always > exceeds this additional cost. > > After looking over the code a bit, there are two points that > particularly concern me in this connection: > > * The IM hashtable is only needed during the first-batch processing; > once we've completed the first pass over the outer relation there is > no longer any need for it, unless I'm misunderstanding things > completely. Therefore it really only competes for space with the > regular first batch. However the damage to nbatches will already have > been done; in effect, we can expect that each subsequent batch will > probably only use (100 - IM_WORK_MEM_PERCENT)% of work_mem. The patch > seems to try to deal with this by keeping IM_WORK_MEM_PERCENT negligibly > small, but surely that's mostly equivalent to fighting with one hand > tied behind your back. I wonder if it'd be better to dedicate all of > work_mem to the MCV hash values during the first pass, rather than > allowing them to compete with the first regular batch. The IM hash table doesn't need to be very large in order to produce a substantial benefit, because there are only going to be ~100 MCVs in the probe table and each of those may well be unique in the build table. But no matter what size you choose for it, there's some danger that it will push us over the edge into more batches, and if the skew doesn't turn out to be enough to make up for that, you lose. I'm not sure there's any way to completely eliminate that unpleasant possibility. > * The IM hashtable creates an additional reason why nbatch might > increase during the initial scan of the inner relation; in fact, since > it's an effect not modeled in the initial choice of nbatch, it's > probably going to be a major reason for that to happen. Increasing > nbatch on the fly isn't good because it results in extra I/O for tuples > that were previously assigned to what is now the wrong batch. Again, > the only answer the patch has for this is to try not to use enough > of work_mem for it to make a difference. Seems like instead the initial > nbatch estimate needs to account for that. ...Robert
pgsql-hackers by date: