fixed tuple descs (was JIT compiling expressions/deform) - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | fixed tuple descs (was JIT compiling expressions/deform) |
Date | |
Msg-id | 20171206093717.vqdxe5icqttpxs3p@alap3.anarazel.de Whole thread Raw |
In response to | [HACKERS] JIT compiling expressions/deform + inlining prototype v2.0 (Andres Freund <andres@anarazel.de>) |
List | pgsql-hackers |
Hi, One part of the work to make JITing worth it's while is JITing tuple deforming. That's currently often the biggest consumer of time, and if not most often in the top entries. My experimentation shows that tuple deforming is primarily beneficial when it happens as *part* of jit compiling expressions. I'd originally tried to jit compile deforming inside heaptuple.c, and cache the deforming program inside the tuple slot. That turns out to not work very well, because a lot of tuple descriptors are very short lived, computed during ExecInitNode(). Even if that were not the case, compiling for each deforming on demand has significant downsides: - it requires emitting code in smaller increments (whenever something new is deformed) - because the generated code has to be generic for all potential deformers, the number of branches to check for that are significant. If instead the the deforming code is generated for a specific callsite, no branches for the number of to-be-deformed columns has to be generated. The primary remaining branches then are the ones checking for NULLs and the number of attributes in the column, and those can often be optimized away if there's NOT NULL columns present. - the call overhead is still noticeable - the memory / function lifetime management is awkward. If the JITing of expressions is instead done as part of expression evaluation we can emit all the necessary code for the whole plantree during executor startup, in one go. And, more importantly, LLVMs optimizer is free to inline the deforming code into the expression code, often yielding noticeable improvements (although that still could use some improvements). To allow doing JITing at ExecReadyExpr() time, we need to know the tuple descriptor a EEOP_{INNER,OUTER,SCAN}_FETCHSOME step refers to. There's currently two major impediments to that. 1) At a lot of ExecInitExpr() callsites the tupledescs for inner, outer, scan aren't yet known. Therefore that code needs to be reordered so we (if applicable): a) initialize subsidiary nodes, thereby determining the left/right (inner/outer) tupledescs b) initialize the scan tuple desc, often that refers to a) c) determine the result tuple desc, required to build the projection d) build projections e) build expressions Attached is a patch doing so. Currently it only applies with a few preliminary patches applied, but that could be easily reordered. The patch is relatively large, as I decided to try to get the different ExecInitNode functions to look a bit more similar. There's some judgement calls involved, but I think the result looks a good bit better, regardless of the later need. I'm not really happy with the, preexisting, split of functions between execScan.c, execTuples.c, execUtils.c. I wonder if the majority, except the low level slot ones, shouldn't be moved to execUtils.c, I think that'd be clearer. There seems to be no justification for execScan.c to contain ExecAssignScanProjectionInfo[WithVarno]. 2) TupleSlots need to describe whether they'll contain a fixed tupledesc for all their lifetime, or whether they can change their nature. Most places don't need to ever change a slot's identity, but in a few places it's quite convenient. I've introduced the notion that a tupledesc can be marked as "fixed", by passing a tupledesc at its creation. That also gains a bit of efficiency (memory management overhead, higher cache hit ratio) because the slot, tts_values, tts_isnull can be allocated in one chunk. 3) At expression initialization time we need to figure out what slots (or just descs INNER/OUTER/SCAN refer to. I've solved that by looking up inner/outer/scan via the provided parent node, which required adding a new field to store the scan slot. Currently no expressions initialized with a parent node have a INNER/OUTER/SCAN slot + desc that doesn't refer to the relevant node, but I'm not sure I like that as a requirement. Attached is a patch that implements 1 + 2. I'd welcome a quick look through it. It currently only applies ontop a few other recently submitted patches, but it'd just be an hour's work or so to reorder that. Comments about either the outline above or the patch? Regards, Andres
Attachment
pgsql-hackers by date: