Reducing expression evaluation overhead - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Reducing expression evaluation overhead |
Date | |
Msg-id | 8715.1079393997@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: Reducing expression evaluation overhead
|
List | pgsql-hackers |
I've been looking at an example provided by Arjen van der Meijden in which practically all the runtime goes into evaluating very trivial comparison expressions (it's basically a CASE statement with a huge number of arms). Whether or not you think that a CASE with a huge number of arms is a particularly sane thing to do, it does seem that the overhead is uncomfortably large. I get profiles like this: % cumulative self self total time seconds seconds calls s/call s/call name 38.15 41.92 41.92 229646 0.00 0.00 nconc21.76 65.84 23.92 199054454 0.00 0.00 ExecEvalExpr11.38 78.34 12.50 10000 0.00 0.00 ExecEvalCase 8.43 87.61 9.27 66348151 0.00 0.00 ExecEvalFuncArgs 8.12 96.54 8.93 66348151 0.00 0.00 ExecMakeFunctionResult 2.96 99.78 3.2566348151 0.00 0.00 ExecEvalVar 1.23 101.14 1.36 10058 0.00 0.00 AllocSetCheck 1.23 102.49 1.35 66348151 0.00 0.00 ExecEvalOper 1.12 103.72 1.24 76537 0.00 0.00 OpernameGetCandidates0.85 104.66 0.94 66424693 0.00 0.00 int4eq % cumulative self self total time seconds seconds calls ms/call ms/call name 25.24 161.92 161.92 51488 3.14 3.14 nconc20.99 296.57 134.65 172163502 0.00 0.00 ExecEvalExpr14.90 392.18 95.61 _mcount 9.66 454.16 61.98 57381167 0.00 0.00 ExecMakeFunctionResult 7.32 501.12 46.96 10000 4.70 4.70 ExecEvalCase 7.24 547.60 46.48 57381167 0.00 0.00 ExecEvalFuncArgs 6.47 589.09 41.49 57381167 0.00 0.00 ExecEvalVar 2.90 607.69 18.60 57381167 0.00 0.00 ExecEvalOper 1.29 615.94 8.25 int4eq The above are from different-size test cases on different hardware, just to make sure that the data is reasonably trustworthy. The nconc entry may be ignored; it's lappend() overhead from the parser, and will drop into the noise once Neil gets his list rewrite done. The thing that jumps out at me is that ExecEvalExpr is taking way more cycles than it has any right to do. The test conditions in the CASE are of the form "var = constant", and so for each tested condition, ExecEvalExpr is invoked three times: one call is passed off to ExecEvalVar, one to ExecEvalOper, and the constant is handled inline by fairly trivial code, which is surely a good bit cheaper than ExecEvalVar. So it seems that a very large fraction of the runtime is being spent just in ExecEvalExpr's switching function, plus the overhead of an extra level of function call to reach the routines that actually do the work. I am thinking about attacking this by expanding ExprState nodes to include a function pointer to the appropriate evaluation subroutine (ExecEvalVar, ExecEvalOper, etc). ExecEvalExpr would be replaced by a macro like this: #define ExecEvalExpr(expr, econtext, isNull, isDone) \ ((*(expr)->evalfunc) (expr, econtext, isNull, isDone)) This would eliminate the overhead of the switch() in ExecEvalExpr, and reduce two levels of function call to one for most node types, at the cost of whatever the delta is between a direct and an indirect function call. In my experience that delta is pretty small and we should come out ahead. The main downside I can see is adding another four or eight bytes to each ExprState node, which might be a noticeable memory burden with complex expressions. A lesser objection is that unless we wanted to complicate and slow down the ExecEvalExpr macro, we'd have to assume that the macro is never passed a null expression pointer. Right now ExecEvalExpr treats a NULL input as if it were a null constant node. However, I'm fairly sure that that test is useless because no one ever tries to evaluate a null expression tree. We could make some other marginal hacks too once we did this. For instance, ExecEvalOper just hands off control to ExecMakeFunctionResult once it's made sure the fcache has been initialized. We could have it do that initialization and then change the evalfunc pointer to point directly to ExecMakeFunctionResult, thereby eliminating another level of function-call overhead on all subsequent calls. There are other tests that could be reduced from every-time to one-time overhead with similar tricks. I'm not sure that this would let us catch up to what Arjen reports as MySQL's expression evaluation speed, but it should at least speed things up a bit with only fairly localized changes. Comments? regards, tom lane
pgsql-hackers by date: