Home > mailing lists

Reducing expression evaluation overhead - Mailing list pgsql-hackers

From	Tom Lane
Subject	Reducing expression evaluation overhead
Date	March 15, 2004 19:39:58
Msg-id	8715.1079393997@sss.pgh.pa.us Whole thread Raw
Responses	Re: Reducing expression evaluation overhead
List	pgsql-hackers

Tree view

I've been looking at an example provided by Arjen van der Meijden in
which practically all the runtime goes into evaluating very trivial
comparison expressions (it's basically a CASE statement with a huge
number of arms).  Whether or not you think that a CASE with a huge
number of arms is a particularly sane thing to do, it does seem that
the overhead is uncomfortably large.  I get profiles like this:
 %   cumulative   self              self     total           time   seconds   seconds    calls   s/call   s/call  name
 38.15     41.92    41.92   229646     0.00     0.00  nconc21.76     65.84    23.92 199054454     0.00     0.00
ExecEvalExpr11.38    78.34    12.50    10000     0.00     0.00  ExecEvalCase 8.43     87.61     9.27 66348151     0.00
  0.00  ExecEvalFuncArgs 8.12     96.54     8.93 66348151     0.00     0.00  ExecMakeFunctionResult 2.96     99.78
3.2566348151     0.00     0.00  ExecEvalVar 1.23    101.14     1.36    10058     0.00     0.00  AllocSetCheck 1.23
102.49    1.35 66348151     0.00     0.00  ExecEvalOper 1.12    103.72     1.24    76537     0.00     0.00
OpernameGetCandidates0.85    104.66     0.94 66424693     0.00     0.00  int4eq
 
 %   cumulative   self              self     total           time   seconds   seconds    calls  ms/call  ms/call  name
 25.24    161.92   161.92    51488     3.14     3.14  nconc20.99    296.57   134.65 172163502     0.00     0.00
ExecEvalExpr14.90   392.18    95.61                             _mcount 9.66    454.16    61.98 57381167     0.00
0.00 ExecMakeFunctionResult 7.32    501.12    46.96    10000     4.70     4.70  ExecEvalCase 7.24    547.60    46.48
57381167    0.00     0.00  ExecEvalFuncArgs 6.47    589.09    41.49 57381167     0.00     0.00  ExecEvalVar 2.90
607.69   18.60 57381167     0.00     0.00  ExecEvalOper 1.29    615.94     8.25                             int4eq
 

The above are from different-size test cases on different hardware,
just to make sure that the data is reasonably trustworthy.

The nconc entry may be ignored; it's lappend() overhead from the parser,
and will drop into the noise once Neil gets his list rewrite done.  The
thing that jumps out at me is that ExecEvalExpr is taking way more
cycles than it has any right to do.

The test conditions in the CASE are of the form "var = constant", and so
for each tested condition, ExecEvalExpr is invoked three times: one call
is passed off to ExecEvalVar, one to ExecEvalOper, and the constant is
handled inline by fairly trivial code, which is surely a good bit
cheaper than ExecEvalVar.  So it seems that a very large fraction of the
runtime is being spent just in ExecEvalExpr's switching function, plus
the overhead of an extra level of function call to reach the routines
that actually do the work.

I am thinking about attacking this by expanding ExprState nodes to
include a function pointer to the appropriate evaluation subroutine
(ExecEvalVar, ExecEvalOper, etc).  ExecEvalExpr would be replaced by
a macro like this:

#define ExecEvalExpr(expr, econtext, isNull, isDone) \ ((*(expr)->evalfunc) (expr, econtext, isNull, isDone))

This would eliminate the overhead of the switch() in ExecEvalExpr, and
reduce two levels of function call to one for most node types, at the
cost of whatever the delta is between a direct and an indirect function
call.  In my experience that delta is pretty small and we should come
out ahead.

The main downside I can see is adding another four or eight bytes to
each ExprState node, which might be a noticeable memory burden with
complex expressions.  A lesser objection is that unless we wanted to
complicate and slow down the ExecEvalExpr macro, we'd have to assume
that the macro is never passed a null expression pointer.  Right now
ExecEvalExpr treats a NULL input as if it were a null constant node.
However, I'm fairly sure that that test is useless because no one ever
tries to evaluate a null expression tree.

We could make some other marginal hacks too once we did this.  For
instance, ExecEvalOper just hands off control to ExecMakeFunctionResult
once it's made sure the fcache has been initialized.  We could have it
do that initialization and then change the evalfunc pointer to point
directly to ExecMakeFunctionResult, thereby eliminating another level of
function-call overhead on all subsequent calls.  There are other tests
that could be reduced from every-time to one-time overhead with similar
tricks.

I'm not sure that this would let us catch up to what Arjen reports as
MySQL's expression evaluation speed, but it should at least speed things
up a bit with only fairly localized changes.

Comments?
        regards, tom lane

pgsql-hackers by date:

From: Tom Lane
Date: 15 March 2004, 19:14:51
Subject: Re: listening addresses

From: Philip Warner
Date: 15 March 2004, 20:04:54
Subject: Re: Custom format for pg_dumpall

Reducing expression evaluation overhead - Mailing list pgsql-hackers

Previous

Next