track needed attributes in plan nodes for executor use - Mailing list pgsql-hackers

From Amit Langote
Subject track needed attributes in plan nodes for executor use
Date
Msg-id CA+HiwqHXDY6TxegR2Cr_4sRa_LY1QJnoL8XRmOqdfrx21pZ6cw@mail.gmail.com
Whole thread Raw
Responses Re: track needed attributes in plan nodes for executor use
List pgsql-hackers
Hi,

I’ve been experimenting with an optimization that reduces executor
overhead by avoiding unnecessary attribute deformation. Specifically,
if the executor knows which attributes are actually needed by a plan
node’s targetlist and qual, it can skip deforming unused columns
entirely.

In a proof-of-concept patch, I initially computed the needed
attributes during ExecInitSeqScan by walking the plan’s qual and
targetlist to support deforming only what’s needed when evaluating
expressions in ExecSeqScan() or the variant thereof (I started with
SeqScan to keep the initial patch minimal). However, adding more work
to ExecInit* adds to executor startup cost, which we should generally
try to reduce. It also makes it harder to apply the optimization
uniformly across plan types.

I’d now like to propose computing the needed attributes at planning
time instead. This can be done at the bottom of create_plan_recurse,
after the plan node has been constructed. A small helper like
record_needed_attrs(plan) can walk the node’s targetlist and qual
using pull_varattnos() and store the result in a new Bitmapset
*attr_used field in the Plan struct. System attributes returned by
pull_varattnos() can be filtered out during this step, since they're
either not relevant to deformation or not performance sensitive.

This also lays the groundwork for a related executor-side optimization
that David Rowley suggested to me off-list. The idea is to remember,
in the TupleDesc, either the attribute number or the byte offset of
the first variable-length attribute. Then, if the minimum required
attribute (as provided by attr_used) lies before that, the executor
can safely jump directly to it using the cached offset, rather than
starting deformation from attno 0 as it currently does. That avoids
walking through fixed-length attributes that aren't needed --
specifically, skipping per-attribute alignment, null checking, and
offset tracking for unused columns -- which reduces CPU work and
avoids loading irrelevant tuple bytes into cache.

With both patches in place, heap tuple deforming can skip over unused
attributes entirely. For example, on a 30-column table where the first
15 columns are fixed-width, the query:

select sum(a_1) from foo where a_10 = $1;

which references only two fixed-width columns, ran nearly 2x faster
with the optimization in place (with heap pages prewarmed into
shared_buffers).

In more complex plans, for example those involving a Sort or Join
between the scan and aggregation, the CPU cost of the intermediate
node may dominate, making deforming-related savings at the top less
visible in overall performance. Still, I don't think that's a reason
to avoid enabling this optimization more broadly across plan nodes.

I'll post the PoC patches and performance measurements. Posting this
in advance to get feedback on the proposed direction and where best to
place attr_used.

--
Thanks,
Amit Langote



pgsql-hackers by date:

Previous
From: Dean Rasheed
Date:
Subject: Re: array_random
Next
From: Florents Tselai
Date:
Subject: Re: encode/decode support for base64url