Re: Batching in executor - Mailing list pgsql-hackers

From Amit Langote
Subject Re: Batching in executor
Date
Msg-id CA+HiwqEZja5rJ78p3FBDZNvynWsHwanxyt6h0YaK_r84NemXng@mail.gmail.com
Whole thread Raw
In response to Re: Batching in executor  (Amit Langote <amitlangote09@gmail.com>)
List pgsql-hackers
On Fri, Dec 5, 2025 at 12:54 AM Amit Langote <amitlangote09@gmail.com> wrote:
> On Wed, Oct 29, 2025 at 3:37 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > On Tue, Oct 28, 2025 at 10:40 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > > That would be nice to see if you have the time, but maybe after I post
> > > a new version.
> >
> > I’ve created a CF entry marked WoA for this in the next CF under the
> > title “Batching in executor, part 1: add batch variant of table AM
> > scan API.” The idea is to track this piece separately so that later
> > parts can have their own entries and we don’t end up with a single
> > long-lived entry that never gets marked done. :-)
>
> I intend to continue working on this, so have just moved it into the
> next fest.  I will post a new patch version next week that addresses
> Daniil's comments and implements a few other things I mentioned I will
> in my reply to Tomas on Oct 28; sorry for the delay.

Before I go on vacation for a couple of weeks, here's an updated patch set.  I am only including the patches that add TAM interface, add TupleBatch executor wrapper for TAM batches, and use it in SeqScan as I had posted before.  There is a new patch to add a BATCHES option to EXPLAIN.  I renamed the testing GUC to executor_batch_rows (integer) from the boolean executor_batching.  EXPLAIN (BATCHES) example:

+-- Basic batch stats output
+select explain_filter('explain (analyze, batches, buffers off, costs off) select * from batch_test');
+                         explain_filter                        
+----------------------------------------------------------------
+ Seq Scan on batch_test (actual time=N.N..N.N rows=N.N loops=N)
+   Batches: N  Avg Rows: N.N  Max: N  Min: N
+ Planning Time: N.N ms
+ Execution Time: N.N ms
+(4 rows)

What I have not included in this set are the patches that add ExecProcNodeBatch() so that TupleBatch can be passed from one plan node to another (parent), ExprEvalOps (EEOPs) for batched expression evaluation (qual and aggregate transition).  I would like to focus on the patches that allow reading batches from TAM into Scan nodes (only SeqScan for now).

After I'm back from vacation, I will post patches for batched qual evaluation in SeqScan filter quals (once bugs are fixed and polished). Batching in Agg node can wait for now.

In the meantime, what I would like to have someone's thoughts on:

* the shape of the TAM APIs -- should I add a TAMBatch or something that is created, populated, and destroyed by the TAM instead of the current void pointer and TupleBatchOps that are initialized in the executor like this (excerpt from 0002):

+    /* Lazily create the AM batch payload. */
+    if (node->ss.ps.ps_Batch->am_payload == NULL)
+    {
+        const TableAmRoutine *tam PG_USED_FOR_ASSERTS_ONLY = scandesc->rs_rd->rd_tableam;
+
+        Assert(tam && tam->scan_begin_batch);
+        node->ss.ps.ps_Batch->am_payload =
+            table_scan_begin_batch(scandesc, node->ss.ps.ps_Batch->maxslots);
+        node->ss.ps.ps_Batch->ops = table_batch_callbacks(node->ss.ss_currentRelation);
+    }

* the shape of TupleBatch itself -- its contents and operations defined in execBatch.c/h.

* any other thoughts you might have on the project, patches.

Benchmark:

Scripts attached if you want to try them.

(Negative % = faster than master)

SELECT * FROM table LIMIT 1 OFFSET N:
Rows      Master    batch=0   vs master   batch=64   vs master
--------------------------------------------------------------
1M          11ms       11ms        -0%        8ms       -23%
2M          23ms       22ms        -1%       18ms       -23%
3M          36ms       34ms        -5%       27ms       -25%
4M          51ms       50ms        -2%       38ms       -26%
5M          64ms       64ms        -1%       48ms       -26%
10M        147ms      145ms        -1%      114ms       -22%

SELECT * FROM WHERE a > 0 LIMIT 1 OFFSET N:
Rows      Master    batch=0   vs master   batch=64   vs master
--------------------------------------------------------------
1M          31ms       31ms        +0%       16ms       -48%
2M          64ms       64ms        -0%       34ms       -47%
3M          67ms       66ms        -1%       50ms       -25%
4M          91ms       90ms        -1%       71ms       -22%
5M         119ms      113ms        -5%       88ms       -26%
10M        262ms      261ms        -0%      205ms       -21%

SELECT * FROM table WHERE o > 0 LIMIT 1 OFFSET N (last column - deform-heavy):
Rows      Master    batch=0   vs master   batch=64   vs master
--------------------------------------------------------------
1M          38ms       37ms        -2%       38ms        +0%
2M          79ms       75ms        -6%       77ms        -4%
3M         182ms      186ms        +2%      160ms       -12%
4M         250ms      252ms        +1%      219ms       -12%
5M         314ms      316ms        +1%      273ms       -13%
10M        647ms      651ms        +1%      604ms        -7%

The smaller improvement with WHERE o > 0 is expected since accessing the last column requires deforming most of the tuple, which dominates the execution time. Future work on batched tuple deformation could help here.

Note on regressions with executor_batch_rows = 0 vs master:

I am not seeing the regressions with batch_rows=0 vs master as I did before.  I think some of it might have to do with my removing some stray fields from HeapScanData that were accidentally left there in the earlier patches.  Also, the regressions I was observing earlier seemed more to have to do with using gcc to compile master tree and clang to compile patched tree, which resulted in code layout changes that seemed to cause patched binary to regress.  Would be nice if these numbers can be verified by others.

--
Thanks, Amit Langote
Attachment

pgsql-hackers by date:

Previous
From: jian he
Date:
Subject: Re: SQL-level pg_datum_image_equal
Next
From: Amit Langote
Date:
Subject: Re: Batching in executor