Re: Batching in executor - Mailing list pgsql-hackers
| From | Amit Langote |
|---|---|
| Subject | Re: Batching in executor |
| Date | |
| Msg-id | CA+HiwqEZja5rJ78p3FBDZNvynWsHwanxyt6h0YaK_r84NemXng@mail.gmail.com Whole thread Raw |
| In response to | Re: Batching in executor (Amit Langote <amitlangote09@gmail.com>) |
| List | pgsql-hackers |
On Fri, Dec 5, 2025 at 12:54 AM Amit Langote <amitlangote09@gmail.com> wrote:
> On Wed, Oct 29, 2025 at 3:37 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > On Tue, Oct 28, 2025 at 10:40 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > > That would be nice to see if you have the time, but maybe after I post
> > > a new version.
> >
> > I’ve created a CF entry marked WoA for this in the next CF under the
> > title “Batching in executor, part 1: add batch variant of table AM
> > scan API.” The idea is to track this piece separately so that later
> > parts can have their own entries and we don’t end up with a single
> > long-lived entry that never gets marked done. :-)
>
> I intend to continue working on this, so have just moved it into the
> next fest. I will post a new patch version next week that addresses
> Daniil's comments and implements a few other things I mentioned I will
> in my reply to Tomas on Oct 28; sorry for the delay.
Before I go on vacation for a couple of weeks, here's an updated patch set. I am only including the patches that add TAM interface, add TupleBatch executor wrapper for TAM batches, and use it in SeqScan as I had posted before. There is a new patch to add a BATCHES option to EXPLAIN. I renamed the testing GUC to executor_batch_rows (integer) from the boolean executor_batching. EXPLAIN (BATCHES) example:
+-- Basic batch stats output
+select explain_filter('explain (analyze, batches, buffers off, costs off) select * from batch_test');
+ explain_filter
+----------------------------------------------------------------
+ Seq Scan on batch_test (actual time=N.N..N.N rows=N.N loops=N)
+ Batches: N Avg Rows: N.N Max: N Min: N
+ Planning Time: N.N ms
+ Execution Time: N.N ms
+(4 rows)
What I have not included in this set are the patches that add ExecProcNodeBatch() so that TupleBatch can be passed from one plan node to another (parent), ExprEvalOps (EEOPs) for batched expression evaluation (qual and aggregate transition). I would like to focus on the patches that allow reading batches from TAM into Scan nodes (only SeqScan for now).
After I'm back from vacation, I will post patches for batched qual evaluation in SeqScan filter quals (once bugs are fixed and polished). Batching in Agg node can wait for now.
In the meantime, what I would like to have someone's thoughts on:
* the shape of the TAM APIs -- should I add a TAMBatch or something that is created, populated, and destroyed by the TAM instead of the current void pointer and TupleBatchOps that are initialized in the executor like this (excerpt from 0002):
+ /* Lazily create the AM batch payload. */
+ if (node->ss.ps.ps_Batch->am_payload == NULL)
+ {
+ const TableAmRoutine *tam PG_USED_FOR_ASSERTS_ONLY = scandesc->rs_rd->rd_tableam;
+
+ Assert(tam && tam->scan_begin_batch);
+ node->ss.ps.ps_Batch->am_payload =
+ table_scan_begin_batch(scandesc, node->ss.ps.ps_Batch->maxslots);
+ node->ss.ps.ps_Batch->ops = table_batch_callbacks(node->ss.ss_currentRelation);
+ }
* the shape of TupleBatch itself -- its contents and operations defined in execBatch.c/h.
* any other thoughts you might have on the project, patches.
Benchmark:
Scripts attached if you want to try them.
(Negative % = faster than master)
SELECT * FROM table LIMIT 1 OFFSET N:
Rows Master batch=0 vs master batch=64 vs master
--------------------------------------------------------------
1M 11ms 11ms -0% 8ms -23%
2M 23ms 22ms -1% 18ms -23%
3M 36ms 34ms -5% 27ms -25%
4M 51ms 50ms -2% 38ms -26%
5M 64ms 64ms -1% 48ms -26%
10M 147ms 145ms -1% 114ms -22%
SELECT * FROM WHERE a > 0 LIMIT 1 OFFSET N:
Rows Master batch=0 vs master batch=64 vs master
--------------------------------------------------------------
1M 31ms 31ms +0% 16ms -48%
2M 64ms 64ms -0% 34ms -47%
3M 67ms 66ms -1% 50ms -25%
4M 91ms 90ms -1% 71ms -22%
5M 119ms 113ms -5% 88ms -26%
10M 262ms 261ms -0% 205ms -21%
SELECT * FROM table WHERE o > 0 LIMIT 1 OFFSET N (last column - deform-heavy):
Rows Master batch=0 vs master batch=64 vs master
--------------------------------------------------------------
1M 38ms 37ms -2% 38ms +0%
2M 79ms 75ms -6% 77ms -4%
3M 182ms 186ms +2% 160ms -12%
4M 250ms 252ms +1% 219ms -12%
5M 314ms 316ms +1% 273ms -13%
10M 647ms 651ms +1% 604ms -7%
The smaller improvement with WHERE o > 0 is expected since accessing the last column requires deforming most of the tuple, which dominates the execution time. Future work on batched tuple deformation could help here.
Note on regressions with executor_batch_rows = 0 vs master:
I am not seeing the regressions with batch_rows=0 vs master as I did before. I think some of it might have to do with my removing some stray fields from HeapScanData that were accidentally left there in the earlier patches. Also, the regressions I was observing earlier seemed more to have to do with using gcc to compile master tree and clang to compile patched tree, which resulted in code layout changes that seemed to cause patched binary to regress. Would be nice if these numbers can be verified by others.
--
Thanks, Amit Langote
> On Wed, Oct 29, 2025 at 3:37 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > On Tue, Oct 28, 2025 at 10:40 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > > That would be nice to see if you have the time, but maybe after I post
> > > a new version.
> >
> > I’ve created a CF entry marked WoA for this in the next CF under the
> > title “Batching in executor, part 1: add batch variant of table AM
> > scan API.” The idea is to track this piece separately so that later
> > parts can have their own entries and we don’t end up with a single
> > long-lived entry that never gets marked done. :-)
>
> I intend to continue working on this, so have just moved it into the
> next fest. I will post a new patch version next week that addresses
> Daniil's comments and implements a few other things I mentioned I will
> in my reply to Tomas on Oct 28; sorry for the delay.
Before I go on vacation for a couple of weeks, here's an updated patch set. I am only including the patches that add TAM interface, add TupleBatch executor wrapper for TAM batches, and use it in SeqScan as I had posted before. There is a new patch to add a BATCHES option to EXPLAIN. I renamed the testing GUC to executor_batch_rows (integer) from the boolean executor_batching. EXPLAIN (BATCHES) example:
+-- Basic batch stats output
+select explain_filter('explain (analyze, batches, buffers off, costs off) select * from batch_test');
+ explain_filter
+----------------------------------------------------------------
+ Seq Scan on batch_test (actual time=N.N..N.N rows=N.N loops=N)
+ Batches: N Avg Rows: N.N Max: N Min: N
+ Planning Time: N.N ms
+ Execution Time: N.N ms
+(4 rows)
What I have not included in this set are the patches that add ExecProcNodeBatch() so that TupleBatch can be passed from one plan node to another (parent), ExprEvalOps (EEOPs) for batched expression evaluation (qual and aggregate transition). I would like to focus on the patches that allow reading batches from TAM into Scan nodes (only SeqScan for now).
After I'm back from vacation, I will post patches for batched qual evaluation in SeqScan filter quals (once bugs are fixed and polished). Batching in Agg node can wait for now.
In the meantime, what I would like to have someone's thoughts on:
* the shape of the TAM APIs -- should I add a TAMBatch or something that is created, populated, and destroyed by the TAM instead of the current void pointer and TupleBatchOps that are initialized in the executor like this (excerpt from 0002):
+ /* Lazily create the AM batch payload. */
+ if (node->ss.ps.ps_Batch->am_payload == NULL)
+ {
+ const TableAmRoutine *tam PG_USED_FOR_ASSERTS_ONLY = scandesc->rs_rd->rd_tableam;
+
+ Assert(tam && tam->scan_begin_batch);
+ node->ss.ps.ps_Batch->am_payload =
+ table_scan_begin_batch(scandesc, node->ss.ps.ps_Batch->maxslots);
+ node->ss.ps.ps_Batch->ops = table_batch_callbacks(node->ss.ss_currentRelation);
+ }
* the shape of TupleBatch itself -- its contents and operations defined in execBatch.c/h.
* any other thoughts you might have on the project, patches.
Benchmark:
Scripts attached if you want to try them.
(Negative % = faster than master)
SELECT * FROM table LIMIT 1 OFFSET N:
Rows Master batch=0 vs master batch=64 vs master
--------------------------------------------------------------
1M 11ms 11ms -0% 8ms -23%
2M 23ms 22ms -1% 18ms -23%
3M 36ms 34ms -5% 27ms -25%
4M 51ms 50ms -2% 38ms -26%
5M 64ms 64ms -1% 48ms -26%
10M 147ms 145ms -1% 114ms -22%
SELECT * FROM WHERE a > 0 LIMIT 1 OFFSET N:
Rows Master batch=0 vs master batch=64 vs master
--------------------------------------------------------------
1M 31ms 31ms +0% 16ms -48%
2M 64ms 64ms -0% 34ms -47%
3M 67ms 66ms -1% 50ms -25%
4M 91ms 90ms -1% 71ms -22%
5M 119ms 113ms -5% 88ms -26%
10M 262ms 261ms -0% 205ms -21%
SELECT * FROM table WHERE o > 0 LIMIT 1 OFFSET N (last column - deform-heavy):
Rows Master batch=0 vs master batch=64 vs master
--------------------------------------------------------------
1M 38ms 37ms -2% 38ms +0%
2M 79ms 75ms -6% 77ms -4%
3M 182ms 186ms +2% 160ms -12%
4M 250ms 252ms +1% 219ms -12%
5M 314ms 316ms +1% 273ms -13%
10M 647ms 651ms +1% 604ms -7%
The smaller improvement with WHERE o > 0 is expected since accessing the last column requires deforming most of the tuple, which dominates the execution time. Future work on batched tuple deformation could help here.
Note on regressions with executor_batch_rows = 0 vs master:
I am not seeing the regressions with batch_rows=0 vs master as I did before. I think some of it might have to do with my removing some stray fields from HeapScanData that were accidentally left there in the earlier patches. Also, the regressions I was observing earlier seemed more to have to do with using gcc to compile master tree and clang to compile patched tree, which resulted in code layout changes that seemed to cause patched binary to regress. Would be nice if these numbers can be verified by others.
--
Thanks, Amit Langote
Attachment
pgsql-hackers by date: