Re: New Table Access Methods for Multi and Single Inserts - Mailing list pgsql-hackers
From | Bharath Rupireddy |
---|---|
Subject | Re: New Table Access Methods for Multi and Single Inserts |
Date | |
Msg-id | CALj2ACUcv5pZoB0=gRrz54M9+YT9JCmo6FYyo5WUS6wnS+em=A@mail.gmail.com Whole thread Raw |
In response to | Re: New Table Access Methods for Multi and Single Inserts (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: New Table Access Methods for Multi and Single Inserts
|
List | pgsql-hackers |
On Tue, Mar 19, 2024 at 10:40 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > I've not reviewed the patches in depth yet, but run performance tests > for CREATE MATERIALIZED VIEW. The test scenarios is: Thanks for looking into this. > Is there any reason why we copy a buffer-heap tuple to another > buffer-heap tuple? Which results in that we increments the buffer > refcount and register it to ResourceOwner for every tuples. I guess > that the destination tuple slot is not necessarily a buffer-heap, and > we could use VirtualTupleTableSlot instead. It would in turn require > copying a heap tuple. I might be missing something but it improved the > performance at least in my env. The change I made was: > > - dstslot = table_slot_create(state->rel, NULL); > + //dstslot = table_slot_create(state->rel, NULL); > + dstslot = MakeTupleTableSlot(RelationGetDescr(state->rel), > + &TTSOpsVirtual); > + > > And the execution times are: > 1588.984 ms > 1591.618 ms > 1582.519 ms Yes, usingVirtualTupleTableSlot helps improve the performance a lot. Below are results from my testing. Note that CMV, RMV, CTAS stand for CREATE MATERIALIZED VIEW, REFRESH MATERIALIZED VIEW, CREATE TABLE AS respectively. These commands got faster by 62.54%, 68.87%, 74.31% or 2.67, 3.21, 3.89 times respectively. I've used the test case specified at [1]. HEAD: CMV: Time: 6276.468 ms (00:06.276) CTAS: Time: 8141.632 ms (00:08.142) RMV: Time: 14747.139 ms (00:14.747) PATCHED: CMV: Time: 2350.282 ms (00:02.350) CTAS: Time: 2091.427 ms (00:02.091) RMV: Time: 4590.180 ms (00:04.590) I quickly looked at the description of what a "virtual" tuple is from src/include/executor/tuptable.h [2]. IIUC, it is invented for minimizing data copying, but it also says that it's the responsibility of the generating plan node to be sure these resources are not released for as long as the virtual tuple needs to be valid or is materialized. While it says this, as far as this patch is concerned, the virtual slot gets materialized when we copy the tuples from source slot (can be any type of slot) to destination slot (which is virtual slot). See ExecCopySlot-> tts_virtual_copyslot->tts_virtual_materialize. This way, tts_virtual_copyslot ensures the tuples storage doesn't depend on external memory because all the datums that aren't passed by value are copied into the slot's memory context. With the above understanding, it looks safe to use virtual slots for the multi insert buffered slots. I'm not so sure if I'm missing anything here. [1] cd $PWD/pg17/bin rm -rf data logfile ./initdb -D data ./pg_ctl -D data -l logfile start ./psql -d postgres \timing drop table test cascade; create unlogged table test (c int); insert into test select generate_series(1, 10000000); create materialized view test_mv as select * from test; create table test_copy as select * from test; insert into test select generate_series(1, 10000000); refresh materialized view test_mv; [2] * A "virtual" tuple is an optimization used to minimize physical data copying * in a nest of plan nodes. Until materialized pass-by-reference Datums in * the slot point to storage that is not directly associated with the * TupleTableSlot; generally they will point to part of a tuple stored in a * lower plan node's output TupleTableSlot, or to a function result * constructed in a plan node's per-tuple econtext. It is the responsibility * of the generating plan node to be sure these resources are not released for * as long as the virtual tuple needs to be valid or is materialized. Note * also that a virtual tuple does not have any "system columns". -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: