Re: New Table Access Methods for Multi and Single Inserts - Mailing list pgsql-hackers
From | Jeff Davis |
---|---|
Subject | Re: New Table Access Methods for Multi and Single Inserts |
Date | |
Msg-id | 6be6f58815dc0844fbe058edf56b4e735a6efc1c.camel@j-davis.com Whole thread Raw |
In response to | Re: New Table Access Methods for Multi and Single Inserts (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>) |
Responses |
Re: New Table Access Methods for Multi and Single Inserts
|
List | pgsql-hackers |
On Thu, 2024-03-21 at 13:10 +0530, Bharath Rupireddy wrote: > I'm attaching the v13 patches using virtual tuple slots for buffered > tuples for multi inserts. Comments: * Do I understand correctly that CMV, RMV, and CTAS experience a performance benefit, but COPY FROM does not? And is that because COPY already used table_multi_insert, whereas CMV and RMV did not? * In the COPY FROM code, it looks like it's deciding whether to flush based on MAX_BUFFERED_TUPLES, but the slot array is allocated with MAX_BUFFERED_SLOTS (they happen to be the same for heap, but perhaps not for other AMs). The copy code shouldn't be using internal knowledge of the multi-insert code; it should know somehow from the API when the right time is to flush. * How is the memory management expected to work? It looks like COPY FROM is using the ExprContext when running the input functions, but we really want to be using a memory context owned by the table AM, right? * What's the point of the table_multi_insert_slots() "num_slots" argument? The only caller simply discards it. * table_tuple_insert_v2 isn't called anywhere, what's it for? * the "v2" naming is inconsistent -- it seems you only added it in places where there's a name conflict, which makes it hard to tell which API methods go together. I'm not sure how widely table_multi_insert* is used outside of core, so it's possible that we may even be able to just change those APIs and the few extensions that call it can be updated. * Memory usage tracking should be done in the AM by allocating everything in a single context so it's easy to check the size. Don't manually add up memory. * I don't understand: "Caller may have got the slot using heap_multi_insert_next_free_slot, filled it and passed. So, skip copying in such a case." If the COPY FROM had a WHERE clause and skipped a tuple after filling the slot, doesn't that mean the slot has bogus data from the last tuple? * We'd like this to work for insert-into-select (IIS) and logical replication, too. Do you see any problem there, or is it just a matter of code? * Andres had some comments[1] that don't seem entirely addressed. - You are still allocating the AM-specific part of TableModifyState as a separately-allocated chunk. - It's still called TableInsertState rather than TableModifyState as he suggested. If you change that, you should also change to table_modify_begin/end. - CID: I suppose Andres is considering the use case of "BEGIN; ... ten thousand inserts ... COMMIT;". I don't think this problem is really solvable (discussed below) but we should have some response/consensus on that point. - He mentioned that we only need one new method implemented by the AM. I don't know if one is enough, but 7 does seem excessive. I have some simplification ideas below. Overall: If I understand correctly, there are two ways to use the API: 1. used by CTAS, MV: tistate = table_insert_begin(...); table_multi_insert_v2(tistate, tup1); ... table_multi_insert_v2(tistate, tupN); table_insert_end(tistate); 2. used by COPY ... FROM: tistate = table_insert_begin(..., SKIP_FLUSH); if (multi_insert_slot_array_is_full()) table_multi_insert_flush(tistate); slot = table_insert_next_free_slot(tistate); ... fill slot with tup1 table_multi_insert_v2(tistate, tup1); ... slot = table_insert_next_free_slot(tistate); ... fill slot with tupN table_multi_insert_v2(tistate, tupN); table_insert_end(tistate); Those two uses need comments explaining what's going on. It appears the SKIP_FLUSH flag is used to indicate which use the caller intends. Use #2 is not enforced well by either the API or runtime checks. If the caller neglects to check for a full buffer, it appears that it will just overrun the slots array. Also, for use #2, table_multi_insert_v2() doesn't do much other than incrementing the memory used. The slot will never be NULL because it was obtained with table_multi_insert_next_free_slot(), and the other two branches don't happen when SKIP_FLUSH is true. The real benefit to COPY of your new API is that the AM can manage slots for itself, and how many tuples may be tracked (which might be a lot higher for non-heap AMs). I agree with Luc Vlaming's comment[2] that more should be left to the table AM. Your patch tries too hard to work with the copyfrom.c slot array, somehow sharing it with the table AM. That adds complexity to the API and feels like a layering violation. We also shouldn't mandate a slot array in the API. Each slot is 64 bytes -- a lot of overhead for small tuples. For a non-heap AM, it's much better to store the tuple data in a big contiguous chunk with minimal overhead. Let's just have a simple API like: tmstate = table_modify_begin(...); table_modify_save_insert(tmstate, tup1); ... table_modify_save_insert(tmstate, tupN); table_modify_end(tmstate); and leave it up to the AM to do all the buffering and flushing work (as Luc Vlaming suggested[2]). That leaves one problem, which is: how do we update the indexes and call AR triggers while flushing? I think the best way is to just have a callback in the TableModifyState that is called during flush. (I don't think that would affect performance, but worth double-checking.) We have to disable this whole multi-insert mechanism if there are volatile BR/AR triggers, because those are supposed to see already- inserted tuples. That's not a problem with your patch but it is a bit unfortunate -- triggers can be costly already, but this increases the penalty. There may be some theoretical ways to avoid this problem, like reading tuples out of the unflushed buffer during a SELECT, which sounds a little too clever (though perhaps not completely crazy if the AM is in control of both?). For potentially working with multi-updates/deletes, it might be as simple as tracking the old TIDs along with the slots and having new _save_update and _save_delete methods. I haven't thought deeply about that, and I'm not sure we have a good example AM to work with, but it seems plausible that we could make something useful here. To batch multiple different INSERT statements within a transaction just seems like a really hard problem. That could mean different CIDs, but also different subtransaction IDs. Constraint violation errors will happen at the time of flushing, which could be many commands later from the one that actually violates the constraint. And what if someone issues a SELECT in the middle of the transaction, how does it see the already-inserted-but-not-flushed tuples? If that's not hard enough already, then you would also need to extend low-level APIs to accept arbitrary CIDs and subxact IDs when storing tuples during a flush. The only way I could imagine solving all of these problems is declaring somehow that your transaction won't do any of these complicated things, and that you don't mind getting constraint violations at the wrong time. So I recommend that you punt on this problem. Regards, Jeff Davis [1] https://www.postgresql.org/message-id/20230603223824.o7iyochli2dwwi7k%40alap3.anarazel.de [2] https://www.postgresql.org/message-id/508af801-6356-d36b-1867-011ac6df8f55%40swarm64.com
pgsql-hackers by date: