[HACKERS] UPDATE of partition key - Mailing list pgsql-hackers
From | Amit Khandekar |
---|---|
Subject | [HACKERS] UPDATE of partition key |
Date | |
Msg-id | CAJ3gD9do9o2ccQ7j7+tSgiE1REY65XRiMb=yJO3u3QhyP8EEPQ@mail.gmail.com Whole thread Raw |
Responses |
Re: [HACKERS] UPDATE of partition key
Re: [HACKERS] UPDATE of partition key Re: [HACKERS] UPDATE of partition key Re: [HACKERS] UPDATE of partition key |
List | pgsql-hackers |
Currently, an update of a partition key of a partition is not allowed, since it requires to move the row(s) into the applicable partition. Attached is a WIP patch (update-partition-key.patch) that removes this restriction. When an UPDATE causes the row of a partition to violate its partition constraint, then a partition is searched in that subtree that can accommodate this row, and if found, the row is deleted from the old partition and inserted in the new partition. If not found, an error is reported. There are a few things that can be discussed about : 1. We can run an UPDATE using a child partition at any level in a nested partition tree. In such case, we should move the row only within that child subtree. For e.g. , in a tree such as : tab -> t1 -> t1_1 t1_2 t2 -> t2_1 t2_2 For "UPDATE t2 set col1 = 'AAA' " , if the modified tuple does not fit in t2_1 but can fit in t1_1, it should not be moved to t1_1, because the UPDATE is fired using t2. 2. In the patch, as part of the row movement, ExecDelete() is called followed by ExecInsert(). This is done that way, because we want to have the ROW triggers on that (sub)partition executed. If a user has explicitly created DELETE and INSERT BR triggers for this partition, I think we should run those. While at the same time, another question is, what about UPDATE trigger on the same table ? Here again, one can argue that because this UPDATE has been transformed into a DELETE-INSERT, we should not run UPDATE trigger for row-movement. But there can be a counter-argument. For e.g. if a user needs to make sure about logging updates of particular columns of a row, he will expect the logging to happen even when that row was transparently moved. In the patch, I have retained the firing of UPDATE BR trigger. 3. In case of a concurrent update/delete, suppose session A has locked the row for deleting it. Now a session B has decided to update this row and that is going to cause row movement, which means it will delete it first. But when session A is finished deleting it, session B finds that it is already deleted. In such case, it should not go ahead with inserting a new row as part of the row movement. For that, I have added a new parameter 'already_delete' for ExecDelete(). Of course, this still won't completely solve the concurrency anomaly. In the above case, the UPDATE of Session B gets lost. May be, for a user that does not tolerate this, we can have a table-level option that disallows row movement, or will cause an error to be thrown for one of the concurrent session. 4. The ExecSetupPartitionTupleRouting() is re-used for routing the row that is to be moved. So in ExecInitModifyTable(), we call ExecSetupPartitionTupleRouting() even for UPDATE. We can also do this only during execution time for the very first time we find that we need to do a row movement. I will think over that, but I am thinking it might complicate things, as compared to always doing the setup for UPDATE. WIll check on that. 5. Regarding performance testing, I have compared the results of row-movement with partition versus row-movement with inheritance tree using triggers. Below are the details : Schema : CREATE TABLE ptab (a date, b int, c int); CREATE TABLE ptab (a date, b int, c int) PARTITION BY RANGE (a, b); CREATE TABLE ptab_1_1 PARTITION OF ptab for values from ('1900-01-01', 1) to ('1900-01-01', 101) PARTITION BY range (c); CREATE TABLE ptab_1_1_1 PARTITION OF ptab_1_1 for values from (1) to (51); CREATE TABLE ptab_1_1_2 PARTITION OF ptab_1_1 for values from (51) to (101); ..... ..... CREATE TABLE ptab_1_1_n PARTITION OF ptab_1_1 for values from (n) to (n+m); ...... ...... CREATE TABLE ptab_5_n PARTITION OF ptab for values from ('1905-01-01', 101) to ('1905-01-01', 201) PARTITION BY range (c); CREATE TABLE ptab_1_2_1 PARTITION OF ptab_1_2 for values from (1) to (51); CREATE TABLE ptab_1_2_2 PARTITION OF ptab_1_2 for values from (51) to (101); ..... ..... CREATE TABLE ptab_1_2_n PARTITION OF ptab_1_2 for values from (n) to (n+m); ..... ..... Similarly for inheritance : CREATE TABLE ptab_1_1 (constraint check_ptab_1_1 check (a = '1900-01-01' and b >= 1 and b < 8)) inherits (ptab); create trigger brutrig_ptab_1_1 before update on ptab_1_1 for each row execute procedure ptab_upd_trig(); CREATE TABLE ptab_1_1_1 (constraint check_ptab_1_1_1 check (c >= 1 and c < 51)) inherits (ptab_1_1); create trigger brutrig_ptab_1_1_1 before update on ptab_1_1_1 for each row execute procedure ptab_upd_trig(); CREATE TABLE ptab_1_1_2 (constraint check_ptab_1_1_2 check (c >= 51 and c < 101)) inherits (ptab_1_1); create trigger brutrig_ptab_1_1_2 before update on ptab_1_1_2 for each row execute procedure ptab_upd_trig(); I had to have a BR UPDATE trigger on each of the leaf tables. Attached is the BR trigger function update_trigger.sql. There it generates the table name assuming a fixed pattern of distribution of data over the partitions. It first deletes the row and then inserts a new one. I also skipped the deletion part, and it did not show any significant change in results. parts partitioned inheritance no. of rows subpartitions ===== =========== =========== =========== ============= 500 10 sec 3 min 02 sec 1,000,000 0 1000 10 sec 3 min 05 sec 1,000,000 0 1000 1 min 38sec 30min 50 sec 10,000,000 0 4000 28 sec 5 min 41 sec 1,000,000 10 part : total number of partitions including subparitions if any. partitioned : Partitions created using declarative syntax. inheritence : Partitions created using inheritence , check constraints and insert,update triggers. subpartitions : Number of subpartitions for each partition (in a 2-level tree) Overall the UPDATE in partitions is faster by 10-20 times compared with inheritance with triggers. The UPDATE query moved all of the rows into another partition. It was something like this : update ptab set a = '1949-01-1' where a <= '1924-01-01' For a plain table with 1,000,000 rows, the UPDATE took 8 seconds, and with 10,000,000 rows, it took 1min 32sec. In general, for both partitioned and inheritence tables, the time taken linearly rose with the number of rows. -- Thanks, -Amit Khandekar EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
pgsql-hackers by date: