Thread: BUG #15832: COPY into a partitioned table breaks its indexes
The following bug has been logged on the website: Bug reference: 15832 Logged by: TAKATSUKA Haruka Email address: harukat@sraoss.co.jp PostgreSQL version: 12beta1 Operating system: CentOS 7.4.1708 Description: When I tested the performance improvement of 12beta1 at COPY into a partitioned table, I found the indexes broken. It doesn't happen in version 11.3 or INSERT as long as I tested. Reproduce steps: db1=# CREATE TABLE oya (id int primary key, v text) PARTITION BY RANGE (id); db1=# SELECT 'CREATE TABLE ko' || g || ' PARTITION OF oya FOR VALUES FROM (' || g * 10 - 10 || ') TO (' || g * 10 || ');' FROM generate_series(0, 3000) as g; db1=# \gexec db1=# COPY oya FROM '/home/postgres/dat.csv' CSV; COPY 30000 db1=# SELECT * FROM oya LIMIT 3; id | v ----+---------------------------------- 0 | cfcd208495d565ef66e7dff9f98764da 1 | c4ca4238a0b923820dcc509a6f75849b 2 | c81e728d9d4c2f636f067f89cc14862c (3 rows) db1=# SELECT * FROM oya WHERE id = 1; id | v ----+--- (0 rows) db1=# REINDEX TABLE ko1; REINDEX db1=# SELECT * FROM oya WHERE id = 1; id | v ----+---------------------------------- 1 | c4ca4238a0b923820dcc509a6f75849b (1 row) db1=# \d ko2 Table "public.ko2" Column | Type | Collation | Nullable | Default --------+---------+-----------+----------+--------- id | integer | | not null | v | text | | | Partition of: oya FOR VALUES FROM (10) TO (20) Indexes: "ko2_pkey" PRIMARY KEY, btree (id) db1=# SELECT * FROM oya WHERE id = 15; id | v ----+--- (0 rows) db1=# REINDEX TABLE ko2; REINDEX db1=# SELECT * FROM oya WHERE id = 15; id | v ----+---------------------------------- 15 | 9bf31c7ff062936a96d3c8bd1f8f2ff3 (1 row) db1=# DELETE FROM oya; DELETE 30000 db1=# INSERT INTO oya SELECT g, md5(g::text) FROM generate_series(0, 30000 - 1) as g; INSERT 0 30000 db1=# SELECT * FROM oya WHERE id = 25; id | v ----+---------------------------------- 25 | 8e296a067a37563370ded05f5a3bf3ec (1 row) db1=# SELECT * FROM oya WHERE id = 35; id | v ----+---------------------------------- 35 | 1c383cd30b7c298ab50293adfecb7b18 (1 row) db1=# DELETE FROM oya; DELETE 30000 db1=# COPY oya FROM '/home/postgres/dat.csv' CSV; COPY 30000 db1=# SELECT * FROM oya WHERE id = 45; id | v ----+--- (0 rows) db1=# REINDEX TABLE ko5; REINDEX db1=# SELECT * FROM oya WHERE id = 45; id | v ----+---------------------------------- 45 | 6c8349cc7260ae62e3b1396831a8398f (1 row) (btw, "oya" and "ko" means "parent" and "child" in Japanese.)
Hi,
Thanks for finding the bug.
The issue here is that in case of partitioned table, "estate->es_result_relation_info" is not pointing to the correct resultRelInfo. It is actually pointing to the last partition rather than the partition whose buffer is being flushed. For e.g. consider the following case.
create table part_tab (a int primary key, b text) partition by range (a);
create table part_tab_1 partition of part_tab for values from (1) to (2);
create table part_tab_2 partition of part_tab for values from (2) to (3);
create table part_tab_3 partition of part_tab for values from (3) to (4);
insert into part_tab values (1, 'str1'), (2, 'str2'), (3, 'str3');
The issue here is that in case of partitioned table, "estate->es_result_relation_info" is not pointing to the correct resultRelInfo. It is actually pointing to the last partition rather than the partition whose buffer is being flushed. For e.g. consider the following case.
create table part_tab (a int primary key, b text) partition by range (a);
create table part_tab_1 partition of part_tab for values from (1) to (2);
create table part_tab_2 partition of part_tab for values from (2) to (3);
create table part_tab_3 partition of part_tab for values from (3) to (4);
insert into part_tab values (1, 'str1'), (2, 'str2'), (3, 'str3');
copy (select * from part_tab) to '/tmp/multi_insert_part_tab.csv' csv;
truncate table part_tab;
copy part_tab from '/tmp/multi_insert_part_tab.csv' csv;
When above COPY FROM command is executed into the partitioned table (part_tab), for the first record i.e. (1, 'str1') 'estate->es_result_relation_info' gets updated with the resultRelInfo of partition-1 as the first record fits into partition 1. Similarly, for the second record, 'estate->es_result_relation_info' gets updated with the resultRelInfo of partition-2 and finally for the last record (i.e. (3, 'str3')), 'estate->es_result_relation_info' gets updated with the resultRelInfo of partition-3. Eventually, when all the records are read and the buffers are flushed one by one, we also do the index insertion (as there exists an index on the partitons) but during index insertion, we refer to the resultRelInfo in estate which is actually pointing to the last partition i.e. partition-3 in our case.
During heap insertion we actually refer to buffer->resultRelInfo which is always updated and that's the reason heap insertion works fine but not the index insertion.
Attached patch fixes the issue. It basically updates estate->es_result_relation_info with the correct resultRelInfo in CopyMultiInsertBufferFlush(). I've also added the test-case for it.
Andres, David, do you all agree with above analysis and the proposed fix ?
truncate table part_tab;
copy part_tab from '/tmp/multi_insert_part_tab.csv' csv;
When above COPY FROM command is executed into the partitioned table (part_tab), for the first record i.e. (1, 'str1') 'estate->es_result_relation_info' gets updated with the resultRelInfo of partition-1 as the first record fits into partition 1. Similarly, for the second record, 'estate->es_result_relation_info' gets updated with the resultRelInfo of partition-2 and finally for the last record (i.e. (3, 'str3')), 'estate->es_result_relation_info' gets updated with the resultRelInfo of partition-3. Eventually, when all the records are read and the buffers are flushed one by one, we also do the index insertion (as there exists an index on the partitons) but during index insertion, we refer to the resultRelInfo in estate which is actually pointing to the last partition i.e. partition-3 in our case.
During heap insertion we actually refer to buffer->resultRelInfo which is always updated and that's the reason heap insertion works fine but not the index insertion.
Attached patch fixes the issue. It basically updates estate->es_result_relation_info with the correct resultRelInfo in CopyMultiInsertBufferFlush(). I've also added the test-case for it.
Andres, David, do you all agree with above analysis and the proposed fix ?
Attachment
On Tue, 4 Jun 2019 at 21:20, Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > Attached patch fixes the issue. It basically updates estate->es_result_relation_info with the correct resultRelInfo inCopyMultiInsertBufferFlush(). I've also added the test-case for it. > > Andres, David, do you all agree with above analysis and the proposed fix ? Thanks for the report Haruka and for the patch Ashutosh. I've pushed this after changing the tests a little to reuse the existing table. I also added an Assert into ExecInsertIndexTuples to ensure the slot and the ResultRelInfo belong to the same relation. If that had existing when this was being developed then I'd have noticed the problem when testing it during dev. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Jun 5, 2019 at 12:11 PM David Rowley <david.rowley@2ndquadrant.com> wrote:
On Tue, 4 Jun 2019 at 21:20, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
> Attached patch fixes the issue. It basically updates estate->es_result_relation_info with the correct resultRelInfo in CopyMultiInsertBufferFlush(). I've also added the test-case for it.
>
> Andres, David, do you all agree with above analysis and the proposed fix ?
Thanks for the report Haruka and for the patch Ashutosh.
I've pushed this after changing the tests a little to reuse the
existing table. I also added an Assert into ExecInsertIndexTuples to
ensure the slot and the ResultRelInfo belong to the same relation.
Thank you.