Thread: pg_dump losing index column collations for unique and primary keys
Hello, I found that pg_dump fails to note down the collations used primary and unique keys supporting indexes. To reproduce: create table f(a text); create unique index f_pkey on f(a collate "C.UTF-8"); alter table f add primary key using index f_pkey; \d f pg_dump -t f The only way to dump it correctly is to create indexes explicitly and then use them in constraint definitions. Please could someone have a look at the patch attached. Thank you. Best, Alex
Attachment
Alexey Bashtanov <bashtanov@imap.cc> writes: > I found that pg_dump fails to note down the collations used primary and > unique keys supporting indexes. > To reproduce: > create table f(a text); > create unique index f_pkey on f(a collate "C.UTF-8"); > alter table f add primary key using index f_pkey; Hm. I kinda think that we should reject the above. The point of ADD PRIMARY KEY USING INDEX ought to be to allow you to build the index concurrently; it should not be something you can use to create a DDL state that is impossible to arrive at with plain ADD PRIMARY KEY. As an example of the sort of problem that I'm worried about, consider a situation where the index's collation has a different notion of equality than the column's default collation has. (Which is entirely possible now that we have nondeterministic collations.) That's going to lead to weird restrictions on whether the index really satisfies query WHERE conditions, and I'd bet considerable money that we'd have bugs due to failing to account for that. In short, I'd say the bug here is not pg_dump's fault at all, but failure to insist on collation match in ADD PRIMARY KEY USING INDEX. regards, tom lane
I wrote: > In short, I'd say the bug here is not pg_dump's fault at all, > but failure to insist on collation match in ADD PRIMARY KEY > USING INDEX. Concretely, I think we should do the attached. I'm not quite sure whether we should back-patch this, though. It's been wrong since we added collations, but the main impact of a back-patch might be to break cases that were working more or less okay for people. A compromise idea is to back-patch only into v12, where the issue became quite a lot more important due to nondeterministic collations. Thoughts? regards, tom lane diff --git a/src/backend/parser/parse_utilcmd.c b/src/backend/parser/parse_utilcmd.c index ee47547..b761fdf 100644 --- a/src/backend/parser/parse_utilcmd.c +++ b/src/backend/parser/parse_utilcmd.c @@ -2147,15 +2147,17 @@ transformIndexConstraint(Constraint *constraint, CreateStmtContext *cxt) if (i < index_form->indnkeyatts) { /* - * Insist on default opclass and sort options. While the - * index would still work as a constraint with non-default - * settings, it might not provide exactly the same uniqueness - * semantics as you'd get from a normally-created constraint; - * and there's also the dump/reload problem mentioned above. + * Insist on default opclass, collation, and sort options. + * While the index would still work as a constraint with + * non-default settings, it might not provide exactly the same + * uniqueness semantics as you'd get from a normally-created + * constraint; and there's also the dump/reload problem + * mentioned above. */ defopclass = GetDefaultOpClass(attform->atttypid, index_rel->rd_rel->relam); if (indclass->values[i] != defopclass || + attform->attcollation != index_rel->rd_indcollation[i] || index_rel->rd_indoption[i] != 0) ereport(ERROR, (errcode(ERRCODE_WRONG_OBJECT_TYPE), diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out index 1cdb7a9..645ae2c 100644 --- a/src/test/regress/expected/create_index.out +++ b/src/test/regress/expected/create_index.out @@ -1479,6 +1479,19 @@ primary key, btree, for table "public.cwi_test" DROP INDEX cwi_replaced_pkey; -- Should fail; a constraint depends on it ERROR: cannot drop index cwi_replaced_pkey because constraint cwi_replaced_pkey on table cwi_test requires it HINT: You can drop constraint cwi_replaced_pkey on table cwi_test instead. +-- Check that non-default index options are rejected +CREATE UNIQUE INDEX cwi_uniq3_idx ON cwi_test(a desc); +ALTER TABLE cwi_test ADD UNIQUE USING INDEX cwi_uniq3_idx; -- fail +ERROR: index "cwi_uniq3_idx" column number 1 does not have default sorting behavior +LINE 1: ALTER TABLE cwi_test ADD UNIQUE USING INDEX cwi_uniq3_idx; + ^ +DETAIL: Cannot create a primary key or unique constraint using such an index. +CREATE UNIQUE INDEX cwi_uniq4_idx ON cwi_test(b collate "POSIX"); +ALTER TABLE cwi_test ADD UNIQUE USING INDEX cwi_uniq4_idx; -- fail +ERROR: index "cwi_uniq4_idx" column number 1 does not have default sorting behavior +LINE 1: ALTER TABLE cwi_test ADD UNIQUE USING INDEX cwi_uniq4_idx; + ^ +DETAIL: Cannot create a primary key or unique constraint using such an index. DROP TABLE cwi_test; -- ADD CONSTRAINT USING INDEX is forbidden on partitioned tables CREATE TABLE cwi_test(a int) PARTITION BY hash (a); diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql index 7659808..73a55ea 100644 --- a/src/test/regress/sql/create_index.sql +++ b/src/test/regress/sql/create_index.sql @@ -538,6 +538,12 @@ ALTER TABLE cwi_test DROP CONSTRAINT cwi_uniq_idx, DROP INDEX cwi_replaced_pkey; -- Should fail; a constraint depends on it +-- Check that non-default index options are rejected +CREATE UNIQUE INDEX cwi_uniq3_idx ON cwi_test(a desc); +ALTER TABLE cwi_test ADD UNIQUE USING INDEX cwi_uniq3_idx; -- fail +CREATE UNIQUE INDEX cwi_uniq4_idx ON cwi_test(b collate "POSIX"); +ALTER TABLE cwi_test ADD UNIQUE USING INDEX cwi_uniq4_idx; -- fail + DROP TABLE cwi_test; -- ADD CONSTRAINT USING INDEX is forbidden on partitioned tables
Hi Tom, >> In short, I'd say the bug here is not pg_dump's fault at all, >> but failure to insist on collation match in ADD PRIMARY KEY >> USING INDEX. While being entirely reasonable especially with the non-deterministic collations, this breaks down an important facility of changing a column collation without rewriting it even if it is a part of a unique constraint: first change the PK then the column itself. > Concretely, I think we should do the attached. While being entirely reasonable especially with the non-deterministic collations, this breaks down an important facility of changing a column collation without rewriting it even if it is a part of a unique constraint: first change the PK then the column itself. Disallowing changing the direction (ASC/DESC) also looks cruel to me. BTW with playing with this stuff I came across another issue, which seems unrelated to collations: 2019-12-03 19:08:26 alexey@[local]/alexey=# \d g Table "public.g" ┌────────┬──────┬────────────┬──────────┬─────────┐ │ Column │ Type │ Collation │ Nullable │ Default │ ├────────┼──────┼────────────┼──────────┼─────────┤ │ a │ text │ en_US.utf8 │ not null │ │ └────────┴──────┴────────────┴──────────┴─────────┘ Indexes: "g_pkey" PRIMARY KEY, btree (a) "g_a_idx" UNIQUE, btree (a) 2019-12-03 19:08:27 alexey@[local]/alexey=# begin; alter table g drop constraint g_pkey, add primary key using index g_a_idx, alter column a type text; BEGIN Time: 0.421 ms ERROR: could not open relation with OID 16417 Time: 9.990 ms Is it something known? From user's perspective I think an ideal solution would be to fix the above and to enforce the PK/UK and column collation to be the same only by the end of ALTER TABLE command (and document this, as it isn't very obvious). I haven't yet looked in the code though to see how comfortable would that be to implement it. Best, Alex
Alexey Bashtanov <bashtanov@imap.cc> writes: >> In short, I'd say the bug here is not pg_dump's fault at all, >> but failure to insist on collation match in ADD PRIMARY KEY >> USING INDEX. > While being entirely reasonable especially with the non-deterministic > collations, > this breaks down an important facility of changing a column collation > without > rewriting it even if it is a part of a unique constraint: first change > the PK then the > column itself. Well, I disagree that that's a high-priority use case, but it seems to me that you can still do it. You just can't call the index the pkey until the column collation agrees. So roughly it'd be * CREATE UNIQUE INDEX ... (col COLLATE new-collation) * Drop old pkey (the new index is still enforcing uniqueness) * ALTER TABLE ... ALTER COLUMN ... COLLATE new-collation * ALTER TABLE ADD PRIMARY KEY USING INDEX These seem to me to be pretty much exactly the same steps you'd need today, although maybe the current code is more forgiving about their ordering. > Disallowing changing the direction (ASC/DESC) also looks cruel to me. That restriction has been there since day one, and we have had a grand total of zero complaints about it. > BTW with playing with this stuff I came across another issue, > which seems unrelated to collations: Yeah, see https://www.postgresql.org/message-id/flat/10365.1558909428@sss.pgh.pa.us If you'd like to help move things along by reviewing that patch, it'd be great. regards, tom lane
> Well, I disagree that that's a high-priority use case, but it seems to me > that you can still do it. You just can't call the index the pkey until > the column collation agrees. So roughly it'd be > > * CREATE UNIQUE INDEX ... (col COLLATE new-collation) > * Drop old pkey (the new index is still enforcing uniqueness) > * ALTER TABLE ... ALTER COLUMN ... COLLATE new-collation > * ALTER TABLE ADD PRIMARY KEY USING INDEX > > These seem to me to be pretty much exactly the same steps you'd need > today, although maybe the current code is more forgiving about > their ordering. Agreed. >> Disallowing changing the direction (ASC/DESC) also looks cruel to me. > That restriction has been there since day one, and we have had a grand > total of zero complaints about it. True, my bad. What do you think of making pg_dump warn the user if they are trying to dump a weird PK/UK which has collations in index and column not matching? And maybe even throw an error in case of --binary-upgrade? >> BTW with playing with this stuff I came across another issue, >> which seems unrelated to collations: > Yeah, see > https://www.postgresql.org/message-id/flat/10365.1558909428@sss.pgh.pa.us > If you'd like to help move things along by reviewing that patch, > it'd be great. I'll have a look. Best, Alex
Alexey Bashtanov <bashtanov@imap.cc> writes: > What do you think of making pg_dump warn the user if they are trying > to dump a weird PK/UK which has collations in index and column not matching? > And maybe even throw an error in case of --binary-upgrade? I can't get excited about spending any additional effort on this issue. If I thought it were actually happening in the field to any significant extent, I'd be more interested in supporting the case instead of just disallowing it. But I think the actual occurrences are epsilon, so there are better places to be spending our time. regards, tom lane