Thread: pg_dump losing index column collations for unique and primary keys

pg_dump losing index column collations for unique and primary keys

From

Alexey Bashtanov

Date:

03 December 2019, 16:35:20

Hello,

I found that pg_dump fails to note down the collations used primary and 
unique keys supporting indexes.
To reproduce:

create table f(a text);
create unique index f_pkey on f(a collate "C.UTF-8");
alter table f add primary key using index f_pkey;
\d f

pg_dump -t f

The only way to dump it correctly is to create indexes explicitly and 
then use them in constraint definitions.
Please could someone have a look at the patch attached. Thank you.

Best,
   Alex

Attachment

pg_dump_pk_collation_v001.patch

Re: pg_dump losing index column collations for unique and primary keys

From

Tom Lane

Date:

03 December 2019, 16:57:57

Alexey Bashtanov <bashtanov@imap.cc> writes:
> I found that pg_dump fails to note down the collations used primary and 
> unique keys supporting indexes.
> To reproduce:

> create table f(a text);
> create unique index f_pkey on f(a collate "C.UTF-8");
> alter table f add primary key using index f_pkey;

Hm.  I kinda think that we should reject the above.  The point of
ADD PRIMARY KEY USING INDEX ought to be to allow you to build
the index concurrently; it should not be something you can use to
create a DDL state that is impossible to arrive at with plain
ADD PRIMARY KEY.

As an example of the sort of problem that I'm worried about,
consider a situation where the index's collation has a different
notion of equality than the column's default collation has.
(Which is entirely possible now that we have nondeterministic
collations.)  That's going to lead to weird restrictions on
whether the index really satisfies query WHERE conditions, and
I'd bet considerable money that we'd have bugs due to failing
to account for that.

In short, I'd say the bug here is not pg_dump's fault at all,
but failure to insist on collation match in ADD PRIMARY KEY
USING INDEX.

            regards, tom lane

Re: pg_dump losing index column collations for unique and primary keys

From

Tom Lane

Date:

03 December 2019, 18:24:38

I wrote:
> In short, I'd say the bug here is not pg_dump's fault at all,
> but failure to insist on collation match in ADD PRIMARY KEY
> USING INDEX.

Concretely, I think we should do the attached.

I'm not quite sure whether we should back-patch this, though.
It's been wrong since we added collations, but the main impact
of a back-patch might be to break cases that were working more
or less okay for people.

A compromise idea is to back-patch only into v12, where the issue
became quite a lot more important due to nondeterministic
collations.

Thoughts?

            regards, tom lane

diff --git a/src/backend/parser/parse_utilcmd.c b/src/backend/parser/parse_utilcmd.c
index ee47547..b761fdf 100644
--- a/src/backend/parser/parse_utilcmd.c
+++ b/src/backend/parser/parse_utilcmd.c
@@ -2147,15 +2147,17 @@ transformIndexConstraint(Constraint *constraint, CreateStmtContext *cxt)
             if (i < index_form->indnkeyatts)
             {
                 /*
-                 * Insist on default opclass and sort options.  While the
-                 * index would still work as a constraint with non-default
-                 * settings, it might not provide exactly the same uniqueness
-                 * semantics as you'd get from a normally-created constraint;
-                 * and there's also the dump/reload problem mentioned above.
+                 * Insist on default opclass, collation, and sort options.
+                 * While the index would still work as a constraint with
+                 * non-default settings, it might not provide exactly the same
+                 * uniqueness semantics as you'd get from a normally-created
+                 * constraint; and there's also the dump/reload problem
+                 * mentioned above.
                  */
                 defopclass = GetDefaultOpClass(attform->atttypid,
                                                index_rel->rd_rel->relam);
                 if (indclass->values[i] != defopclass ||
+                    attform->attcollation != index_rel->rd_indcollation[i] ||
                     index_rel->rd_indoption[i] != 0)
                     ereport(ERROR,
                             (errcode(ERRCODE_WRONG_OBJECT_TYPE),
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 1cdb7a9..645ae2c 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -1479,6 +1479,19 @@ primary key, btree, for table "public.cwi_test"
 DROP INDEX cwi_replaced_pkey;    -- Should fail; a constraint depends on it
 ERROR:  cannot drop index cwi_replaced_pkey because constraint cwi_replaced_pkey on table cwi_test requires it
 HINT:  You can drop constraint cwi_replaced_pkey on table cwi_test instead.
+-- Check that non-default index options are rejected
+CREATE UNIQUE INDEX cwi_uniq3_idx ON cwi_test(a desc);
+ALTER TABLE cwi_test ADD UNIQUE USING INDEX cwi_uniq3_idx;  -- fail
+ERROR:  index "cwi_uniq3_idx" column number 1 does not have default sorting behavior
+LINE 1: ALTER TABLE cwi_test ADD UNIQUE USING INDEX cwi_uniq3_idx;
+                                 ^
+DETAIL:  Cannot create a primary key or unique constraint using such an index.
+CREATE UNIQUE INDEX cwi_uniq4_idx ON cwi_test(b collate "POSIX");
+ALTER TABLE cwi_test ADD UNIQUE USING INDEX cwi_uniq4_idx;  -- fail
+ERROR:  index "cwi_uniq4_idx" column number 1 does not have default sorting behavior
+LINE 1: ALTER TABLE cwi_test ADD UNIQUE USING INDEX cwi_uniq4_idx;
+                                 ^
+DETAIL:  Cannot create a primary key or unique constraint using such an index.
 DROP TABLE cwi_test;
 -- ADD CONSTRAINT USING INDEX is forbidden on partitioned tables
 CREATE TABLE cwi_test(a int) PARTITION BY hash (a);
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 7659808..73a55ea 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -538,6 +538,12 @@ ALTER TABLE cwi_test DROP CONSTRAINT cwi_uniq_idx,

 DROP INDEX cwi_replaced_pkey;    -- Should fail; a constraint depends on it

+-- Check that non-default index options are rejected
+CREATE UNIQUE INDEX cwi_uniq3_idx ON cwi_test(a desc);
+ALTER TABLE cwi_test ADD UNIQUE USING INDEX cwi_uniq3_idx;  -- fail
+CREATE UNIQUE INDEX cwi_uniq4_idx ON cwi_test(b collate "POSIX");
+ALTER TABLE cwi_test ADD UNIQUE USING INDEX cwi_uniq4_idx;  -- fail
+
 DROP TABLE cwi_test;

 -- ADD CONSTRAINT USING INDEX is forbidden on partitioned tables

Re: pg_dump losing index column collations for unique and primarykeys

From

Alexey Bashtanov

Date:

03 December 2019, 19:30:49

Hi Tom,

>> In short, I'd say the bug here is not pg_dump's fault at all,
>> but failure to insist on collation match in ADD PRIMARY KEY
>> USING INDEX.
While being entirely reasonable especially with the non-deterministic 
collations,
this breaks down an important facility of changing a column collation 
without
rewriting it even if it is a part of a unique constraint: first change 
the PK then the
column itself.

> Concretely, I think we should do the attached.

While being entirely reasonable especially with the non-deterministic 
collations,
this breaks down an important facility of changing a column collation 
without
rewriting it even if it is a part of a unique constraint: first change 
the PK then the
column itself.

Disallowing changing the direction (ASC/DESC) also looks cruel to me.

BTW with playing with this stuff I came across another issue,
which seems unrelated to collations:

2019-12-03 19:08:26 alexey@[local]/alexey=# \d g
                  Table "public.g"
┌────────┬──────┬────────────┬──────────┬─────────┐
│ Column │ Type │ Collation  │ Nullable │ Default │
├────────┼──────┼────────────┼──────────┼─────────┤
│ a      │ text │ en_US.utf8 │ not null │         │
└────────┴──────┴────────────┴──────────┴─────────┘
Indexes:
     "g_pkey" PRIMARY KEY, btree (a)
     "g_a_idx" UNIQUE, btree (a)

2019-12-03 19:08:27 alexey@[local]/alexey=# begin; alter table g drop 
constraint g_pkey, add primary key using index g_a_idx, alter column a 
type text;
BEGIN
Time: 0.421 ms
ERROR:  could not open relation with OID 16417
Time: 9.990 ms

Is it something known?

 From user's perspective I think an ideal solution would be to fix the above
and to enforce the PK/UK and column collation to be the same only
by the end of ALTER TABLE command (and document this, as it isn't very 
obvious).
I haven't yet looked in the code though to see how comfortable would 
that be to implement it.

Best, Alex

Re: pg_dump losing index column collations for unique and primary keys

From

Tom Lane

Date:

03 December 2019, 19:52:13

Alexey Bashtanov <bashtanov@imap.cc> writes:
>> In short, I'd say the bug here is not pg_dump's fault at all,
>> but failure to insist on collation match in ADD PRIMARY KEY
>> USING INDEX.

> While being entirely reasonable especially with the non-deterministic 
> collations,
> this breaks down an important facility of changing a column collation 
> without
> rewriting it even if it is a part of a unique constraint: first change 
> the PK then the
> column itself.

Well, I disagree that that's a high-priority use case, but it seems to me
that you can still do it.  You just can't call the index the pkey until
the column collation agrees.  So roughly it'd be

* CREATE UNIQUE INDEX ... (col COLLATE new-collation)
* Drop old pkey (the new index is still enforcing uniqueness)
* ALTER TABLE ... ALTER COLUMN ... COLLATE new-collation
* ALTER TABLE ADD PRIMARY KEY USING INDEX

These seem to me to be pretty much exactly the same steps you'd need
today, although maybe the current code is more forgiving about
their ordering.

> Disallowing changing the direction (ASC/DESC) also looks cruel to me.

That restriction has been there since day one, and we have had a grand
total of zero complaints about it.

> BTW with playing with this stuff I came across another issue,
> which seems unrelated to collations:

Yeah, see
https://www.postgresql.org/message-id/flat/10365.1558909428@sss.pgh.pa.us
If you'd like to help move things along by reviewing that patch,
it'd be great.

            regards, tom lane

Re: pg_dump losing index column collations for unique and primarykeys

From

Alexey Bashtanov

Date:

04 December 2019, 13:51:32

> Well, I disagree that that's a high-priority use case, but it seems to me
> that you can still do it.  You just can't call the index the pkey until
> the column collation agrees.  So roughly it'd be
>
> * CREATE UNIQUE INDEX ... (col COLLATE new-collation)
> * Drop old pkey (the new index is still enforcing uniqueness)
> * ALTER TABLE ... ALTER COLUMN ... COLLATE new-collation
> * ALTER TABLE ADD PRIMARY KEY USING INDEX
>
> These seem to me to be pretty much exactly the same steps you'd need
> today, although maybe the current code is more forgiving about
> their ordering.
Agreed.
>> Disallowing changing the direction (ASC/DESC) also looks cruel to me.
> That restriction has been there since day one, and we have had a grand
> total of zero complaints about it.
True, my bad.

What do you think of making pg_dump warn the user if they are trying
to dump a weird PK/UK which has collations in index and column not matching?
And maybe even throw an error in case of --binary-upgrade?

>> BTW with playing with this stuff I came across another issue,
>> which seems unrelated to collations:
> Yeah, see
> https://www.postgresql.org/message-id/flat/10365.1558909428@sss.pgh.pa.us
> If you'd like to help move things along by reviewing that patch,
> it'd be great.
I'll have a look.

Best, Alex

Re: pg_dump losing index column collations for unique and primary keys

From

Tom Lane

Date:

04 December 2019, 15:48:58

Alexey Bashtanov <bashtanov@imap.cc> writes:
> What do you think of making pg_dump warn the user if they are trying
> to dump a weird PK/UK which has collations in index and column not matching?
> And maybe even throw an error in case of --binary-upgrade?

I can't get excited about spending any additional effort on this issue.
If I thought it were actually happening in the field to any significant
extent, I'd be more interested in supporting the case instead of just
disallowing it.  But I think the actual occurrences are epsilon, so
there are better places to be spending our time.

            regards, tom lane