Re: BUG #19098: Can't create unique gist index, where pg_indexes says that WITHOUT OVERLAPS does exacly that - Mailing list pgsql-bugs
| From | Matthias van de Meent |
|---|---|
| Subject | Re: BUG #19098: Can't create unique gist index, where pg_indexes says that WITHOUT OVERLAPS does exacly that |
| Date | |
| Msg-id | CAEze2WgyjPx7aaxr_d=DT1f1RUrzHRiPh8DppnszvcnihfE4Dw@mail.gmail.com Whole thread Raw |
| In response to | Re: BUG #19098: Can't create unique gist index, where pg_indexes says that WITHOUT OVERLAPS does exacly that (Paul A Jungwirth <pj@illuminatedcomputing.com>) |
| List | pgsql-bugs |
Hi,
Thanks for the CC, and sorry for the delayed response. I've been
travelling, and will be for another week or so.
On Wed, 29 Oct 2025, 19:51 Paul A Jungwirth,
<pj@illuminatedcomputing.com> wrote:
>
> On Wed, Oct 29, 2025 at 8:30 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > I wrote:
> > > I have not looked at the WITHOUT OVERLAPS patch, but if the mechanism
> > > underlying that is just to set pg_index.indisunique, then it seems
> > > like a reasonable answer here is to allow this syntax.
> >
> > On second thought, not really, because it'd preclude ever supporting
> > "normal" unique indexes with GiST. Really the only thing I can
> > think of that isn't a complete violation of pg_get_indexdef's contract
> > to produce a correct representation of the index is for it to emit
> > an ALTER TABLE ADD CONSTRAINT command to represent these indexes.
> >
> > Which seems like kind of a mess, not only because it will likely
> > require a deal of extra code in ruleutils, but because it will likely
> > break calling applications that aren't expecting such syntax.
> >
> > I wonder how hard it would be to extend CREATE INDEX so that it
> > could produce a non-phony representation of such indexes, with the
> > &&-semantics columns clearly distinguished from the =-semantics ones.
> > Is including an opclass name sufficient, or is there some additional
> > secret sauce for the temporal columns?
>
> This came up a few times in hallway conversations at PgCon{,f} as well
> as on the mailing list. I would really like to clean it up. I think we
> have the pieces we need, but I'd appreciate some design discussion,
> because there a few overlapping issues.
>
> As you say, the basic problem is that you can create a unique
> constraint (which creates an index), but you can't create just the
> unique index. Matthias (cc'd) ran into this when he wanted to first
> create the index concurrently, then create the constraint USING INDEX
> (to avoid long locks).[0]
>
> Also as he points out, our error message is confusing: "access method
> "gist" does not support unique indexes". It does support them, sort
> of. Or it can *sometimes* support them.
Well, strictly speaking, GIST itself doesn't do unique indexes. There
are hacked unique constraints which rely on the GIST AM and that
marked those GIST indexes as unique, but GIST itself doesn't know
about or validate any of the uniqueness - the uniqueness is instead
guaranteed by higher level systems. Which is also my main issue with
the new code - it is secretly an exclusion constraint under the hood,
so instead of leveraging the AM to guarantee the uniqueness constraint
it instead relies on repeated calls into the AM to validate the
constraint.
I'm not a fan of this disjoint uniqueness handling. It's the main
reason why we can't build these indexes concurrently, and if we want
the *index* to be considered unique, we should also make the *index*
guarantee that uniqueness, not some higher code.
> I would really like to support this workflow, but there are some
> bigger pieces missing.
>
> Today we can't concurrently create an exclusion constraint (which is
> what temporal PKs really are). There have been attempts in 2012 and
> 2019 to at least concurrently REINDEX, but neither was successful (for
> exclusion constraints).[1, 2]
I don't see why we would need to support concurrent creation of
exclusion constraints to be able to rebuild indexes concurrently.
CREATE CONSTRAINT USING INDEX ... NOT VALID + VALIDATE CONSTRAINT
works for that, right?
> Could we at least support the non-concurrent case? That is within
> reach if we address a few things:
>
> First, an exclusion constraint's index alone won't enforce the
> constraint, unlike a unique index. Similarly a temporal unique index
> is strictly more restrictive than a regular unique index, so it has to
> know that it's a temporal index.
I really dislike the term "temporal index", as there is nothing
temporal about the index. Presumably it's unique, with WITHOUT
OVERLAPS behaviour, but none of that has anything to do with temporal
- only uniqueness and non-unitary data types.
> To support arbitrary exclusion
> constraints, we would have to move pg_constraint.conexclop over to
> pg_index (or something like that). That seems messy.
Yes, that'd be messy. But I'm not sure why we would need to move over
pg_constraint.conexclop -- we're working with something called unique
indexes here, right?
> On the other hand, to enforce just temporal indexes, we don't need any
> new columns on pg_index. We can identify temporal indexes already
> because they have both indisunique and indisexclusion. So GiST could
> do the right thing here. Perhaps we would have a new can_temporal
> property for index AMs.
In my opinion, if an index is marked unique, then that index ('s AM)
should be responsible for fully validating that uniqueness; this would
include the WITHOUT OVERLAPS support that was added to UNIQUE
constraints in PG 18 (though apparently it isn't).
AFAIK, the pg_index data should be sufficient to support this, and
that should allow us to remove the hack that made indexes exist that a
user can't create. We can validate whether the index's opclasses
support this check and whether the index can actually validate
uniqueness, but I find it weird that I can interact (in C) with an
index marked unique without that index making sure the checks
happened. Btree does it, so why not GIST?
> Also we need syntax on CREATE INDEX to say what you want. The SQL
> standard doesn't offer any help here. It gives syntax for WITHOUT
> OVERLAPS in primary key and unique constraints, but it says nothing
> about WITHOUT OVERLAPS on indexes.
>
> One option is to add WITHOUT OVERLAPS syntax to the index column list.
> But those are full expressions (unlike with constraints), and you can
> give optional opclasses and collations. There might be parser
> conflicts.
I don't think there would be a parser conflict there: the only
production it might conflict with is the opclass' name, but that's
either a single word or a quoted identifier and followed by a
parenthesized list (or the end of the index attribute syntax); whilst
WITHOUT OVERLAPS is always exactly two unquoted words. I don't think
there is a way to confuse these two; at least not yet.
> Maybe we should use the existing opclass_parameter syntax instead.
I don't like that. Opclasses shouldn't have to deal with determining
whether WITHOUT OVERLAPS is applied, they only should provide the
operator used for those checks and/or determining whether the value is
considered "empty" i.e. whether the value can overlap with another
(same) value. Both of which should be unique (no alternative
implementations) for any given opclass.
> These get stored in pg_attribute.attoptions (for the index rel). That
> seems safer and more extensible. It avoids future conflicts with the
> standard.
I'd rather have this added to pg_index as either a bit in e.g.
indoption (the new bit indicating the attribute should use WITHOUT
OVERLAPS), or as new array indicating the operators used for checking
uniqueness.
I don't think this issue warrants new additions to pg_attribute, which
is already a very wide structure.
> (Just the opclass *name* is not sufficient, since the point is you
> have to know to use overlaps semantics, not equality, and forbid empty
> values.)
Shouldn't that be implemented by the AM and delegated to the opclass
with a single support function call? I don't see why we need to
determine at runtime which operator is responsible for exclude
constraints -- do you have an example where a single opclass (not:
opfamily) needs to support multiple WITHOUT OVERLAPS -backing
operators?
> Once we have this syntax, pg_get_indexdef can use it. Likewise we
> could allow USING INDEX for a WITHOUT OVERLAPS constraint (as long as
> the index was enforcing temporal semantics).
>
> Whatever we do here, we should also consider how it affects
> pg_indexam_has_property, pg_index_has_property, and
> pg_index_column_has_property. What does the false mean from
> `pg_indexam_has_property((select oid from pg_am where amname =
> 'gist'), 'can_unique')`? We really want a "maybe" or "sometimes"
> answer. Should we return NULL and support 'can_unique' in the other
> more-specific columns?
Why? Btree has can_unique, but that doesn't mean every index is a
unique index. Surely we can add an IndexAM callback that validates
that a given index is actually something that this index AM can
support; or make ambuild/ambuildempty raise if it can't actually
construct that empty index with the given index definition.
> Note that GiST theoretically can support unique indexes,[3] but we
> would need a way to know which operator (or strategy number) the
> opclass wanted for equality.
That's only a support function away, though. AFAIK, there is only one
natural operator for equality for any opclass, and only one natural
WITHOUT OVERLAPS behaviour. If the user wants different equality or
WITHOUT OVERLAPS behaviour, then they can use a different opclass. See
e.g. btree's text_ops vs text_pattern_ops.
> I don't think there are any other
> blockers. But we've solved that now with the CompareType enum.[4] So
> that gives us a path to make GiST be unequivocally true for
> can_unique. Then the has_property functions don't need big changes.
> And it would solve the misleading error message.
>
> You still couldn't pre-create the index *concurrently*, but all the
> other pieces would be in place for the non-locking workflow.
>
> So here is a proposed sequence of work:
>
> - Add an opclass_parameter so you can say without_overlaps = true.
> Only the last column of the index allows that (at least for now), and
> there must be an overlaps operator.
I've argued this before, but any passing of knowledge about which
attribute of the index is being compared/operated on _in the opclass'
code_ should be considered a layering violation. So how would an
opclass know that it was given the final key attribute of the index
and error out if it wasn't?
> - If an index has that property, enforce the exclusion constraint
> rules and forbid empty ranges/multiranges.
How do you propose to pass this information on to the index AM?
> - Update pg_get_indexdef to output the right syntax to create an
> independent temporal index.
> - Allow USING INDEX for WITHOUT OVERLAPS. It must be a temporal index.
What's a "temporal index" in this context? Do you mean "unique index,
with some attributes using the more strict WITHOUT OVERLAPS
behaviour"?
> - Support can_unique in GiST indexes using CompareType. (Perhaps this
> is of low practical value, but the discrepancies bug me.)
What do you mean by this? Do you have a reference to discussions that
touched this topic?
> - Add can_temporal to index AMs. Make GiST report true. Check this
> property instead of hard-coding GiST in our WITHOUT OVERLAPS code.
+1; though IMO that should be implemented regardless of earlier operations.
> - Support reindexing these indexes concurrently. (I'm not signing up
> for this just yet, but maybe someday.)
> - Support creating these indexes concurrently. (I'm not signing up for
> this just yet, but maybe someday.)
+1
Kind regards,
Matthias van de Meent
Databricks (https://www.databricks.com)
pgsql-bugs by date: