Re: INSERT ... ON CONFLICT {UPDATE | IGNORE} - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: INSERT ... ON CONFLICT {UPDATE | IGNORE} |
Date | |
Msg-id | CAM3SWZQCN6JZ_OKi1GJr8dR_J+BaDiW8xkm235Ww3oaqxhmj1A@mail.gmail.com Whole thread Raw |
In response to | Re: INSERT ... ON CONFLICT {UPDATE | IGNORE} (Peter Geoghegan <pg@heroku.com>) |
Responses |
Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}
|
List | pgsql-hackers |
On Fri, Dec 19, 2014 at 5:32 PM, Peter Geoghegan <pg@heroku.com> wrote: >> Most people would list the columns, but if there is a really bizarre >> constraint, with non-default opclasses, or an exclusion constraint, it's >> probably been given a name that you could use. > > What I find curious about the opclass thing is: when do you ever have > an opclass that has a different idea of equality than the default > opclass for the type? In other words, when is B-Tree strategy number 3 > not actually '=' in practice, for *any* B-Tree opclass? Certainly, it > doesn't appear to be the case that it isn't so with any shipped > opclasses - the shipped non-default B-Tree opclasses only serve to > provide alternative notions of sort order, and never "equals". > > I think that with B-Tree (which is particularly relevant for the > UPDATE variant), it ought to be defined to work with the type's > default opclass "equals" operator, just like GROUP BY and DISTINCT. > Non-default opclass unique indexes work just as well in practice, > unless someone somewhere happens to create an oddball one that doesn't > use '=' as its "equals" operator (while also having '=' as the default > opclass "equals" operator). I am not aware that that leaves any > actually shipped opclass out (and I include our external extension > ecosystem here, although I might be wrong about that part). So looking at the way the system deals with its dependence on default operator classes, I have a hard time justifying all this extra overhead for the common case. The optimizer will refuse to use an index with a non-default opclass even when AFAICT there is no *real* semantic dependence on anything other than the "equals" operator, which seems to always match across a type's opclasses anyway. e.g., DISTINCT will only use a non-default opclass B-Tree index, even though in practice the "equals" operator always matches for shipped non-default opclasses; DISTINCT will not work with a text_pattern_ops index, while it will work with a default text B-Tree opclass index, *even though no corresponding "ORDER BY" was given*. Someone recently pointed out in a dedicated thread that the system isn't all that bright about exploiting the fact that group aggregates don't necessarily need to care about facets of sort ordering like collations, which have additional overhead [1]. That might be a useful special case to target (to make underlying sorts faster), but the big picture is that the system doesn't know when it only needs to care about an "equals" operator matching some particular B-Tree-opclass-defined notion of sorting, rather than caring about a variety of operators matching. Sometimes, having a matching "equals" operator of some non-default opclass is good enough to make an index (or sort scheme) of that opclass usable for some purpose that only involves equality, and not sort order (like DISTINCT, with no ORDER BY, executed using a GroupAggregate, for example). I thought we should formalize the idea that a non-default opclass must have the same notion of equality (the same "equals" operator) as its corresponding default opclass, if any. That way, presumably the optimizer has license to be clever about only caring about "DISTINCTness"/equality. That also gives my implementation license to not care about which operator class a unique index uses -- it must not matter. Heikki pointed out that there is one shipped opclass that has an "equals" operator that happens to not be spelt "=" [2] (and furthermore, does not match that of the default opclass). That's the record_image_ops opclass, which unusually has an "equals" operator of "*=". So as Heikki pointed out, it looks like there is some limited precedent for having to worry about B-Tree opclasses that introduce alternative notions of "equals", rather than merely alternative notions of sort order. So so much for formalizing that all of a type's B-Tree opclass "equals" operators must match... ...having thought about it for a while more, though, I think we should *still* ignore opclass for the purposes of unique index inference. The implementation doesn't care about the fact that you used a non-default opclass. Sure, in theory that could lead to inconsistencies, if there was multiple unique indexes of multiple opclasses that just so happened to have incompatible ideas about equality, but that seems ludicrous...we have only one extremely narrow example of how that could happen. Plus there'd have to be *both* unique indexes defined and available for us to infer as appropriate, before the inference logic could accidentally infer the wrong idea of equality. That seems like an extremely implausible scenario. Even if we allow for the idea that alternative notions of equality are something that will happen in the wild, obviously the user cares about the definition of equality that they actually used for the unique index in question. We can document that unique index inference doesn't care about opclasses (recall that I still only plan on letting users infer a B-Tree unique index), which is thought to almost certainly not matter. I think that ought to be fine. In the next revision of UPSERT, the implementation formally won't care about the opclass of an index when inferring a unique index to use as an arbiter of whether to take the alternative IGNORE/UPDATE path. That's formally left undefined. As already discussed before, I will still proceed with allowing the user to pick a partial unique index when writing a unique index inference specification. [1] http://www.postgresql.org/message-id/CAFjtmHU3Obf5aSpWY7i18diapvjg-418hYySdqUuYhXZtjChhg@mail.gmail.com [2] http://www.postgresql.org/message-id/54988BF5.9000405@vmware.com -- Peter Geoghegan
pgsql-hackers by date: