Home > mailing lists

Inaccurate description of UNION/CASE/etc type selection - Mailing list pgsql-docs

From	Tom Lane
Subject	Inaccurate description of UNION/CASE/etc type selection
Date	August 16, 2020 21:26:40
Msg-id	1019930.1597613200@sss.pgh.pa.us Whole thread Raw
Responses	Re: Inaccurate description of UNION/CASE/etc type selection
List	pgsql-docs

Tree view

We had a question about why an ARRAY[] construct was resolving the
array's type strangely [1].  The documentation about this [2] says
that the relevant resolution rules are:

  5. Choose the first non-unknown input type which is a preferred type in
  that category, if there is one.

  6. Otherwise, choose the last non-unknown input type that allows all the
  preceding non-unknown inputs to be implicitly converted to it.  (There
  always is such a type, since at least the first type in the list must
  satisfy this condition.)

But what select_common_type() actually does is:

            else if (!pispreferred &&
                     can_coerce_type(1, &ptype, &ntype, COERCION_IMPLICIT) &&
                     !can_coerce_type(1, &ntype, &ptype, COERCION_IMPLICIT))
            {
                /*
                 * take new type if can coerce to it implicitly but not the
                 * other way; but if we have a preferred type, stay on it.
                 */
                pexpr = nexpr;
                ptype = ntype;
                pcategory = ncategory;
                pispreferred = nispreferred;
            }

(ptype is the currently selected common type, ntype is the next
input type to consider, and we've already eliminated cases involving
UNKNOWN.)

In the reported case, we have ptype = "name", ntype = "text", and there
are implicit coercions in both directions so we stay with "name" even
though it's not preferred.

So, the step-5 claim that we always choose a preferred type over other
types is just wrong.  Step 6 is much short of truthful as well, since
it fails to describe the check about coercion in the other direction.
(Also, we're not really checking that *every* earlier argument can be
promoted to ntype, only the currently best one.  Typically, if there's
an implicit coercion from A to B and also one from B to C, there'd be
one from A to C too; but there are lots of counterexamples.)

Now, this code is old enough to vote, so I think changing its behavior
is probably a really bad idea.  I did experiment with giving preferred
types fractionally more preference, like this:

            else if (can_coerce_type(1, &ptype, &ntype, COERCION_IMPLICIT) &&
                     (nispreferred > pispreferred ||
                      (!pispreferred &&
                       !can_coerce_type(1, &ntype, &ptype, COERCION_IMPLICIT))))

but this broke a couple of regression test cases, so I'm sure it'd
break real-world queries too.  So I think we need to leave the code
alone and fix the docs to describe it more accurately.

However, I'm having a hard time coming up with wording that describes
this accurately without being a verbatim statement of the algorithm.
(I see that I already made one attempt at improving the description,
back in 07daff63c, but it's clearly still not good enough.)

Any ideas?

            regards, tom lane

[1] https://www.postgresql.org/message-id/CAOwYNKYfKPfAL4rgP0AO_w0Mn7h8yiXd_Qi9swPdAc4CAUXeAQ%40mail.gmail.com
[2] https://www.postgresql.org/docs/current/typeconv-union-case.html

pgsql-docs by date:

From: Tom Lane
Date: 15 August 2020, 16:38:47
Subject: Re: PostgreSQL 12: Cryptic documentation

From: "David G. Johnston"
Date: 17 August 2020, 00:05:19
Subject: Re: Inaccurate description of UNION/CASE/etc type selection

Inaccurate description of UNION/CASE/etc type selection - Mailing list pgsql-docs

Previous

Next