Thread: BUG #18594: CASE WHEN ELSE failing to return the expected output when the same colum is used in WHEN and ELSE

The following bug has been logged on the website:

Bug reference:      18594
Logged by:          Francisco Javier Ossandon
Email address:      fco.j.ossandon@gmail.com
PostgreSQL version: 15.5
Operating system:   Linux
Description:

Dear developers:
I have been using Postgres for some years now, and I just found what looks
like a bug, or at least I did not see anything in the documentation that
could explain it.

* version():
PostgreSQL 15.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 7.3.1 20180712
(Red Hat 7.3.1-12), 64-bit

The main issue is that "CASE WHEN THEN ELSE" is giving an expected result in
what seems like simple logic. I have a series of WHENs with conditions over
the value of a column and an ELSE where I return the same column value if
the CASEs above don't apply, and the query always returns the value of the
ELSE condition like if the WHENs above would not exist. The following query
exemplifies this, where the first CASE column gives an unexpected result
when using the same column in the ELSE, while the other CASE columns provide
the expected result when removing the ELSE or using a different column in
the ELSE or a constant string:
```
SELECT oid,
       typname,
       typcategory,
       CASE
          WHEN typcategory = 'N' THEN 'Numeric types'
          WHEN typcategory = 'S' THEN 'String types'
          WHEN typcategory = 'E' THEN 'Enum types'
          ELSE typcategory
       END AS test_case_fails,
       CASE
          WHEN typcategory = 'N' THEN 'Numeric types'
          WHEN typcategory = 'S' THEN 'String types'
          WHEN typcategory = 'E' THEN 'Enum types'
       END AS test_case_works_no_else,
       CASE
          WHEN typcategory = 'N' THEN 'Numeric types'
          WHEN typcategory = 'S' THEN 'String types'
          WHEN typcategory = 'E' THEN 'Enum types'
          ELSE typname
       END AS test_case_works_other_col,
       CASE
          WHEN typcategory = 'N' THEN 'Numeric types'
          WHEN typcategory = 'S' THEN 'String types'
          WHEN typcategory = 'E' THEN 'Enum types'
          ELSE 'ELSE'
       END AS test_case_works_constant
FROM pg_type
ORDER BY oid
LIMIT 10
;
```
This returns the following:
```
oid|typname
|typcategory|test_case_fails|test_case_works_no_else|test_case_works_other_col|test_case_works_constant|
---+----------+-----------+---------------+-----------------------+-------------------------+------------------------+
 16|bool      |B          |B              |                       |bool
               |ELSE                    |
 17|bytea     |U          |U              |                       |bytea
               |ELSE                    |
 18|char      |Z          |Z              |                       |char
               |ELSE                    |
 19|name      |S          |S              |String types           |String
types             |String types            |
 20|int8      |N          |N              |Numeric types          |Numeric
types            |Numeric types           |
 21|int2      |N          |N              |Numeric types          |Numeric
types            |Numeric types           |
 22|int2vector|A          |A              |
|int2vector               |ELSE                    |
 23|int4      |N          |N              |Numeric types          |Numeric
types            |Numeric types           |
 24|regproc   |N          |N              |Numeric types          |Numeric
types            |Numeric types           |
 25|text      |S          |S              |String types           |String
types             |String types            |
```
It looks weird, so I'm reporting this to you for review.

Best regards,


On 2024-08-28 We 12:17 PM, PG Bug reporting form wrote:
> The following bug has been logged on the website:
>
> Bug reference:      18594
> Logged by:          Francisco Javier Ossandon
> Email address:      fco.j.ossandon@gmail.com
> PostgreSQL version: 15.5
> Operating system:   Linux
> Description:
>
> Dear developers:
> I have been using Postgres for some years now, and I just found what looks
> like a bug, or at least I did not see anything in the documentation that
> could explain it.
>
> * version():
> PostgreSQL 15.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 7.3.1 20180712
> (Red Hat 7.3.1-12), 64-bit
>
> The main issue is that "CASE WHEN THEN ELSE" is giving an expected result in
> what seems like simple logic. I have a series of WHENs with conditions over
> the value of a column and an ELSE where I return the same column value if
> the CASEs above don't apply, and the query always returns the value of the
> ELSE condition like if the WHENs above would not exist. The following query
> exemplifies this, where the first CASE column gives an unexpected result
> when using the same column in the ELSE, while the other CASE columns provide
> the expected result when removing the ELSE or using a different column in
> the ELSE or a constant string:
> ```
> SELECT oid,
>         typname,
>         typcategory,
>         CASE
>            WHEN typcategory = 'N' THEN 'Numeric types'
>            WHEN typcategory = 'S' THEN 'String types'
>            WHEN typcategory = 'E' THEN 'Enum types'
>            ELSE typcategory
>         END AS test_case_fails,
>         CASE
>            WHEN typcategory = 'N' THEN 'Numeric types'
>            WHEN typcategory = 'S' THEN 'String types'
>            WHEN typcategory = 'E' THEN 'Enum types'
>         END AS test_case_works_no_else,
>         CASE
>            WHEN typcategory = 'N' THEN 'Numeric types'
>            WHEN typcategory = 'S' THEN 'String types'
>            WHEN typcategory = 'E' THEN 'Enum types'
>            ELSE typname
>         END AS test_case_works_other_col,
>         CASE
>            WHEN typcategory = 'N' THEN 'Numeric types'
>            WHEN typcategory = 'S' THEN 'String types'
>            WHEN typcategory = 'E' THEN 'Enum types'
>            ELSE 'ELSE'
>         END AS test_case_works_constant
> FROM pg_type
> ORDER BY oid
> LIMIT 10
> ;
> ```
> This returns the following:
> ```
> oid|typname
> |typcategory|test_case_fails|test_case_works_no_else|test_case_works_other_col|test_case_works_constant|
>
---+----------+-----------+---------------+-----------------------+-------------------------+------------------------+
>   16|bool      |B          |B              |                       |bool
>                 |ELSE                    |
>   17|bytea     |U          |U              |                       |bytea
>                 |ELSE                    |
>   18|char      |Z          |Z              |                       |char
>                 |ELSE                    |
>   19|name      |S          |S              |String types           |String
> types             |String types            |
>   20|int8      |N          |N              |Numeric types          |Numeric
> types            |Numeric types           |
>   21|int2      |N          |N              |Numeric types          |Numeric
> types            |Numeric types           |
>   22|int2vector|A          |A              |
> |int2vector               |ELSE                    |
>   23|int4      |N          |N              |Numeric types          |Numeric
> types            |Numeric types           |
>   24|regproc   |N          |N              |Numeric types          |Numeric
> types            |Numeric types           |
>   25|text      |S          |S              |String types           |String
> types             |String types            |
> ```
> It looks weird, so I'm reporting this to you for review.
>

The bug is in the query, not in Postgres. Try casting typcategory to 
text in your failing case to see the difference. As it is the literals 
are being cast to "char" because that's what typcategory is.



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Hello Andrew,
I thought about that before writing and tried casting to TEXT in the WHEN comparison bits but it still failed.
       CASE
          WHEN typcategory::TEXT = 'N' THEN 'Numeric types'
          WHEN typcategory::TEXT = 'S' THEN 'String types'
          WHEN typcategory::TEXT = 'E' THEN 'Enum types'
          ELSE typcategory
       END AS test_case_fails,

After your answer, I tried again but this time adding the casting to the ELSE too, and then it worked:
       CASE
          WHEN typcategory::TEXT = 'N' THEN 'Numeric types'
          WHEN typcategory::TEXT = 'S' THEN 'String types'
          WHEN typcategory::TEXT = 'E' THEN 'Enum types'
          ELSE typcategory::TEXT

So is the ELSE column hijacking the data type of the whole expression?

Cheers,

Francisco



El mié, 28 ago 2024 a las 13:57, Andrew Dunstan (<andrew@dunslane.net>) escribió:

On 2024-08-28 We 12:17 PM, PG Bug reporting form wrote:
> The following bug has been logged on the website:
>
> Bug reference:      18594
> Logged by:          Francisco Javier Ossandon
> Email address:      fco.j.ossandon@gmail.com
> PostgreSQL version: 15.5
> Operating system:   Linux
> Description:
>
> Dear developers:
> I have been using Postgres for some years now, and I just found what looks
> like a bug, or at least I did not see anything in the documentation that
> could explain it.
>
> * version():
> PostgreSQL 15.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 7.3.1 20180712
> (Red Hat 7.3.1-12), 64-bit
>
> The main issue is that "CASE WHEN THEN ELSE" is giving an expected result in
> what seems like simple logic. I have a series of WHENs with conditions over
> the value of a column and an ELSE where I return the same column value if
> the CASEs above don't apply, and the query always returns the value of the
> ELSE condition like if the WHENs above would not exist. The following query
> exemplifies this, where the first CASE column gives an unexpected result
> when using the same column in the ELSE, while the other CASE columns provide
> the expected result when removing the ELSE or using a different column in
> the ELSE or a constant string:
> ```
> SELECT oid,
>         typname,
>         typcategory,
>         CASE
>            WHEN typcategory = 'N' THEN 'Numeric types'
>            WHEN typcategory = 'S' THEN 'String types'
>            WHEN typcategory = 'E' THEN 'Enum types'
>            ELSE typcategory
>         END AS test_case_fails,
>         CASE
>            WHEN typcategory = 'N' THEN 'Numeric types'
>            WHEN typcategory = 'S' THEN 'String types'
>            WHEN typcategory = 'E' THEN 'Enum types'
>         END AS test_case_works_no_else,
>         CASE
>            WHEN typcategory = 'N' THEN 'Numeric types'
>            WHEN typcategory = 'S' THEN 'String types'
>            WHEN typcategory = 'E' THEN 'Enum types'
>            ELSE typname
>         END AS test_case_works_other_col,
>         CASE
>            WHEN typcategory = 'N' THEN 'Numeric types'
>            WHEN typcategory = 'S' THEN 'String types'
>            WHEN typcategory = 'E' THEN 'Enum types'
>            ELSE 'ELSE'
>         END AS test_case_works_constant
> FROM pg_type
> ORDER BY oid
> LIMIT 10
> ;
> ```
> This returns the following:
> ```
> oid|typname
> |typcategory|test_case_fails|test_case_works_no_else|test_case_works_other_col|test_case_works_constant|
> ---+----------+-----------+---------------+-----------------------+-------------------------+------------------------+
>   16|bool      |B          |B              |                       |bool
>                 |ELSE                    |
>   17|bytea     |U          |U              |                       |bytea
>                 |ELSE                    |
>   18|char      |Z          |Z              |                       |char
>                 |ELSE                    |
>   19|name      |S          |S              |String types           |String
> types             |String types            |
>   20|int8      |N          |N              |Numeric types          |Numeric
> types            |Numeric types           |
>   21|int2      |N          |N              |Numeric types          |Numeric
> types            |Numeric types           |
>   22|int2vector|A          |A              |
> |int2vector               |ELSE                    |
>   23|int4      |N          |N              |Numeric types          |Numeric
> types            |Numeric types           |
>   24|regproc   |N          |N              |Numeric types          |Numeric
> types            |Numeric types           |
>   25|text      |S          |S              |String types           |String
> types             |String types            |
> ```
> It looks weird, so I'm reporting this to you for review.
>

The bug is in the query, not in Postgres. Try casting typcategory to
text in your failing case to see the difference. As it is the literals
are being cast to "char" because that's what typcategory is.



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com



--
Francisco J. Ossandon
Bioinformatician
Ph.D. in Biotechnology
=?UTF-8?Q?Francisco_J=2E_Ossand=C3=B3n?= <fco.j.ossandon@gmail.com> writes:
> So is the ELSE column hijacking the data type of the whole expression?

It's the only CASE result that is supplying a definite type at all.
But see

https://www.postgresql.org/docs/current/typeconv-union-case.html

particularly the footnote to the bit about "Select the first
non-unknown input type as the candidate type, then consider
each other non-unknown input type, left to right."

The WHEN clauses have exactly nothing to do with the result type
of the CASE: it's the THEN and ELSE clauses that supply the result.

            regards, tom lane



Hello Tom,
Thanks for the explanation and the link to the documentation. I understand now what happened.
So it was a mistake on my side.
Thanks again for the patience and replies.

Best regards,

Francisco


El mié, 28 ago 2024 a las 20:11, Tom Lane (<tgl@sss.pgh.pa.us>) escribió:
Francisco J. Ossandón <fco.j.ossandon@gmail.com> writes:
> So is the ELSE column hijacking the data type of the whole expression?

It's the only CASE result that is supplying a definite type at all.
But see

https://www.postgresql.org/docs/current/typeconv-union-case.html

particularly the footnote to the bit about "Select the first
non-unknown input type as the candidate type, then consider
each other non-unknown input type, left to right."

The WHEN clauses have exactly nothing to do with the result type
of the CASE: it's the THEN and ELSE clauses that supply the result.

                        regards, tom lane


--
Francisco J. Ossandon
Bioinformatician
Ph.D. in Biotechnology
On Wed, Jan 22, 2025 at 3:00 AM Chris BSomething <xpusostomos@gmail.com> wrote:
In the 6 rules for type resolution, rule 5 says "select the first non-unknown input type as candidate, and consider if other non-unknown types can be implicitly converted." ... should there not be a new rule, something like considering if the unknown types can be converted, then notice that a long string can't be converted to a char, because it won't fit? And then fail with an interesting error?

Do you have an example demonstrating this new rule?  I'm doubting this resolution logic goes into that much depth regarding the semantics of each type.  It is implied that even if an implicit cast exists from type A to type B that actual values of type A may fail during the casting process to make them type B.  That would be the interesting error you'd end up seeing - but from the type casting/input system, not the type resolution one.

David J.



On Tue, Feb 18, 2025, 6:31 AM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Wed, Jan 22, 2025 at 3:00 AM Chris BSomething <xpusostomos@gmail.com> wrote:
In the 6 rules for type resolution, rule 5 says "select the first non-unknown input type as candidate, and consider if other non-unknown types can be implicitly converted." ... should there not be a new rule, something like considering if the unknown types can be converted, then notice that a long string can't be converted to a char, because it won't fit? And then fail with an interesting error?

Do you have an example demonstrating this new rule?  I'm doubting this resolution logic goes into that much depth regarding the semantics of each type.  It is implied that even if an implicit cast exists from type A to type B that actual values of type A may fail during the casting process to make them type B.  That would be the interesting error you'd end up seeing - but from the type casting/input system, not the type resolution one.

David J.


That sounds like a lot of words to justify an error that could be caught at parse time, that is actually never caught at all.  It seems to me that whatever type is inferred from a long constant string, it won't fit into a char... At least not without casting it. And therefore an error should come out.






On Mon, Feb 17, 2025 at 3:49 PM Chris BSomething <xpusostomos@gmail.com> wrote:
That sounds like a lot of words to justify an error that could be caught at parse time, that is actually never caught at all.  It seems to me that whatever type is inferred from a long constant string, it won't fit into a char... At least not without casting it. And therefore an error should come out.

I thought you were trying to convince people to modify the documentation?  This reads more like wanting to hack the parser to teach it forbidden knowledge.

In any case I think this email report is closed and burying any discussion about this material here just makes it harder for people to find.

David J.



On Tue, Feb 18, 2025, 7:13 AM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Mon, Feb 17, 2025 at 3:49 PM Chris BSomething <xpusostomos@gmail.com> wrote:
That sounds like a lot of words to justify an error that could be caught at parse time, that is actually never caught at all.  It seems to me that whatever type is inferred from a long constant string, it won't fit into a char... At least not without casting it. And therefore an error should come out.

I thought you were trying to convince people to modify the documentation?  This reads more like wanting to hack the parser to teach it forbidden knowledge.

Hack? Forbidden knowledge? What an odd comment.




"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Wed, Jan 22, 2025 at 3:00 AM Chris BSomething <xpusostomos@gmail.com>
> wrote:
>> ... should there not be a new rule,
>> something like considering if the unknown types can be converted, then
>> notice that a long string can't be converted to a char, because it won't
>> fit? And then fail with an interesting error?

> Do you have an example demonstrating this new rule?  I'm doubting this
> resolution logic goes into that much depth regarding the semantics of each
> type.  It is implied that even if an implicit cast exists from type A to
> type B that actual values of type A may fail during the casting process to
> make them type B.  That would be the interesting error you'd end up seeing
> - but from the type casting/input system, not the type resolution one.

Yeah.  We document somewhere that the contents of a literal string are
*not* considered while deciding what type it should be taken as; thus,
the possibility of a casting failure is not part of the rules.

I think such a rule would be pretty dangerous, even if we could
implement it easily.  As an example, if you entered something you
meant as a timestamp, but you fat-fingered the punctuation or
something, it'd likely fall back to being considered just "text",
potentially silently changing the semantics of the expression.  We had
a lot of trouble with that sort of behavior back when the system had a
lot of implicit casts to text.  We got rid of (most of) those in 8.3
or thereabouts, and it made the semantics generally a lot safer.

            regards, tom lane




On Tue, 18 Feb 2025 at 10:12, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Yeah.  We document somewhere that the contents of a literal string are
*not* considered while deciding what type it should be taken as; thus,
the possibility of a casting failure is not part of the rules.

The length of a literal is not strictly speaking its "contents". One could for example, have its type as varchar(x) where x is its length. And then not have implicit typecasts from varchar(x) to varchar( < x).
 
I think such a rule would be pretty dangerous, even if we could
implement it easily.  As an example, if you entered something you
meant as a timestamp, but you fat-fingered the punctuation or
something, it'd likely fall back to being considered just "text",

I don't get that, I thought timestamps have to be preceded by "TIMESTAMP".

 potentially silently changing the semantics of the expression.  We had
a lot of trouble with that sort of behavior back when the system had a
lot of implicit casts to text.  We got rid of (most of) those in 8.3
or thereabouts, and it made the semantics generally a lot safer.

So... you solved that, it sounds like.

And yet this super dangerous behavior which could sit dormant and undiscovered in a view in a large computer system, doesn't bother anyone? OK, whatever.

Chris





 
On Wednesday, February 19, 2025, Chris BSomething <xpusostomos@gmail.com> wrote:

And yet this super dangerous behavior which could sit dormant and undiscovered in a view in a large computer system, doesn't bother anyone? OK, whatever.


 There a quite a few of those in SQL…testing queries is a requirement.  And being explicit with types is strongly suggested.

To be clear, the complaint is that we allow input functions such as varchar(n) and char to truncate their inputs as opposed to erroring out if the input length is too long. Right?  If not I’m still confused as to what exact behavior you believe is dangerous.

David J.

On 19/02/2025 13:50, Chris BSomething wrote:
>
> On Tue, 18 Feb 2025 at 10:12, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
>     I think such a rule would be pretty dangerous, even if we could
>     implement it easily.  As an example, if you entered something you
>     meant as a timestamp, but you fat-fingered the punctuation or
>     something, it'd likely fall back to being considered just "text",
>
>
> I don't get that, I thought timestamps have to be preceded by "TIMESTAMP".


The SQL standard requires you to do that, but PostgreSQL is a bit more 
lax about it (and in my humble opinion, thankfully so). If you did 
prefix your literals with their type, you wouldn't be having these 
problems in the first place. The problem stems from postgres treating 
your unmarked literal as "unknown" and then trying to figure out what 
the heck it is.

-- 

Vik Fearing