Re: Explicit deterministic COLLATE fails with pattern matching operations on column with non-deterministic collation - Mailing list pgsql-bugs
From | Tom Lane |
---|---|
Subject | Re: Explicit deterministic COLLATE fails with pattern matching operations on column with non-deterministic collation |
Date | |
Msg-id | 5075.1590625286@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Explicit deterministic COLLATE fails with pattern matchingoperations on column with non-deterministic collation (James Lucas <jlucasdba@gmail.com>) |
Responses |
Re: Explicit deterministic COLLATE fails with pattern matchingoperations on column with non-deterministic collation
Re: Explicit deterministic COLLATE fails with pattern matching operations on column with non-deterministic collation |
List | pgsql-bugs |
James Lucas <jlucasdba@gmail.com> writes: > explain select * from ctestnd where t like 'a%' collate "C"; > ERROR: nondeterministic collations are not supported for LIKE Yeah. I traced through this, and the place where it's failing is where the planner tries to apply the LIKE operator to the stored MCV values (to see how many of them pass the condition, which gives us a big clue about the selectivity). Unfortunately, per the comments in selfuncs.c, * For both oprrest and oprjoin functions, the operator's input collation OID * (if any) is passed using the standard fmgr mechanism, so that the estimator * function can fetch it with PG_GET_COLLATION(). Note, however, that all * statistics in pg_statistic are currently built using the relevant column's * collation. Thus, in most cases where we are looking at statistics, we * should ignore the operator collation and use the stats entry's collation. * We expect that the error induced by doing this is usually not large enough * to justify complicating matters. In any case, doing otherwise would yield * entirely garbage results for ordered stats data such as histograms. mcv_selectivity is following this advice and applying LIKE with the ctestnd.t column's declared collation ... and then the operator throws an error. The idea that using the "wrong" collation might actually cause an error was not factored into this design, obviously. I'm not sure offhand what to do about it. If we go over to using the query's collation then we avoid that issue, but instead we have the problem noted in this comment about the histogram sort order not matching what the operator expects. (In the case of mcv_selectivity the sort order isn't really an issue, but it is an issue for sibling functions such as ineq_histogram_selectivity.) This issue only dates back to commit 5e0928005; before that, we just blindly passed DEFAULT_COLLATION_OID to operators being evaluated for estimation purposes. (I suppose if you made the database's default collation nondeterministic, you could still get into trouble; but that case may not be reachable right now.) On the other hand, the actual breakage is even newer, because nondeterministic collations weren't added until 5e1963fb7, several months later. Both of those are v12 cycle, so it's academic from a user's standpoint which one we blame; but the upshot is that this case doesn't work. Ideally, no operator would ever throw an error about unsupported collations, but I suppose that day is far away. I guess the path of least resistance is to change the selectivity functions to use the query's collation; then, if you get an error here you would have done so at runtime anyway. The problem of inconsistency with the histogram collation will be real for ineq_histogram_selectivity; but we had a variant of that before, in that always using DEFAULT_COLLATION_OID would give answers that were wrong for a query using a different collation. Peter, any other thoughts? regards, tom lane
pgsql-bugs by date: