examine_variable ignored CollateExpr - Mailing list pgsql-hackers

From jian he
Subject examine_variable ignored CollateExpr
Date
Msg-id CACJufxGLCiyhM+P0gxesg2x--PTrMY3PszqSqOq_H4QS_oq3Jg@mail.gmail.com
Whole thread Raw
List pgsql-hackers
hi.

ComputeIndexAttrs:
src/backend/commands/indexcmds.c, line 1995:
            /*
             * Strip any top-level COLLATE clause.  This ensures that we treat
             * "x COLLATE y" and "(x COLLATE y)" alike.
             */
            while (IsA(expr, CollateExpr))
                expr = (Node *) ((CollateExpr *) expr)->arg;

That means in function examine_variable:
``foreach(ilist, onerel->indexlist)``
we don't need to worry about RelabelType transformed from COLLATE clause.

CreateStatistics does not have special handling for CollateExpr. Query like:
CREATE STATISTICS t_stx1 ON (a collate "C" collate "C" collate "C"
collate "POSIX") FROM t;
the stored statistics expression node will be a nested CollateExpr.

For the above example,
get_relation_info->get_relation_statistics->eval_const_expressions will
transform the nested CollateExpr into one RelabelType node if column "a"
collation is not POSIX.

RelabelType can be from the CollateExpr transformation, as mentioned above.  For
statistics, say ndistinct, ``GROUP BY A COLLATE "C"`` can have different results
from ``GROUP BY A COLLATE case_insensitive``.

Therefore in examine_variable, we can not just simple use:
``
       while (IsA(basenode, RelabelType))
               basenode = (Node *) ((RelabelType *) basenode)->arg;
``

demo:
CREATE TABLE test_stats_ext_coll (a text, b text, c int);
INSERT INTO test_stats_ext_coll SELECT chr(65 + g % 52) FROM
generate_series(1, 500) g;
CREATE STATISTICS test_stats_ext_coll_stx ON (a COLLATE
case_insensitive) FROM test_stats_ext_coll;
ANALYZE test_stats_ext_coll;

SELECT * FROM check_estimated_rows ($$SELECT count(*) FROM
test_stats_ext_coll GROUP BY a COLLATE case_insensitive $$);

with HEAD:
 estimated | actual
-----------+--------
        52 |     32

with attached PATCH:
 estimated | actual
-----------+--------
        32 |     32

function check_estimated_rows is in src/test/regress/sql/stats_ext.sql.

While working on this, I wondered whether we could move check_estimated_rows
from stats_ext.sql to test_setup.sql so that it can be reused more broadly.



--
jian
https://www.enterprisedb.com/

Attachment

pgsql-hackers by date:

Previous
From: Steven Niu
Date:
Subject: Re: str_casefold: fix typo in error message
Next
From: David Rowley
Date:
Subject: Re: Patch: dumping tables data in multiple chunks in pg_dump