Re: BUG #18568: BUG: Result wrong when do group by on partition table! - Mailing list pgsql-bugs

From Tender Wang
Subject Re: BUG #18568: BUG: Result wrong when do group by on partition table!
Date
Msg-id CAHewXNnyWUEmdHrRK3yg4k2TzSbb5WnkKLWxyO+OVZPhPFX7ew@mail.gmail.com
Whole thread Raw
In response to Re: BUG #18568: BUG: Result wrong when do group by on partition table!  (Amit Langote <amitlangote09@gmail.com>)
Responses Re: BUG #18568: BUG: Result wrong when do group by on partition table!
List pgsql-bugs


Amit Langote <amitlangote09@gmail.com> 于2024年10月22日周二 17:25写道:
On Tue, Oct 22, 2024 at 5:30 PM Tender Wang <tndrwang@gmail.com> wrote:
> Amit Langote <amitlangote09@gmail.com> 于2024年10月22日周二 15:33写道:
>> Not really.  As the documentation says, collation can be specified per
>> column or per operation:
>>
>> https://www.postgresql.org/docs/current/collation.html
>>
>> In this case, the operation is partitioning.  When you specify the
>> COLLATE clause for a partition key, it means that the partitioning
>> logic, such as partition tuple routing, will use that collation
>> instead of the column-specified or the column type's collation.
>
>
> Since you said partition key had its own collation, and but we used column type's collation in
>  set_baserel_partition_key_exprs() as below:
>
> partexpr = (Expr *) makeVar(varno, attno,
> partkey->parttypid[cnt],
> partkey->parttypmod[cnt],
> partkey->parttypcoll[cnt], 0);
>
> I think why not we directly use the partition key collation(e.g.  partcollation).

That's a good question but I don't immediately know the answer.

It seems like it has been like this since the beginning or since the
commit that added the RelOptInfo.partexprs field (9140cf8269).

Hi Amit, 

I have another case that can confirm what I suspect.  I now think in set_baserel_partition_key_exprs(),
we should use partkey->partcollation[cnt] instead of partkey->parttypcoll[cnt].
I find a wrong result when enable partitionwise_join, as below:

postgres=# set enable_partitionwise_join = on;
SET
postgres=# set max_parallel_workers_per_gather = 0;
SET
postgres=# explain select count(*) from pagg_tab t1 join pagg_tab t2 on t1.c = t2.c;
                                         QUERY PLAN
--------------------------------------------------------------------------------------------
 Aggregate  (cost=17010.00..17010.01 rows=1 width=8)
   ->  Append  (cost=16.50..14760.00 rows=900000 width=0)
         ->  Hash Join  (cost=16.50..2052.00 rows=180000 width=0)
               Hash Cond: (t1_1.c = t2_1.c)
               ->  Seq Scan on pagg_tab_p2 t1_1  (cost=0.00..9.00 rows=600 width=2)
               ->  Hash  (cost=9.00..9.00 rows=600 width=2)
                     ->  Seq Scan on pagg_tab_p2 t2_1  (cost=0.00..9.00 rows=600 width=2)
         ->  Hash Join  (cost=33.00..4104.00 rows=360000 width=0)
               Hash Cond: (t1_2.c = t2_2.c)
               ->  Seq Scan on pagg_tab_p3 t1_2  (cost=0.00..18.00 rows=1200 width=2)
               ->  Hash  (cost=18.00..18.00 rows=1200 width=2)
                     ->  Seq Scan on pagg_tab_p3 t2_2  (cost=0.00..18.00 rows=1200 width=2)
         ->  Hash Join  (cost=33.00..4104.00 rows=360000 width=0)
               Hash Cond: (t1_3.c = t2_3.c)
               ->  Seq Scan on pagg_tab_p1 t1_3  (cost=0.00..18.00 rows=1200 width=2)
               ->  Hash  (cost=18.00..18.00 rows=1200 width=2)
                     ->  Seq Scan on pagg_tab_p1 t2_3  (cost=0.00..18.00 rows=1200 width=2)
(17 rows)

postgres=#  select count(*) from pagg_tab t1 join pagg_tab t2 on t1.c = t2.c;
 count
--------
 900000
(1 row)

postgres=# set enable_partitionwise_join = off;
SET
postgres=# explain select count(*) from pagg_tab t1 join pagg_tab t2 on t1.c = t2.c;
                                         QUERY PLAN
--------------------------------------------------------------------------------------------
 Aggregate  (cost=1269.02..1269.03 rows=1 width=8)
   ->  Merge Join  (cost=466.52..1156.52 rows=45000 width=0)
         Merge Cond: (t1.c = t2.c)
         ->  Sort  (cost=233.26..240.76 rows=3000 width=2)
               Sort Key: t1.c COLLATE case_insensitive
               ->  Append  (cost=0.00..60.00 rows=3000 width=2)
                     ->  Seq Scan on pagg_tab_p2 t1_1  (cost=0.00..9.00 rows=600 width=2)
                     ->  Seq Scan on pagg_tab_p3 t1_2  (cost=0.00..18.00 rows=1200 width=2)
                     ->  Seq Scan on pagg_tab_p1 t1_3  (cost=0.00..18.00 rows=1200 width=2)
         ->  Sort  (cost=233.26..240.76 rows=3000 width=2)
               Sort Key: t2.c COLLATE case_insensitive
               ->  Append  (cost=0.00..60.00 rows=3000 width=2)
                     ->  Seq Scan on pagg_tab_p2 t2_1  (cost=0.00..9.00 rows=600 width=2)
                     ->  Seq Scan on pagg_tab_p3 t2_2  (cost=0.00..18.00 rows=1200 width=2)
                     ->  Seq Scan on pagg_tab_p1 t2_3  (cost=0.00..18.00 rows=1200 width=2)
(15 rows)

postgres=#  select count(*) from pagg_tab t1 join pagg_tab t2 on t1.c = t2.c;
  count
---------
 1800000
(1 row)

I gave a try that use partkey->partcollation[cnt] then the query result is correct when enable_partitionwise_join is on.

And I found some codes only use equal not check the collation, for example, in match_expr_to_partition_keys():
if (equal(lfirst(lc), expr))
lfirst(lc) is partkeys.
I thinks other codes assume that the collation of partexprs in RelOptInfo is same with pg_partitioned_table. But it is not.

--
Thanks,
Tender Wang

pgsql-bugs by date:

Previous
From: Amit Langote
Date:
Subject: Re: BUG #18568: BUG: Result wrong when do group by on partition table!
Next
From: "Hayato Kuroda (Fujitsu)"
Date:
Subject: RE: BUG #18644: ALTER PUBLICATION ... SET (publish_via_partition_root) wrong/undocumented behavior.