BUG #17084: Wrong results of distinct node. - Mailing list pgsql-bugs
From | PG Bug reporting form |
---|---|
Subject | BUG #17084: Wrong results of distinct node. |
Date | |
Msg-id | 17084-9f83b67ab464cc69@postgresql.org Whole thread Raw |
Responses |
Re: BUG #17084: Wrong results of distinct node.
Re: BUG #17084: Wrong results of distinct node. |
List | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 17084 Logged by: Maxim Boguk Email address: maxim.boguk@gmail.com PostgreSQL version: 13.3 Operating system: Linux Description: During investigation of weird app behaviour I found very strange results with sorting/dising of some dataset. Issue was reproduced on 12.7 server and 13.3 my local laptop. Dataset table dump (25MB file) available per request. QUERY: WITH t1 AS ( SELECT ad_id, advertiser_id, campaign_id, campaign_type_id, ad_group_id, target_id, device_type_id, country_code, block_format_type_id, category_group_id, array_agg(geo_id) AS geo_ids FROM test GROUP BY ad_id, advertiser_id, campaign_id, campaign_type_id, ad_group_id, target_id, device_type_id, country_code, block_format_type_id, category_group_id ), t2 AS ( SELECT DISTINCT ad_group_id, country_code, device_type_id, target_id, block_format_type_id, category_group_id, geo_ids FROM t1 ) SELECT COUNT(*) FROM t2; Test: set jit to off; set max_parallel_workers_per_gather to 0; set work_mem to 4MB; count ------- 83921 set work_mem to '64MB'; count ------- 55634 Plan with 4MB work_mem: Aggregate (cost=3258730.52..3258730.53 rows=1 width=8) (actual time=16607.679..16607.680 rows=1 loops=1) -> HashAggregate (cost=3232649.49..3257461.37 rows=101532 width=67) (actual time=16126.315..16604.394 rows=83700 loops=1) Group Key: test.ad_group_id, test.country_code, test.device_type_id, test.target_id, test.block_format_type_id, test.category_group_id, array_agg(test.geo_id) Planned Partitions: 8 Batches: 77 Memory Usage: 4313kB Disk Usage: 139784kB -> GroupAggregate (cost=2771948.04..3089235.54 rows=1015320 width=99) (actual time=12097.536..15675.807 rows=705836 loops=1) Group Key: test.ad_id, test.advertiser_id, test.campaign_id, test.campaign_type_id, test.ad_group_id, test.target_id, test.device_type_id, test.country_code, test.block_format_type_id, test.category_group_id -> Sort (cost=2771948.04..2797331.04 rows=10153200 width=75) (actual time=12097.524..13354.731 rows=10153197 loops=1) Sort Key: test.ad_id, test.advertiser_id, test.campaign_id, test.campaign_type_id, test.ad_group_id, test.target_id, test.device_type_id, test.country_code, test.block_format_type_id, test.category_group_id Sort Method: external merge Disk: 894184kB -> Seq Scan on test (cost=0.00..236908.00 rows=10153200 width=75) (actual time=0.015..779.722 rows=10153197 loops=1) Planning Time: 0.344 ms Execution Time: 16730.304 ms QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=2051062.74..2051062.75 rows=1 width=8) (actual time=8404.880..8404.882 rows=1 loops=1) -> HashAggregate (cost=2048778.27..2049793.59 rows=101532 width=67) (actual time=8390.952..8402.777 rows=55634 loops=1) Group Key: test.ad_group_id, test.country_code, test.device_type_id, test.target_id, test.block_format_type_id, test.category_group_id, array_agg(test.geo_id) Batches: 1 Memory Usage: 19473kB -> HashAggregate (cost=1750369.38..2020856.97 rows=1015320 width=99) (actual time=4478.394..7879.103 rows=705836 loops=1) Group Key: test.ad_id, test.advertiser_id, test.campaign_id, test.campaign_type_id, test.ad_group_id, test.target_id, test.device_type_id, test.country_code, test.block_format_type_id, test.category_group_id Planned Partitions: 256 Batches: 257 Memory Usage: 6553kB Disk Usage: 1102856kB -> Seq Scan on test (cost=0.00..236908.00 rows=10153200 width=75) (actual time=0.020..788.698 rows=10153197 loops=1) Planning Time: 0.301 ms Execution Time: 8538.236 ms Correnct answers provided with larger work_mem (55634) What make situatio even more curious that disable hash_agg doesn't make issue gone: set enable_hashagg to 0; set work_mem to '4MB'; ... count ------- 83700 Aggregate (cost=3305553.24..3305553.25 rows=1 width=8) (actual time=18227.164..18227.167 rows=1 loops=1) -> Unique (cost=3283977.69..3304284.09 rows=101532 width=67) (actual time=17595.189..18223.935 rows=83700 loops=1) -> Sort (cost=3283977.69..3286515.99 rows=1015320 width=67) (actual time=17595.188..18042.457 rows=705836 loops=1) Sort Key: t1.ad_group_id, t1.country_code, t1.device_type_id, t1.target_id, t1.block_format_type_id, t1.category_group_id, t1.geo_ids Sort Method: external merge Disk: 128680kB -> Subquery Scan on t1 (cost=2771948.04..3099388.74 rows=1015320 width=67) (actual time=11965.394..15750.759 rows=705836 loops=1) -> GroupAggregate (cost=2771948.04..3089235.54 rows=1015320 width=99) (actual time=11965.392..15695.102 rows=705836 loops=1) Group Key: test.ad_id, test.advertiser_id, test.campaign_id, test.campaign_type_id, test.ad_group_id, test.target_id, test.device_type_id, test.country_code, test.block_format_type_id, test.category_group_id -> Sort (cost=2771948.04..2797331.04 rows=10153200 width=75) (actual time=11965.378..13283.143 rows=10153197 loops=1) Sort Key: test.ad_id, test.advertiser_id, test.campaign_id, test.campaign_type_id, test.ad_group_id, test.target_id, test.device_type_id, test.country_code, test.block_format_type_id, test.category_group_id Sort Method: external merge Disk: 894184kB -> Seq Scan on test (cost=0.00..236908.00 rows=10153200 width=75) (actual time=0.014..770.040 rows=10153197 loops=1) Planning Time: 0.315 ms Execution Time: 18503.307 ms --Maxim Boguk
pgsql-bugs by date: