RE: BUG #16031: Group by returns duplicate groups - Mailing list pgsql-bugs
From | David Raymond |
---|---|
Subject | RE: BUG #16031: Group by returns duplicate groups |
Date | |
Msg-id | VI1PR07MB60298C48FF9982EACF0B3F39879B0@VI1PR07MB6029.eurprd07.prod.outlook.com Whole thread Raw |
In response to | Re: BUG #16031: Group by returns duplicate groups (Andrew Gierth <andrew@tao11.riddles.org.uk>) |
Responses |
Re: BUG #16031: Group by returns duplicate groups
|
List | pgsql-bugs |
Downloaded and installed 12.0, created a nice shiny new cluster, and confirmed that it's still doing it. Now in 12 you haveto force it to materialize the CTE, which was why I had used a CTE in 11 in the first place. testing=> select version(); version ------------------------------------------------------------ PostgreSQL 12.0, compiled by Visual C++ build 1914, 64-bit (1 row) Time: 0.148 ms testing=> \d+ weird_grouping Table "name_stuff.weird_grouping" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description --------+------------------------+-----------+----------+---------+----------+--------------+------------- name | character varying(254) | | not null | | extended | | Access method: heap testing=> analyze verbose weird_grouping; INFO: analyzing "name_stuff.weird_grouping" INFO: "weird_grouping": scanned 8316 of 8316 pages, containing 1297265 live rows and 0 dead rows; 30000 rows in sample,1297265 estimated total rows ANALYZE Time: 206.577 ms testing=> select count(*), count(distinct name) from weird_grouping; count | count -----------+----------- 1,297,265 | 1,176,103 (1 row) Time: 6729.011 ms (00:06.729) testing=> with foo as (select name from weird_grouping group by name) select name from foo group by name having count(*)> 1; name ------ (0 rows) Time: 7289.128 ms (00:07.289) testing=> explain (analyze, verbose, costs, settings, buffers, timing, summary) with foo as (select name from weird_groupinggroup by name) select name from foo group by name having count(*) > 1; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------- GroupAggregate (cost=179613.72..200268.04 rows=67 width=20) (actual time=6203.868..6203.868 rows=0 loops=1) Output: weird_grouping.name Group Key: weird_grouping.name Filter: (count(*) > 1) Rows Removed by Filter: 1176103 Buffers: shared hit=2464 read=5852, temp read=7800 written=7830 -> Group (cost=179613.72..186100.05 rows=944366 width=20) (actual time=4769.781..5985.111 rows=1176103 loops=1) Output: weird_grouping.name Group Key: weird_grouping.name Buffers: shared hit=2464 read=5852, temp read=7800 written=7830 -> Sort (cost=179613.72..182856.89 rows=1297265 width=20) (actual time=4769.779..5844.350 rows=1297265 loops=1) Output: weird_grouping.name Sort Key: weird_grouping.name Sort Method: external merge Disk: 39048kB Buffers: shared hit=2464 read=5852, temp read=7800 written=7830 -> Seq Scan on name_stuff.weird_grouping (cost=0.00..21288.65 rows=1297265 width=20) (actual time=0.059..102.772rows=1297265 loops=1) Output: weird_grouping.name Buffers: shared hit=2464 read=5852 Settings: search_path = 'name_stuff' Planning Time: 0.048 ms Execution Time: 7115.761 ms (21 rows) Time: 7116.170 ms (00:07.116) testing=> with foo as materialized (select name from weird_grouping group by name) select name from foo group by name havingcount(*) > 1; name ------- DCT DELTA (2 rows) Time: 8850.833 ms (00:08.851) testing=> explain (analyze, verbose, costs, settings, buffers, timing, summary) with foo as materialized (select name fromweird_grouping group by name) select name from foo group by name having count(*) > 1; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------- HashAggregate (cost=209709.20..209711.70 rows=67 width=516) (actual time=6676.811..6768.094 rows=2 loops=1) Output: foo.name Group Key: foo.name Filter: (count(*) > 1) Rows Removed by Filter: 1176099 Buffers: shared hit=2528 read=5788, temp read=7800 written=12363 CTE foo -> Group (cost=179613.72..186100.05 rows=944366 width=20) (actual time=4774.681..6004.725 rows=1176103 loops=1) Output: weird_grouping.name Group Key: weird_grouping.name Buffers: shared hit=2528 read=5788, temp read=7800 written=7830 -> Sort (cost=179613.72..182856.89 rows=1297265 width=20) (actual time=4774.678..5860.270 rows=1297265 loops=1) Output: weird_grouping.name Sort Key: weird_grouping.name Sort Method: external merge Disk: 39048kB Buffers: shared hit=2528 read=5788, temp read=7800 written=7830 -> Seq Scan on name_stuff.weird_grouping (cost=0.00..21288.65 rows=1297265 width=20) (actual time=0.065..101.141rows=1297265 loops=1) Output: weird_grouping.name Buffers: shared hit=2528 read=5788 -> CTE Scan on foo (cost=0.00..18887.32 rows=944366 width=516) (actual time=4774.683..6228.002 rows=1176103 loops=1) Output: foo.name Buffers: shared hit=2528 read=5788, temp read=7800 written=12363 Settings: search_path = 'name_stuff' Planning Time: 0.054 ms Execution Time: 8786.597 ms (25 rows) Time: 8787.011 ms (00:08.787) testing=> -----Original Message----- From: Andrew Gierth <andrew@tao11.riddles.org.uk> Sent: Sunday, October 6, 2019 10:29 AM To: David Raymond <David.Raymond@tomtom.com> Cc: pgsql-bugs@lists.postgresql.org Subject: Re: BUG #16031: Group by returns duplicate groups >>>>> "David" == David Raymond <David.Raymond@tomtom.com> writes: David> As an update, I've currently got a dump that consistently shows David> weirdness when loaded. It's just the "name" field, has 1.3 David> million records, is 15 MB zipped, and has things garbled enough David> that I don't mind sending it. David> How small does it need to be before it's good to send to David> someone? That's small enough for me, though since I don't use Windows all I'll be able to do is check if you're exposing some general PG bug. If not I'll see if I can find someone to test on Windows. -- Andrew (irc:RhodiumToad)
pgsql-bugs by date: