Home > mailing lists

RE: BUG #16031: Group by returns duplicate groups - Mailing list pgsql-bugs

From	David Raymond
Subject	RE: BUG #16031: Group by returns duplicate groups
Date	October 7, 2019 18:29:04
Msg-id	VI1PR07MB60298C48FF9982EACF0B3F39879B0@VI1PR07MB6029.eurprd07.prod.outlook.com Whole thread Raw
In response to	Re: BUG #16031: Group by returns duplicate groups (Andrew Gierth <andrew@tao11.riddles.org.uk>)
Responses	Re: BUG #16031: Group by returns duplicate groups
List	pgsql-bugs

Tree view

Downloaded and installed 12.0, created a nice shiny new cluster, and confirmed that it's still doing it. Now in 12 you
haveto force it to materialize the CTE, which was why I had used a CTE in 11 in the first place.
 


testing=> select version();
                          version
------------------------------------------------------------
 PostgreSQL 12.0, compiled by Visual C++ build 1914, 64-bit
(1 row)

Time: 0.148 ms
testing=> \d+ weird_grouping
                                    Table "name_stuff.weird_grouping"
 Column |          Type          | Collation | Nullable | Default | Storage  | Stats target | Description
--------+------------------------+-----------+----------+---------+----------+--------------+-------------
 name   | character varying(254) |           | not null |         | extended |              |
Access method: heap

testing=> analyze verbose weird_grouping;
INFO:  analyzing "name_stuff.weird_grouping"
INFO:  "weird_grouping": scanned 8316 of 8316 pages, containing 1297265 live rows and 0 dead rows; 30000 rows in
sample,1297265 estimated total rows
 
ANALYZE
Time: 206.577 ms
testing=> select count(*), count(distinct name) from weird_grouping;
   count   |   count
-----------+-----------
 1,297,265 | 1,176,103
(1 row)

Time: 6729.011 ms (00:06.729)
testing=> with foo as (select name from weird_grouping group by name) select name from foo group by name having
count(*)> 1;
 
 name
------
(0 rows)

Time: 7289.128 ms (00:07.289)
testing=> explain (analyze, verbose, costs, settings, buffers, timing, summary) with foo as (select name from
weird_groupinggroup by name) select name from foo group by name having count(*) > 1;
 
                                                                       QUERY PLAN
           
 

--------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=179613.72..200268.04 rows=67 width=20) (actual time=6203.868..6203.868 rows=0 loops=1)
   Output: weird_grouping.name
   Group Key: weird_grouping.name
   Filter: (count(*) > 1)
   Rows Removed by Filter: 1176103
   Buffers: shared hit=2464 read=5852, temp read=7800 written=7830
   ->  Group  (cost=179613.72..186100.05 rows=944366 width=20) (actual time=4769.781..5985.111 rows=1176103 loops=1)
         Output: weird_grouping.name
         Group Key: weird_grouping.name
         Buffers: shared hit=2464 read=5852, temp read=7800 written=7830
         ->  Sort  (cost=179613.72..182856.89 rows=1297265 width=20) (actual time=4769.779..5844.350 rows=1297265
loops=1)
               Output: weird_grouping.name
               Sort Key: weird_grouping.name
               Sort Method: external merge  Disk: 39048kB
               Buffers: shared hit=2464 read=5852, temp read=7800 written=7830
               ->  Seq Scan on name_stuff.weird_grouping  (cost=0.00..21288.65 rows=1297265 width=20) (actual
time=0.059..102.772rows=1297265 loops=1)
 
                     Output: weird_grouping.name
                     Buffers: shared hit=2464 read=5852
 Settings: search_path = 'name_stuff'
 Planning Time: 0.048 ms
 Execution Time: 7115.761 ms
(21 rows)

Time: 7116.170 ms (00:07.116)
testing=> with foo as materialized (select name from weird_grouping group by name) select name from foo group by name
havingcount(*) > 1;
 
 name
-------
 DCT
 DELTA
(2 rows)

Time: 8850.833 ms (00:08.851)
testing=> explain (analyze, verbose, costs, settings, buffers, timing, summary) with foo as materialized (select name
fromweird_grouping group by name) select name from foo group by name having count(*) > 1;
 
                                                                        QUERY PLAN
           
 

----------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=209709.20..209711.70 rows=67 width=516) (actual time=6676.811..6768.094 rows=2 loops=1)
   Output: foo.name
   Group Key: foo.name
   Filter: (count(*) > 1)
   Rows Removed by Filter: 1176099
   Buffers: shared hit=2528 read=5788, temp read=7800 written=12363
   CTE foo
     ->  Group  (cost=179613.72..186100.05 rows=944366 width=20) (actual time=4774.681..6004.725 rows=1176103 loops=1)
           Output: weird_grouping.name
           Group Key: weird_grouping.name
           Buffers: shared hit=2528 read=5788, temp read=7800 written=7830
           ->  Sort  (cost=179613.72..182856.89 rows=1297265 width=20) (actual time=4774.678..5860.270 rows=1297265
loops=1)
                 Output: weird_grouping.name
                 Sort Key: weird_grouping.name
                 Sort Method: external merge  Disk: 39048kB
                 Buffers: shared hit=2528 read=5788, temp read=7800 written=7830
                 ->  Seq Scan on name_stuff.weird_grouping  (cost=0.00..21288.65 rows=1297265 width=20) (actual
time=0.065..101.141rows=1297265 loops=1)
 
                       Output: weird_grouping.name
                       Buffers: shared hit=2528 read=5788
   ->  CTE Scan on foo  (cost=0.00..18887.32 rows=944366 width=516) (actual time=4774.683..6228.002 rows=1176103
loops=1)
         Output: foo.name
         Buffers: shared hit=2528 read=5788, temp read=7800 written=12363
 Settings: search_path = 'name_stuff'
 Planning Time: 0.054 ms
 Execution Time: 8786.597 ms
(25 rows)

Time: 8787.011 ms (00:08.787)
testing=>


-----Original Message-----
From: Andrew Gierth <andrew@tao11.riddles.org.uk> 
Sent: Sunday, October 6, 2019 10:29 AM
To: David Raymond <David.Raymond@tomtom.com>
Cc: pgsql-bugs@lists.postgresql.org
Subject: Re: BUG #16031: Group by returns duplicate groups

>>>>> "David" == David Raymond <David.Raymond@tomtom.com> writes:

 David> As an update, I've currently got a dump that consistently shows
 David> weirdness when loaded. It's just the "name" field, has 1.3
 David> million records, is 15 MB zipped, and has things garbled enough
 David> that I don't mind sending it.
 
 David> How small does it need to be before it's good to send to
 David> someone?

That's small enough for me, though since I don't use Windows all I'll be
able to do is check if you're exposing some general PG bug. If not I'll
see if I can find someone to test on Windows.

-- 
Andrew (irc:RhodiumToad)

pgsql-bugs by date:

From: Rob Emery
Date: 07 October 2019, 15:52:41
Subject: Re: [PATCH] Re: BUG #16032: pg_basebackup when running on Windows doesn't clean up on failure correctly

From: Andrew Gierth
Date: 07 October 2019, 18:29:11
Subject: Re: BUG #16031: Group by returns duplicate groups

RE: BUG #16031: Group by returns duplicate groups - Mailing list pgsql-bugs

Previous

Next