BUG #16869: GROUP BY on primary key unnecessarily triggers a full table scan - Mailing list pgsql-bugs
From | PG Bug reporting form |
---|---|
Subject | BUG #16869: GROUP BY on primary key unnecessarily triggers a full table scan |
Date | |
Msg-id | 16869-26346b77d6ccaeec@postgresql.org Whole thread Raw |
Responses |
Re: BUG #16869: GROUP BY on primary key unnecessarily triggers a full table scan
|
List | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 16869 Logged by: Carroll Wainwright Email address: carroll@clwainwright.net PostgreSQL version: 13.2 Operating system: macos 10.14.6 Description: I'm trying to do a very simple query with a GROUP BY and an indexed column, but if I group by the primary key I end up having to do a full table scan and the indexed column does not get used. Consider the following simple set up: ``` CREATE TABLE foo ( id serial NOT NULL PRIMARY KEY, name varchar(100) NOT NULL, data real NOT NULL ); INSERT INTO foo (name, data) SELECT md5(random()::text), random() FROM generate_series(1, 100000); CREATE INDEX foo_idx ON foo ("data"); ``` Then the following query causes a full table scan (about 100ms on my computer): ``` SELECT id, name, data FROM foo GROUP BY id ORDER BY data LIMIT 10; ``` The EXPLAIN ANALYZE output is: ``` Limit (cost=8755.26..8755.28 rows=10 width=226) (actual time=103.486..103.495 rows=10 loops=1) -> Sort (cost=8755.26..9005.26 rows=100000 width=226) (actual time=103.484..103.489 rows=10 loops=1) Sort Key: data Sort Method: top-N heapsort Memory: 26kB -> Group (cost=0.29..6594.29 rows=100000 width=226) (actual time=0.037..66.625 rows=100000 loops=1) Group Key: id -> Index Scan using foo_pkey on foo (cost=0.29..6344.29 rows=100000 width=226) (actual time=0.033..31.239 rows=100000 loops=1) Planning Time: 0.158 ms Execution Time: 103.589 ms ``` Note the index scan on "foo_pkey", which should not be necessary. Obviously, the grouping doesn't do anything here, but it would if I had joined with some secondary table on which I wanted to aggregate (which is how I found this bug in the first place). If I drop the GROUP BY the query executes very quickly (< 1ms). Interestingly, the following queries *also* execute very quickly using the index: ``` SELECT name, data FROM foo GROUP BY name, data ORDER BY data LIMIT 10; SELECT COALESCE(id) as foo_id, name, data FROM foo GROUP BY foo_id, name, data ORDER BY data LIMIT 10; ``` This last query, which ought to be functionally equivalent to `GROUP BY id` yields the following execution plan: ``` Limit (cost=67.83..69.38 rows=10 width=226) (actual time=0.139..0.158 rows=10 loops=1) -> Group (cost=67.83..15636.35 rows=100000 width=226) (actual time=0.137..0.152 rows=10 loops=1) Group Key: data, (COALESCE(id)), name -> Incremental Sort (cost=67.83..14886.35 rows=100000 width=226) (actual time=0.134..0.137 rows=10 loops=1) Sort Key: data, (COALESCE(id)), name Presorted Key: data Full-sort Groups: 1 Sort Method: quicksort Average Memory: 27kB Peak Memory: 27kB -> Index Scan using foo_idx on foo (cost=0.29..6344.29 rows=100000 width=226) (actual time=0.028..0.093 rows=33 loops=1) Planning Time: 0.212 ms Execution Time: 0.205 ms ``` Note that it's properly using the "foo_idx" index here, and executes very fast. So something is going on when grouping by the primary key. This seems like a bug, or, at the very least, very unintuitive behavior. I didn't see anything in the documentation implying that this should be expected. SELECT version() yields: ``` PostgreSQL 13.2 on x86_64-apple-darwin19.6.0, compiled by Apple clang version 11.0.3 (clang-1103.0.32.62), 64-bit ``` but I first saw this behavior on a PostgreSQL 10 installation, so it's not new.
pgsql-bugs by date: