BUG #15922: Simple select with multiple exists filters returns duplicates from a primary key field - Mailing list pgsql-bugs
From | PG Bug reporting form |
---|---|
Subject | BUG #15922: Simple select with multiple exists filters returns duplicates from a primary key field |
Date | |
Msg-id | 15922-969d62a9b4d95ee1@postgresql.org Whole thread Raw |
Responses |
Re: BUG #15922: Simple select with multiple exists filters returns duplicates from a primary key field
|
List | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 15922 Logged by: David Raymond Email address: david.raymond@tomtom.com PostgreSQL version: 11.4 Operating system: Windows 7 Description: Running version 11.4 on Windows 7, EnterpriseDB installer. I have a case where I'm doing a simple select from a table but I'm getting returned duplicates from its primary key field when I have two different exists statements in the where clause. I simplified the tables down to the bare bones, dumped them out with pg_dump and re-loaded them. Immediately after loading the data the query returns the correct, consistent answer. Then after running analyze it starts returning the duplicates. The query is this: select count(*), count(distinct id) from n where exists(select 1 from n2h where n_id = n.id) and exists(select 1 from n2a where n_id = n.id and a_id = '00005831-4900-1200-0000-0000773ae45f'); If either one of the 2 exists statements is not included then the results don't have duplicates, it's only when they are both included that there's a problem. The three tables involved have 2 million rows in them, so I can't include the full dump in the form here. Below is output from psql starting immediately after the fresh table load and before analyze is done. Shows the table layouts, the record counts, and the explain results for the same query before and after analyze is done. Please let me know what the next bits you need from me are. Thank you, -David Raymond <david.raymond@tomtom.com> mnr=> set search_path to bug_test; SET Time: 1.427 ms mnr=> \d+ List of relations Schema | Name | Type | Owner | Size | Description ----------+------+-------+-------+---------+------------- bug_test | n | table | mnr | 16 MB | bug_test | n2a | table | mnr | 89 MB | bug_test | n2h | table | mnr | 5800 kB | (3 rows) mnr=> \d+ n Table "bug_test.n" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description --------+------+-----------+----------+---------+---------+--------------+------------- id | uuid | | not null | | plain | | Indexes: "n_pkey" PRIMARY KEY, btree (id) Referenced by: TABLE "n2a" CONSTRAINT "n2a_n_id_fkey" FOREIGN KEY (n_id) REFERENCES n(id) TABLE "n2h" CONSTRAINT "n2h_n_id_fkey" FOREIGN KEY (n_id) REFERENCES n(id) mnr=> \d+ n2a Table "bug_test.n2a" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description --------+------+-----------+----------+---------+---------+--------------+------------- n_id | uuid | | not null | | plain | | a_id | uuid | | not null | | plain | | Indexes: "n2a_pkey" PRIMARY KEY, btree (n_id, a_id) "n2a_a_id_n_id_idx" btree (a_id, n_id) Foreign-key constraints: "n2a_n_id_fkey" FOREIGN KEY (n_id) REFERENCES n(id) mnr=> \d+ n2h Table "bug_test.n2h" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description --------+------+-----------+----------+---------+---------+--------------+------------- n_id | uuid | | not null | | plain | | h_id | uuid | | not null | | plain | | Indexes: "n2h_pkey" PRIMARY KEY, btree (n_id, h_id) "n2h_h_id_n_id_idx" btree (h_id, n_id) Foreign-key constraints: "n2h_n_id_fkey" FOREIGN KEY (n_id) REFERENCES n(id) mnr=> select 'n' as table_name, count(*) as record_count from n union all select 'n2a', count(*) from n2a union all select 'n2h', count(*) from n2h; table_name | record_count ------------+-------------- n | 366,869 n2a | 1,546,626 n2h | 98,180 (3 rows) Time: 342.223 ms mnr=> select count(*), count(distinct id) from n where exists(select 1 from n2h where n_id = n.id) and exists(select 1 from n2a where n_id = n.id and a_id = '00005831-4900-1200-0000-0000773ae45f'); count | count -------+------- 6,531 | 6,531 (1 row) Time: 426.043 ms mnr=> explain analyze verbose select count(*), count(distinct id) from n where exists(select 1 from n2h where n_id = n.id) and exists(select 1 from n2a where n_id = n.id and a_id = '00005831-4900-1200-0000-0000773ae45f'); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=10609.94..10609.95 rows=1 width=16) (actual time=300.783..300.783 rows=1 loops=1) Output: count(*), count(DISTINCT n.id) -> Nested Loop (cost=1950.10..10590.61 rows=3866 width=16) (actual time=30.074..295.967 rows=6531 loops=1) Output: n.id Inner Unique: true Join Filter: (n2h.n_id = n.id) -> Nested Loop (cost=1949.68..2044.12 rows=3866 width=32) (actual time=30.064..270.423 rows=6531 loops=1) Output: n2a.n_id, n2h.n_id Inner Unique: true -> HashAggregate (cost=1949.25..1951.25 rows=200 width=16) (actual time=29.997..46.915 rows=61325 loops=1) Output: n2h.n_id Group Key: n2h.n_id -> Seq Scan on bug_test.n2h (cost=0.00..1703.80 rows=98180 width=16) (actual time=0.023..6.345 rows=98180 loops=1) Output: n2h.n_id, n2h.h_id -> Index Only Scan using n2a_pkey on bug_test.n2a (cost=0.43..1.67 rows=39 width=16) (actual time=0.003..0.003 rows=0 loops=61325) Output: n2a.n_id, n2a.a_id Index Cond: ((n2a.n_id = n2h.n_id) AND (n2a.a_id = '00005831-4900-1200-0000-0000773ae45f'::uuid)) Heap Fetches: 6531 -> Index Only Scan using n_pkey on bug_test.n (cost=0.42..2.20 rows=1 width=16) (actual time=0.003..0.003 rows=1 loops=6531) Output: n.id Index Cond: (n.id = n2a.n_id) Heap Fetches: 6531 Planning Time: 0.601 ms Execution Time: 301.614 ms (24 rows) Time: 302.919 ms mnr=> analyze n, n2a, n2h; ANALYZE Time: 252.862 ms mnr=> select count(*), count(distinct id) from n where exists(select 1 from n2h where n_id = n.id) and exists(select 1 from n2a where n_id = n.id and a_id = '00005831-4900-1200-0000-0000773ae45f'); count | count -------+------- 8,858 | 6,531 (1 row) Time: 170.372 ms mnr=> explain analyze verbose select count(*), count(distinct id) from n where exists(select 1 from n2h where n_id = n.id) and exists(select 1 from n2a where n_id = n.id and a_id = '00005831-4900-1200-0000-0000773ae45f'); QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=21318.14..21318.15 rows=1 width=16) (actual time=148.410..148.410 rows=1 loops=1) Output: count(*), count(DISTINCT n.id) -> Gather (cost=7592.81..21291.01 rows=5426 width=16) (actual time=65.294..146.754 rows=8858 loops=1) Output: n.id Workers Planned: 2 Workers Launched: 2 -> Nested Loop (cost=6592.81..19748.41 rows=2261 width=16) (actual time=21.688..48.404 rows=2953 loops=3) Output: n.id Inner Unique: true Worker 0: actual time=1.096..4.734 rows=24 loops=1 Worker 1: actual time=0.470..6.426 rows=225 loops=1 -> Parallel Hash Join (cost=6592.38..18216.45 rows=2263 width=32) (actual time=21.544..31.739 rows=2953 loops=3) Output: n2a.n_id, n2h.n_id Hash Cond: (n2a.n_id = n2h.n_id) Worker 0: actual time=0.828..3.801 rows=24 loops=1 Worker 1: actual time=0.333..4.385 rows=225 loops=1 -> Parallel Bitmap Heap Scan on bug_test.n2a (cost=1113.32..12667.03 rows=14457 width=16) (actual time=1.187..5.693 rows=10193 loops=3) Output: n2a.n_id, n2a.a_id Recheck Cond: (n2a.a_id = '00005831-4900-1200-0000-0000773ae45f'::uuid) Heap Blocks: exact=2173 Worker 0: actual time=0.196..2.202 rows=246 loops=1 Worker 1: actual time=0.105..2.785 rows=1394 loops=1 -> Bitmap Index Scan on n2a_a_id_n_id_idx (cost=0.00..1104.65 rows=34696 width=0) (actual time=2.861..2.861 rows=30578 loops=1) Index Cond: (n2a.a_id = '00005831-4900-1200-0000-0000773ae45f'::uuid) -> Parallel Hash (cost=4967.71..4967.71 rows=40908 width=16) (actual time=19.568..19.569 rows=32727 loops=3) Output: n2h.n_id Buckets: 131072 Batches: 1 Memory Usage: 5632kB Worker 0: actual time=0.077..0.077 rows=0 loops=1 Worker 1: actual time=0.050..0.050 rows=0 loops=1 -> Parallel Index Only Scan using n2h_pkey on bug_test.n2h (cost=0.42..4967.71 rows=40908 width=16) (actual time=0.024..32.132 rows=98180 loops=1) Output: n2h.n_id Heap Fetches: 98180 -> Index Only Scan using n_pkey on bug_test.n (cost=0.42..0.68 rows=1 width=16) (actual time=0.004..0.004 rows=1 loops=8858) Output: n.id Index Cond: (n.id = n2h.n_id) Heap Fetches: 8858 Worker 0: actual time=0.035..0.035 rows=1 loops=24 Worker 1: actual time=0.007..0.007 rows=1 loops=225 Planning Time: 0.755 ms Execution Time: 155.490 ms (40 rows) Time: 156.952 ms mnr=> select count(*), count(distinct id) from n where exists (select 1 from n2a where n_id = n.id and a_id = '00005831-4900-1200-0000-0000773ae45f'); count | count --------+-------- 30,578 | 30,578 (1 row) Time: 95.252 ms mnr=> select count(*), count(distinct id) from n where exists (select 1 from n2h where n_id = n.id); count | count --------+-------- 61,325 | 61,325 (1 row) Time: 574.642 ms mnr=> select version(); version ------------------------------------------------------------ PostgreSQL 11.4, compiled by Visual C++ build 1914, 64-bit (1 row) Time: 0.305 ms mnr=>
pgsql-bugs by date: