Re: BUG #13908: Query returns too few rows - Mailing list pgsql-bugs
From | David G. Johnston |
---|---|
Subject | Re: BUG #13908: Query returns too few rows |
Date | |
Msg-id | CAKFQuwaj9Q+UEEtLmyA032=CMKUFAhtWU4tuQ4dY3wUJ_5JDuQ@mail.gmail.com Whole thread Raw |
In response to | Re: BUG #13908: Query returns too few rows ("David G. Johnston" <david.g.johnston@gmail.com>) |
Responses |
Re: BUG #13908: Query returns too few rows
|
List | pgsql-bugs |
On Thu, Feb 4, 2016 at 8:57 PM, David G. Johnston < david.g.johnston@gmail.com> wrote: > On Thu, Feb 4, 2016 at 4:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> "David G. Johnston" <david.g.johnston@gmail.com> writes: >> > *The second-level Hash Join combines/sees, in both cases, 7,993 and >> > 1,275,138 records but depending on the LEFT/RIGHT order of the sub-nod= es >> > appears to provide a different result.* >> >> Hmm, that's pretty fishy; and given all the hacking that's gone on latel= y >> on hash join, 9.5 introducing a bug there wouldn't be all that >> astonishing. >> But I decline to try to find it without a self-contained test case. >> >> > =E2=80=8BSo, one of the tables involved has 70M-ish rows of which only ma= ybe > 2M-ish of them are necessary to fulfill the query in terms of getting the > right answer. Realizing this I deleted the other 68M records and then > immediately (same transaction) run the counting query and find that I sti= ll > get an incorrect result. I ran a count instead of an explain analyze so > I'm not sure exactly what it did behind the scenes. Regardless, I got th= e > under-counting behavior. I then commit the transaction, run vacuum > analyze, and the re-run the counting query and that completes I now get t= he > correct count. > > Do you expect that some portion of the 68M records that were deleted woul= d > be required to diagnose the problem or can knobs be twiddled in some way = to > get similar behavior exhibited without them actually being present. The > fact that the records were deleted, but not vacuumed, and a wrong result > was returned makes me think it should be possible but I don't know enough > to say for sure or to make informed decisions as to which knobs to turn (= if > indeed such knobs are exposed since I do not have source-editing or > debugging capabilities). > > =E2=80=8BSome more observations... On this node's detail (the rows seems to appear consistently regardless of node ordering in the plan...) Hash (cost=3D1440572.59..1440572.59 rows=3D872393 width=3D24) (actual time=3D151981.919..151981.919 rows=3D*1275138* loops=3D1) Bad Plan: =E2=80=8B =E2=80=8B =E2=80=8B Buckets: 131072 (originally 131072) Batches: 32 (originally 16) Memory Usage: 3073kB =E2=80=8BBad Plan: =E2=80=8B Buckets: 65536 (originally 65536) Batches: = 32 (originally 16) Memory Usage: 3585kB Good Plan: Buckets: 1024 Batches: 1 Memory Usage: 42kB Good Plan: Buckets: 8192 Batches: 1 Memory Usage: 439kB (see below) 9.3 Plan: Buckets: 4096 Batches: 64 Memory Usage: 889kB 9.5 Made Up Good Plan (see below): Buckets: 131072 (originally 1024) Batches: 16 (originally 1) Memory Usage: 3227kB *The difference on this last one is that the original bucket count is considerably smaller than the seemingly important 2^16-1 boundary even though the final count is greater.* I was able to force a Hash Join between the 1,275,138 node and the 7,993 node using CTEs BEGIN; SET LOCAL enable_mergejoin =3D off; EXPLAIN ANALYZE WITH first_join AS ( SELECT * FROM rrr JOIN fff ON fff.idx =3D rrr.fff_idx WHERE fff.file_name_model_ver =3D '100' AND fff.file_name_date BETWEEN '2005-03-01' AND '2005-07-30' ), second_join AS ( SELECT * FROM uuu WHERE universe_hash =3D 5188205190738336870 ), combined AS ( SELECT * FROM first_join JOIN second_join USING (bababa) ) SELECT count(*) FROM combined; The result was the following explain analyze and a good count. QUERY PLAN Aggregate (cost=3D320276.69..320276.70 rows=3D1 width=3D0) (actual time=3D23126.200..23126.201 rows=3D1 loops=3D1) CTE first_join -> Hash Join (cost=3D820.88..228772.40 rows=3D38783 width=3D68) (actu= al time=3D8573.804..15931.271 rows=3D1275138 loops=3D1) Hash Cond: (rrr.fff_idx =3D fff.idx) -> Seq Scan on rrr (cost=3D0.00..216302.86 rows=3D3002886 width= =3D24) (actual time=3D7499.001..10979.666 rows=3D3007454 loops=3D1) -> Hash (cost=3D808.37..808.37 rows=3D1001 width=3D44) (actual time=3D2.616..2.616 rows=3D964 loops=3D1) Buckets: 1024 Batches: 1 Memory Usage: 87kB -> Bitmap Heap Scan on fff (cost=3D35.75..808.37 rows=3D1= 001 width=3D44) (actual time=3D0.122..1.438 rows=3D964 loops=3D1) Recheck Cond: ((file_name_date >=3D '2005-03-01'::dat= e) AND (file_name_date <=3D '2005-07-30'::date)) Filter: ((file_name_model_ver)::text =3D '100'::text) Rows Removed by Filter: 540 Heap Blocks: exact=3D30 -> Bitmap Index Scan on ix_fff_file_name_date (cost=3D0.00..35.50 rows=3D1521 width=3D0) (actual time=3D0.100..0.100 row= s=3D1504 loops=3D1) Index Cond: ((file_name_date >=3D '2005-03-01'::date) AND (file_name_date <=3D '2005-07-30'::date)) CTE second_join -> Index Only Scan using pk_uuu on uuu (cost=3D0.42..269.84 rows=3D80= 81 width=3D16) (actual time=3D0.019..9.372 rows=3D7993 loops=3D1) Index Cond: (universe_hash =3D '5188205190738336870'::bigint) Heap Fetches: 0 CTE combined -> Hash Join (cost=3D262.63..55976.35 rows=3D1567027 width=3D228) (ac= tual time=3D8612.493..21535.488 rows=3D416075 loops=3D1) Hash Cond: ((first_join.bababa)::text =3D (second_join.bababa)::text) -> CTE Scan on first_join (cost=3D0.00..775.66 rows=3D38783 width=3D220) (actual time=3D8573.810..19343.262 rows=3D1275138 loops=3D1) -> Hash (cost=3D161.62..161.62 rows=3D8081 width=3D42) (actual time=3D37.346..37.346 rows=3D7993 loops=3D1) * Buckets: 8192 Batches: 1 Memory Usage: 439kB* -> CTE Scan on second_join (cost=3D0.00..161.62 rows=3D80= 81 width=3D42) (actual time=3D0.023..27.642 rows=3D7993 loops=3D1) -> CTE Scan on combined (cost=3D0.00..31340.54 rows=3D1567027 width=3D0= ) (actual time=3D8612.498..22664.354 rows=3D416075 loops=3D1) Planning time: 0.137 ms Execution time: 23145.303 ms And below is the 9.3 non-Distinct plan QUERY PLAN Aggregate (cost=3D1467123.37..1467123.38 rows=3D1 width=3D0) (actual time=3D165954.407..165954.409 rows=3D1 loops=3D1) -> Hash Join (cost=3D1456877.26..1466434.87 rows=3D275399 width=3D0) (a= ctual time=3D164885.552..165504.698 rows=3D416075 loops=3D1) Hash Cond: ((uuu.bababa)::text =3D (rrr.bababa)::text) -> Index Only Scan using pk_uuu on uuu (cost=3D0.42..260.48 rows=3D7775 width=3D8) (actual time=3D0.011..9.630 rows=3D7993 loops=3D1) Index Cond: (universe_hash =3D 5188205190738336870::bigint) Heap Fetches: 0 -> Hash (cost=3D1441329.64..1441329.64 rows=3D947616 width=3D8) (= actual time=3D164884.922..164884.922 rows=3D1275138 loops=3D1) *Buckets: 4096 Batches: 64 Memory Usage: 889kB* -> Hash Join (cost=3D822.75..1441329.64 rows=3D947616 width= =3D8) (actual time=3D60770.116..163253.376 rows=3D1275138 loops=3D1) Hash Cond: (rrr.fff_idx =3D fff.idx) -> Seq Scan on rrr (cost=3D0.00..1164410.44 rows=3D71098744 width=3D12) (actual time=3D0.036..79440.276 rows=3D71098547= loops=3D1) -> Hash (cost=3D809.84..809.84 rows=3D1033 width=3D4) (actual time=3D2.577..2.577 rows=3D964 loops=3D1) Buckets: 1024 Batches: 1 Memory Usage: 34kB -> Bitmap Heap Scan on fff (cost=3D36.29..809.8= 4 rows=3D1033 width=3D4) (actual time=3D0.110..1.418 rows=3D964 loops=3D1) Recheck Cond: ((file_name_date >=3D '2005-03-01'::date) AND (file_name_date <=3D '2005-07-30'::date)) Filter: ((file_name_model_ver)::text =3D '100'::text) Rows Removed by Filter: 540 -> Bitmap Index Scan on ix_fff_file_name_date (cost=3D0.00..36.03 rows=3D1574 width=3D0) (actual time=3D0.098..0.098 rows=3D1504 loops=3D1) Index Cond: ((file_name_date >=3D '2005-03-01'::date) AND (file_name_date <=3D '2005-07-30'::date)) Total runtime: 165954.463 ms Though it doesn't seem simply a function bug bucket count since: EXPLAIN ANALYZE WITH first_join AS ( SELECT * FROM generate_series(1, 1000000) gs (i) ), second_join AS ( SELECT * FROM generate_series(1, 1000000) gs (i) ), combined AS ( SELECT * FROM first_join JOIN second_join USING (i) ) SELECT count(*) FROM combined; QUERY PLAN Aggregate (cost=3D362.50..362.51 rows=3D1 width=3D0) (actual time=3D13842.652..13842.653 rows=3D1 loops=3D1) CTE first_join -> Function Scan on generate_series gs (cost=3D0.00..10.00 rows=3D100= 0 width=3D4) (actual time=3D88.464..1186.923 rows=3D1000000 loops=3D1) CTE second_join -> Function Scan on generate_series gs_1 (cost=3D0.00..10.00 rows=3D1= 000 width=3D4) (actual time=3D88.287..1187.215 rows=3D1000000 loops=3D1) CTE combined -> Hash Join (cost=3D32.50..230.00 rows=3D5000 width=3D4) (actual time=3D4668.961..10530.299 *rows=3D1000000* loops=3D1) Hash Cond: (first_join.i =3D second_join.i) -> CTE Scan on first_join (cost=3D0.00..20.00 rows=3D1000 width= =3D4) (actual time=3D88.468..3333.798 rows=3D1000000 loops=3D1) -> Hash (cost=3D20.00..20.00 rows=3D1000 width=3D4) (actual time=3D4580.247..4580.247 rows=3D1000000 loops=3D1) * Buckets: 131072 (originally 1024) Batches: 16 (originally 1) Memory Usage: 3227kB* -> CTE Scan on second_join (cost=3D0.00..20.00 rows=3D100= 0 width=3D4) (actual time=3D88.294..3320.363 rows=3D1000000 loops=3D1) -> CTE Scan on combined (cost=3D0.00..100.00 rows=3D5000 width=3D0) (ac= tual time=3D4668.966..12787.652 rows=3D1000000 loops=3D1) Planning time: 0.071 ms Execution time: 13853.714 ms Though maybe because the original was only 1024 while the Bad plans above had original sizes greater than 2^16-1... David J.
pgsql-bugs by date: