Home > mailing lists
Re: BUG #13908: Query returns too few rows - Mailing list pgsql-bugs

From	David G. Johnston
Subject	Re: BUG #13908: Query returns too few rows
Date	February 5, 2016 04:47:37
Msg-id	CAKFQuwaj9Q+UEEtLmyA032=CMKUFAhtWU4tuQ4dY3wUJ_5JDuQ@mail.gmail.com Whole thread Raw
In response to	Re: BUG #13908: Query returns too few rows ("David G. Johnston" <david.g.johnston@gmail.com>)
Responses	Re: BUG #13908: Query returns too few rows
List	pgsql-bugs
Tree view
On Thu, Feb 4, 2016 at 8:57 PM, David G. Johnston <
david.g.johnston@gmail.com> wrote:

> On Thu, Feb 4, 2016 at 4:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
>> "David G. Johnston" <david.g.johnston@gmail.com> writes:
>> > *The second-level Hash Join combines/sees, in both cases, 7,993 and
>> > 1,275,138 records but depending on the LEFT/RIGHT order of the sub-nod=
es
>> > appears to provide a different result.*
>>
>> Hmm, that's pretty fishy; and given all the hacking that's gone on latel=
y
>> on hash join, 9.5 introducing a bug there wouldn't be all that
>> astonishing.
>> But I decline to try to find it without a self-contained test case.
>>
>>
> =E2=80=8BSo, one of the tables involved has 70M-ish rows of which only ma=
ybe
> 2M-ish of them are necessary to fulfill the query in terms of getting the
> right answer.  Realizing this I deleted the other 68M records and then
> immediately (same transaction) run the counting query and find that I sti=
ll
> get an incorrect result.  I ran a count instead of an explain analyze so
> I'm not sure exactly what it did behind the scenes.  Regardless, I got th=
e
> under-counting behavior.  I then commit the transaction, run vacuum
> analyze, and the re-run the counting query and that completes I now get t=
he
> correct count.
>
> Do you expect that some portion of the 68M records that were deleted woul=
d
> be required to diagnose the problem or can knobs be twiddled in some way =
to
> get similar behavior exhibited without them actually being present.  The
> fact that the records were deleted, but not vacuumed, and a wrong result
> was returned makes me think it should be possible but I don't know enough
> to say for sure or to make informed decisions as to which knobs to turn (=
if
> indeed such knobs are exposed since I do not have source-editing or
> debugging capabilities).
>
>
=E2=80=8BSome more observations...

On this node's detail (the rows seems to appear consistently regardless of
node ordering in the plan...)
Hash  (cost=3D1440572.59..1440572.59 rows=3D872393 width=3D24) (actual
time=3D151981.919..151981.919 rows=3D*1275138* loops=3D1)
Bad Plan: =E2=80=8B

=E2=80=8B  =E2=80=8B
Buckets: 131072 (originally 131072)  Batches: 32 (originally 16)  Memory
Usage: 3073kB
=E2=80=8BBad Plan:  =E2=80=8B  Buckets: 65536 (originally 65536)  Batches: =
32 (originally
16)  Memory Usage: 3585kB
Good Plan:  Buckets: 1024  Batches: 1  Memory Usage: 42kB
Good Plan:  Buckets: 8192  Batches: 1  Memory Usage: 439kB (see below)
9.3 Plan:     Buckets: 4096  Batches: 64  Memory Usage: 889kB

9.5 Made Up Good Plan (see below): Buckets: 131072 (originally 1024)
 Batches: 16 (originally 1)  Memory Usage: 3227kB

*The difference on this last one is that the original bucket count is
considerably smaller than the seemingly important 2^16-1 boundary even
though the final count is greater.*

I was able to force a Hash Join between the 1,275,138 node and the 7,993
node using CTEs
BEGIN;
SET LOCAL enable_mergejoin =3D off;
EXPLAIN ANALYZE
WITH first_join AS (
SELECT *
FROM rrr JOIN fff ON fff.idx =3D rrr.fff_idx
WHERE fff.file_name_model_ver =3D '100' AND
      fff.file_name_date BETWEEN '2005-03-01' AND '2005-07-30'
),
second_join AS (
SELECT * FROM uuu WHERE universe_hash =3D 5188205190738336870
),
combined AS (
SELECT * FROM first_join JOIN second_join USING (bababa)
)
SELECT count(*) FROM combined;

The result was the following explain analyze and a good count.

QUERY PLAN
Aggregate  (cost=3D320276.69..320276.70 rows=3D1 width=3D0) (actual
time=3D23126.200..23126.201 rows=3D1 loops=3D1)
  CTE first_join
    ->  Hash Join  (cost=3D820.88..228772.40 rows=3D38783 width=3D68) (actu=
al
time=3D8573.804..15931.271 rows=3D1275138 loops=3D1)
          Hash Cond: (rrr.fff_idx =3D fff.idx)
          ->  Seq Scan on rrr  (cost=3D0.00..216302.86 rows=3D3002886 width=
=3D24)
(actual time=3D7499.001..10979.666 rows=3D3007454 loops=3D1)
          ->  Hash  (cost=3D808.37..808.37 rows=3D1001 width=3D44) (actual
time=3D2.616..2.616 rows=3D964 loops=3D1)
                Buckets: 1024  Batches: 1  Memory Usage: 87kB
                ->  Bitmap Heap Scan on fff  (cost=3D35.75..808.37 rows=3D1=
001
width=3D44) (actual time=3D0.122..1.438 rows=3D964 loops=3D1)
                      Recheck Cond: ((file_name_date >=3D '2005-03-01'::dat=
e)
AND (file_name_date <=3D '2005-07-30'::date))
                      Filter: ((file_name_model_ver)::text =3D '100'::text)
                      Rows Removed by Filter: 540
                      Heap Blocks: exact=3D30
                      ->  Bitmap Index Scan on ix_fff_file_name_date
 (cost=3D0.00..35.50 rows=3D1521 width=3D0) (actual time=3D0.100..0.100 row=
s=3D1504
loops=3D1)
                            Index Cond: ((file_name_date >=3D
'2005-03-01'::date) AND (file_name_date <=3D '2005-07-30'::date))
  CTE second_join
    ->  Index Only Scan using pk_uuu on uuu  (cost=3D0.42..269.84 rows=3D80=
81
width=3D16) (actual time=3D0.019..9.372 rows=3D7993 loops=3D1)
          Index Cond: (universe_hash =3D '5188205190738336870'::bigint)
          Heap Fetches: 0
  CTE combined
    ->  Hash Join  (cost=3D262.63..55976.35 rows=3D1567027 width=3D228) (ac=
tual
time=3D8612.493..21535.488 rows=3D416075 loops=3D1)
          Hash Cond: ((first_join.bababa)::text =3D
(second_join.bababa)::text)
          ->  CTE Scan on first_join  (cost=3D0.00..775.66 rows=3D38783
width=3D220) (actual time=3D8573.810..19343.262 rows=3D1275138 loops=3D1)
          ->  Hash  (cost=3D161.62..161.62 rows=3D8081 width=3D42) (actual
time=3D37.346..37.346 rows=3D7993 loops=3D1)
               * Buckets: 8192  Batches: 1  Memory Usage: 439kB*
                ->  CTE Scan on second_join  (cost=3D0.00..161.62 rows=3D80=
81
width=3D42) (actual time=3D0.023..27.642 rows=3D7993 loops=3D1)
  ->  CTE Scan on combined  (cost=3D0.00..31340.54 rows=3D1567027 width=3D0=
)
(actual time=3D8612.498..22664.354 rows=3D416075 loops=3D1)
Planning time: 0.137 ms
Execution time: 23145.303 ms

And below is the 9.3 non-Distinct plan

QUERY PLAN
Aggregate  (cost=3D1467123.37..1467123.38 rows=3D1 width=3D0) (actual
time=3D165954.407..165954.409 rows=3D1 loops=3D1)
  ->  Hash Join  (cost=3D1456877.26..1466434.87 rows=3D275399 width=3D0) (a=
ctual
time=3D164885.552..165504.698 rows=3D416075 loops=3D1)
        Hash Cond: ((uuu.bababa)::text =3D (rrr.bababa)::text)
        ->  Index Only Scan using pk_uuu on uuu  (cost=3D0.42..260.48
rows=3D7775 width=3D8) (actual time=3D0.011..9.630 rows=3D7993 loops=3D1)
              Index Cond: (universe_hash =3D 5188205190738336870::bigint)
              Heap Fetches: 0
        ->  Hash  (cost=3D1441329.64..1441329.64 rows=3D947616 width=3D8) (=
actual
time=3D164884.922..164884.922 rows=3D1275138 loops=3D1)
              *Buckets: 4096  Batches: 64  Memory Usage: 889kB*
              ->  Hash Join  (cost=3D822.75..1441329.64 rows=3D947616 width=
=3D8)
(actual time=3D60770.116..163253.376 rows=3D1275138 loops=3D1)
                    Hash Cond: (rrr.fff_idx =3D fff.idx)
                    ->  Seq Scan on rrr  (cost=3D0.00..1164410.44
rows=3D71098744 width=3D12) (actual time=3D0.036..79440.276 rows=3D71098547=
 loops=3D1)
                    ->  Hash  (cost=3D809.84..809.84 rows=3D1033 width=3D4)
(actual time=3D2.577..2.577 rows=3D964 loops=3D1)
                          Buckets: 1024  Batches: 1  Memory Usage: 34kB
                          ->  Bitmap Heap Scan on fff  (cost=3D36.29..809.8=
4
rows=3D1033 width=3D4) (actual time=3D0.110..1.418 rows=3D964 loops=3D1)
                                Recheck Cond: ((file_name_date >=3D
'2005-03-01'::date) AND (file_name_date <=3D '2005-07-30'::date))
                                Filter: ((file_name_model_ver)::text =3D
'100'::text)
                                Rows Removed by Filter: 540
                                ->  Bitmap Index Scan on
ix_fff_file_name_date  (cost=3D0.00..36.03 rows=3D1574 width=3D0) (actual
time=3D0.098..0.098 rows=3D1504 loops=3D1)
                                      Index Cond: ((file_name_date >=3D
'2005-03-01'::date) AND (file_name_date <=3D '2005-07-30'::date))
Total runtime: 165954.463 ms

Though it doesn't seem simply a function bug bucket count since:

EXPLAIN ANALYZE
WITH first_join AS (
SELECT * FROM generate_series(1, 1000000) gs (i)
),
second_join AS (
SELECT * FROM generate_series(1, 1000000) gs (i)
),
combined AS (
SELECT * FROM first_join JOIN second_join USING (i)
)
SELECT count(*) FROM combined;

QUERY PLAN
Aggregate  (cost=3D362.50..362.51 rows=3D1 width=3D0) (actual
time=3D13842.652..13842.653 rows=3D1 loops=3D1)
  CTE first_join
    ->  Function Scan on generate_series gs  (cost=3D0.00..10.00 rows=3D100=
0
width=3D4) (actual time=3D88.464..1186.923 rows=3D1000000 loops=3D1)
  CTE second_join
    ->  Function Scan on generate_series gs_1  (cost=3D0.00..10.00 rows=3D1=
000
width=3D4) (actual time=3D88.287..1187.215 rows=3D1000000 loops=3D1)
  CTE combined
    ->  Hash Join  (cost=3D32.50..230.00 rows=3D5000 width=3D4) (actual
time=3D4668.961..10530.299 *rows=3D1000000* loops=3D1)
          Hash Cond: (first_join.i =3D second_join.i)
          ->  CTE Scan on first_join  (cost=3D0.00..20.00 rows=3D1000 width=
=3D4)
(actual time=3D88.468..3333.798 rows=3D1000000 loops=3D1)
          ->  Hash  (cost=3D20.00..20.00 rows=3D1000 width=3D4) (actual
time=3D4580.247..4580.247 rows=3D1000000 loops=3D1)
               * Buckets: 131072 (originally 1024)  Batches: 16 (originally
1)  Memory Usage: 3227kB*
                ->  CTE Scan on second_join  (cost=3D0.00..20.00 rows=3D100=
0
width=3D4) (actual time=3D88.294..3320.363 rows=3D1000000 loops=3D1)
  ->  CTE Scan on combined  (cost=3D0.00..100.00 rows=3D5000 width=3D0) (ac=
tual
time=3D4668.966..12787.652 rows=3D1000000 loops=3D1)
Planning time: 0.071 ms
Execution time: 13853.714 ms

Though maybe because the original was only 1024 while the Bad plans above
had original sizes greater than 2^16-1...

David J.
pgsql-bugs by date:
From: "David G. Johnston"
Date: 05 February 2016, 03:57:10
Subject: Re: BUG #13908: Query returns too few rows
From: "David G. Johnston"
Date: 05 February 2016, 05:17:23
Subject: Re: BUG #13908: Query returns too few rows
Re: BUG #13908: Query returns too few rows - Mailing list pgsql-bugs

Previous

Next