Home > mailing lists

Re: [BUGS] BUG #14729: Between operator is slow when same value usedfor low and high margin - Mailing list pgsql-bugs

From	Pavel Tavoda
Subject	Re: [BUGS] BUG #14729: Between operator is slow when same value usedfor low and high margin
Date	July 4, 2017 13:59:46
Msg-id	CAMRASb+dXq9+sw3dVLjSp=080OC0v3pS0mMO+ZEoMqc5W_OfGA@mail.gmail.com Whole thread Raw
In response to	Re: [BUGS] BUG #14729: Between operator is slow when same value used for low and high margin (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-bugs

Tree view

Hello, after 'VACUUM FULL VERBOSE ANALYZE' of both tables still very strange results:

VACUUM FULL VERBOSE ANALYZE contractportfolio;

VACUUM FULL VERBOSE ANALYZE contract;

EXPLAIN ANALYZE SELECT count(*) FROM contractportfolio cp JOIN contract co ON cp.contract = co.id WHERE validfor = '2017-05-31';

Aggregate (cost=87255.63..87255.64 rows=1 width=0) (actual time=488.634..488.635 rows=1 loops=1)

-> Hash Join (cost=424.96..87208.66 rows=18789 width=0) (actual time=6.160..482.780 rows=20426 loops=1)

Hash Cond: (cp.contract = co.id)

-> Seq Scan on contractportfolio cp (cost=0.00..86384.44 rows=18789 width=8) (actual time=0.025..456.431 rows=20426 loops=1)

Filter: (validfor = '2017-05-31'::date)

Rows Removed by Filter: 2966971

-> Hash (cost=334.98..334.98 rows=7198 width=8) (actual time=6.125..6.125 rows=7198 loops=1)

Buckets: 1024 Batches: 1 Memory Usage: 197kB

-> Seq Scan on contract co (cost=0.00..334.98 rows=7198 width=8) (actual time=0.003..2.931 rows=7198 loops=1)

Planning time: 0.329 ms

Execution time: 488.674 ms

EXPLAIN ANALYZE SELECT count(*) FROM contractportfolio cp JOIN contract co ON cp.contract = co.id WHERE validfor between '2017-05-30' AND '2017-05-31';

Aggregate (cost=94304.35..94304.36 rows=1 width=0) (actual time=528.667..528.667 rows=1 loops=1)

-> Hash Join (cost=424.96..94301.58 rows=1106 width=0) (actual time=6.198..517.042 rows=40844 loops=1)

Hash Cond: (cp.contract = co.id)

-> Seq Scan on contractportfolio cp (cost=0.00..93853.12 rows=1106 width=8) (actual time=0.022..474.856 rows=40844 loops=1)

Filter: ((validfor >= '2017-05-30'::date) AND (validfor <= '2017-05-31'::date))

Rows Removed by Filter: 2946553

-> Hash (cost=334.98..334.98 rows=7198 width=8) (actual time=6.166..6.166 rows=7198 loops=1)

Buckets: 1024 Batches: 1 Memory Usage: 197kB

-> Seq Scan on contract co (cost=0.00..334.98 rows=7198 width=8) (actual time=0.004..2.960 rows=7198 loops=1)

Planning time: 0.340 ms

Execution time: 528.708 ms

EXPLAIN ANALYZE SELECT count(*) FROM contractportfolio cp JOIN contract co ON cp.contract = co.id WHERE validfor between '2017-05-31' AND '2017-05-31'

Aggregate (cost=94278.08..94278.09 rows=1 width=0) (actual time=93832.610..93832.610 rows=1 loops=1)

-> Nested Loop (cost=0.00..94278.08 rows=1 width=0) (actual time=2.042..93822.069 rows=20426 loops=1)

Join Filter: (cp.contract = co.id)

Rows Removed by Join Filter: 147005922

-> Seq Scan on contractportfolio cp (cost=0.00..93853.12 rows=1 width=8) (actual time=0.023..487.343 rows=20426 loops=1)

Filter: ((validfor >= '2017-05-31'::date) AND (validfor <= '2017-05-31'::date))

Rows Removed by Filter: 2966971

-> Seq Scan on contract co (cost=0.00..334.98 rows=7198 width=8) (actual time=0.001..2.165 rows=7198 loops=20426)

Planning time: 0.333 ms

Execution time: 93832.656 ms

Same problem... here is something wrong with optimization.

The problem is shown when both dates are same it doesn't matter which date. Same result for 2017-05-12.

Regards

/Pavel

On Jul 4, 2017 07:15, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

Pavel Tavoda <pavel.tavoda@gmail.com> writes:
> -> Seq Scan on contractportfolio cp (cost=0.00..109171.90 rows=1 width=8) (actual time=0.009..486.918 rows=20426 loops=1)
> Filter: ((validfor >= '2017-05-31'::date) AND (validfor <= '2017-05-31'::date))

> STUNNING!!!!

Yup, it's certainly that factor-of-20K rowcount misestimate that is
killing you here. Given that this estimate isn't too bad:

> -> Seq Scan on contractportfolio cp (cost=0.00..101496.91 rows=20197 width=8) (actual time=0.009..463.063 rows=20426 loops=1)
> Filter: (validfor = '2017-05-31'::date)

I do not think your problem is one of out-of-date statistics. Rather,
the issue is just that we're bad at narrow range estimates. I did
some work on that today[1] but it won't show up in a released PG version
before next year. In the meantime, I believe that the existing code would
arrive at a plausible answer if the value being checked were present in
the column's pg_stats.most_common_vals list. Maybe you could fix this
by increasing the statistics target for the column or the whole table
(and re-analyzing it, of course).

regards, tom lane

[1] https://www.postgresql.org/message-id/12232.1499140410%40sss.pgh.pa.us

pgsql-bugs by date:

From: Tom Lane
Date: 04 July 2017, 08:15:09
Subject: Re: [BUGS] BUG #14729: Between operator is slow when same value used for low and high margin

From: mtv.spec@gmail.com
Date: 04 July 2017, 14:19:14
Subject: [BUGS] BUG #14733: unexpected query result

Re: [BUGS] BUG #14729: Between operator is slow when same value usedfor low and high margin - Mailing list pgsql-bugs

Previous

Next