Re: "NOT IN" substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2 - Mailing list pgsql-performance
From | Mladen Gogala |
---|---|
Subject | Re: "NOT IN" substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2 |
Date | |
Msg-id | 4D35A30D.4030404@vmsinfo.com Whole thread Raw |
In response to | Re: "NOT IN" substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2 (Achilleas Mantzios <achill@matrix.gatewaynet.com>) |
Responses |
Re: "NOT IN" substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2
|
List | pgsql-performance |
Achilleas Mantzios wrote: > Thanx, > > Στις Monday 17 January 2011 18:52:27 ο/η Ing. Marcos Ortiz Valmaseda έγραψε: > >> Well, on the Release Notes on the PostgreSQL-8.4 Documentation, the developers recommend to use NOT EXISTS >> instead NOT IN, because the first clause has a better performance. So, you can use it on that way. >> >> > You mean this? > (from 8.4 changes) > "Create explicit concepts of semi-joins and anti-joins (Tom) > This work formalizes our previous ad-hoc treatment of IN (SELECT > ...) clauses, and extends it to EXISTS and NOT EXISTS clauses. It > should result in significantly better planning of EXISTS and NOT > EXISTS queries. In general, logically equivalent IN and EXISTS > clauses should now have similar performance, whereas previously IN > often won." > > I haven't found any other recent reference to this issue. And this is far from what you suggest. > Here the entry talks about "similar" performance. > > Also a similar issue was hot back in 7.4 days : > "IN / NOT IN subqueries are now much more efficient > In previous releases, IN/NOT IN subqueries were joined to the > upper query by sequentially scanning the subquery looking for a > match. The 7.4 code uses the same sophisticated techniques used > by ordinary joins and so is much faster. An IN will now usually > be as fast as or faster than an equivalent EXISTS subquery; this > reverses the conventional wisdom that applied to previous > releases." > > >> Other questions? >> - Do you have a partial index on marinerstates.marinerid where this condition is accomplished? >> > No, but i just tried it (on state='Active') with no impact. > >> - Do you have a index on mariner.id? >> > Yes, It is the primary key. > >> - Can you provide a explain of these queries on the PostgreSQL-9.0 machines? >> > Sure, first i'll post the table definitions and then some stats and then the epxlain analyze(s) > > mariner > ===== > id | integer | not null default nextval(('public.mariner_id_seq'::text)::regclass) > givenname | character varying(200) | > midname | character varying(100) | > surname | character varying(200) | not null > ... > Indexes: > "mariner_pkey" PRIMARY KEY, btree (id) > "mariner_smauid" UNIQUE, btree (smauid) > "mariner_username_key" UNIQUE, btree (username) > "mariner_nationalityid" btree (nationalityid) > "mariner_parentid" btree (parentid) > "mariner_surname" btree (surname) > > marinerstates > ======== > id | integer | not null default nextval(('public.marinerstates_id_seq'::text)::regclass) > marinerid | integer | not null > state | character varying(20) | not null > vslid | integer | > leave_period_days | integer | > comment | text | > starttime | timestamp with time zone | not null > endtime | timestamp with time zone | > trid | integer | > sal_bw | real | not null default 0.0 > sal_ot | real | not null default 0.0 > sal_lp | real | not null default 0.0 > sal_misc | real | not null default 0.0 > rankid | integer | > system_vslid | integer | > startport | text | > endport | text | > ..... > Indexes: > "marinerstates_pkey" PRIMARY KEY, btree (id) > "marinerstates_mariner_cur_state" UNIQUE, btree (marinerid) WHERE endtime IS NULL > "marinerstates_system_vslid" UNIQUE, btree (marinerid, system_vslid) > "marinerstates__system_vslid" btree (system_vslid) > "marinerstates_cur_mariners_states" btree (endtime) WHERE endtime IS NULL > "marinerstates_mariner_past_state" btree (marinerid, starttime, endtime) WHERE endtime IS NOT NULL > "marinerstates_marinerid" btree (marinerid) > "marinerstates_marinerid_starttime" btree (marinerid, starttime) > "marinerstates_rankid" btree (rankid) > "marinerstates_rankid_cur_mariners" btree (rankid) WHERE endtime IS NULL > "marinerstates_rankid_past_state" btree (rankid, starttime, endtime) WHERE endtime IS NOT NULL > "marinerstates_state" btree (state) > "marinerstates_state_cur_mariners" btree (state) WHERE endtime IS NULL > "marinerstates_state_past_state" btree (state, starttime, endtime) WHERE endtime IS NOT NULL > "marinerstates_vslid" btree (vslid) > "marinerstates_vslid_cur_mariners" btree (vslid) WHERE endtime IS NULL > "marinerstates_vslid_past_state" btree (vslid, starttime, endtime) WHERE endtime IS NOT NULL > > vessels > ===== > name | character varying(200) | not null > id | integer | not null default nextval(('public.vessels_id_seq'::text)::regclass) > ... > Indexes: > "vessels_pkey" PRIMARY KEY, btree (id) > "vessels_name_key" UNIQUE, btree (name) > "idx_name" btree (name) > "vessels_flag" btree (flag) > "vessels_groupno" btree (groupno) > "vessels_vslstatus_idx" btree (vslstatus) > > dynacom=# SELECT count(*) from mariner; > count > ------- > 14447 > > dynacom=# SELECT count(*) from marinerstates; > count > ------- > 51013 > > dynacom=# SELECT avg(marqry.cnt),stddev(marqry.cnt) FROM (SELECT m.id,count(ms.id) as cnt from mariner m, marinerstatesms WHERE m.id=ms.marinerid group by m.id) AS marqry; > avg | stddev > --------------------+-------------------- > 3.5665944207508914 | 4.4416879361829170 > > (vessels do not play any impact in the query, so i'll leave them out) > > Slow plan in 9.0.2 : > QUERY PLAN > ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Unique (cost=11525.09..11571.55 rows=3717 width=23) (actual time=10462.561..10462.937 rows=603 loops=1) > -> Sort (cost=11525.09..11534.38 rows=3717 width=23) (actual time=10462.560..10462.664 rows=603 loops=1) > Sort Key: m.surname, (COALESCE(m.givenname, ''::character varying)), (COALESCE(m.midname, ''::character varying)),m.id > Sort Method: quicksort Memory: 71kB > -> Hash Join (cost=8281.98..11304.67 rows=3717 width=23) (actual time=10425.261..10461.621 rows=603 loops=1) > Hash Cond: (ms.marinerid = m.id) > -> Hash Join (cost=20.12..2963.83 rows=3717 width=4) (actual time=0.228..34.993 rows=2625 loops=1) > Hash Cond: (ms.vslid = vsl.id) > -> Seq Scan on marinerstates ms (cost=0.00..2889.32 rows=4590 width=8) (actual time=0.011..33.494rows=2625 loops=1) > Filter: (((state)::text = 'Active'::text) AND ((starttime)::date <= '2007-01-11'::date) AND((COALESCE(endtime, now()))::date >= '2006-07-15'::date)) > -> Hash (cost=16.72..16.72 rows=272 width=4) (actual time=0.207..0.207 rows=272 loops=1) > Buckets: 1024 Batches: 1 Memory Usage: 10kB > -> Seq Scan on vessels vsl (cost=0.00..16.72 rows=272 width=4) (actual time=0.004..0.118 rows=272loops=1) > -> Hash (cost=8172.57..8172.57 rows=7143 width=23) (actual time=10424.994..10424.994 rows=12832 loops=1) > Buckets: 1024 Batches: 1 Memory Usage: 702kB > -> Seq Scan on mariner m (cost=7614.86..8172.57 rows=7143 width=23) (actual time=10409.498..10419.971rows=12832 loops=1) > Filter: ((NOT (hashed SubPlan 1)) AND ((marinertype)::text = 'Mariner'::text)) > SubPlan 1 > -> Unique (cost=2768.00..7614.86 rows=1 width=4) (actual time=87.495..10408.446 rows=1454loops=1) > -> Nested Loop (cost=2768.00..7614.86 rows=1 width=4) (actual time=87.493..10407.517rows=1835 loops=1) > Join Filter: (msold.marinerid = mold.id) > -> Index Scan using mariner_pkey on mariner mold (cost=0.00..1728.60 rows=14286width=4) (actual time=0.007..13.931 rows=14286 loops=1) > Filter: ((marinertype)::text = 'Mariner'::text) > -> Materialize (cost=2768.00..5671.97 rows=1 width=8) (actual time=0.003..0.330rows=1876 loops=14286) > -> Nested Loop (cost=2768.00..5671.96 rows=1 width=8) (actual time=39.723..85.401rows=1876 loops=1) > -> Hash Semi Join (cost=2768.00..5671.67 rows=1 width=12) (actualtime=39.708..81.501 rows=1876 loops=1) > Hash Cond: (msold.marinerid = msold2.marinerid) > Join Filter: ((msold2.id <> msold.id) AND (msold2.starttime< msold.starttime) AND ((msold.starttime - msold2.endtime) <= '1 year 6 mons'::interval)) > -> Seq Scan on marinerstates msold (cost=0.00..2889.32 rows=4590width=20) (actual time=0.003..33.952 rows=2625 loops=1) > Filter: (((state)::text = 'Active'::text) AND ((starttime)::date<= '2007-01-11'::date) AND ((COALESCE(endtime, now()))::date >= '2006-07-15'::date)) > -> Hash (cost=2251.66..2251.66 rows=41307 width=24) (actualtime=39.613..39.613 rows=41250 loops=1) > Buckets: 8192 Batches: 1 Memory Usage: 2246kB > -> Seq Scan on marinerstates msold2 (cost=0.00..2251.66rows=41307 width=24) (actual time=0.002..24.882 rows=41250 loops=1) > Filter: ((state)::text = 'Active'::text) > -> Index Scan using vessels_pkey on vessels vslold (cost=0.00..0.28rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1876) > Index Cond: (vslold.id = msold.vslid) > Total runtime: 10463.619 ms > (37 rows) > > Fast plan in 8.3.13 : > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Unique (cost=633677.56..633700.48 rows=1834 width=23) (actual time=543.684..551.003 rows=603 loops=1) > -> Sort (cost=633677.56..633682.14 rows=1834 width=23) (actual time=543.676..546.070 rows=603 loops=1) > Sort Key: m.surname, (COALESCE(m.givenname, ''::character varying)), (COALESCE(m.midname, ''::character varying)),m.id > Sort Method: quicksort Memory: 53kB > -> Hash Join (cost=630601.65..633578.15 rows=1834 width=23) (actual time=439.969..540.573 rows=603 loops=1) > Hash Cond: (ms.vslid = vsl.id) > -> Hash Join (cost=630580.33..633530.01 rows=2261 width=27) (actual time=437.459..532.847 rows=603 loops=1) > Hash Cond: (ms.marinerid = m.id) > -> Seq Scan on marinerstates ms (cost=0.00..2875.32 rows=4599 width=8) (actual time=0.017..80.153rows=2625 loops=1) > Filter: (((state)::text = 'Active'::text) AND ((starttime)::date <= '2007-01-11'::date) AND((COALESCE(endtime, now()))::date >= '2006-07-15'::date)) > -> Hash (cost=630491.54..630491.54 rows=7103 width=23) (actual time=437.307..437.307 rows=12832loops=1) > -> Index Scan using mariner_pkey on mariner m (cost=628776.89..630491.54 rows=7103 width=23)(actual time=311.023..380.168 rows=12832 loops=1) > Filter: ((NOT (hashed subplan)) AND ((marinertype)::text = 'Mariner'::text)) > SubPlan > -> Unique (cost=0.00..628772.30 rows=1834 width=4) (actual time=0.129..303.981 rows=1454loops=1) > -> Nested Loop (cost=0.00..628767.72 rows=1834 width=4) (actual time=0.120..289.961rows=1835 loops=1) > -> Nested Loop (cost=0.00..627027.98 rows=1865 width=4) (actual time=0.099..237.128rows=1876 loops=1) > -> Index Scan using marinerstates_marinerid on marinerstates msold (cost=0.00..626316.07 rows=2299 width=8) (actual time=0.079..186.150 rows=1876 loops=1) > Filter: (((state)::text = 'Active'::text) AND ((starttime)::date<= '2007-01-11'::date) AND ((COALESCE(endtime, now()))::date >= '2006-07-15'::date) AND (subplan)) > SubPlan > -> Bitmap Heap Scan on marinerstates msold2 (cost=4.28..12.11rows=1 width=0) (actual time=0.020..0.020 rows=1 loops=2625) > Recheck Cond: ((marinerid = $0) AND (starttime < $2)) > Filter: ((id <> $1) AND ((state)::text = 'Active'::text)AND (($2 - endtime) <= '1 year 6 mons'::interval)) > -> Bitmap Index Scan on marinerstates_marinerid_starttime (cost=0.00..4.28 rows=2 width=0) (actual time=0.009..0.009 rows=6 loops=2625) > Index Cond: ((marinerid = $0) AND (starttime <$2)) > -> Index Scan using vessels_pkey on vessels vslold (cost=0.00..0.30rows=1 width=4) (actual time=0.006..0.010 rows=1 loops=1876) > Index Cond: (vslold.id = msold.vslid) > -> Index Scan using mariner_pkey on mariner mold (cost=0.00..0.92 rows=1width=4) (actual time=0.007..0.012 rows=1 loops=1876) > Index Cond: (mold.id = msold.marinerid) > Filter: ((mold.marinertype)::text = 'Mariner'::text) > -> Hash (cost=17.81..17.81 rows=281 width=4) (actual time=2.491..2.491 rows=273 loops=1) > -> Seq Scan on vessels vsl (cost=0.00..17.81 rows=281 width=4) (actual time=0.012..1.306 rows=273loops=1) > Total runtime: 553.601 ms > (33 rows) > > Is there any other data i could post (pg_stat,...) that would help? > > thanx a lot. > > >> Regards >> >> >> Ing. Marcos Luís Ortíz Valmaseda >> Linux User # 418229 && PostgreSQL DBA >> Centro de Tecnologías Gestión de Datos (DATEC) >> http://postgresql.uci.cu >> http://www.postgresql.org >> http://it.toolbox.com/blogs/sql-apprentice >> >> > > > > -- > Achilleas Mantzios > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > Achilleas, here is the slow part from 9.02: -> Hash Semi Join (cost=2768.00..5671.67 rows=1 width=12) (actualtime=39.708..81.501 rows=1876 loops=1) Hash Cond: (msold.marinerid = msold2.marinerid) Join Filter: ((msold2.id <> msold.id) AND (msold2.starttime <msold.starttime) AND ((msold.starttime - msold2.endtime) <= '1 year 6 mons'::interval)) -> Seq Scan on marinerstates msold (cost=0.00..2889.32 rows=4590width=20) (actual time=0.003..33.952 rows=2625 loops=1) Filter: (((state)::text = 'Active'::text) AND ((starttime)::date<= '2007-01-11'::date) AND ((COALESCE(endtime, now()))::date >= '2006-07-15'::date)) -> Hash (cost=2251.66..2251.66 rows=41307 width=24) (actualtime=39.613..39.613 rows=41250 loops=1) Buckets: 8192 Batches: 1 Memory Usage: 2246kB -> Seq Scan on marinerstates msold2 (cost=0.00..2251.66rows=41307 width=24) (actual time=0.002..24.882 The same part from 8.3.3 looks like this: Seq Scan on marinerstates ms (cost=0.00..2875.32 rows=4599 width=8) (actual time=0.017..80.153 rows=2625 loops=1) Filter: (((state)::text = 'Active'::text) AND ((starttime)::date <= '2007-01-11'::date) AND ((COALESCE(endtime,now()))::date >= '2006-07-15'::date)) -> Hash (cost=630491.54..630491.54 rows=7103 width=23) (actual time=437.307..437.307 rows=12832 loops=1) -> Index Scan using mariner_pkey on mariner m (cost=628776.89..630491.54 rows=7103 width=23)(actual time=311.023..380.168 rows=12832 loops=1) Filter: ((NOT (hashed subplan)) AND ((marinertype)::text = 'Mariner'::text)) SubPlan -> Unique (cost=0.00..628772.30 rows=1834 width=4) (actual time=0.129..303.981 rows=1454loops=1) -> Nested Loop (cost=0.00..628767.72 rows=1834 width=4) (actual time=0.120..289.961rows=1835 loops=1) This leads me to the conclusion that the queries differ significantly. 8.3.3 mentions NOT hashed plan, I don't see it in 9.02 and the filtering conditions look differently. Are you sure that the plans are from the same query? -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 www.vmsinfo.com
pgsql-performance by date: