7.4 vs 7.3 ( hash join issue ) - Mailing list pgsql-performance
From | Gaetano Mendola |
---|---|
Subject | 7.4 vs 7.3 ( hash join issue ) |
Date | |
Msg-id | 4150B096.1050904@bigfoot.com Whole thread Raw |
Responses |
Re: 7.4 vs 7.3 ( hash join issue )
|
List | pgsql-performance |
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi all, I'm having performance degradation with a view upgrading from 7.3 to 7.4, the view is a not so complex, one of his field is the result from a function. If I remove the function ( or I use a void function ) the 7.4 out perform the 7.3: On 7.4 I get: xxxxx=# explain analyze select * from v_ivan_2; ~ QUERY PLAN - ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ ~ Hash Left Join (cost=7028.36..16780.89 rows=65613 width=288) (actual time=2059.923..9340.043 rows=79815 loops=1) ~ Hash Cond: ("outer".id_baa_loc = "inner".id_baa_loc) ~ -> Hash Left Join (cost=6350.62..15134.25 rows=65613 width=258) (actual time=1816.013..7245.085 rows=65609 loops=1) ~ Hash Cond: ("outer".id_localita = "inner".id_localita) ~ -> Hash Left Join (cost=6252.93..14786.74 rows=65613 width=247) (actual time=1777.072..6533.316 rows=65609 loops=1) ~ Hash Cond: ("outer".id_frazione = "inner".id_frazione) ~ -> Hash Left Join (cost=6226.61..14362.74 rows=65613 width=235) (actual time=1768.273..5837.104 rows=65609loops=1) ~ Hash Cond: ("outer".id_baa = "inner".id_baa) ~ -> Hash Left Join (cost=5092.24..12342.65 rows=65594 width=197) (actual time=1354.059..4562.398 rows=65592loops=1) ~ Hash Cond: ("outer".id_pratica = "inner".id_pratica) ~ -> Hash Left Join (cost=3597.52..10010.84 rows=65594 width=173) (actual time=785.775..3278.372rows=65592 loops=1) ~ Hash Cond: ("outer".id_pratica = "inner".id_pratica) ~ -> Hash Join (cost=1044.77..6605.97 rows=65594 width=149) (actual time=274.316..2070.788rows=65592 loops=1) ~ Hash Cond: ("outer".id_stato_pratica = "inner".id_stato_pratica) ~ -> Hash Join (cost=1043.72..5850.59 rows=65593 width=141) (actual time=273.478..1421.274rows=65592 loops=1) ~ Hash Cond: ("outer".id_pratica = "inner".id_pratica) ~ -> Seq Scan on t_pratica p (cost=0.00..3854.27 rows=65927 width=137) (actualtime=7.275..533.281 rows=65927 loops=1) ~ -> Hash (cost=1010.92..1010.92 rows=65592 width=8) (actual time=265.615..265.615rows=0 loops=1) ~ -> Seq Scan on t_baa_pratica bp (cost=0.00..1010.92 rows=65592 width=8)(actual time=0.209..164.761 rows=65592 loops=1) ~ -> Hash (cost=1.05..1.05 rows=5 width=22) (actual time=0.254..0.254 rows=0 loops=1) ~ -> Seq Scan on lookup_stato_pratica s (cost=0.00..1.05 rows=5 width=22) (actualtime=0.190..0.210 rows=5 loops=1) ~ -> Hash (cost=2519.82..2519.82 rows=65865 width=28) (actual time=511.104..511.104 rows=0loops=1) ~ -> Seq Scan on t_persona (cost=0.00..2519.82 rows=65865 width=28) (actual time=0.068..381.586rows=65864 loops=1) ~ Filter: (is_rich = true) ~ -> Hash (cost=1462.53..1462.53 rows=64356 width=28) (actual time=567.919..567.919 rows=0 loops=1) ~ -> Index Scan using idx_t_persona_is_inte on t_persona (cost=0.00..1462.53 rows=64356width=28) (actual time=12.953..432.697 rows=64356 loops=1) ~ Index Cond: (is_inte = true) ~ -> Hash (cost=1113.65..1113.65 rows=41444 width=46) (actual time=413.782..413.782 rows=0 loops=1) ~ -> Hash Join (cost=4.33..1113.65 rows=41444 width=46) (actual time=2.687..333.746 rows=41444loops=1) ~ Hash Cond: ("outer".id_comune = "inner".id_comune) ~ -> Seq Scan on t_baa_loc bl (cost=0.00..653.44 rows=41444 width=20) (actual time=0.422..94.803rows=41444 loops=1) ~ -> Hash (cost=4.22..4.22 rows=222 width=34) (actual time=1.735..1.735 rows=0 loops=1) ~ -> Seq Scan on t_comune co (cost=0.00..4.22 rows=222 width=34) (actual time=0.521..1.277rows=222 loops=1) ~ -> Hash (cost=25.59..25.59 rows=1459 width=20) (actual time=8.343..8.343 rows=0 loops=1) ~ -> Seq Scan on t_frazione f (cost=0.00..25.59 rows=1459 width=20) (actual time=0.554..5.603 rows=1459loops=1) ~ -> Hash (cost=94.94..94.94 rows=5494 width=19) (actual time=38.504..38.504 rows=0 loops=1) ~ -> Seq Scan on t_localita l (cost=0.00..94.94 rows=5494 width=19) (actual time=8.499..28.216 rows=5494loops=1) ~ -> Hash (cost=660.61..660.61 rows=34261 width=38) (actual time=198.663..198.663 rows=0 loops=1) ~ -> Seq Scan on t_affaccio af (cost=0.00..660.61 rows=34261 width=38) (actual time=5.875..133.336 rows=34261 loops=1) ~ Total runtime: 9445.263 ms (40 rows) On 7.3 I get: xxxxx=# explain analyze select * from v_ivan_2; ~ QUERY PLAN - ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ~ Hash Join (cost=5597.02..15593.91 rows=65610 width=354) (actual time=2169.37..13102.64 rows=79815 loops=1) ~ Hash Cond: ("outer".id_baa_loc = "inner".id_baa_loc) ~ -> Hash Join (cost=4919.28..13953.00 rows=65610 width=316) (actual time=1966.38..10568.69 rows=65609 loops=1) ~ Hash Cond: ("outer".id_localita = "inner".id_localita) ~ -> Hash Join (cost=4821.59..13596.30 rows=65610 width=297) (actual time=1934.29..9151.45 rows=65609 loops=1) ~ Hash Cond: ("outer".id_frazione = "inner".id_frazione) ~ -> Hash Join (cost=4795.27..13157.36 rows=65610 width=277) (actual time=1925.29..7795.71 rows=65609 loops=1) ~ Hash Cond: ("outer".id_baa = "inner".id_baa) ~ -> Hash Join (cost=3640.17..11149.38 rows=65592 width=223) (actual time=1375.66..5870.74 rows=65592loops=1) ~ Hash Cond: ("outer".id_pratica = "inner".id_pratica) ~ -> Hash Join (cost=3597.53..10237.66 rows=65592 width=195) (actual time=835.95..4332.46 rows=65592loops=1) ~ Hash Cond: ("outer".id_pratica = "inner".id_pratica) ~ -> Hash Join (cost=1044.78..6800.07 rows=65592 width=167) (actual time=307.55..2903.04rows=65592 loops=1) ~ Hash Cond: ("outer".id_pratica = "inner".id_pratica) ~ -> Merge Join (cost=1.06..4770.96 rows=65927 width=159) (actual time=1.41..1898.12rows=65927 loops=1) ~ Merge Cond: ("outer".id_stato_pratica = "inner".id_stato_pratica) ~ -> Index Scan using idx_t_pratica on t_pratica p (cost=0.00..4044.70 rows=65927width=137) (actual time=0.58..894.95 rows=65927 loops=1) ~ -> Sort (cost=1.06..1.06 rows=5 width=22) (actual time=0.78..58.49 rows=63528loops=1) ~ Sort Key: s.id_stato_pratica ~ -> Seq Scan on lookup_stato_pratica s (cost=0.00..1.05 rows=5 width=22)(actual time=0.11..0.13 rows=5 loops=1) ~ -> Hash (cost=1010.92..1010.92 rows=65592 width=8) (actual time=305.40..305.40rows=0 loops=1) ~ -> Seq Scan on t_baa_pratica bp (cost=0.00..1010.92 rows=65592 width=8) (actualtime=0.23..192.88 rows=65592 loops=1) ~ -> Hash (cost=2519.82..2519.82 rows=65864 width=28) (actual time=527.88..527.88 rows=0loops=1) ~ -> Seq Scan on t_persona (cost=0.00..2519.82 rows=65864 width=28) (actual time=0.07..394.51rows=65864 loops=1) ~ Filter: (is_rich = true) ~ -> Hash (cost=10.46..10.46 rows=64356 width=28) (actual time=539.27..539.27 rows=0 loops=1) ~ -> Index Scan using idx_t_persona_is_inte on t_persona (cost=0.00..10.46 rows=64356 width=28)(actual time=0.61..403.48 rows=64356 loops=1) ~ Index Cond: (is_inte = true) ~ -> Hash (cost=1134.38..1134.38 rows=41444 width=54) (actual time=549.25..549.25 rows=0 loops=1) ~ -> Hash Join (cost=4.33..1134.38 rows=41444 width=54) (actual time=2.19..470.20 rows=41444loops=1) ~ Hash Cond: ("outer".id_comune = "inner".id_comune) ~ -> Seq Scan on t_baa_loc bl (cost=0.00..653.44 rows=41444 width=20) (actual time=0.15..179.24rows=41444 loops=1) ~ -> Hash (cost=4.22..4.22 rows=222 width=34) (actual time=1.55..1.55 rows=0 loops=1) ~ -> Seq Scan on t_comune co (cost=0.00..4.22 rows=222 width=34) (actual time=0.22..1.08rows=222 loops=1) ~ -> Hash (cost=25.59..25.59 rows=1459 width=20) (actual time=8.37..8.37 rows=0 loops=1) ~ -> Seq Scan on t_frazione f (cost=0.00..25.59 rows=1459 width=20) (actual time=0.22..5.46 rows=1459loops=1) ~ -> Hash (cost=94.94..94.94 rows=5494 width=19) (actual time=31.46..31.46 rows=0 loops=1) ~ -> Seq Scan on t_localita l (cost=0.00..94.94 rows=5494 width=19) (actual time=0.22..20.41 rows=5494 loops=1) ~ -> Hash (cost=660.61..660.61 rows=34261 width=38) (actual time=199.96..199.96 rows=0 loops=1) ~ -> Seq Scan on t_affaccio af (cost=0.00..660.61 rows=34261 width=38) (actual time=0.21..130.67 rows=34261 loops=1) ~ Total runtime: 13190.70 msec (41 rows) As you can see the 7.3 do a index scan on the table t_pratica and the 7.4 perform a sequential scan, the plans however are very close to each other. So I identify the performance issue on the function call, indeed: 7.4: xxxxx=# explain analyze select sp_foo(id_pratica) from t_pratica; ~ QUERY PLAN - ------------------------------------------------------------------------------------------------------------------ ~ Seq Scan on t_pratica (cost=0.00..3887.23 rows=65927 width=4) (actual time=4.013..45240.015 rows=65927 loops=1) ~ Total runtime: 45499.123 ms (2 rows) 7.3: xxxxx=# explain analyze select sp_foo(id_pratica) from t_pratica; ~ QUERY PLAN - ---------------------------------------------------------------------------------------------------------------- ~ Seq Scan on t_pratica (cost=0.00..3854.27 rows=65927 width=4) (actual time=0.58..18446.99 rows=65927 loops=1) ~ Total runtime: 18534.41 msec (2 rows) This is the sp_foo: CREATE FUNCTION sp_foo (integer) RETURNS text ~ AS ' DECLARE ~ a_id_pratica ALIAS FOR $1; ~ my_parere TEXT; BEGIN ~ a_id_pratica := $1; ~ SELECT INTO my_parere le.nome ~ FROM t_evento e, ~ lookup_tipo_evento le ~ WHERE e.id_tipo_evento = le.id_tipo_evento AND ~ e.id_pratica = a_id_pratica AND ~ e.id_tipo_evento in (5,6,7,8 ) ~ ORDER by e.id_evento desc ~ LIMIT 1; ~ RETURN my_parere; END; ' LANGUAGE plpgsql; Preparing a statement this is the plan used by 7.4: xxxxx=# explain analyze execute foo_body( 5 ); ~ QUERY PLAN - -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ~ Limit (cost=10.30..10.30 rows=1 width=24) (actual time=0.538..0.538 rows=0 loops=1) ~ -> Sort (cost=10.30..10.30 rows=1 width=24) (actual time=0.534..0.534 rows=0 loops=1) ~ Sort Key: e.id_evento ~ -> Hash Join (cost=9.11..10.30 rows=1 width=24) (actual time=0.512..0.512 rows=0 loops=1) ~ Hash Cond: ("outer".id_tipo_evento = "inner".id_tipo_evento) ~ -> Seq Scan on lookup_tipo_evento le (cost=0.00..1.16 rows=16 width=32) (actual time=0.010..0.041 rows=16loops=1) ~ -> Hash (cost=9.11..9.11 rows=1 width=16) (actual time=0.144..0.144 rows=0 loops=1) ~ -> Index Scan using t_evento_id_pratica_key on t_evento e (cost=0.00..9.11 rows=1 width=16) (actualtime=0.140..0.140 rows=0 loops=1) ~ Index Cond: (id_pratica = $1) ~ Filter: (((id_tipo_evento)::text = '5'::text) OR ((id_tipo_evento)::text = '6'::text) OR ((id_tipo_evento)::text= '7'::text) OR ((id_tipo_evento)::text = '8'::text)) ~ Total runtime: 0.824 ms (11 rows) The table t_pratica have 65927 rows so 0.824 ms * 65927 is almost the total time execution for each t_pratica row ~ 45000 ms Unfortunately I can not see the plan used by the 7.3 engine due the lack of explain execute, however I did an explain analyze on the select: xxxxx=# explain analyze SELECT le.nome xxxxx-# FROM t_evento e,lookup_tipo_evento le xxxxx-# WHERE e.id_tipo_evento = le.id_tipo_evento xxxxx-# AND e.id_pratica = 5 xxxxx-# AND e.id_tipo_evento in (5,6,7,8 ) xxxxx-# ORDER by e.id_evento desc xxxxx-# LIMIT 1; ~ QUERY PLAN - -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ~ Limit (cost=10.27..10.27 rows=1 width=48) (actual time=0.19..0.19 rows=0 loops=1) ~ -> Sort (cost=10.27..10.27 rows=1 width=48) (actual time=0.18..0.18 rows=0 loops=1) ~ Sort Key: e.id_evento ~ -> Merge Join (cost=10.24..10.27 rows=1 width=48) (actual time=0.09..0.09 rows=0 loops=1) ~ Merge Cond: ("outer".id_tipo_evento = "inner".id_tipo_evento) ~ -> Sort (cost=9.02..9.02 rows=1 width=16) (actual time=0.09..0.09 rows=0 loops=1) ~ Sort Key: e.id_tipo_evento ~ -> Index Scan using t_evento_id_pratica_key on t_evento e (cost=0.00..9.02 rows=1 width=16) (actualtime=0.06..0.06 rows=0 loops=1) ~ Index Cond: (id_pratica = 5) ~ Filter: (((id_tipo_evento)::text = '5'::text) OR ((id_tipo_evento)::text = '6'::text) OR ((id_tipo_evento)::text= '7'::text) OR ((id_tipo_evento)::text = '8'::text)) ~ -> Sort (cost=1.22..1.23 rows=16 width=32) (never executed) ~ Sort Key: le.id_tipo_evento ~ -> Seq Scan on lookup_tipo_evento le (cost=0.00..1.16 rows=16 width=32) (never executed) ~ Total runtime: 0.31 msec (14 rows) Disabling the hashjoin on the 7.4 I got best performance that 7.3: xxxxx=# set enable_hashjoin = off; SET xxxxx=# explain analyze select sp_get_ultimo_parere(id_pratica) from t_pratica; ~ QUERY PLAN - ------------------------------------------------------------------------------------------------------------------- ~ Seq Scan on t_pratica (cost=0.00..3887.23 rows=65927 width=4) (actual time=12.384..12396.136 rows=65927 loops=1) ~ Total runtime: 12485.548 ms (2 rows) Now my question is why the 7.4 choose the hash join ? :-( I can provide further details if you ask BTW with the hash_join = off the 7.4 choose the same 7.3 plan for this function body. Of course both engines are running on the same machine with the same settings. Regards Gaetano Mendola -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (MingW32) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFBULCU7UpzwH2SGd4RAt2ZAKC9FjAKiljRqgaZSZa+p/7N65Cl7ACePWBV TaR2VH1kDSBS7b+kNK4deFo= =X+th -----END PGP SIGNATURE-----
pgsql-performance by date: