Re: BUG #16241: Degraded hash join performance - Mailing list pgsql-bugs
From | Thomas Butz |
---|---|
Subject | Re: BUG #16241: Degraded hash join performance |
Date | |
Msg-id | 87490169.275301.1580827750088.JavaMail.zimbra@optitool.de Whole thread Raw |
In response to | Re: BUG #16241: Degraded hash join performance (Andres Freund <andres@anarazel.de>) |
Responses |
Re: BUG #16241: Degraded hash join performance
|
List | pgsql-bugs |
I was able to reduce it to a minimal example: SELECT localized_streetname FROM planet_osm_line WHERE "way" && ST_SetSRID('BOX3D(1222380.956336539 6339381.37785938,1233387.888409604 6350388.309932444)'::box3d, 3857) Your diagnosis with osml10n_get_streetname_from_tags was spot on. I'm going to report it to the author of the osml10n extension. Thank you for your help! PS: I've created the execution plans using EXPLAIN (ANALYZE, VERBOSE, COSTS, BUFFERS, TIMING, SUMMARY, FORMAT json) but iwasn't aware that i might have to check the "Output" section of a node to find potential performance problems. How do youfind such slow outputs without using perf? > Von: "Andres Freund" <andres@anarazel.de> > An: "Thomas Butz" <tbutz@optitool.de>, "Tom Lane" <tgl@sss.pgh.pa.us> > CC: "pgsql-bugs" <pgsql-bugs@lists.postgresql.org> > Gesendet: Dienstag, 4. Februar 2020 14:44:08 > Betreff: Re: BUG #16241: Degraded hash join performance > Hi, > > > On 2020-02-04 10:02:45 +0100, Thomas Butz wrote: >> I've executed the query twice and attached the generated reports. > > Interesting! The no-children one clearly shows that a lot of the the > time is spent evaluating regular expressions (there's other regex > functions in the profile too): > > 23.36% postgres postgres [.] subcolor > > and that this is inside a stack of plpgsql functions: > > | | |--18.95%--exec_stmt_block > | | | exec_stmt > | | | plpgsql_exec_function > | | | plpgsql_call_handler > | | | ExecInterpExpr > | | | ExecEvalExpr (inlined) > | | | exec_eval_simple_expr > | | | (inlined) > [...] > | | | plpgsql_exec_function > | | | plpgsql_call_handler > | | | ExecInterpExpr > | | | ExecEvalExpr (inlined) > | | | exec_eval_simple_expr (inlined) > [...] > | | | plpgsql_exec_function > | | | plpgsql_call_handler > | | | ExecInterpExpr > | | | ExecEvalExpr (inlined) > [...] > > > I'm not aware of any relevant regular expression evaluation changes > between 11 and 12. Tom, does this trigger anything? > > Thomas, could you provide a EXPLAIN (ANALYZE, VERBOSE) for both > versions? Clearly the hash join is where the time is spent: > > -> Hash Join (cost=1.000..6436.030 rows=44 width=517) (actual > time=5.532..3383.213 rows=2779 loops=1) > Buffers: shared hit=9077 > -> Index Scan using planet_osm_hstore_line_way_idx on planet_osm_hstore_line > planet_osm_hstore_line (cost=0.550..6419.380 rows=491 width=352) (actual > time=0.423..19.147 rows=2779 loops=1) > Index Cond: (planet_osm_hstore_line.way && > '01030000A0110F00000100000005000000AF78D2F4ECA632411CD92E58CD2E58410000000000000000AF78D2F4ECA63241E4EED5138D3958410000000000000000D3CF6EE3EBD13241E4EED5138D3958410000000000000000D3CF6EE3EBD132411CD92E58CD2E58410000000000000000AF78D2F4ECA632411CD92E58CD2E58410000000000000000'::geometry) > Filter: ((((planet_osm_hstore_line.tags -> 'name'::text) IS NOT NULL) OR > ((planet_osm_hstore_line.tags -> 'oneway'::text) = ANY ('{yes,-1}'::text[])) OR > ((planet_osm_hstore_line.tags -> 'junction'::text) = 'roundabout'::text)) AND > ((planet_osm_hstore_line.tags -> 'highway'::text) = ANY > ('{motorway,motorway_link,trunk,trunk_link,primary,primary_link,secondary,secondary_link,tertiary,tertiary_link,residential,unclassified,road,service,pedestrian,raceway,living_street,construction}'::text[]))) > Buffers: shared hit=9077 > -> Hash (cost=0.230..0.230 rows=18 width=32) (actual time=0.029..0.030 rows=18 > loops=1) > -> Values Scan (cost=0.000..0.230 rows=18 width=32) (actual time=0.003..0.017 > rows=18 loops=1) > > but I don't think it's the hash condition itself - there's no plpgsql > functions necessary for the join as far as I can tell. A VERBOSE EXPLAIN > might help up with knowing what the function being called actually is. > > It seems quite possible that all that time is spent inside plpgsql > functions that are part of postgis. And that the postgis version > difference between 2.5 and 3.0 is the relevant part here, not the > postgres version difference. Or it's just in the application code: > > The sort step shows among others the following column to be sorted on: > osml10n_get_streetname_from_tags(planet_osm_hstore_line.tags, true, false, ' - > '::text, 'de'::text, planet_osm_hstore_line.way, NULL::text)) DESC > which is actually likely computed as the output of the HashJoin > node. And is the only obvious candidate for a plpgsql function > potentially taking a lot of the time. > > Is that function defined exactly the same between the version? Is the > underlying dataset roughly the same size? I suggest running the query > without it, and checking how long it takes. > > Regards, > > Andres -- Thomas Butz
pgsql-bugs by date: