Re: Performance improvement for joins where outer side is unique - Mailing list pgsql-hackers
From | Kyotaro HORIGUCHI |
---|---|
Subject | Re: Performance improvement for joins where outer side is unique |
Date | |
Msg-id | 20150324.211120.120654019.horiguchi.kyotaro@lab.ntt.co.jp Whole thread Raw |
In response to | Re: Performance improvement for joins where outer side is unique (David Rowley <dgrowleyml@gmail.com>) |
Responses |
Re: Performance improvement for joins where outer side is unique
|
List | pgsql-hackers |
Hi, thanks for the new patch. I made an additional shrink from your last one. Do you have a look on the attached? At Sun, 22 Mar 2015 19:42:21 +1300, David Rowley <dgrowleyml@gmail.com> wrote in <CAApHDvrKwMmTwkXfn4uazYZA9jQL1c7UwBjBtuwFR69rqLVKfA@mail.gmail.com> > On 20 March 2015 at 21:11, David Rowley <dgrowleyml@gmail.com> wrote: > > > > I can continue working on your patch if you like? Or are you planning to > > go further with it? It's fine that you continue to work on this. # Sorry for the hardly baked patch which had left many things alone:( > I've been working on this more over the weekend and I've re-factored things > to allow LEFT JOINs to be properly marked as unique. > I've also made changes to re-add support for detecting the uniqueness of > sub-queries. I don't see the point of calling mark_unique_joins for every iteration on join_info_list in remove_useless_joins. The loop already iteraltes on whole the join_info_list so mark_unique_join as an individual function is needless. Finally, simply marking uniqueness of join in join_is_removable seems to be enough, inhibiting early bailing out by the checks on attribute usage and placeholder let it work as expected. Reducing changes to this extent, I can easily see what is added to planning computations. It consists of mainly two factors. - Unique-join chekcs for every candidate inner joins in add_paths_to_joinrel. - Uniqueness check of mergejoinable clause in join-removability check for every left join, some of which would be skippedby other checks before. > Also, I've added modified the costing for hash and nested loop joins to > reduce the cost for unique inner joins to cost the join the same as it does > for SEMI joins. This has tipped the scales on a few plans in the regression > tests. I've forgotten it, but quite important. > Also, please see attached unijoin_analysis.patch. This just adds some code > which spouts out notices when join nodes are initialised which states if > the join is unique or not. Running the regression tests with this patch in > places gives: > > Unique Inner: Yes == 753 hits > Unique Inner: No == 1430 hits > > So it seems we can increase the speed of about 1 third of joins by about > 10%. > A quick scan of the "No"s seems to show quite a few cases which do not look > that real world like. e.g cartesian join. I don't have an idea how many queries in the reality hit this but I suppose it's not a few. > It would be great if someone could run some PostgreSQL application with > these 2 patches applied, and then grep the logs for the Unique Inner > results... Just to get a better idea of how many joins in a real world case > will benefit from this patch. Wow. I think the second patch should be DEBUGx, not NOTICE:) regards, -- Kyotaro Horiguchi NTT Open Source Software Center
pgsql-hackers by date: