Re: improving a badly optimized query - Mailing list pgsql-general
From | Brandon Craig Rhodes |
---|---|
Subject | Re: improving a badly optimized query |
Date | |
Msg-id | w67kf5eh73.fsf@guinness.ts.gatech.edu Whole thread Raw |
In response to | Re: improving a badly optimized query (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: improving a badly optimized query
|
List | pgsql-general |
Tom Lane <tgl@sss.pgh.pa.us> writes: > Brandon Craig Rhodes <brandon@oit.gatech.edu> writes: > > (a) (slow) > > SELECT * FROM role_keys NATURAL LEFT JOIN role_person > > WHERE person = 28389; > > > (b) (fast) > > SELECT * FROM role_keys NATURAL JOIN role_person > > WHERE person = 28389; > > > ... the rows created for unmatched role_keys rows by the LEFT JOIN > > are guaranteed to be thrown out by the WHERE clause ... [so] the > > LEFT JOIN could be reduced to a JOIN > > It seems like a useful optimization, but I have an uncomfortable > feeling that there's something wrong with it. Can you point to a > rigorous proof that this is okay in complicated contexts such as > nested outer joins? We can optimize the above query simply by observing that the result of a LEFT JOIN includes both the rows that would have been produced by a simple JOIN, and those rows of the left table that did not match any from the right. That is: SELECT * FROM role_keys NATURAL LEFT JOIN role_person WHERE person = 28389; is the equivalent of: SELECT * FROM role_keys NATURAL JOIN role_person WHERE person = 28389 UNION SELECT * FROM (SELECT role, CAST(NULL AS integer) AS person FROM role_keys) AS rp WHERE person = 28389 EXCEPT SELECT role, CAST(NULL AS integer) AS person FROM role_person; Since the parenthesized sub-select generates only NULL person fields, the WHERE condition will yield an empty result, so the entire query can be reduced to: SELECT * FROM role_keys NATURAL JOIN role_person WHERE person = 28389; Similar technique will yield transforms for RIGHT and OUTER JOINs that can reveal similar optimizations. I should state our original intention: We wanted to gain simplicity and save space by inserting rows into role_person only for those roles associated with people, leaving all other roles unmentioned; but obviously a view that contains all roles and mentions their people requires a LEFT JOIN, which kills query performance. There are three alternatives: (1) Mention every role in the role_person table, using NULLs for those roles without associated people, and then our view can be implemented with a simple JOIN rather than an expensive LEFT JOIN. (This wastes space and we find it logically unattractive, as well as more difficult to maintain.) (2) Build queries on top of the actual tables, instead of the view, choosing for each query the least sufficient JOIN. (This destroys the usefulness of views, and makes our queries more complex since we must hand-craft each JOIN to fit the query.) (3) Include the above optimization into at least our own copy of PostgreSQL. (This will be our first experience dealing directly with the query optimizer, and we are not sure where to begin.) Please advise, -- Brandon Craig Rhodes http://www.rhodesmill.org/brandon Georgia Tech brandon@oit.gatech.edu
pgsql-general by date: