query optimization scenarios 17,701 times faster!!! - Mailing list pgsql-hackers
From | Robert Dyas |
---|---|
Subject | query optimization scenarios 17,701 times faster!!! |
Date | |
Msg-id | MGEFJOBFIEAIADIKAMEKAEIKCIAA.rdyas@adelphia.net Whole thread Raw |
Responses |
Re: query optimization scenarios 17,701 times faster!!!
Re: query optimization scenarios 17,701 times faster!!! Re: query optimization scenarios 17,701 times faster!!! |
List | pgsql-hackers |
Hi Everyone - this is my first post (but I have been lurking on and off for a couple of years). Congratulations on a steadily improving product! This is not a question, just some observations on performance that I thought might trigger someone's thinking on ways to improve query optimization. The following is a list of query pairs (one fast, one slow) that must produce identical results by definition (and do), but have very different execution times. Especially the last example. I did some development work on a commercial SQL database product 7 years ago (names will not be used), so although I am clueless about PostgreSQL internals, I (think) I have a grip on some of the query optimization issues (though not necessarily a grip on life). The data set used for all of these queries was very small - most tables had a few hundred records or less. No, its not very scientific, but I believe its illustrative none-the-less. I'll just make a couple of observations on the last query and leave everyone else to reach their own conclusions. 1) the two versions of the last query must produce identical results by definition (and they do) 2) it appears that the optimizer is doing all of the join work before ever considering the impact of where clause restrictions. (this may not be the case, but it appears so) 3) It could have said, hey, I have a where clause restriction on the primary key that is equal to a fixed value. So I have a single row from that table to deal with, all of the columns come from that table too, and further its left joined to the rest of the crap so I can safely ignore it. I always think its illustrative to look at extreme examples like this to point out optimizations that may be overlooked. If I ever get some free time, I look forward to contributing to this wonderful project! NOTES: 1) I didn't include the schema to keep this post reasonable. send email to rdyas@adelphia.net if you want the schema to look into this further. 2) the primary keys for the following tables are org_milestones.id tasks.task_id contacts.contact_id organizations.org_id EXPLAIN ANALYZE SELECT DISTINCT org_milestones.completed_on, org_milestones.id, org_milestones.milestone_id, org_milestones.notes, org_milestones.org_id FROM org_milestones RIGHT OUTER JOIN organizations ON (org_milestones.org_id = organizations.org_id) LEFT OUTER JOIN contacts ON (contacts.org_id = organizations.org_id) LEFT OUTER JOIN tasks ON (tasks.org_id = organizations.org_id) WHERE (organizations.org_id = 71) ORDER BY org_milestones.completed_on, org_milestones.id 79.20 msec EXPLAIN ANALYZE SELECT org_milestones.completed_on, org_milestones.id, org_milestones.milestone_id, org_milestones.notes, org_milestones.org_id FROM org_milestones RIGHT OUTER JOIN organizations ON (org_milestones.org_id = organizations.org_id) WHERE (organizations.org_id = 71) ORDER BY org_milestones.completed_on, org_milestones.id 6.44 msec = 12 times faster --------------- EXPLAIN ANALYZE SELECT DISTINCT tasks.completed, tasks.contact_id, tasks.created_by, tasks.notes, tasks.objective, tasks.org_id, tasks.outcome, tasks.priority, tasks.start_date, tasks.start_time, tasks.task_id, tasks.task_type FROM tasks RIGHT OUTER organizations ON (tasks.org_id = organizations.org_id) LEFT OUTER JOIN org_milestones ON (org_milestones.org_id = organizations.org_id) LEFT OUTER JOIN contacts ON (contacts.org_id = organizations.org_id) WHERE (organizations.org_id = 71) ORDER BY tasks.start_date, tasks.start_time 2,548.71 msec EXPLAIN ANALYZE SELECT tasks.completed, tasks.contact_id, tasks.created_by, tasks.notes, tasks.objective, tasks.org_id, tasks.outcome, tasks.priority, tasks.start_date, tasks.start_time, tasks.task_id, tasks.task_type FROM tasks RIGHT OUTER JOIN organizations ON (tasks.org_id = organizations.org_id) WHERE (organizations.org_id = 71) ORDER BY tasks.start_date, tasks.start_time 29.21 msec = 87 times faster ----------- EXPLAIN ANALYZE SELECT DISTINCT contacts.address_1, contacts.address_2, contacts.assistant_email, contacts.assistant_name, contacts.assistant_phone, contacts.city, contacts.contact_id, contacts.email_address, contacts.first_name, contacts.functional_role, contacts.last_name, contacts.needs, contacts.notes, contacts.org_id, contacts.pain, contacts.phone, contacts.reasons, contacts.reports_to, contacts.state, contacts.title, contacts.zip_code, (contacts.first_name || ' ' || contacts.last_name) AS full_name FROM contacts RIGHT OUTER JOIN organizations ON (contacts.org_id = organizations.org_id) LEFT OUTER JOIN org_milestones ON (org_milestones.org_id = organizations.org_id) LEFT OUTER JOIN tasks ON (tasks.org_id = organizations.org_id) WHERE (organizations.org_id = 71) 2056.83 msec EXPLAIN ANALYZE SELECT DISTINCT contacts.address_1, contacts.address_2, contacts.assistant_email, contacts.assistant_name, contacts.assistant_phone, contacts.city, contacts.contact_id, contacts.email_address, contacts.first_name, contacts.functional_role, contacts.last_name, contacts.needs, contacts.notes, contacts.org_id, contacts.pain, contacts.phone, contacts.reasons, contacts.reports_to, contacts.state, contacts.title, contacts.zip_code, (contacts.first_name || ' ' || contacts.last_name) AS full_name FROM contacts RIGHT OUTER JOIN organizations ON (contacts.org_id = organizations.org_id) WHERE (organizations.org_id = 71) 27.41 msec = 75 times faster ----------------- EXPLAIN ANALYZE SELECT DISTINCT organizations.city, organizations.inactive, organizations.java_developers, organizations.name, organizations.org_id, organizations.overview, organizations.phone, organizations.salesperson, organizations.state, organizations.time_zone FROM organizations LEFT OUTER JOIN org_milestones ON (org_milestones.org_id = organizations.org_id) LEFT OUTER JOIN contacts ON (contacts.org_id = organizations.org_id) LEFT OUTER JOIN tasks ON (tasks.org_id = organizations.org_id) WHERE (organizations.org_id = 71) ORDER BY organizations.name 12,567.87 msec EXPLAIN ANALYZE SELECT organizations.city, organizations.inactive, organizations.java_developers, organizations.name, organizations.org_id, organizations.overview, organizations.phone, organizations.salesperson, organizations.state, organizations.time_zone FROM organizations WHERE (organizations.org_id = 71) 0.71 msec = 17,701 times faster -----------------------
pgsql-hackers by date: