Re: BUG #5294: Sorts on more than just the order-by clause - Mailing list pgsql-bugs
From | Allen Johnson |
---|---|
Subject | Re: BUG #5294: Sorts on more than just the order-by clause |
Date | |
Msg-id | 6786ed4f1001221307g3f34f41cn7a1aa8113886dbd2@mail.gmail.com Whole thread Raw |
In response to | Re: BUG #5294: Sorts on more than just the order-by clause (Greg Stark <gsstark@mit.edu>) |
Responses |
Re: BUG #5294: Sorts on more than just the order-by clause
|
List | pgsql-bugs |
Ok, I've generated a test database with: * 20,000 users * 250,000 contacts * 1,124,700 attachments The summary of the results is that the normal query takes about 32sec on my machine. The hack query takes about 13sec. Below are the queries and their `explain analyze` outputs. I've attached the test layout as well as a zip file containing the ruby scripts that generate the data in the default format that the 'copy' command expects. If anyone else wants to give it a try this is the procedure. I wrote these scripts in a hurry so I'm sure there could be many improvements :) 1. Apply the tables.sql file to a test database (might want to run the create index commands after the data load) 2. Create the data files using scripts in the create-scripts.zip like this: ruby create-user.rb ; creates /tmp/users ruby create-contacts.rb ; creates /tmp/contacts ruby create-attachments.rb ; creates /tmp/attachments 3. Load data into the test database copy users from '/tmp/users'; copy contacts from '/tmp/contacts'; copy attachments from '/tmp/attachments'; 4. Run the `create index` statements in the tables.sql file I also have a pg_dump version if anyone wants it (~6MB gzipped). /* NORMAL QUERY */ select users.id, users.full_name, users.username, ct.name as type_name, c.lname, c.fname, c.mname, c.email, c.address1, c.city, c.state, c.zip_code, c.created_date, count(a.id) as attachment_count from contacts c inner join users on (users.id = c.user_id) inner join contact_types ct on (ct.code = c.contact_type_code) left join attachments a on (a.contact_id = c.id) where users.company_id = 1 and c.contact_type_code in ('BOSS', 'EMP', 'WORK') group by users.id, users.full_name, users.username, ct.name, c.lname, c.fname, c.mname, c.email, c.address1, c.city, c.state, c.zip_code, c.created_date order by users.full_name, ct.name, c.lname, c.fname, c.mname, c.created_date EXPLAIN ANALYZE OUTPUT: QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- GroupAggregate (cost=102724.80..122894.80 rows=403400 width=124) (actual time=30683.912..32431.153 rows=75228 loops=1) -> Sort (cost=102724.80..103733.30 rows=403400 width=124) (actual time=30683.869..31236.760 rows=345543 loops=1) Sort Key: users.full_name, ct.name, c.lname, c.fname, c.mname, c.created_date, users.id, users.username, c.email, c.address1, c.city, c.state, c.zip_code Sort Method: quicksort Memory: 96578kB -> Merge Right Join (cost=16571.91..65164.53 rows=403400 width=124) (actual time=1946.609..7523.831 rows=345543 loops=1) Merge Cond: (a.contact_id = c.id) -> Index Scan using attachments_contact on attachments a (cost=0.00..39729.87 rows=1124700 width=12) (actual time=0.082..2150.242 rows=1124662 loops=1) -> Sort (cost=16571.91..16732.30 rows=64157 width=124) (actual time=1946.463..2384.232 rows=345537 loops=1) Sort Key: c.id Sort Method: quicksort Memory: 21439kB -> Hash Join (cost=553.92..11449.19 rows=64157 width=124) (actual time=81.120..1584.615 rows=75228 loops=1) Hash Cond: ((c.contact_type_code)::text = (ct.code)::text) -> Hash Join (cost=552.81..10565.92 rows=64157 width=121) (actual time=81.063..1285.727 rows=75228 loops=1) Hash Cond: (c.user_id = users.id) -> Seq Scan on contacts c (cost=0.00..7534.50 rows=122469 width=96) (actual time=0.023..526.259 rows=150077 loops=1) Filter: ((contact_type_code)::text = ANY ('{BOSS,EMP,WORK}'::text[])) -> Hash (cost=427.00..427.00 rows=10065 width=33) (actual time=80.974..80.974 rows=10065 loops=1) -> Seq Scan on users (cost=0.00..427.00 rows=10065 width=33) (actual time=0.022..37.797 rows=10065 loops=1) Filter: (company_id = 1) -> Hash (cost=1.05..1.05 rows=5 width=12) (actual time=0.037..0.037 rows=5 loops=1) -> Seq Scan on contact_types ct (cost=0.00..1.05 rows=5 width=12) (actual time=0.018..0.024 rows=5 loops=1) Total runtime: 32551.132 ms (22 rows) /* HACK QUERY */ select * from ( select users.id, users.full_name, users.username, ct.name as type_name, c.lname, c.fname, c.mname, c.email, c.address1, c.city, c.state, c.zip_code, c.created_date, count(a.id) as attachment_count from contacts c inner join users on (users.id = c.user_id) inner join contact_types ct on (ct.code = c.contact_type_code) left join attachments a on (a.contact_id = c.id) where users.company_id = 1 and c.contact_type_code in ('BOSS', 'EMP', 'WORK') group by users.id, users.full_name, users.username, ct.name, c.lname, c.fname, c.mname, c.email, c.address1, c.city, c.state, c.zip_code, c.created_date ) as results order by full_name, type_name, lname, fname, mname, created_date /* EXPLAIN ANALYZE OUTPUT */ QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=420987.80..421996.30 rows=403400 width=1688) (actual time=12579.266..12663.383 rows=75228 loops=1) Sort Key: users.full_name, ct.name, c.lname, c.fname, c.mname, c.created_date Sort Method: quicksort Memory: 21435kB -> HashAggregate (cost=79283.53..84326.03 rows=403400 width=124) (actual time=9546.773..9721.322 rows=75228 loops=1) -> Merge Right Join (cost=16571.91..65164.53 rows=403400 width=124) (actual time=1857.597..7428.718 rows=345543 loops=1) Merge Cond: (a.contact_id = c.id) -> Index Scan using attachments_contact on attachments a (cost=0.00..39729.87 rows=1124700 width=12) (actual time=0.862..2298.099 rows=1124662 loops=1) -> Sort (cost=16571.91..16732.30 rows=64157 width=124) (actual time=1856.666..2279.646 rows=345537 loops=1) Sort Key: c.id Sort Method: quicksort Memory: 21439kB -> Hash Join (cost=553.92..11449.19 rows=64157 width=124) (actual time=54.465..1500.539 rows=75228 loops=1) Hash Cond: ((c.contact_type_code)::text = (ct.code)::text) -> Hash Join (cost=552.81..10565.92 rows=64157 width=121) (actual time=54.375..1210.697 rows=75228 loops=1) Hash Cond: (c.user_id = users.id) -> Seq Scan on contacts c (cost=0.00..7534.50 rows=122469 width=96) (actual time=0.035..492.531 rows=150077 loops=1) Filter: ((contact_type_code)::text = ANY ('{BOSS,EMP,WORK}'::text[])) -> Hash (cost=427.00..427.00 rows=10065 width=33) (actual time=54.277..54.277 rows=10065 loops=1) -> Seq Scan on users (cost=0.00..427.00 rows=10065 width=33) (actual time=0.032..27.676 rows=10065 loops=1) Filter: (company_id = 1) -> Hash (cost=1.05..1.05 rows=5 width=12) (actual time=0.055..0.055 rows=5 loops=1) -> Seq Scan on contact_types ct (cost=0.00..1.05 rows=5 width=12) (actual time=0.026..0.035 rows=5 loops=1) Total runtime: 12752.540 ms (22 rows)
Attachment
pgsql-bugs by date: