Re: Slow query on OS X box - Mailing list pgsql-performance
From | Patrick Hatcher |
---|---|
Subject | Re: Slow query on OS X box |
Date | |
Msg-id | OFD9C08201.63194703-ON88256CB6.006EA449@fds.com Whole thread Raw |
In response to | Slow query on OS X box ("Patrick Hatcher" <PHatcher@macys.com>) |
Responses |
Re: Slow query on OS X box
|
List | pgsql-performance |
Sorry I'm being really dense today. I didn't even notice the 3.2 million row being returned. :( To answer your question, no, all fields would not have data. The data we receive is from a Web log file. It's parsed and then uploaded to this table. I guess the bigger issue is that when trying to do aggregates, grouping by the wizard field, it takes just as long. Ex: mdc_oz=# explain analyze select wizard,count(wizard) from search_log where sdate between '2002-12-01' and '2003-01-15' group by wizard; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=1083300.43..1112411.55 rows=388148 width=10) (actual time=229503.85..302617.75 rows=14 loops=1) -> Group (cost=1083300.43..1102707.84 rows=3881482 width=10) (actual time=229503.60..286014.83 rows=3717161 loops=1) -> Sort (cost=1083300.43..1093004.14 rows=3881482 width=10) (actual time=229503.57..248415.81 rows=3717161 loops=1) Sort Key: wizard -> Seq Scan on search_log (cost=0.00..575217.57 rows=3881482 width=10) (actual time=91235.76..157559.58 rows=3717161 loops=1) Filter: ((sdate >= '2002-12-01'::date) AND (sdate <= '2003-01-15'::date)) Total runtime: 302712.48 msec (7 rows) Thanks again for the help Patrick Hatcher Rod Taylor <rbt@rbt.ca> To: Patrick Hatcher <PHatcher@macys.com> cc: Postgresql Performance <pgsql-performance@postgresql.org> 01/22/2003 Subject: Re: [PERFORM] Slow query on OS X box 12:02 PM On Wed, 2003-01-22 at 13:26, Patrick Hatcher wrote: > I have a table that contains over 13 million rows. This query takes an > extremely long time to return. I've vacuum full, analyzed, and re-indexed > the table. Still the results are the same. Any ideas? Yeah, you're pulling out 3.2 million rows from (possibly) a wide table bytewise. Do all of those fields actually have data? Thats always going to take a while -- and I find it hard to believe you're actually doing something with all of those rows that runs regularly. If every one of those rows was maxed out (ignoring the text field at the end) you could have ~ 15GB of data to pull out. Without knowing the type of data actually in the table, I'm going to bet it's a harddrive limitation. The index on 'wizard' is next to useless as at least 1/4 of the data in the table is under the same key. You might try a partial index on 'wizard' (skip the value 'Keyword'). It won't help this query, but it'll help ones looking for values other than 'Keyword'. Anyway, you might try a CURSOR. Fetch rows out 5000 at a time, do some work with them, then grab some more. This -- more or less -- will allow you to process the rows received while awaiting the remaining lines to be processed by the database. Depending on what you're doing with them it'll give a chance for the diskdrive to catch up. If the kernels smart it'll read ahead of the scan. This doesn't remove read time, but hides it while you're transferring the data out (from the db to your client) or processing it. > mdc_oz=# explain analyze select wizard from search_log where wizard > ='Keyword' and sdate between '2002-12-01' and '2003-01-15'; > QUERY PLAN > ----------------------------------------------------------------------------------------------------------------------------- > Seq Scan on search_log (cost=0.00..609015.34 rows=3305729 width=10) > (actual time=99833.83..162951.25 rows=3280573 loops=1) > Filter: ((wizard = 'Keyword'::character varying) AND (sdate > > = '2002-12-01'::date) AND (sdate <= '2003-01-15'::date)) > Total runtime: 174713.25 msec > (3 rows) -- Rod Taylor <rbt@rbt.ca> PGP Key: http://www.rbt.ca/rbtpub.asc (See attached file: signature.asc)
Attachment
pgsql-performance by date: