Thread: Query running slow but was running fine before

Query running slow but was running fine before

From

Keaton Adams

Date:

07 July 2008, 19:07:09

PG: PostgreSQL 8.1.4
OS: RHEL 4.x

I have a set of queries on a production server that have been running fine for the past few months but as of last Friday started performing poorly. I have isolated the problem down to a particular part that is common to all queries involved and have provided an example here of what the issue is. There are multiple tables where this is a problem, not just one.

An analyze is run on the tables every day (even several times a day because they are updated very frequently) and a vacuum analyze is run on the weekends. I also tried to run an analyze specifically on the customer_id column and then the product_id column but that didn’t help.

This one table listed is part of a UNION ALL that joins ten separate tables. The “real” query is against the view name of kda_log_info. I am working with just one of the tables within the view to help narrow down where the problem might be.

There may be an occasion when the product_id value is null, so this is how the query is written. It uses the index on customer_id and does a filter on product_id. The cost is high and the time it takes to scan the 964 rows is about one minute on average (where before it would return in sub-second time):

kadams@hostserver> more t4.sql
select count(*)
from kda_log_info_2008w24
where customer_id = 7767664
AND (created >= '2008-06-01 00:00:00-06'::timestamp with time zone)
AND (created < '2008-07-01 00:00:00-06'::timestamp with time zone)
and ((product_id IS NULL) OR (product_id = 2070101833));

kadams@hostserver> psql -Upostgres -dkda_log -ft4.sql
                                                                                      QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=426537.34..426537.35 rows=1 width=0)
   -> Index Scan using kda_log_info_cid_cre_dom_2008w24_idx on kda_log_info_2008w24 (cost=0.00..426535.75 rows=635 width=0)
         Index Cond: ((customer_id = 7767664) AND (created >= '2008-06-01 00:00:00-06'::timestamp with time zone) AND (created < '2008-07-01 00:00:00-06'::timestamp with time zone))
         Filter: ((product_id IS NULL) OR (product_id = 2070101833))
(4 rows)

kadams@hostserver> time psql -Upostgres -dkda_log -ft4.sql
count
-------
   964
(1 row)

real    0m54.810s
user    0m0.002s
sys    0m0.001s

kda_log=# select count(*) from kda_log_info_2008w24;
  count
----------
16356303
(1 row)

Index:
kda_log_info_cid_cre_dom_2008w24_idx btree (customer_id, created, "domain")

On the same dataset, if I eliminate the IS NULL and look for just the specific product_id the optimizer picks the index on the product_id column and filters on customer_id and scans the same 964 rows in 2 milliseconds:

select count(*)
from kda_log_info_2008w24
where customer_id = 7767664
AND (created >= '2008-06-01 00:00:00-06'::timestamp with time zone)
AND (created < '2008-07-01 00:00:00-06'::timestamp with time zone)
--and ((product_id IS NULL) OR (product_id = 2070101833))
and product_id = 2070101833;

kadams@hostserver> time psql -Upostgres -dkda_log -ft4.sql
                                                                                      QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1649.41..1649.42 rows=1 width=0)
   -> Index Scan using kda_log_info_did_cre_2008w24_idx on kda_log_info_2008w24 (cost=0.00..1647.82 rows=635 width=0)
         Index Cond: ((product_id = 2070101833) AND (created >= '2008-06-01 00:00:00-06'::timestamp with time zone) AND (created < '2008-07-01 00:00:00-06'::timestamp with time zone))
         Filter: (customer_id = 7767664)
(4 rows)

kadams@hostserver> time psql -Upostgres -dkda_log -ft4.sql
count
-------
   964
(1 row)

real    0m0.207s
user    0m0.004s
sys    0m0.002s

There are currently no rows in the table where product_id is NULL:

mxl_log=# select count(*) from kda_log_info_2008w24 where product_id IS NULL;
-------
     0
(1 row)

Index:
kda_log_info_did_cre_2008w24_idx" btree (domain_id, created)

As another comparison I ran this same query on a different database server with the same database layout (different data set, of course) which returns the same approximate number of rows for the given customer_id / product_id and it returns in sub-second time. This table has even more rows in it than on the server where performance has tanked:

kadams@hostserver2> more t2.sql
select count(*)
from kda_log_info_2008w24
where customer_id = 907
AND (created >= '2008-06-01 00:00:00-06'::timestamp with time zone)
AND (created < '2008-07-01 00:00:00-06'::timestamp with time zone)
and ((product_id IS NULL) OR (product_id = 573351));

kadams@hostserver2> psql -Upostgres -dkda_log -ft2.sql
                                                                                    QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=2626.36..2626.37 rows=1 width=0)
   -> Index Scan using kda_log_info_cid_cre_dom_2008w24_idx on kda_log_info_2008w24 (cost=0.00..2626.35 rows=1 width=0)
         Index Cond: ((customer_id = 907) AND (created >= '2008-06-01 00:00:00-06'::timestamp with time zone) AND (created < '2008-07-01 00:00:00-06'::timestamp with time zone))
         Filter: ((product_id IS NULL) OR (product_id = 573351))
(4 rows)

kadams@hostserver2> time psql -Upostgres -dmxl_log -ft2.sql
count
-------
   992
(1 row)

real    0m0.011s
user    0m0.002s
sys    0m0.002s

kda_log=# select count(*) from kda_log_info_2008w24;
  count
----------
21777364
(1 row)

I compared postgresql.conf file settings between the servers and they are identical.

The physical servers are also the same configuration (8 CPU, 8 GB RAM, local fast-SCSI disk).

So, I have no idea why the query in the top of this email on this particular database server has gone from sub-second response to over a minute on average. As only part of the overall query, times have gone from a few seconds to 15-20 minutes each, which is causing major problems for our users.

Any idea what else I can look at or do to resolve this problem? Any additional information I can provide to help you guys figure this out?

PS: Yes, I know..... 8.1 is old..... We are migrating to 8.3.x in the fall.

Thanks for your help,

Keaton

Re: Query running slow but was running fine before

From

"Dennis Brakhane"

Date:

08 July 2008, 07:38:30

On Tue, Jul 8, 2008 at 12:06 AM, Keaton Adams <kadams@mxlogic.com> wrote:
> An analyze is run on the tables every day (even several times a day because
> they are updated very frequently) and a vacuum analyze is run on the
> weekends.  I also tried to run an analyze specifically on the customer_id
> column and then the product_id column but that didn't help.

I'm no expert, so if I'm talking nonsense here, someone please correct me.

From what I read on this list I believe you need to run VACUUM
frequently if there are many updates, at least daily,
but it could also be needed every n minutes in extreme cases, an
ANALYSE won't cut it.

(UPDATE leaves dead rows in the database, which must be filtered out
by count(*), vacuum gets rid of them)

Try a VACUUM FULL (at a time when there isn't much load on the
server), if this solves your problem, you need to decrease your VACUUM
interval.

Re: Query running slow but was running fine before

From

Keaton Adams

Date:

08 July 2008, 09:28:41

Good point. But....

This is a database used to capture logged information, such as success/failure of an operation. Daily tables are rolled up to weekly tables, weekly tables are rolled up to monthly tables. All logged activity are done by inserts, never any updates or deletes, to avoid having to do a vacuum full. So rows are added for seven days to the weekly table (kda_log_info_2008w24) and that’s it. Once a new week begins the data is static until enough weeks pass by where the weekly information is summarized at a higher level and stored in a monthly table, then the weekly table is dropped.

I’ll send out a full EXPLAIN from the original query, which was against the view, so you can see the scope of the issue.

Thanks for the reply,

Keaton

On 7/8/08 4:32 AM, "Dennis Brakhane" <brakhane+psql@googlemail.com> wrote:

On Tue, Jul 8, 2008 at 12:06 AM, Keaton Adams <kadams@mxlogic.com> wrote:
> An analyze is run on the tables every day (even several times a day because
> they are updated very frequently) and a vacuum analyze is run on the
> weekends. I also tried to run an analyze specifically on the customer_id
> column and then the product_id column but that didn't help.

I'm no expert, so if I'm talking nonsense here, someone please correct me.

From what I read on this list I believe you need to run VACUUM
frequently if there are many updates, at least daily,
but it could also be needed every n minutes in extreme cases, an
ANALYSE won't cut it.

(UPDATE leaves dead rows in the database, which must be filtered out
by count(*), vacuum gets rid of them)

Try a VACUUM FULL (at a time when there isn't much load on the
server), if this solves your problem, you need to decrease your VACUUM
interval.

Re: Query running slow but was running fine before

From

Bill Moran

Date:

08 July 2008, 09:39:57

In response to Keaton Adams <kadams@mxlogic.com>:

> Good point.  But....
>
> This is a database used to capture logged information, such as
> success/failure of an operation.  Daily tables are rolled up to weekly
> tables, weekly tables are rolled up to monthly tables.  All logged
> activity are done by inserts, never any updates or deletes, to avoid
> having to do a vacuum full.  So rows are added for seven days to the
> weekly table (kda_log_info_2008w24) and that's it.  Once a new week
> begins the data is static until enough weeks pass by where the weekly
> information is summarized at a higher level and stored in a monthly
> table, then the weekly table is dropped.
>
> I'll send out a full EXPLAIN from the original query, which was against
> the view, so you can see the scope of the issue.

Please also send the output of:
SELECT relname, n_tup_ins,  n_tup_upd, n_tup_del, n_live_tup, n_dead_tup FROM pg_stat_user_tables ;

--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023

Re: Query running slow but was running fine before

From

Keaton Adams

Date:

08 July 2008, 13:14:57

It looks like the indexes are blown.... Which may indicate some corruption in the pg_indexes system catalog table?

Is there a way to check the integrity of pg_indexes for possible page corruption? Nothing is being reported in the postgresql log file that would indicate there is a corruption issue.

Thanks,

Keaton

On 7/8/08 6:24 AM, "Keaton Adams" <kadams@mxlogic.com> wrote:

Good point. But....

This is a database used to capture logged information, such as success/failure of an operation. Daily tables are rolled up to weekly tables, weekly tables are rolled up to monthly tables. All logged activity are done by inserts, never any updates or deletes, to avoid having to do a vacuum full. So rows are added for seven days to the weekly table (kda_log_info_2008w24) and that’s it. Once a new week begins the data is static until enough weeks pass by where the weekly information is summarized at a higher level and stored in a monthly table, then the weekly table is dropped.

I’ll send out a full EXPLAIN from the original query, which was against the view, so you can see the scope of the issue.

Thanks for the reply,

Keaton

On 7/8/08 4:32 AM, "Dennis Brakhane" <brakhane+psql@googlemail.com> wrote:

On Tue, Jul 8, 2008 at 12:06 AM, Keaton Adams <kadams@mxlogic.com> wrote:
> An analyze is run on the tables every day (even several times a day because
> they are updated very frequently) and a vacuum analyze is run on the
> weekends. I also tried to run an analyze specifically on the customer_id
> column and then the product_id column but that didn't help.

I'm no expert, so if I'm talking nonsense here, someone please correct me.

From what I read on this list I believe you need to run VACUUM
frequently if there are many updates, at least daily,
but it could also be needed every n minutes in extreme cases, an
ANALYSE won't cut it.

(UPDATE leaves dead rows in the database, which must be filtered out
by count(*), vacuum gets rid of them)

Try a VACUUM FULL (at a time when there isn't much load on the
server), if this solves your problem, you need to decrease your VACUUM
interval.