Home > mailing lists
Re: INDEX suggestion needed - Mailing list pgsql-general

From	Thomas Beutin
Subject	Re: INDEX suggestion needed
Date	December 13, 2002 10:50:04
Msg-id	20021213164138.C17113@laokoon.bug.net Whole thread Raw
In response to	Re: INDEX suggestion needed (Manfred Koizar <mkoi-pg@aon.at>)
Responses	Re: INDEX suggestion needed Re: INDEX suggestion needed
List	pgsql-general
Tree view
Hi,

at first thanks to all people for help!

On Thu, Dec 12, 2002 at 10:00:48PM +0100, Manfred Koizar wrote:
> On Thu, 12 Dec 2002 20:13:24 +0100, Thomas Beutin
> <tyrone@laokoon.IN-Berlin.DE> wrote:
[...]

> >tb=# EXPLAIN ANALYZE SELECT COUNT(DISTINCT a_id) FROM stat_pages WHERE m_id = 35::smallint AND (visit >=
'2002-06-01'AND visit <= '2002-12-11'); 
>
> This selects (almost) all rows.  An index cannot help.
>
> >i got the following index/sequence scans by date ranges:
> >(visit >= '2002-06-01' AND visit <= '2002-06-30')    index scan
>
> 0 rows satisfy this condition, index scan is ok
>
> >(visit >= '2002-06-01' AND visit <= '2002-07-31')    index scan
>
> Ca. 10000 rows, 3% of the whole table, index scan ok
Ahh, now i get a clear look into index usability ;-)

> >(visit >= '2002-06-01' AND visit <= '2002-08-31')    sequence scan
>
> 32%, sequence scan is expected to be faster, unless tuples are almost
> perfectly ordered by visit or most of the table (~ 8000 pages) fits
> into the cache.  How much physical memory is installed?  What are your
> shared_buffers and effective_cache_size settings?  You might want to
> experiment with
>     SET enable_seqscan = off;
Ok, this queries are running on my developer workstation (notebook):
550 MHz mobile celeron, 128 MB RAM, 512MB swap space, IDE
running a apache/php, postgres 7.2.3, X11/gnome and mozilla 1.2.1

This are my entries in postgresql.conf:
#max_connections = 32
#shared_buffers = 64        # 2*max_connections, min 16
#effective_cache_size = 1000  # default in 8k pages
so i'm guessing i run the default values. Playing around with
enable_seqscan = on/off shows the planner's right choices.

Actually i cannot connect to the production server for stats, but the
hardware is:
2x iPIII 850MHz, 2GB RAM, 1GB swap, SCSI RAID5 for database
running a apache/php and postgres 7.2.?
Should be more reasonable...


[...]

> >(visit >= '2002-10-01' AND visit <= '2002-10-31')    index scan, but long (>5sec)
>
> Ca. 3%; interesting that it takes so long; I'd like to see EXPLAIN
> ANALYZE output for enable_seqscan on and off.
tb=# set enable_seqscan=on;
SET VARIABLE
itb=# EXPLAIN ANALYZE SELECT COUNT(DISTINCT a_id) FROM stat_pages WHERE m_id = 35::smallint AND (visit >= '2002-10-01'
ANDvisit <= '2002-10-31'); 
NOTICE:  QUERY PLAN:

Aggregate  (cost=8788.75..8788.75 rows=1 width=34) (actual time=4663.69..4663.70 rows=1 loops=1)
  ->  Index Scan using tb5 on stat_pages  (cost=0.00..8712.60 rows=30459 width=34) (actual time=0.41..468.35 rows=29937
loops=1)
Total runtime: 4663.99 msec

EXPLAIN

tb=# set enable_seqscan=off;
SET VARIABLE
tb=# EXPLAIN ANALYZE SELECT COUNT(DISTINCT a_id) FROM stat_pages WHERE m_id = 35::smallint AND (visit >= '2002-10-01'
ANDvisit <= '2002-10-31'); 
NOTICE:  QUERY PLAN:

Aggregate  (cost=8788.75..8788.75 rows=1 width=34) (actual time=4788.35..4788.35 rows=1 loops=1)
  ->  Index Scan using tb5 on stat_pages  (cost=0.00..8712.60 rows=30459 width=34) (actual time=0.41..439.44 rows=29937
loops=1)
Total runtime: 4788.65 msec

EXPLAIN

There is no difference in cost.

> >And: The date range in my table is from 2002-07-10 11:36:53+02 up to
> >2002-10-29 23:31:47+01.
>
> Yes, this is approximately reflected by the histogram bounds.
>
> > attname | null_frac | avg_wi | n_distinct | correlation
> >---------+-----------+--------+------------+-------------
> > visit   |         0 |      8 |  -0.543682 |   -0.972118
>
> The negative correlation looks strange.  How did you insert your data?
It is a dump from the production system, and the production system gets
the data once a day from webserver logs line by line.

> > m_id    |         0 |      2 |          1 |           1
>                                          ^^^
> Only one distinct value in m_id?  This explains why your m_id index is
> never used.
Yes, because i copied only a part of the data, it's still a lot for
my developer system ;-)
I will playing around on the production system next week, may be i give
combined indicees (m_id, visit) with enable_seqscan=on/off a try.

> > attname | most_common_vals
> >---------+--------------------------
> > visit   | "2002-08-21 10:29:10+02", ...
> > m_id    | 35
>
> > attname | histogram_bounds
> >---------+--------------------------
> > visit   | "2002-07-25 16:37:12+02"
> >           "2002-08-15 12:36:18+02"
> >           "2002-08-23 12:36:15+02"
> >           "2002-08-29 17:30:54+02"
> >           "2002-09-05 12:54:31+02"
> >           "2002-09-10 18:03:54+02"
> >           "2002-09-16 15:44:56+02"
> >           "2002-09-20 14:34:40+02"
> >           "2002-09-24 13:59:29+02"
> >           "2002-09-29 09:09:31+02"
> >           "2002-10-29 23:25:13+01"
> > m_id    |
> >
>
> >??? Is this output ok?
>
> Almost.  I forgot to ask for most_common_freqs (cut'n'paste error).
Here is the result:
tb=# SELECT attname, null_frac, avg_width, n_distinct, most_common_vals, most_common_freqs, histogram_bounds,
correlationFROM pg_stats WHERE tablename = 'stat_pages' AND attname IN ('m_id', 'visit'); 
 attname | null_frac | avg_width | n_distinct |
                                            most_common_vals
                                                          |
most_common_freqs                                                    |
                                                                                histogram_bounds
                                                                                                          | correlation


---------+-----------+-----------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------
 visit   |         0 |         8 |  -0.465972 | {"2002-08-21 10:31:18+02","2002-08-28 15:28:04+02","2002-09-02
08:50:08+02","2002-09-0213:48:49+02","2002-09-04 13:00:03+02","2002-09-06 18:55:19+02","2002-09-12
15:24:14+02","2002-09-1313:12:39+02","2002-09-18 12:55:07+02","2002-09-18 15:01:52+02"} |
{0.000666667,0.000666667,0.000666667,0.000666667,0.000666667,0.000666667,0.000666667,0.000666667,0.000666667,0.000666667}
|{"2002-07-25 07:04:05+02","2002-08-15 07:52:47+02","2002-08-22 11:00:35+02","2002-08-29 11:59:47+02","2002-09-05
13:56:08+02","2002-09-1108:08:52+02","2002-09-16 10:48:37+02","2002-09-20 11:50:46+02","2002-09-23
22:25:32+02","2002-09-2713:01:03+02","2002-10-29 23:31:18+01"} |   -0.964541 
 m_id    |         0 |         2 |          1 | {35}

                                                          | {1}
                                                             |

                                                                                                  |           1 
(2 rows)


> But I don't expect any value of visit to occur much more than twice,
> so these values should be irrelevant to our estimations.
The probability isn't high, but it could happen to see the same value multiple
times. These are web server log data in a one second grid.



Regards,
-tb
--
Thomas Beutin                             tb@laokoon.IN-Berlin.DE
Beam me up, Scotty. There is no intelligent live down in Redmond.
pgsql-general by date:
From: Jan Poslusny
Date: 13 December 2002, 10:46:49
Subject: Re: Copy/foreign key contraints
From: Alvaro Herrera
Date: 13 December 2002, 11:00:22
Subject: Re: INDEX suggestion needed
Re: INDEX suggestion needed - Mailing list pgsql-general

Previous

Next