Home > mailing lists

Re: INDEX suggestion needed - Mailing list pgsql-general

From	Manfred Koizar
Subject	Re: INDEX suggestion needed
Date	December 12, 2002 16:00:54
Msg-id	pgrhvus3d1nr8pp8311lopqd5uk4ffuir5@4ax.com Whole thread Raw
In response to	Re: INDEX suggestion needed (Thomas Beutin <tyrone@laokoon.IN-Berlin.DE>)
Responses	Re: INDEX suggestion needed
List	pgsql-general

Tree view

On Thu, 12 Dec 2002 20:13:24 +0100, Thomas Beutin
<tyrone@laokoon.IN-Berlin.DE> wrote:
>tb=# VACUUM VERBOSE ANALYZE stat_pages;
>NOTICE:  Pages 7809: Changed 0, Empty 0; Tup 343554: Vac 0, Keep 0, UnUsed 0.

>Aggregate  (cost=3.19..3.19 rows=1 width=34) (actual time=52.89..52.89 rows=1 loops=1)
>  ->  Index Scan using tb5 on stat_pages  (cost=0.00..3.18 rows=1 width=34) (actual time=52.74..52.74 rows=0 loops=1)
>Total runtime: 53.11 msec

>tb=# EXPLAIN ANALYZE SELECT COUNT(DISTINCT a_id) FROM stat_pages WHERE m_id = 35::smallint AND (visit >= '2002-06-01'
ANDvisit <= '2002-12-11'); 

This selects (almost) all rows.  An index cannot help.

>i got the following index/sequence scans by date ranges:
>(visit >= '2002-06-01' AND visit <= '2002-06-30')    index scan

0 rows satisfy this condition, index scan is ok

>(visit >= '2002-06-01' AND visit <= '2002-07-31')    index scan

Ca. 10000 rows, 3% of the whole table, index scan ok

>(visit >= '2002-06-01' AND visit <= '2002-08-31')    sequence scan

32%, sequence scan is expected to be faster, unless tuples are almost
perfectly ordered by visit or most of the table (~ 8000 pages) fits
into the cache.  How much physical memory is installed?  What are your
shared_buffers and effective_cache_size settings?  You might want to
experiment with
    SET enable_seqscan = off;

>(visit >= '2002-07-01' AND visit <= '2002-07-31')    index scan

Same as 06-01 to 07-31.

>(visit >= '2002-08-01' AND visit <= '2002-08-31')    sequence scan

Ca. 29%

>(visit >= '2002-09-01' AND visit <= '2002-09-30')    sequence scan

More than 50%

>(visit >= '2002-10-01' AND visit <= '2002-10-31')    index scan, but long (>5sec)

Ca. 3%; interesting that it takes so long; I'd like to see EXPLAIN
ANALYZE output for enable_seqscan on and off.

>And: The date range in my table is from 2002-07-10 11:36:53+02 up to
>2002-10-29 23:31:47+01.

Yes, this is approximately reflected by the histogram bounds.

> attname | null_frac | avg_wi | n_distinct | correlation
>---------+-----------+--------+------------+-------------
> visit   |         0 |      8 |  -0.543682 |   -0.972118

The negative correlation looks strange.  How did you insert your data?

> m_id    |         0 |      2 |          1 |           1
                                         ^^^
Only one distinct value in m_id?  This explains why your m_id index is
never used.

> attname | most_common_vals
>---------+--------------------------
> visit   | "2002-08-21 10:29:10+02", ...
> m_id    | 35

> attname | histogram_bounds
>---------+--------------------------
> visit   | "2002-07-25 16:37:12+02"
>           "2002-08-15 12:36:18+02"
>           "2002-08-23 12:36:15+02"
>           "2002-08-29 17:30:54+02"
>           "2002-09-05 12:54:31+02"
>           "2002-09-10 18:03:54+02"
>           "2002-09-16 15:44:56+02"
>           "2002-09-20 14:34:40+02"
>           "2002-09-24 13:59:29+02"
>           "2002-09-29 09:09:31+02"
>           "2002-10-29 23:25:13+01"
> m_id    |
>

>??? Is this output ok?

Almost.  I forgot to ask for most_common_freqs (cut'n'paste error).
But I don't expect any value of visit to occur much more than twice,
so these values should be irrelevant to our estimations.

Servus
 Manfred

pgsql-general by date:

From: Lincoln Yeoh
Date: 12 December 2002, 15:46:11
Subject: Re: Docs: GIST

From: Manfred Koizar
Date: 12 December 2002, 16:25:06
Subject: Re: PostgreSQL idocs

Re: INDEX suggestion needed - Mailing list pgsql-general

Previous

Next