Re: Subselect query enhancement - Mailing list pgsql-performance

From Richard Huxton
Subject Re: Subselect query enhancement
Date
Msg-id 45C23950.4070803@archonet.com
Whole thread Raw
In response to Re: Subselect query enhancement  ("Michael Artz" <mlartz@gmail.com>)
List pgsql-performance
Michael Artz wrote:
>>  > I have primary table that holds ip information
>> > and two other tables that hold event data for the specific IP in with
>> > a one-to-many mapping between them, ie:
>> [snip]
>> > There is quite a bit of commonality between the network_events and
>> > host_events schemas, but they do not currently share an ancestor.
>> > ip_info has about 13 million rows, the network_events table has about
>> > 30 million rows, and the host_events table has about 7 million rows.
>> > There are indexes on all the rows.
>>
>> What indexes though. Do you have (name,ip) on the two event tables?
>
> All the columns are indexed individually.  The tables are completely
> static, as I reload the whole DB with new data every day.

The point of a (name,ip) index would be to let you read off ip numbers
in order easily.

>> How selective is "name" - are there many different values or just a few?
>> If lots, it might be worth increasing the statistics gathered on that
>> column (ALTER COLUMN ... SET STATISTICS).
>> http://www.postgresql.org/docs/8.2/static/sql-altertable.html
>
> I guess that is the heart of my question.  "name" is not very
> selective (there are only 20 or so choices) however other columns are
> fairly selective for certain cases, such as 'port'.  When querying on
> and unusual port, the query is very fast, and the single UNIONed
> subselect returns quickly.  When 'port' is not very selective (like
> port = '80', which is roughly 1/2 of the rows in the DB), the dual
> subselect query wins, hands-down.
>
> And I have altered the statistics via the config file:
>  default_statistics_target = 100
> Perhaps this should be even higher for certain columns?

You're probably better off leaving it at 10 and upping it for the vital
columns. 25 for names should be a good choice.

You could try partial indexes for those cases where you have
particularly common values of name/port:

CREATE INDEX idx1 ON host_events (ip) WHERE port=80;

--
   Richard Huxton
   Archonet Ltd

pgsql-performance by date:

Previous
From: "Michael Artz"
Date:
Subject: Re: Subselect query enhancement
Next
From: "Michael Artz"
Date:
Subject: Re: Subselect query enhancement