Home > mailing lists

Re: How to force Nested Loop plan? - Mailing list pgsql-performance

From	Rob Nagler
Subject	Re: How to force Nested Loop plan?
Date	August 30, 2003 17:42:48
Msg-id	16209.3105.94000.204936@gargle.gargle.HOWL Whole thread Raw
In response to	Re: How to force Nested Loop plan? (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: How to force Nested Loop plan? Re: How to force Nested Loop plan?
List	pgsql-performance

Tree view

Tom Lane writes:
> That doesn't really tell me anything.  What's the proportion of 21
> records out of the total table?

Currently we have about 15 servers so 6% of the data is uniformly
distributed with the value 21.

>     create index fooi on foo (min_date_time) where server_id = 21;
>
> This reduces the cost of maintaining the index but also makes it useful
> *only* for id = 21 queries.  On the plus side, you don't need to hack
> the ORDER BY clause to get your queries to use it.  Your choice...

I like that better, thanks.  After testing I found the elbow was at
1610 records with this index, but this clause still yields better
performance at 1654 records:

    AND aa_t.server_id IN (21, 0)

This is independent of the existence of the new index.

Interestingly, when I drop the aa_t5 index, the elbow goes up to
1729 with the IN (21, 0) query.

You might ask: why do I have an index at all?  That's from my Oracle
experience.  server_id is a foreign key into the server table.  If you
don't create an index on a foreign key, Oracle locks the entire
foreign table when you modify the local table.  With an index, it only
locks a row in the foreign table.  This causes a major bottleneck, but
in this case the server table is static.  Therefore, the index is
superfluous, and since there are only 16 values, the index should be
bitmap index (Oracle speak, sorry, don't know the PG term).  Dropping
the index probably won't change any of the other queries, I think.

Without the aa_t5 index and after the elbow, the Index Scan is
replaced with a Seq Scan, which is just about as fast, but still 50
times slower than before the elbow:

 Limit  (cost=34071.30..34071.31 rows=1 width=84) (actual time=5111.14..5111.15 rows=1 loops=1)
   ->  Sort  (cost=34066.98..34075.61 rows=3454 width=84) (actual time=5108.74..5109.96 rows=1733 loops=1)
         Sort Key: aa_t.min_date_time
         ->  Merge Join  (cost=33801.26..33863.98 rows=3454 width=84) (actual time=4868.62..5020.58 rows=41879 loops=1)
               Merge Cond: ("outer".realm_id = "inner".realm_id)
               ->  Sort  (cost=31.64..32.78 rows=455 width=19) (actual time=3.06..3.38 rows=415 loops=1)
                     Sort Key: cc_t.realm_id
                     ->  Seq Scan on cc_t  (cost=0.00..11.55 rows=455 width=19) (actual time=0.05..0.99 rows=455
loops=1)
               ->  Sort  (cost=33769.63..33778.26 rows=3454 width=65) (actual time=4865.20..4895.28 rows=41879 loops=1)
                     Sort Key: aa_t.realm_id
                     ->  Merge Join  (cost=33296.79..33566.63 rows=3454 width=65) (actual time=4232.52..4541.24
rows=41879loops=1) 
                           Merge Cond: ("outer".bb_id = "inner".bb_id)
                           ->  Sort  (cost=25216.97..25225.60 rows=3454 width=46) (actual time=3213.53..3243.65
rows=41879loops=1) 
                                 Sort Key: aa_t.bb_id
                                 ->  Seq Scan on aa_t  (cost=0.00..25013.97 rows=3454 width=46) (actual
time=20.07..2986.11rows=41879 loops=1) 
                                       Filter: (server_id = 21::numeric)
                           ->  Sort  (cost=8079.83..8184.53 rows=41879 width=19) (actual time=1018.95..1049.37
rows=41879loops=1) 
                                 Sort Key: bb_t.bb_id
                                 ->  Seq Scan on bb_t  (cost=0.00..4864.79 rows=41879 width=19) (actual
time=0.04..810.88rows=41879 loops=1) 
 Total runtime: 5141.22 msec

What I'm not sure is why does it decide to switch modes so "early",
i.e., at about 5% of the table size or less?  It seems that an Index
Scan would give better mileage than a Seq Scan for possibly up to 50%
of the table in this case.  I clearly don't understand the internals,
but the elbow seems rather sharp to me.

> > What if the ORDER BY was:
> >     ORDER BY aa_t.server_id DESC, cc_t.name ASC
> > Would the planner do the right thing?
>
> What do you consider the right thing?
> cc_t.name doesn't seem connected
> to this table at all --- or did I miss something?

Sorry, this is a red herring.  Please ignore.

> If you've been generically using NUMERIC(n) where you could be using
> integer or bigint, then I think you've probably paid a high price
> without knowing it.  I don't know what Oracle's cost tradeoffs are for
> these datatypes, but I can tell you that Postgres's integer types are
> way faster (and more compact) than our NUMERIC.

I'll try to figure out what the price is in our case.  I think Oracle
does a pretty good job on data compression for NUMERIC.  I haven't
dealt with a large Postgres database until this one, so I guess it's
time to learn. :)

We actually have been quite pleased with Postgres's performance
without paying much attention to it before this.  When we first set up
Oracle, we got into all of its parameters pretty heavily.  With
Postgres, we just tried it and it worked.  This is the first query
where we ran out of ideas to try.

BTW, everybody's help on this list is fantastic.  Usually, I can find
the answer to my question (and have been doing so for 3 years) on this
list without asking.

Thanks,
Rob

pgsql-performance by date:

From: "Russell Garrett"
Date: 30 August 2003, 14:43:54
Subject: Re: Selecting random rows efficiently

From: Ron Johnson
Date: 30 August 2003, 18:28:53
Subject: Re: How to force Nested Loop plan?

Re: How to force Nested Loop plan? - Mailing list pgsql-performance

Previous

Next