Re: How to force Nested Loop plan? - Mailing list pgsql-performance
From | Rob Nagler |
---|---|
Subject | Re: How to force Nested Loop plan? |
Date | |
Msg-id | 16209.3105.94000.204936@gargle.gargle.HOWL Whole thread Raw |
In response to | Re: How to force Nested Loop plan? (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: How to force Nested Loop plan?
Re: How to force Nested Loop plan? |
List | pgsql-performance |
Tom Lane writes: > That doesn't really tell me anything. What's the proportion of 21 > records out of the total table? Currently we have about 15 servers so 6% of the data is uniformly distributed with the value 21. > create index fooi on foo (min_date_time) where server_id = 21; > > This reduces the cost of maintaining the index but also makes it useful > *only* for id = 21 queries. On the plus side, you don't need to hack > the ORDER BY clause to get your queries to use it. Your choice... I like that better, thanks. After testing I found the elbow was at 1610 records with this index, but this clause still yields better performance at 1654 records: AND aa_t.server_id IN (21, 0) This is independent of the existence of the new index. Interestingly, when I drop the aa_t5 index, the elbow goes up to 1729 with the IN (21, 0) query. You might ask: why do I have an index at all? That's from my Oracle experience. server_id is a foreign key into the server table. If you don't create an index on a foreign key, Oracle locks the entire foreign table when you modify the local table. With an index, it only locks a row in the foreign table. This causes a major bottleneck, but in this case the server table is static. Therefore, the index is superfluous, and since there are only 16 values, the index should be bitmap index (Oracle speak, sorry, don't know the PG term). Dropping the index probably won't change any of the other queries, I think. Without the aa_t5 index and after the elbow, the Index Scan is replaced with a Seq Scan, which is just about as fast, but still 50 times slower than before the elbow: Limit (cost=34071.30..34071.31 rows=1 width=84) (actual time=5111.14..5111.15 rows=1 loops=1) -> Sort (cost=34066.98..34075.61 rows=3454 width=84) (actual time=5108.74..5109.96 rows=1733 loops=1) Sort Key: aa_t.min_date_time -> Merge Join (cost=33801.26..33863.98 rows=3454 width=84) (actual time=4868.62..5020.58 rows=41879 loops=1) Merge Cond: ("outer".realm_id = "inner".realm_id) -> Sort (cost=31.64..32.78 rows=455 width=19) (actual time=3.06..3.38 rows=415 loops=1) Sort Key: cc_t.realm_id -> Seq Scan on cc_t (cost=0.00..11.55 rows=455 width=19) (actual time=0.05..0.99 rows=455 loops=1) -> Sort (cost=33769.63..33778.26 rows=3454 width=65) (actual time=4865.20..4895.28 rows=41879 loops=1) Sort Key: aa_t.realm_id -> Merge Join (cost=33296.79..33566.63 rows=3454 width=65) (actual time=4232.52..4541.24 rows=41879loops=1) Merge Cond: ("outer".bb_id = "inner".bb_id) -> Sort (cost=25216.97..25225.60 rows=3454 width=46) (actual time=3213.53..3243.65 rows=41879loops=1) Sort Key: aa_t.bb_id -> Seq Scan on aa_t (cost=0.00..25013.97 rows=3454 width=46) (actual time=20.07..2986.11rows=41879 loops=1) Filter: (server_id = 21::numeric) -> Sort (cost=8079.83..8184.53 rows=41879 width=19) (actual time=1018.95..1049.37 rows=41879loops=1) Sort Key: bb_t.bb_id -> Seq Scan on bb_t (cost=0.00..4864.79 rows=41879 width=19) (actual time=0.04..810.88rows=41879 loops=1) Total runtime: 5141.22 msec What I'm not sure is why does it decide to switch modes so "early", i.e., at about 5% of the table size or less? It seems that an Index Scan would give better mileage than a Seq Scan for possibly up to 50% of the table in this case. I clearly don't understand the internals, but the elbow seems rather sharp to me. > > What if the ORDER BY was: > > ORDER BY aa_t.server_id DESC, cc_t.name ASC > > Would the planner do the right thing? > > What do you consider the right thing? > cc_t.name doesn't seem connected > to this table at all --- or did I miss something? Sorry, this is a red herring. Please ignore. > If you've been generically using NUMERIC(n) where you could be using > integer or bigint, then I think you've probably paid a high price > without knowing it. I don't know what Oracle's cost tradeoffs are for > these datatypes, but I can tell you that Postgres's integer types are > way faster (and more compact) than our NUMERIC. I'll try to figure out what the price is in our case. I think Oracle does a pretty good job on data compression for NUMERIC. I haven't dealt with a large Postgres database until this one, so I guess it's time to learn. :) We actually have been quite pleased with Postgres's performance without paying much attention to it before this. When we first set up Oracle, we got into all of its parameters pretty heavily. With Postgres, we just tried it and it worked. This is the first query where we ran out of ideas to try. BTW, everybody's help on this list is fantastic. Usually, I can find the answer to my question (and have been doing so for 3 years) on this list without asking. Thanks, Rob
pgsql-performance by date: