Re: Query is slow when order by and limit clause are used in the query - Mailing list pgsql-bugs
From | Tomas Vondra |
---|---|
Subject | Re: Query is slow when order by and limit clause are used in the query |
Date | |
Msg-id | a8615d56-7e9d-8362-e61e-114fa92f1c02@enterprisedb.com Whole thread Raw |
In response to | Re: Query is slow when order by and limit clause are used in the query (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>) |
Responses |
Re: Query is slow when order by and limit clause are used in the query
|
List | pgsql-bugs |
On 5/24/21 2:28 PM, Bharath Rupireddy wrote: > On Mon, May 24, 2021 at 5:01 PM sreekanth vajrapu > <sreekanthvajrapu@gmail.com> wrote: >> >> Hi Bharath, >> >> Thanks for the quick reply, I have attached the execution plan for below 3 scenarios. Our application is using 1st Scenario >> >> 1) WITH ORDER BY AND LIMIT 30 -- Very slow(3 to 5 seconds) >> 2) WITH ORDER BY WITHOUT LIMIT -- Very fast(160 MS) >> 3) WITH ORDER BY WITHOUT LIMIT 100 -- Very fast(160 MS) >> >> Kidney let me know if you need any more details on this. > > I see that there are a huge number of Heap Fetches: 599354 with LIMIT > 30 clause vs Heap Fetches: 11897 without LIMIT clause, maybe that > could be the reason for the slowness. I'm not sure why this is > happening with the LIMIT 30 clause only. Is it that this issue happens > every time? Say, if you run with LIMIT 30, then the query finishes in > 3-5sec. Immediately if you run without a LIMIT clause then the query > completes in 160ms. Is vacuum running successfully on the tables and > indexes for which there's a huge number of heap fetches? > But the heap fetches are in the index-only part of the plan, and it matches the number of loops for that node. There are 521996 loops and 599354 heap fetches, so roughly 1:1, i.e. roughly the same ratio as for the faster plans. It's hard to give advice when we only have the plans, not the original queries and index definitions, and when the plans are "anonymized" in a rather inconsistent way (hard to say how the tables/indexes match between the queries). If I had to guess, I'd say this is a case of the usual LIMIT problem, where the optimizer assumes the matching rows are uniformly distributed in the input relation, when in reality it's "concentrated" at the end. For example, imagine you have a table with two random columns: CREATE TABLE t (a int, b int); INSERT INTO t SELECT 100 * random(), 100 * random() FROM generate_series(1,1000000) g; CREATE INDEX ON t (a); Now imagine you do this: SELECT * FROM t WHERE a = 10 AND b = 20 LIMIT 1; In this case there's ~10000 rows with a=10, and we can fetch that from the index. And we know there's ~100 groups with different b values, distributed uniformly in the 10k rows. So if we start scanning the index, after ~100 rows we should get a value with b=20, in which case LIMIT 1 is done. Now imagine the values are not distributed uniformly like this, but instead we have this: INSERT INTO t SELECT i/10000, i/10000 FROM generate_series(1,1000000) s(i); In this case the DB will still believe it'll only scan ~100 rows, but there are no rows with a=10 and b=20, so it'll end up scanning all 10k rows with a=10. Or maybe all the "b=20" rows are at the very end of the index, or something like that. Perhaps the "delete" column makes it behave like this. This is only an issue for queries with LIMIT, because that pushes the planner to pick a plan with very low startup cost. But those plans often have very high total cost, and degrade poorly when the uniformity assumption is incorrect. Hard to say, though, confirming it would require looking at the data more closely. The one thing I'd suggest is changing the xxxx_index to also include the "deleted" column, but it's a stab in the dark. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-bugs by date: