Re: Inconsistent query performance based on relation hit frequency - Mailing list pgsql-performance
From | Andrei Lepikhov |
---|---|
Subject | Re: Inconsistent query performance based on relation hit frequency |
Date | |
Msg-id | a67e439f-15f9-4ecc-85f0-30679e3d35bb@gmail.com Whole thread Raw |
In response to | Inconsistent query performance based on relation hit frequency (Laura Hausmann <laura@hausmann.dev>) |
Responses |
Re: Inconsistent query performance based on relation hit frequency
|
List | pgsql-performance |
On 6/27/24 07:50, Laura Hausmann wrote: > I'd appreciate any and all input on the situation. If I've left out any > information that would be useful in figuring this out, please tell me. Thanks for this curious case, I like it! At first, you can try to avoid "OR" expressions - PostgreSQL has quite limited set of optimisation/prediction tricks on such expressions. Second - I see, postgres predicts wrong number of tuples. But using my typical tool [1] and getting more precise estimations i don't see significant profit: Limit (cost=10832.85..10838.69 rows=50 width=21) -> Gather Merge (cost=10832.85..10838.92 rows=52 width=21) Workers Planned: 2 Workers Launched: 2 -> Sort (cost=9832.83..9832.90 rows=26 width=21) Sort Key: objects.id DESC Sort Method: top-N heapsort Memory: 32kB Worker 0: Sort Method: quicksort Memory: 32kB Worker 1: Sort Method: quicksort Memory: 32kB -> Parallel Seq Scan on objects Filter: ((hashed SubPlan 1) OR ("userId" = 1)) Rows Removed by Filter: 183372 SubPlan 1 -> Nested Loop -> Index Only Scan using users_pkey on Index Cond: (id = 1) Heap Fetches: 0 -> Index Only Scan using "relationships_followerId_followeeId_idx" on relationships Index Cond: ("followerId" = 1) Heap Fetches: 0 Planning Time: 0.762 ms Execution Time: 43.816 ms Limit (cost=10818.83..10819.07 rows=2 width=21) -> Gather Merge (cost=10818.83..10819.07 rows=2 width=21) Workers Planned: 2 Workers Launched: 2 -> Sort (cost=9818.81..9818.81 rows=1 width=21) Sort Key: objects.id DESC Sort Method: quicksort Memory: 25kB Worker 0: Sort Method: quicksort Memory: 25kB Worker 1: Sort Method: quicksort Memory: 25kB -> Parallel Seq Scan on objects Filter: ((hashed SubPlan 1) OR ("userId" = 4)) Rows Removed by Filter: 183477 SubPlan 1 -> Nested Loop (cost=0.56..8.61 rows=1 width=4) -> Index Only Scan using "relationships_followerId_followeeId_idx" on relationships Index Cond: ("followerId" = 4) Heap Fetches: 0 -> Index Only Scan using users_pkey Index Cond: (id = 4) Heap Fetches: 0 Planning Time: 0.646 ms Execution Time: 30.824 ms But this was achieved just because of parallel workers utilisation. Disabling them we get: Limit (cost=14635.07..14635.08 rows=2 width=21) (actual time=75.941..75.943 rows=0 loops=1) -> Sort (cost=14635.07..14635.08 rows=2 width=21) (actual time=75.939..75.940 rows=0 loops=1) Sort Key: objects.id DESC Sort Method: quicksort Memory: 25kB -> Seq Scan on objects (cost=8.61..14635.06 rows=2 width=21) (actual time=75.931..75.932 rows=0 loops=1) Filter: ((hashed SubPlan 1) OR ("userId" = 4)) Rows Removed by Filter: 550430 SubPlan 1 -> Nested Loop (cost=0.56..8.61 rows=1 width=4) (actual time=0.039..0.040 rows=0 loops=1) -> Index Only Scan using "relationships_followerId_followeeId_idx" on relationships (cost=0.28..4.29 rows=1 width=8) (actual time=0.038..0.038 rows=0 loops=1) Index Cond: ("followerId" = 4) Heap Fetches: 0 -> Index Only Scan using users_pkey on users (cost=0.29..4.31 rows=1 width=4) (never executed) Index Cond: (id = 4) Heap Fetches: 0 Planning Time: 0.945 ms Execution Time: 76.123 ms So, from the optimiser's point of view, it has done the best it could. Theoretically, if you have a big table with indexes and must select a small number of tuples, the ideal query plan will include parameterised NestLoop JOINs. Unfortunately, parameterisation in PostgreSQL can't pass inside a subquery. It could be a reason for new development because MSSQL can do such a trick, but it is a long way. You can try to rewrite your schema and query to avoid subqueries in expressions at all. I hope this message gave you some insights. [1] https://github.com/postgrespro/aqo -- regards, Andrei Lepikhov
pgsql-performance by date: