Thread: Why is a sort required for this query? (IS NULL predicate on leading key column)
Why is a sort required for this query? (IS NULL predicate on leading key column)
From
Jerry Brenner
Date:
We are on 13.9.
SELECT /* ISNULL:pc_job.CloseDate:, KeyTable:pc_job; */ gRoot.ID col0, gRoot.Subtype col1, gRoot.CreateTime col2
I'm wondering why a sort is required for this query, as the index should be providing the required ordering to satisfy the ORDER BY clause. Does it have to do with the IS NULL predicate on the leading key column in the index?
There's an index, job_u_closedate_g9cdc6ghupib, on pc_job(CloseDate, Retired, Subtype, CreateTime, ID). All columns have ASC sort order and NULLs LAST.
- pc_job is the probe table in a hash join
- There are IS NULL and equality predicates on the 3 leading columns in the index and the last 2 key columns (CreateTime, ID) are the ordering columns in the query
- So, the Index Scan of job_u_closedate_g9cdc6ghupib is returning the rows in the sorted order
- NOTE: The sort is cheap, but I'm investigating this because "CloseDate IS NULL" is very selective and without forcing the index the optimizer is choosing a different sort avert index that does not include CloseDate and hence a lot of time is spent filtering out rows on that predicate against the heap.
Here's the query
FROM pc_job gRoot INNER JOIN pc_policy policy_0
ON policy_0.ID = gRoot.PolicyID
WHERE gRoot.Subtype = 7 AND gRoot.CloseDate IS NULL
ON policy_0.ID = gRoot.PolicyID
WHERE gRoot.Subtype = 7 AND gRoot.CloseDate IS NULL
AND gRoot.Retired = 0
AND policy_0.ProducerCodeOfServiceID IN
AND policy_0.ProducerCodeOfServiceID IN
(248,1092,1848,74101,103158,103159,117402,122618,129215,132420,135261,137719)
AND policy_0.Retired = 0
AND policy_0.Retired = 0
ORDER BY col2 ASC, col0 ASC LIMIT 10
Here's the query plan:
Limit (cost=107826.77..107826.79 rows=10 width=20) (actual time=13149.872..13149.877 rows=10 loops=1)
Buffers: shared hit=2756 read=40121
I/O Timings: read=105917.908
-> Sort (cost=107826.77..107827.72 rows=381 width=20) (actual time=13149.871..13149.874 rows=10 loops=1)
Sort Key: groot.createtime, groot.id
Sort Method: top-N heapsort Memory: 25kB
Buffers: shared hit=2756 read=40121
I/O Timings: read=105917.908
-> Hash Join (cost=15632.51..107818.53 rows=381 width=20) (actual time=578.511..13149.658 rows=144 loops=1)
Buffers: shared hit=2750 read=40121
I/O Timings: read=105917.908
-> Index Scan using job_u_closedate_g9cdc6ghupib on pc_job groot (cost=0.56..91783.14 rows=153696 width=28) (actual time=3.864..12562.568 rows=75558 loops=1)
Index Cond: ((groot.closedate IS NULL) AND (groot.retired = 0) AND (groot.subtype = 7))
Buffers: shared hit=2721 read=27934
I/O Timings: read=58781.220
-> Hash (cost=15427.92..15427.92 rows=16322 width=8) (actual time=543.298..543.299 rows=13016 loops=1)
Buffers: shared hit=29 read=12187
I/O Timings: read=47136.688
-> Index Scan using policy_n_producerco_3e8i0ojsyckhx on pc_policy policy_0 (cost=0.43..15427.92 rows=16322 width=8) (actual time=6.149..540.501 rows=13016 loops=1)
Index Cond: ((policy_0.producercodeofserviceid = ANY ('{248,1092,1848,74101,103158,103159,117402,122618,129215,132420,135261,137719}'::bigint[])) AND (policy_0.retired = 0))
Buffers: shared hit=29 read=12187
I/O Timings: read=47136.688
Planning time: 0.538 ms
Execution time: 13150.301 ms
Thanks,
Jerry
Re: Why is a sort required for this query? (IS NULL predicate on leading key column)
From
Jerry Brenner
Date:
Apologies for not including this in the original email. The other index, job_u_createtime_2cy0wgyqpani8, is on pc_job(CreateTime, Retired, Subtype, ID). The optimizer chooses Nested Loop when choosing that index, vs Hash Join when choosing the index in the first plan that I posted. It seems like the choice of the Hash Join in the 1st plan that I posted is collateral damage from the seemingly unnecessary need to do the sort.
Here's the plan without forcing the index:
Limit (cost=1.00..52692.73 rows=10 width=20) (actual time=55219.289..87704.704 rows=10 loops=1)
Buffers: shared hit=9579294 read=328583
I/O Timings: read=1157740.299
-> Nested Loop (cost=1.00..2007555.82 rows=381 width=20) (actual time=55219.288..87704.695 rows=10 loops=1)
Buffers: shared hit=9579294 read=328583
I/O Timings: read=1157740.299
-> Index Scan using job_u_createtime_2cy0wgyqpani8 on pc_job groot (cost=0.56..1800117.94 rows=153696 width=28) (actual time=102.075..79470.670 rows=5650 loops=1)
Index Cond: ((groot.retired = 0) AND (groot.subtype = 7))
Filter: (groot.closedate IS NULL)
Rows Removed by Filter: 14994857
Buffers: shared hit=9563981 read=321566
I/O Timings: read=1149579.949
-> Index Scan using pc_policy_pk on pc_policy policy_0 (cost=0.43..1.35 rows=1 width=8) (actual time=1.456..1.456 rows=0 loops=5650)
Index Cond: (policy_0.id = groot.policyid)
Filter: ((policy_0.retired = 0) AND (policy_0.producercodeofserviceid = ANY ('{248,1092,1848,74101,103158,103159,117402,122618,129215,132420,135261,137719}'::bigint[])))
Rows Removed by Filter: 1
Buffers: shared hit=15313 read=7017
I/O Timings: read=8160.350
Planning time: 2.209 ms
Execution time: 87705.116 ms
Buffers: shared hit=9579294 read=328583
I/O Timings: read=1157740.299
-> Nested Loop (cost=1.00..2007555.82 rows=381 width=20) (actual time=55219.288..87704.695 rows=10 loops=1)
Buffers: shared hit=9579294 read=328583
I/O Timings: read=1157740.299
-> Index Scan using job_u_createtime_2cy0wgyqpani8 on pc_job groot (cost=0.56..1800117.94 rows=153696 width=28) (actual time=102.075..79470.670 rows=5650 loops=1)
Index Cond: ((groot.retired = 0) AND (groot.subtype = 7))
Filter: (groot.closedate IS NULL)
Rows Removed by Filter: 14994857
Buffers: shared hit=9563981 read=321566
I/O Timings: read=1149579.949
-> Index Scan using pc_policy_pk on pc_policy policy_0 (cost=0.43..1.35 rows=1 width=8) (actual time=1.456..1.456 rows=0 loops=5650)
Index Cond: (policy_0.id = groot.policyid)
Filter: ((policy_0.retired = 0) AND (policy_0.producercodeofserviceid = ANY ('{248,1092,1848,74101,103158,103159,117402,122618,129215,132420,135261,137719}'::bigint[])))
Rows Removed by Filter: 1
Buffers: shared hit=15313 read=7017
I/O Timings: read=8160.350
Planning time: 2.209 ms
Execution time: 87705.116 ms
Thanks,
Jerry
On Wed, Jan 17, 2024 at 6:39 AM Jerry Brenner <jbrenner@guidewire.com> wrote:
We are on 13.9.I'm wondering why a sort is required for this query, as the index should be providing the required ordering to satisfy the ORDER BY clause. Does it have to do with the IS NULL predicate on the leading key column in the index?There's an index, job_u_closedate_g9cdc6ghupib, on pc_job(CloseDate, Retired, Subtype, CreateTime, ID). All columns have ASC sort order and NULLs LAST.
- pc_job is the probe table in a hash join
- There are IS NULL and equality predicates on the 3 leading columns in the index and the last 2 key columns (CreateTime, ID) are the ordering columns in the query
- So, the Index Scan of job_u_closedate_g9cdc6ghupib is returning the rows in the sorted order
- NOTE: The sort is cheap, but I'm investigating this because "CloseDate IS NULL" is very selective and without forcing the index the optimizer is choosing a different sort avert index that does not include CloseDate and hence a lot of time is spent filtering out rows on that predicate against the heap.
SELECT /* ISNULL:pc_job.CloseDate:, KeyTable:pc_job; */ gRoot.ID col0, gRoot.Subtype col1, gRoot.CreateTime col2Here's the queryFROM pc_job gRoot INNER JOIN pc_policy policy_0
ON policy_0.ID = gRoot.PolicyID
WHERE gRoot.Subtype = 7 AND gRoot.CloseDate IS NULLAND gRoot.Retired = 0
AND policy_0.ProducerCodeOfServiceID IN(248,1092,1848,74101,103158,103159,117402,122618,129215,132420,135261,137719)
AND policy_0.Retired = 0ORDER BY col2 ASC, col0 ASC LIMIT 10Here's the query plan:Limit (cost=107826.77..107826.79 rows=10 width=20) (actual time=13149.872..13149.877 rows=10 loops=1)
Buffers: shared hit=2756 read=40121
I/O Timings: read=105917.908
-> Sort (cost=107826.77..107827.72 rows=381 width=20) (actual time=13149.871..13149.874 rows=10 loops=1)
Sort Key: groot.createtime, groot.id
Sort Method: top-N heapsort Memory: 25kB
Buffers: shared hit=2756 read=40121
I/O Timings: read=105917.908
-> Hash Join (cost=15632.51..107818.53 rows=381 width=20) (actual time=578.511..13149.658 rows=144 loops=1)
Buffers: shared hit=2750 read=40121
I/O Timings: read=105917.908
-> Index Scan using job_u_closedate_g9cdc6ghupib on pc_job groot (cost=0.56..91783.14 rows=153696 width=28) (actual time=3.864..12562.568 rows=75558 loops=1)
Index Cond: ((groot.closedate IS NULL) AND (groot.retired = 0) AND (groot.subtype = 7))
Buffers: shared hit=2721 read=27934
I/O Timings: read=58781.220
-> Hash (cost=15427.92..15427.92 rows=16322 width=8) (actual time=543.298..543.299 rows=13016 loops=1)
Buffers: shared hit=29 read=12187
I/O Timings: read=47136.688
-> Index Scan using policy_n_producerco_3e8i0ojsyckhx on pc_policy policy_0 (cost=0.43..15427.92 rows=16322 width=8) (actual time=6.149..540.501 rows=13016 loops=1)
Index Cond: ((policy_0.producercodeofserviceid = ANY ('{248,1092,1848,74101,103158,103159,117402,122618,129215,132420,135261,137719}'::bigint[])) AND (policy_0.retired = 0))
Buffers: shared hit=29 read=12187
I/O Timings: read=47136.688
Planning time: 0.538 ms
Execution time: 13150.301 msThanks,Jerry
Re: Why is a sort required for this query? (IS NULL predicate on leading key column)
From
Tom Lane
Date:
Jerry Brenner <jbrenner@guidewire.com> writes: > I'm wondering why a sort is required for this query, as the index should be > providing the required ordering to satisfy the ORDER BY clause. Does it > have to do with the IS NULL predicate on the leading key column in the > index? IS NULL is not seen as an equality condition, no. It's pretty much of a hack that makes it an indexable condition at all, and we don't really do any advanced optimization with it. regards, tom lane