Re: PoC: Partial sort - Mailing list pgsql-hackers
From | Marti Raudsepp |
---|---|
Subject | Re: PoC: Partial sort |
Date | |
Msg-id | CABRT9RCLLUyJ=bkeB132aVA_mVNx5==LvVvQMvUqDguFZtW+cg@mail.gmail.com Whole thread Raw |
In response to | Re: PoC: Partial sort (Alexander Korotkov <aekorotkov@gmail.com>) |
Responses |
Re: PoC: Partial sort
|
List | pgsql-hackers |
On Tue, Jan 14, 2014 at 5:49 PM, Alexander Korotkov <aekorotkov@gmail.com> wrote:
>> I implemented a new
>> enable_partialsort GUC to make it easier to turn on/off
> I though about such option. Generally not because of testing convenience,
> but because of overhead of planning. This way you implement it is quite
> naive :) For instance, merge join rely on partial sort which will be
> replaced with simple sort.
Oh, this actually highlights a performance regression with the partial sort patch. I assumed the planner will discard the full sort because of higher costs, but it looks like the new code always assumes that a Partial sort will be cheaper than a Join Filter without considering costs. When doing a join USING (unique_indexed_value, something), the new plan is significantly worse.
Unpatched:
marti=# explain analyze select * from release a join release b using (id, name);
Merge Join (cost=0.85..179810.75 rows=12 width=158) (actual time=0.011..1279.596 rows=1232610 loops=1)
Merge Cond: (a.id = b.id)
Join Filter: ((a.name)::text = (b.name)::text)
-> Index Scan using release_id_idx on release a (cost=0.43..79120.04 rows=1232610 width=92) (actual time=0.005..211.928 rows=1232610 loops=1)
-> Index Scan using release_id_idx on release b (cost=0.43..79120.04 rows=1232610 width=92) (actual time=0.004..371.592 rows=1232610 loops=1)
Total runtime: 1309.049 ms
Patched:>> I implemented a new
>> enable_partialsort GUC to make it easier to turn on/off
> I though about such option. Generally not because of testing convenience,
> but because of overhead of planning. This way you implement it is quite
> naive :) For instance, merge join rely on partial sort which will be
> replaced with simple sort.
Oh, this actually highlights a performance regression with the partial sort patch. I assumed the planner will discard the full sort because of higher costs, but it looks like the new code always assumes that a Partial sort will be cheaper than a Join Filter without considering costs. When doing a join USING (unique_indexed_value, something), the new plan is significantly worse.
Unpatched:
marti=# explain analyze select * from release a join release b using (id, name);
Merge Join (cost=0.85..179810.75 rows=12 width=158) (actual time=0.011..1279.596 rows=1232610 loops=1)
Merge Cond: (a.id = b.id)
Join Filter: ((a.name)::text = (b.name)::text)
-> Index Scan using release_id_idx on release a (cost=0.43..79120.04 rows=1232610 width=92) (actual time=0.005..211.928 rows=1232610 loops=1)
-> Index Scan using release_id_idx on release b (cost=0.43..79120.04 rows=1232610 width=92) (actual time=0.004..371.592 rows=1232610 loops=1)
Total runtime: 1309.049 ms
Merge Join (cost=0.98..179810.87 rows=12 width=158) (actual time=0.037..5034.158 rows=1232610 loops=1)
Merge Cond: ((a.id = b.id) AND ((a.name)::text = (b.name)::text))
-> Partial sort (cost=0.49..82201.56 rows=1232610 width=92) (actual time=0.013..955.938 rows=1232610 loops=1)
Sort Key: a.id, a.name
Presorted Key: a.id
Sort Method: quicksort Memory: 25kB
-> Index Scan using release_id_idx on release a (cost=0.43..79120.04 rows=1232610 width=92) (actual time=0.007..449.332 rows=1232610 loops=1)
-> Materialize (cost=0.49..85283.09 rows=1232610 width=92) (actual time=0.019..1352.377 rows=1232610 loops=1)
-> Partial sort (cost=0.49..82201.56 rows=1232610 width=92) (actual time=0.018..1223.251 rows=1232610 loops=1)
Sort Key: b.id, b.name
Presorted Key: b.id
Sort Method: quicksort Memory: 25kB
-> Index Scan using release_id_idx on release b (cost=0.43..79120.04 rows=1232610 width=92) (actual time=0.004..597.258 rows=1232610 loops=1)
Total runtime: 5166.906 ms
----
Also, I find the name "partial sort" a bit confusing; this feature is not actually sorting *partially*, it's finishing the sort of partially-sorted data. Perhaps "batched sort" would explain the feature better? Because it does the sort in multiple batches instead of all at once. But maybe that's just me.
Regards,
Marti
Marti
pgsql-hackers by date: