Re: DISTINCT with btree skip scan - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: DISTINCT with btree skip scan |
Date | |
Msg-id | CADLWmXWALK8NPZqdnRQiPnrzAnic7NxYKynrkzO_vxYr8enWww@mail.gmail.com Whole thread Raw |
In response to | Re: DISTINCT with btree skip scan (David Rowley <dgrowleyml@gmail.com>) |
Responses |
Re: DISTINCT with btree skip scan
|
List | pgsql-hackers |
On 27 October 2014 20:24, David Rowley <dgrowleyml@gmail.com> wrote: > I've had a quick look at this and it seems like a great win! I'm quite > surprised that we've not got this already. I think this technology could > also really help performance of queries such as SELECT * from bigtable bt > WHERE EXISTS(SELECT 1 FROM otherbigtable obt WHERE bt.somecol = > obt.someIndexedButNonUniqueColumn); I know you're not proposing to improve > that first off, but it could be done later once the benefits of this are > more commonly realised. > > I think our shortfalls in this area have not gone unnoticed. I was reading > this post > https://www.periscope.io/blog/count-distinct-in-mysql-postgres-sql-server-and-oracle.html > about comparing performance of COUNT(DISTINCT col). I think this would give > a big win for query 3 in that post. I'm trying to find some time make some > changes to transform queries to allow the group by to happen before the > joins when possible, so between that and your patch we'd be 2 steps closer > to making query 1 in the link above perform a little better on PostgreSQL. > > Do you think you'll manage to get time to look at this a bit more? I'd be > keen to look into the costing side of it if you think that'll help you any? Hi David, Sorry for the silence, I have been busy moving countries. I am definitely interested in collaborating on a series of patches to implement various kinds of skip-based plans as seen in other RDBMSs if others think it could be useful. I see skip-based DISTINCT as a good place to start. (I suspect the semi-related skip scan plan (for the case when your WHERE clause doesn't have a qual for the leading column(s)) is much harder to plan and execute and I also suspect it's less generally useful). Here is a rebased version of that patch which fixes a crashing off-by-one error, and handles backward scans properly, I think. This is still just a quick hack to play with the general idea at this stage. It works by adding a new index operation 'skip' which the executor code can use during a scan to advance to the next value (for some prefix of the index's columns). That may be a terrible idea and totally unnecessary... but let me explain my reasoning: 1. Perhaps some index implementations can do something better than a search for the next key value from the root. Is it possible or desirable to use the current position as a starting point for a btree traversal? I don't know. 2. It seemed that I'd need to create a new search ScanKey to use the 'rescan' interface for skipping to the next value, but I already had an insertion ScanKey so I wanted a way to just reuse that. But maybe there is some other way to reuse existing index interfaces, or maybe there is an easy way to make a new search ScanKey from the existing insertion ScanKey? Currently it uses the existing IndexOnlyScan plan node, adding an extra member. I briefly tried making a different plan called DistinctIndexScan but it seemed I was going to need to duplicate or refactor/share a lot of IndexOnlyScan code (this might be the right thing to do but at this stage I'm just trying to demo the general idea with minimal changes). As for costing, I haven't looked at that part of PG at all yet. If you *can* do a distinct index scan, would you *always* want to? How well could we estimate the cost using existing statistics? I suppose the expected number of page fetches is proportional to the expected number of distinct values (ie skip operations) times btree depth, and the cost may be adjusted in some way for cache effects (?), but how well can we estimate the number of distinct values (a), (a, b) or (a, b, c) in some range for an index on (a, b, c, d)? As before, you still need the kludge of disabling hashagg to reach the new plan: postgres=# set enable_hashagg = true; SET Time: 0.213 ms postgres=# select distinct a from foo; ┌───┐ │ a │ ├───┤ │ 6 │ │ 8 │ │ 1 │ │ 2 │ │ 3 │ │ 4 │ │ 5 │ │ 9 │ │ 7 │ │ 0 │ └───┘ (10 rows) Time: 890.166 ms postgres=# set enable_hashagg = false; SET Time: 0.271 ms postgres=# select distinct a from foo; ┌───┐ │ a │ ├───┤ │ 0 │ │ 1 │ │ 2 │ │ 3 │ │ 4 │ │ 5 │ │ 6 │ │ 7 │ │ 8 │ │ 9 │ └───┘ (10 rows) Time: 0.573 ms Best regards, Thomas Munro
Attachment
pgsql-hackers by date: