Re: Speeding up parts of the planner using a binary search treestructure for nodes - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: Speeding up parts of the planner using a binary search treestructure for nodes |
Date | |
Msg-id | CAApHDvqDvcpkmfWQLn42ACRjG-Kxcn1tvqeNyQ3ej3Xf_5Bp=w@mail.gmail.com Whole thread Raw |
In response to | Re: Speeding up parts of the planner using a binary search treestructure for nodes (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>) |
Responses |
Re: Speeding up parts of the planner using a binary search treestructure for nodes
|
List | pgsql-hackers |
On Sat, 30 May 2020 at 01:52, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > On Fri, May 29, 2020 at 10:47 AM David Rowley <dgrowleyml@gmail.com> wrote: > > In [1] I mentioned that I'd like to look into coding a data structure > > to allow Node types to be looked up more efficiently than what List > > allows. There are many places in the planner where we must determine > > if a given Node is in some List of Nodes. This is an O(n) operation. > > We could reduce these to as low as O(log n) with a binary search tree. > > > > A few places in the code that does this which can become a problem > > when the lists being searched are large: > > > > get_eclass_for_sort_expr() > > eclass members have relids as their key. Can we use hash table instead > like how we manage join relations? I know that there are other cases > where we search for subset of relids instead of exact match. So the > hash table has to account for that. If we could somehow create a hash > value out of an expression node (we almost hash anything so that > should be possible) we should be able to use hash table for searching > expression as well. This certainly could be done with hash tables. However, I feel it raises the bar a little as it means we either need to maintain a hash function for each node type or do some pre-processor magic in equalfuncs.c to adjust the comparison macros into hashing macros to allow it to be compiled again to implement hash functions. If everyone feels that's an okay thing to go and do then perhaps hash tables are a better option. I was just trying to keep the bar to some level that I thought might be acceptable to the community. It does seem likely to me that hash tables would be a more efficient structure, but the gains might not be as big as you think. To perform a lookup in the table we need to hash the entire node to get the hash value, then perform at least one equal comparison. With the Binary Search Lists, we'd need more comparisons, but those can be partial comparisons as they can abort early when we find the first mismatching field in the Node. When the number of items in the list is fairly small that might actually be less work, especially when the Node type varies (since that's the first field we compare). Hash tables are likely to become more efficient when the number of items is larger. The general case is that we'll just have a small number of items. I'd like to improve the less common situation when the number of items is large with minimal impact for when the number of items is small. > > tlist_member() > > hash table by expression as key here as well? The order of the tlist does matter in many cases. I'm unsure how we could track the order that items were added to the hash table and then obtain the items back in that order in an efficient way. I imagine we'd need to store the item in some subsequent data structure, such as a List. There's obviously going to be some overhead to doing that. The idea with the Binary Search Lists was that items would be stored in an array, similar to List, but the cells of the list would contain the offset index to its parent and left and right child. New items would always take the next free slot in the list, just like List does so we'd easily be able to get the insert order by looping over the array of elements in order. David > > [1] https://www.postgresql.org/message-id/CAKJS1f8v-fUG8YpaAGj309ZuALo3aEk7f6cqMHr_AVwz1fKXug%40mail.gmail.com
pgsql-hackers by date: