Home > mailing lists

Re: Speeding up parts of the planner using a binary search treestructure for nodes - Mailing list pgsql-hackers

From	David Rowley
Subject	Re: Speeding up parts of the planner using a binary search treestructure for nodes
Date	June 1, 2020 21:42:51
Msg-id	CAApHDvqDvcpkmfWQLn42ACRjG-Kxcn1tvqeNyQ3ej3Xf_5Bp=w@mail.gmail.com Whole thread Raw
In response to	Re: Speeding up parts of the planner using a binary search treestructure for nodes (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
Responses	Re: Speeding up parts of the planner using a binary search treestructure for nodes
List	pgsql-hackers

Tree view

On Sat, 30 May 2020 at 01:52, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Fri, May 29, 2020 at 10:47 AM David Rowley <dgrowleyml@gmail.com> wrote:
> > In [1] I mentioned that I'd like to look into coding a data structure
> > to allow Node types to be looked up more efficiently than what List
> > allows.  There are many places in the planner where we must determine
> > if a given Node is in some List of Nodes. This is an O(n) operation.
> > We could reduce these to as low as O(log n) with a binary search tree.
> >
> > A few places in the code that does this which can become a problem
> > when the lists being searched are large:
> >
> > get_eclass_for_sort_expr()
>
> eclass members have relids as their key. Can we use hash table instead
> like how we manage join relations? I know that there are other cases
> where we search for subset of relids instead of exact match. So the
> hash table has to account for that. If we could somehow create a hash
> value out of an expression node (we almost hash anything so that
> should be possible) we should be able to use hash table for searching
> expression as well.

This certainly could be done with hash tables. However, I feel it
raises the bar a little as it means we either need to maintain a hash
function for each node type or do some pre-processor magic in
equalfuncs.c to adjust the comparison macros into hashing macros to
allow it to be compiled again to implement hash functions.  If
everyone feels that's an okay thing to go and do then perhaps hash
tables are a better option.  I was just trying to keep the bar to some
level that I thought might be acceptable to the community.

It does seem likely to me that hash tables would be a more efficient
structure, but the gains might not be as big as you think. To perform
a lookup in the table we need to hash the entire node to get the hash
value, then perform at least one equal comparison.  With the Binary
Search Lists, we'd need more comparisons, but those can be partial
comparisons as they can abort early when we find the first mismatching
field in the Node. When the number of items in the list is fairly
small that might actually be less work, especially when the Node type
varies (since that's the first field we compare). Hash tables are
likely to become more efficient when the number of items is larger.
The general case is that we'll just have a small number of items. I'd
like to improve the less common situation when the number of items is
large with minimal impact for when the number of items is small.

> > tlist_member()
>
> hash table by expression as key here as well?

The order of the tlist does matter in many cases. I'm unsure how we
could track the order that items were added to the hash table and then
obtain the items back in that order in an efficient way. I imagine
we'd need to store the item in some subsequent data structure, such as
a List. There's obviously going to be some overhead to doing that.
The idea with the Binary Search Lists was that items would be stored
in an array, similar to List, but the cells of the list would contain
the offset index to its parent and left and right child.  New items
would always take the next free slot in the list, just like List does
so we'd easily be able to get the insert order by looping over the
array of elements in order.

David

> > [1] https://www.postgresql.org/message-id/CAKJS1f8v-fUG8YpaAGj309ZuALo3aEk7f6cqMHr_AVwz1fKXug%40mail.gmail.com

pgsql-hackers by date:

From: Tom Lane
Date: 01 June 2020, 19:20:21
Subject: Re: Just for fun: Postgres 20?

From: Etsuro Fujita
Date: 02 June 2020, 00:02:10
Subject: Re: [POC] Fast COPY FROM command for the table with foreign partitions

Re: Speeding up parts of the planner using a binary search treestructure for nodes - Mailing list pgsql-hackers

Previous

Next