Home > mailing lists

Re: reducing the footprint of ScanKeyword (was Re: Large writablevariables) - Mailing list pgsql-hackers

From	Andres Freund
Subject	Re: reducing the footprint of ScanKeyword (was Re: Large writablevariables)
Date	December 26, 2018 18:02:17
Msg-id	20181226180217.qad6ahkkzk4g5poy@alap3.anarazel.de Whole thread Raw
In response to	Re: reducing the footprint of ScanKeyword (was Re: Large writable variables) (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)
List	pgsql-hackers

Tree view

Hi,

On 2018-12-26 11:50:18 -0500, Robert Haas wrote:
> On Wed, Dec 26, 2018 at 11:22 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > I think there's a lot of goalpost-moving going on here.  The original
> > idea was to trim the physical size of the data structure, as stated
> > in the thread subject, and just reap whatever cache benefits we got
> > along the way from that.  I am dubious that we actually have any
> > performance problem in this code that needs a big dollop of added
> > complexity to fix.
> 
> I have seen ScanKeywordLookup show up in profiles quite often and
> fairly prominently -- like several percent of total runtime. I'm not
> trying to impose requirements on John's patch, and I agree that
> reducing the physical size of the structure is a good step whether
> anything else is done or not. However, I don't see that as a reason to
> shut down further discussion of other possible improvements.  If his
> patch makes this disappear from profiles, cool, but if it doesn't,
> then sooner or later somebody's going to want to do more.

I agree. And most of the patch would be a pre-requisite for anything
more elaborate anyway.


> FWIW, my bet is this helps but isn't enough to get rid of the problem
> completely.  A 9-step binary search has got to be slower than a really
> well-optimized hash table lookup.

Yea, at least with a non-optimized layout. If we'd used a binary search
optimized lookup order it might be different, but probably at best
equivalent to a good hashtable.

> > In my hands, the only part of the low-level parsing code that commonly
> > shows up as interesting in profiles is the Bison engine.  That's probably
> > because the grammar tables are circa half a megabyte and blow out cache
> > pretty badly :-(.  I don't know of any way to make that better,
> > unfortunately.  I suspect that it's just going to get worse, because
> > people keep submitting additions to the grammar.
> 
> I'm kinda surprised that you haven't seen ScanKeywordLookup() in
> there, but I agree with you that the size of the main parser tables is
> a real issue, and that there's no easy solution. At various times
> there has been discussion of using some other parser generator, and
> I've also toyed with the idea of writing one specifically for
> PostgreSQL. Unfortunately, it seems like bison is all but
> unmaintained; the alternatives are immature and have limited adoption
> and limited community; and writing something from scratch is a ton of
> work.  :-(

My bet is, and has been for quite a while, that we'll have to go for a
hand-written recursive descent type parser.  They can be *substantially*
faster, and performance isn't as affected by the grammar size. And,
about as important, they also allow for a lot more heuristics around
grammar errors - I do think we'll soon have to better than to throw a
generic syntax error for the cases where the grammar doesn't match at
all.

Greetings,

Andres Freund

pgsql-hackers by date:

From: Tom Lane
Date: 26 December 2018, 17:59:32
Subject: Re: "repliation" as database name

From: Andres Freund
Date: 26 December 2018, 18:04:35
Subject: Re: reducing the footprint of ScanKeyword (was Re: Large writablevariables)

Re: reducing the footprint of ScanKeyword (was Re: Large writablevariables) - Mailing list pgsql-hackers

Previous

Next