Re: reducing the footprint of ScanKeyword (was Re: Large writablevariables) - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: reducing the footprint of ScanKeyword (was Re: Large writablevariables) |
Date | |
Msg-id | 20181226180217.qad6ahkkzk4g5poy@alap3.anarazel.de Whole thread Raw |
In response to | Re: reducing the footprint of ScanKeyword (was Re: Large writable variables) (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)
|
List | pgsql-hackers |
Hi, On 2018-12-26 11:50:18 -0500, Robert Haas wrote: > On Wed, Dec 26, 2018 at 11:22 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I think there's a lot of goalpost-moving going on here. The original > > idea was to trim the physical size of the data structure, as stated > > in the thread subject, and just reap whatever cache benefits we got > > along the way from that. I am dubious that we actually have any > > performance problem in this code that needs a big dollop of added > > complexity to fix. > > I have seen ScanKeywordLookup show up in profiles quite often and > fairly prominently -- like several percent of total runtime. I'm not > trying to impose requirements on John's patch, and I agree that > reducing the physical size of the structure is a good step whether > anything else is done or not. However, I don't see that as a reason to > shut down further discussion of other possible improvements. If his > patch makes this disappear from profiles, cool, but if it doesn't, > then sooner or later somebody's going to want to do more. I agree. And most of the patch would be a pre-requisite for anything more elaborate anyway. > FWIW, my bet is this helps but isn't enough to get rid of the problem > completely. A 9-step binary search has got to be slower than a really > well-optimized hash table lookup. Yea, at least with a non-optimized layout. If we'd used a binary search optimized lookup order it might be different, but probably at best equivalent to a good hashtable. > > In my hands, the only part of the low-level parsing code that commonly > > shows up as interesting in profiles is the Bison engine. That's probably > > because the grammar tables are circa half a megabyte and blow out cache > > pretty badly :-(. I don't know of any way to make that better, > > unfortunately. I suspect that it's just going to get worse, because > > people keep submitting additions to the grammar. > > I'm kinda surprised that you haven't seen ScanKeywordLookup() in > there, but I agree with you that the size of the main parser tables is > a real issue, and that there's no easy solution. At various times > there has been discussion of using some other parser generator, and > I've also toyed with the idea of writing one specifically for > PostgreSQL. Unfortunately, it seems like bison is all but > unmaintained; the alternatives are immature and have limited adoption > and limited community; and writing something from scratch is a ton of > work. :-( My bet is, and has been for quite a while, that we'll have to go for a hand-written recursive descent type parser. They can be *substantially* faster, and performance isn't as affected by the grammar size. And, about as important, they also allow for a lot more heuristics around grammar errors - I do think we'll soon have to better than to throw a generic syntax error for the cases where the grammar doesn't match at all. Greetings, Andres Freund
pgsql-hackers by date: