Re: Parser Cruft in gram.y - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Parser Cruft in gram.y |
Date | |
Msg-id | CA+TgmoYYs++fs0WkVHWXXq=7Ynj94VviDPUorE2=EGFCuz7uQg@mail.gmail.com Whole thread Raw |
In response to | Re: Parser Cruft in gram.y (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Parser Cruft in gram.y
Re: Parser Cruft in gram.y |
List | pgsql-hackers |
On Sat, Dec 15, 2012 at 11:52 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Kevin Grittner" <kgrittn@mail.com> writes: >> Tom Lane wrote: >>> the parser tables are basically number-of-tokens wide by >>> number-of-states high. (In HEAD there are 433 tokens known to the >>> grammar, all but 30 of which are keywords, and 4367 states.) >>> >>> Splitting the grammar into multiple grammars is unlikely to do >>> much to improve this --- in fact, it could easily make matters >>> worse due to duplication. > >> Of course if they were both at 80% it would be a higher total than >> combined, but unless you have a handle on the percentages, it >> doesn't seem like a foregone conclusion. Do you have any feel for >> what the split would be? > > I don't really, but I will note that the scalar-expression subgrammar is > a pretty sizable part of the whole, and it's difficult to see how you'd > make a useful split that didn't duplicate it. I guess you could push > CREATE TABLE, ALTER TABLE, CREATE DOMAIN, ALTER DOMAIN, COPY, and > anything else that included expression arguments over into the "main" > grammar. But that path leads to more and more stuff getting moved to > the "main" grammar over time, making the whole thing more and more > questionable. The whole concept seems ugly and unmaintainable in any > case. I thought a little bit about the sort of thing that Dimitri is proposing in the past, and it seemed to me that one could put DML in one grammar and everything else in another grammar and then decide, based on the first word of the input, which grammar to use. But there are a couple of problems with this. First, the DML grammar has to include an awful lot of stuff, because the grammar for expressions is really complicated and involves a lot of things like special-case syntax for XML that are probably not really carrying their weight but which cannot easily be factored out. Second, the DDL grammar would have to duplicate a lot of stuff that also shows up in the DML grammar, because things like expressions can also show up in DEFAULT or USING clauses which show up in things like CREATE TABLE and ALTER TABLE and CREATE SCHEMA .. CREATE TABLE. Now either one of these problems by itself might not be sufficient to kill the idea: if the DML grammar were a small subset of the full grammar, one might not mind duplicating some stuff, on the grounds that in most cases that full grammar would not be used, and using only the smaller tables most of the time would be easier on the L1 cache. And on the other hand, if you could get a clean split between the two grammars, then regardless of exactly what the split was, it might seem a win. But it seemed to me when I looked at this that you'd have to duplicate a lot of stuff and the small parser still wouldn't end up being very small, which I found hard to get excited about. I'm frankly kind of shocked at the revelation that the parser is already 14% of the backend. I knew it was big; I didn't realize it was THAT big. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: