Home > mailing lists

Re: Parser Cruft in gram.y - Mailing list pgsql-hackers

From	Robert Haas
Subject	Re: Parser Cruft in gram.y
Date	December 18, 2012 02:00:40
Msg-id	CA+TgmoYYs++fs0WkVHWXXq=7Ynj94VviDPUorE2=EGFCuz7uQg@mail.gmail.com Whole thread Raw
In response to	Re: Parser Cruft in gram.y (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Parser Cruft in gram.y Re: Parser Cruft in gram.y
List	pgsql-hackers

Tree view

On Sat, Dec 15, 2012 at 11:52 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Kevin Grittner" <kgrittn@mail.com> writes:
>> Tom Lane wrote:
>>> the parser tables are basically number-of-tokens wide by
>>> number-of-states high. (In HEAD there are 433 tokens known to the
>>> grammar, all but 30 of which are keywords, and 4367 states.)
>>>
>>> Splitting the grammar into multiple grammars is unlikely to do
>>> much to improve this --- in fact, it could easily make matters
>>> worse due to duplication.
>
>> Of course if they were both at 80% it would be a higher total than
>> combined, but unless you have a handle on the percentages, it
>> doesn't seem like a foregone conclusion. Do you have any feel for
>> what the split would be?
>
> I don't really, but I will note that the scalar-expression subgrammar is
> a pretty sizable part of the whole, and it's difficult to see how you'd
> make a useful split that didn't duplicate it.  I guess you could push
> CREATE TABLE, ALTER TABLE, CREATE DOMAIN, ALTER DOMAIN, COPY, and
> anything else that included expression arguments over into the "main"
> grammar.  But that path leads to more and more stuff getting moved to
> the "main" grammar over time, making the whole thing more and more
> questionable.  The whole concept seems ugly and unmaintainable in any
> case.

I thought a little bit about the sort of thing that Dimitri is
proposing in the past, and it seemed to me that one could put DML in
one grammar and everything else in another grammar and then decide,
based on the first word of the input, which grammar to use.  But there
are a couple of problems with this.  First, the DML grammar has to
include an awful lot of stuff, because the grammar for expressions is
really complicated and involves a lot of things like special-case
syntax for XML that are probably not really carrying their weight but
which cannot easily be factored out.  Second, the DDL grammar would
have to duplicate a lot of stuff that also shows up in the DML
grammar, because things like expressions can also show up in DEFAULT
or USING clauses which show up in things like CREATE TABLE and ALTER
TABLE and CREATE SCHEMA .. CREATE TABLE.

Now either one of these problems by itself might not be sufficient to
kill the idea: if the DML grammar were a small subset of the full
grammar, one might not mind duplicating some stuff, on the grounds
that in most cases that full grammar would not be used, and using only
the smaller tables most of the time would be easier on the L1 cache.
And on the other hand, if you could get a clean split between the two
grammars, then regardless of exactly what the split was, it might seem
a win.  But it seemed to me when I looked at this that you'd have to
duplicate a lot of stuff and the small parser still wouldn't end up
being very small, which I found hard to get excited about.

I'm frankly kind of shocked at the revelation that the parser is
already 14% of the backend.  I knew it was big; I didn't realize it
was THAT big.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

From: Bruce Momjian
Date: 18 December 2012, 01:41:42
Subject: Re: [ADMIN] Problems with enums after pg_upgrade

From: Bruce Momjian
Date: 18 December 2012, 02:10:34
Subject: Re: [ADMIN] Problems with enums after pg_upgrade

Re: Parser Cruft in gram.y - Mailing list pgsql-hackers

Previous

Next