Re: benchmarking Flex practices - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: benchmarking Flex practices |
Date | |
Msg-id | 25775.1564414812@sss.pgh.pa.us Whole thread Raw |
In response to | Re: benchmarking Flex practices (John Naylor <john.naylor@2ndquadrant.com>) |
Responses |
Re: benchmarking Flex practices
|
List | pgsql-hackers |
John Naylor <john.naylor@2ndquadrant.com> writes: > On Sun, Jul 21, 2019 at 3:14 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> So I'm feeling like maybe we should experiment to see what that >> solution looks like, before we commit to going in this direction. >> What do you think? > Given the above wrinkles, I thought it was worth trying. Attached is a > rough patch (don't mind the #include mess yet :-) ) that works like > this: > The lexer returns UCONST from xus and UIDENT from xui. The grammar has > rules that are effectively: > SCONST { do nothing} > | UCONST { esc char is backslash } > | UCONST UESCAPE SCONST { esc char is from $3 } > ...where UESCAPE is now an unreserved keyword. To prevent shift-reduce > conflicts, I added UIDENT to the %nonassoc precedence list to match > IDENT, and for UESCAPE I added a %left precedence declaration. Maybe > there's a more principled way. I also added an unsigned char type to > the %union, but it worked fine on my compiler without it. I think it might be better to drop the separate "Uescape" production and just inline that into the calling rules, exactly per your sketch above. You could avoid duplicating the escape-checking logic by moving that into the str_udeescape support function. This would avoid the need for the "uchr" union variant, but more importantly it seems likely to be more future-proof: IME, any time you can avoid or postpone shift/reduce decisions, it's better to do so. I didn't try, but I think this might allow dropping the %left for UESCAPE. That bothers me because I don't understand why it's needed or what precedence level it ought to have. > litbuf_udeescape() and check_uescapechar() were moved to gram.y. The > former had be massaged to give error messages similar to HEAD. They're > not quite identical, but the position info is preserved. Some of the > functions I moved around don't seem to have any test coverage, so I > should eventually do some work in that regard. I don't terribly like the cross-calls you have between gram.y and scan.l in this formulation. If we have to make these functions (hexval() etc) non-static anyway, maybe we should shove them all into scansup.c? > -Binary size is very close to v6. That is to say the grammar tables > grew by about the same amount the scanner table shrank, so the binary > is still about 200kB smaller than HEAD. OK. > -Performance is very close to v6 with the information_schema and > pgbench-like queries with standard strings, which is to say also very > close to HEAD. When the latter was changed to use Unicode escapes, > however, it was about 15% slower than HEAD. That's a big regression > and I haven't tried to pinpoint why. I don't quite follow what you changed to produce the slower test case? But that seems to be something we'd better run to ground before deciding whether to go this way. > -The ecpg changes here are only the bare minimum from HEAD to get it > to compile, since I'm borrowing its additional token names (although > they mean slightly different things). After a bit of experimentation, > it's clear there's a bit more work needed to get it functional, and > it's not easy to debug, so I'm putting that off until we decide > whether this is the way forward. On the whole I like this approach, modulo the performance question. Let's try to work that out before worrying about ecpg. regards, tom lane
pgsql-hackers by date: