Thread: Status report: long-query-string changes

Status report: long-query-string changes

From

Tom Lane

Date:

11 September 1999, 18:51:18

I have finished applying Mike Ansley's changes for long queries, along
with a bunch of my own. The current status is:

* You can send a query string of indefinite length to the backend. (This is poorly tested for MULTIBYTE, though; would
someonewho uses MULTIBYTE more than I do try it out?)

* You can get back an EXPLAIN or error message string of indefinite length.

* Single lexical tokens within a query are currently limited to 64k because of the lexer's use of YY_REJECT. I have
notcommitted any of Leon's proposed lexer changes, since that issue still seems controversial. I would like to see us
agreeon a solution. (ecpg's lexer has the same problem, of course.)

Although I think the backend is in fairly good shape, there are still
a few minor trouble spots. (The rule deparser will blow up at 8K for
example --- I have some work to do in there and will fix it when
I get a chance.)

In the frontend libraries and clients, both libpq and psql are length-
limit-free. I have not looked much at any of the other frontend
interface libraries. I suspect that at least odbc and the python
interface need work, because quick glimpse searches show suspicious-
looking
constants:MAX_QUERY_SIZEERROR_MSG_LENGTHSQL_PACKET_SIZEMAX_MESSAGE_LENTEXT_FIELD_SIZEMAX_VARCHAR_SIZEDRV_VARCHAR_SIZEDRV_LONGVARCHAR_SIZEMAX_BUFFER_SIZEMAX_FIELDS

The real problem in the clients is that pg_dump blithely assumes it
will never need to deal with strings over MAX_QUERY_SIZE. This is
a bad idea --- it ought to be rewritten to use the expansible-string-
buffer facility that now exists in libpq. There may be restrictions
in the other programs in bin/ as well, though glimpse didn't turn up
any red flags.

I would like to encourage the odbc and python folks to get rid of the
length limitations in their modules; I don't use either and have no
intention of touching either. I'd like to find a volunteer other than
myself to fix pg_dump, too.

Now, all we need is someone to implement multiple-disk-block tuples ;-)
regards, tom lane

Re: [HACKERS] Status report: long-query-string changes

From

Leon

Date:

11 September 1999, 19:26:18

Tom Lane wrote:

> 
> * Single lexical tokens within a query are currently limited to 64k
>   because of the lexer's use of YY_REJECT.  I have not committed any
>   of Leon's proposed lexer changes, since that issue still seems
>   controversial.  I would like to see us agree on a solution.

Thomas Lockhart should speak up - he seems the only person who
has objections yet. If the proposed thing is to be declined, something
has to be applied instead in respect to lexer reject feature and
accompanying size limits, as well as grammar inconsistency. Seems there
are only awkward solutions as alternatives. As you probably remember,
the proposed change only breaks constructs like 1+-2, which anyone
in a sane condition should avoid when programming :)

There are more size restrictions there. I noticed (by simply eyeing the
lexer source, without testing) that in case of flex lexer 
(FLEX_LEXER being defined in scan.c) lexer can't
swallow big queries. You (Tom and Michael) aren't using flex,
are you?

-- 
Leon.
-------
He knows he'll never have to answer for any of his theories actually 
being put to test. If they were, they would be contaminated by reality.

Re: [HACKERS] Status report: long-query-string changes

From

Tom Lane

Date:

11 September 1999, 20:40:19

Leon <leon@udmnet.ru> writes:
> There are more size restrictions there. I noticed (by simply eyeing the
> lexer source, without testing) that in case of flex lexer 
> (FLEX_LEXER being defined in scan.c) lexer can't
> swallow big queries. You (Tom and Michael) aren't using flex,
> are you?

Huh?  flex is the only lexer that works with the Postgres .l files,
as far as I know.  Certainly it's what I'm using.

If you're looking at the "literal" buffer, that would need to be made
expansible, but there's not much point until flex's internal stuff is
fixed.
        regards, tom lane

Re: [HACKERS] Status report: long-query-string changes

From

Leon

Date:

12 September 1999, 08:23:27

Tom Lane wrote:

> 
> If you're looking at the "literal" buffer, that would need to be made
> expansible, but there's not much point until flex's internal stuff is
> fixed.
> 
> 

Look at this piece of code. It seems that when myinput() had been called
once, for the second time it will return 0 even if string isn't
over yet. Parameter 'max' is 8192 bytes on my system. So the query is 
simply truncated to that size.

#ifdef FLEX_SCANNER
/* input routine for flex to read input from a string instead of a file */
static int
myinput(char* buf, int max)
{int len, copylen;
if (parseCh == NULL){    len = strlen(parseString);    if (len >= max)        copylen = max - 1;    else        copylen
=len;    if (copylen > 0)        memcpy(buf, parseString, copylen);    buf[copylen] = '\0';    parseCh = parseString;
return copylen;}else    return 0; /* end of string */
 
}
#endif /* FLEX_SCANNER */

-- 
Leon.
-------
He knows he'll never have to answer for any of his theories actually 
being put to test. If they were, they would be contaminated by reality.

Re: [HACKERS] Status report: long-query-string changes

From

Tom Lane

Date:

12 September 1999, 11:36:29

Leon <leon@udmnet.ru> writes:
> Look at this piece of code. It seems that when myinput() had been called
> once, for the second time it will return 0 even if string isn't
> over yet.

It's always a good idea to pull a fresh copy of the sources
before opinionating about what works or doesn't work in someone's
just-committed changes ;-)
        regards, tom lane

Re: [HACKERS] Status report: long-query-string changes

From

Tatsuo Ishii

Date:

12 September 1999, 18:05:33

> I have finished applying Mike Ansley's changes for long queries, along
> with a bunch of my own.  The current status is:
> 
> * You can send a query string of indefinite length to the backend.
>   (This is poorly tested for MULTIBYTE, though; would someone who
>   uses MULTIBYTE more than I do try it out?)

I'll take care of this.
---
Tatsuo Ishii

Re: [HACKERS] Status report: long-query-string changes

From

Thomas Lockhart

Date:

12 September 1999, 23:34:11

> Thomas Lockhart should speak up - he seems the only person who
> has objections yet. If the proposed thing is to be declined, something
> has to be applied instead in respect to lexer reject feature and
> accompanying size limits, as well as grammar inconsistency.

Hmm. I'd suggest that we go with the "greedy lexer" solution, which
continues to gobble characters which *could* be an operator until
other characters or whitespace are encountered.

I don't recall any compelling cases for which this would be an
inadequate solution, and we have plenty of time until v6.6 is released
to discover problems and work out alternatives.

Sorry for slowing things up; but fwiw I *did* think about it some more
;)
                   - Thomas

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California

Re: [HACKERS] Status report: long-query-string changes

From

Thomas Lockhart

Date:

12 September 1999, 23:46:16

> Thomas Lockhart should speak up...
> He knows he'll never have to answer for any of his theories actually
> being put to test. If they were, they would be contaminated by reality.

You talkin' to me?? ;)

So, while you are on the lexer warpath, I'd be really happy if someone
would fix the following behavior:

(I'm doing this from memory, but afaik it is close to correct)

For non-psql applications, such as tcl or ecpg, which do not do any
pre-processing on input tokens, a trailing un-terminated string will
be lost, and no error will be detected. For example,

select * from t1 'abc

sent directly to the server will not fail as it should with that
garbage at the end. The lexer is in a non-standard mode after all
tokens are processed, and the accumulated string "abc" is left in a
buffer and not sent to yacc/bison. I think you can see this behavior
just by looking at the lexer code.

A simple fix would be to check the current size after lexing of that
accumulated string buffer, and if it is non-zero then elog(ERROR) a
complaint. Perhaps a more general fix would be to ensure that you are
never in an exclusive state after all tokens are processed, but I'm
not sure how to do that.
                      - Thomas

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California

Re: [HACKERS] Status report: long-query-string changes

From

Leon

Date:

13 September 1999, 09:47:18

Thomas Lockhart wrote:
> 
> > Thomas Lockhart should speak up - he seems the only person who
> > has objections yet. If the proposed thing is to be declined, something
> > has to be applied instead in respect to lexer reject feature and
> > accompanying size limits, as well as grammar inconsistency.
> 
> Hmm. I'd suggest that we go with the "greedy lexer" solution, which
> continues to gobble characters which *could* be an operator until
> other characters or whitespace are encountered.

'Xcuse my dumbness ;) , but is it in any way different from 
what is proposed (by me and some others?)

-- 
Leon.
-------
He knows he'll never have to answer for any of his theories actually 
being put to test. If they were, they would be contaminated by reality.

Re: [HACKERS] Status report: long-query-string changes

From

Leon

Date:

13 September 1999, 09:47:23

Thomas Lockhart wrote:
> 
> > Thomas Lockhart should speak up...
> > He knows he'll never have to answer for any of his theories actually
> > being put to test. If they were, they would be contaminated by reality.
> 
> You talkin' to me?? ;)

Nein, nein! Sei still bitte! :)  This is my signature which is a week 
old already :)

> A simple fix would be to check the current size after lexing of that
> accumulated string buffer, and if it is non-zero then elog(ERROR) a
> complaint. Perhaps a more general fix would be to ensure that you are
> never in an exclusive state after all tokens are processed, but I'm
> not sure how to do that.

The solution is obvious - to eliminate exclusive states entirely!
Banzai!!!

-- 
Leon.
-------
He knows he'll never have to answer for any of his theories actually 
being put to test. If they were, they would be contaminated by reality.

Re: [HACKERS] Status report: long-query-string changes

From

Leon

Date:

13 September 1999, 10:38:18

Thomas Lockhart wrote:
> 
> > The solution is obvious - to eliminate exclusive states entirely!
> > Banzai!!!
> 
> That will complicate the lexer, and make it more brittle and difficult
> to read, since you will have to, essentially, implement the exclusive
> states using flags within each element.
> 
> If you want to try it as an exercise, we *might* find it isn't as ugly
> as I am afraid it will be, but...
> 

Gimme the latest lexer source. (I pay for my Internet on a per 
minute basis, so I can't connect to CVS) You will see what I mean.

-- 
Leon.
-------
He knows he'll never have to answer for any of his theories actually 
being put to test. If they were, they would be contaminated by reality.

Re: [HACKERS] Status report: long-query-string changes

From

Tom Lane

Date:

13 September 1999, 10:55:18

Leon <leon@udmnet.ru> writes:
>> A simple fix would be to check the current size after lexing of that
>> accumulated string buffer, and if it is non-zero then elog(ERROR) a
>> complaint. Perhaps a more general fix would be to ensure that you are
>> never in an exclusive state after all tokens are processed, but I'm
>> not sure how to do that.

> The solution is obvious - to eliminate exclusive states entirely!
> Banzai!!!

Can we do that?  Seems like a more likely approach is to ensure that
all of the lexer states have rules that ensure they terminate (or
raise an error, as for unterminated quoted string) at end of input.
I do think checking the token buffer is a hack, and changing the rules
a cleaner solution...
        regards, tom lane

Re: [HACKERS] Status report: long-query-string changes

From

Leon

Date:

13 September 1999, 14:22:45

Tom Lane wrote:
> 
> Leon <leon@udmnet.ru> writes:
> >> A simple fix would be to check the current size after lexing of that
> >> accumulated string buffer, and if it is non-zero then elog(ERROR) a
> >> complaint. Perhaps a more general fix would be to ensure that you are
> >> never in an exclusive state after all tokens are processed, but I'm
> >> not sure how to do that.
> 
> > The solution is obvious - to eliminate exclusive states entirely!
> > Banzai!!!
> 
> Can we do that?  Seems like a more likely approach is to ensure that
> all of the lexer states have rules that ensure they terminate (or
> raise an error, as for unterminated quoted string) at end of input.
> I do think checking the token buffer is a hack, and changing the rules
> a cleaner solution...

Hmm, yea, you are right. It is much simpler solution. We can check 
condition in myinput() and input() when we are going to return 
end-of-input (YYSTATE == INITIAL), and raise an error if that's not so.
Well, I give up my idea of total extermination of start conditions :)

BTW, while eyeing the scan.l again, I noticed that C - style comments
can also contain bugs, but I am not completely sure.
-- 
Leon.
-------
He knows he'll never have to answer for any of his theories actually 
being put to test. If they were, they would be contaminated by reality.