Re: percentile value check can be slow - Mailing list pgsql-hackers
From | David Fetter |
---|---|
Subject | Re: percentile value check can be slow |
Date | |
Msg-id | 20171119174938.GL4411@fetter.org Whole thread Raw |
In response to | Re: percentile value check can be slow (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Responses |
Re: percentile value check can be slow
|
List | pgsql-hackers |
On Sun, Nov 19, 2017 at 01:23:42PM +0100, Tomas Vondra wrote: > Hi, > > On 11/19/2017 03:10 AM, David Fetter wrote: > > On Sat, Nov 18, 2017 at 11:05:47PM +0100, Tomas Vondra wrote: > >> Hi, > >> > >> ... > >> > >> Is 'recognizer' an established definition I should know? Is it the same > >> as 'validator' or is it something new/different? > > > > I borrowed it from http://langsec.org/ > > > > I'm not entirely sure what you mean by a validator, but a recognizer > > is something that gives a quick and sure read as to whether the input > > is well-formed. In general, it's along the lines of a tokenizer, a > > parser, and something that does very light post-parse analysis for > > correctness of form. > > > > For the case that started the thread, a recognizer would check > > something along the lines of > > > > CHECK('[0,1]' @> ALL(input_array)) > > OK, thanks. From what I understand, recognizer is more about recognizing > if a string is valid within a given formal language (essentially, if > it's a well-formed program). That may not be the right term for checks > on parameter values. There are two hard problems in computer science: naming things, cache coherency, and off-by-one. > OTOH we already have "validators" on a number of places - functions > checking various parameters, e.g. reloptions for FDWs, etc. > > But I guess the naming can be solved later ... Indeed. > >>> Way Bigger Lift, As Far As I Can Tell, But More Fun For Users: > >>> Allow optional CHECK constraints in CREATE AGGREGATE for direct > >>> arguments. > >>> > >> > >> How will any of the approaches deal with something like > >> > >> select percentile_cont((select array_agg(v) from p)) > >> within group (order by a) from t; > >> > >> In this case the the values are unknown after the parse analysis, so I > >> guess it does not really address that. > > > > It doesn't. Does it make sense to do a one-shot execution for cases > > like that? It might well be worth it to do the aggregate once in > > advance as a throw-away if the query execution time is already going > > to take awhile. Of course, you can break that one by making p a JOIN > > to yet another thing... > > > >> FWIW while I won't stand in the way of improving this, I wonder if this > >> is really worth the additional complexity. If you get errors like this > >> with a static list of values, you will fix the list and you're done. If > >> the list is dynamic (computed in the query itself), you'll still get the > >> error much later during query execution. > >> > >> So if you're getting many failures like this for the "delayed error > >> reporting" to be an issue, perhaps there's something wrong in you stack > >> and you should address that instead? > > > > I'd like to think that getting something to fail quickly and cheaply > > when it can will give our end users a better experience. Here, > > "cheaply" refers to their computing resources and time. > > The trouble is, this increases execution time for everyone, including > people who carefully construct the parameter values. That seems rather > undesirable. I may be wrong but I'm pretty sure that a check for well-formed direct parameters will not impose a significant cost on aggregates. It occurs to me that this particular aggregate could take an array of a domain defined along the lines of: CREATE DOMAIN float4_0_1_closed AS float4 NOT NULL CHECK(VALUE >= 0.0 AND VALUE <= 1.0); Then the check would happen much earlier without adding a bunch of potentially expensive machinery. > > Clearly, not having this happen in this case bothered Johannes > > enough to wade in here. > > No. He was surprised the error is reported after significant amount > of time, but he does not seem to claim failing faster would be > valuable to him. That is your assumption, and I have my doubts about > it. My mistake. I shouldn't have guessed when there was a better alternative. Johannes, could you help us understand your thinking in reporting this? Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
pgsql-hackers by date: