Thread: sampling.c and potential divisions by 0 ang log(0) with tablesample and ANALYZE in 9.5
sampling.c and potential divisions by 0 ang log(0) with tablesample and ANALYZE in 9.5
From
Michael Paquier
Date:
Hi all, (Petr in CC) Coverity is pointing out that anl_random_fract and sampler_random_fract can return 0, causing in some code paths math errors, aka division by 0 or even log(0) in the case of TABLESAMPLE or even ANALYZE. In 9.4, anl_random_fract is careful enough to use random() + 1 to prevent that, but that's not the case of 9.5 where we begin to use pg_erand48, that returns a double in range [0.0,1.0). I think that we should change the returned double to be (0.0,1.0] instead like in the patch attached (bernouilli and system methods need a brush-up as well). I haven't updated tsm_system_rows and tsm_system_time but their regression diffs are attached. This bug can be triggered when using TABLESAMPLE, now ANALYZE is more worrying because it could happen during an auto-analyze. Thoughts? -- Michael
Attachment
Re: sampling.c and potential divisions by 0 ang log(0) with tablesample and ANALYZE in 9.5
From
Petr Jelinek
Date:
On 2015-06-25 10:01, Michael Paquier wrote: > Hi all, > (Petr in CC) > > Coverity is pointing out that anl_random_fract and > sampler_random_fract can return 0, causing in some code paths math > errors, aka division by 0 or even log(0) in the case of TABLESAMPLE or > even ANALYZE. > > In 9.4, anl_random_fract is careful enough to use random() + 1 to > prevent that, but that's not the case of 9.5 where we begin to use > pg_erand48, that returns a double in range [0.0,1.0). > > I think that we should change the returned double to be (0.0,1.0] Agreed. > instead like in the patch attached (bernouilli and system methods need > a brush-up as well). I haven't updated tsm_system_rows and > tsm_system_time but their regression diffs are attached. > I don't see need for the other 1 - x changes outside of the sampler_random_fract() tbh. Yes it changes the regression tests output but I don't think it changes correctness of the results. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
Re: sampling.c and potential divisions by 0 ang log(0) with tablesample and ANALYZE in 9.5
From
Michael Paquier
Date:
On Sat, Jun 27, 2015 at 11:27 AM, Petr Jelinek <petr@2ndquadrant.com> wrote: > On 2015-06-25 10:01, Michael Paquier wrote: >> instead like in the patch attached (bernouilli and system methods need >> a brush-up as well). I haven't updated tsm_system_rows and >> tsm_system_time but their regression diffs are attached. >> > > I don't see need for the other 1 - x changes outside of the > sampler_random_fract() tbh. Yes it changes the regression tests output but I > don't think it changes correctness of the results. OK, thanks for the confirmation. I thought first that things were wanted this way in HEAD for the tablesample samplings, but if it does not matter much, let's move on with your patch then it looks good to me. -- Michael
Re: Re: sampling.c and potential divisions by 0 ang log(0) with tablesample and ANALYZE in 9.5
From
Tom Lane
Date:
Petr Jelinek <petr@2ndquadrant.com> writes: > On 2015-06-25 10:01, Michael Paquier wrote: >> I think that we should change the returned double to be (0.0,1.0] > Agreed. I find this to be a pretty bad idea. That definition is simply weird; where else in the world will you find a random number generator that does that? What are the odds that any callers are actually designed for that behavior? Another problem is that we consider anl_random_fract() to be an exported API, and the very longstanding definition of that is that the result is in (0,1), excluding both endpoints. Whatever we do with sampler_random_fract(), we'd better make sure that anl_random_fract() preserves that behavior, else we are likely to break third-party modules. A simple fix would be to adjust sampler_random_fract to disallow 0 as result, say by repeating the pg_erand48 call if it produces 0. I'm not sure if that would throw off any of the math in the new tablesample-related callers. If it would, I'm inclined to fix the problem call-site-by-call-site, rather than inventing a definition of sampler_random_fract() that fails to satisfy the POLA. regards, tom lane
Re: Re: sampling.c and potential divisions by 0 ang log(0) with tablesample and ANALYZE in 9.5
From
Tom Lane
Date:
I wrote: > Petr Jelinek <petr@2ndquadrant.com> writes: >> On 2015-06-25 10:01, Michael Paquier wrote: >>> I think that we should change the returned double to be (0.0,1.0] >> Agreed. > I find this to be a pretty bad idea. That definition is simply weird; > where else in the world will you find a random number generator that does > that? What are the odds that any callers are actually designed for that > behavior? And, in fact, a bit of looking quickly finds a counterexample, in analyze.c: int k = (int) (targrows * sampler_random_fract(rstate.randstate)); Assert(k >= 0 && k < targrows); You can't just whack this around to satisfy some new call sites without considering the behavior of existing use-cases. regards, tom lane
Re: Re: sampling.c and potential divisions by 0 ang log(0) with tablesample and ANALYZE in 9.5
From
Petr Jelinek
Date:
On 2015-06-30 18:21, Tom Lane wrote: > I wrote: >> Petr Jelinek <petr@2ndquadrant.com> writes: >>> On 2015-06-25 10:01, Michael Paquier wrote: >>>> I think that we should change the returned double to be (0.0,1.0] > >>> Agreed. > >> I find this to be a pretty bad idea. That definition is simply weird; >> where else in the world will you find a random number generator that does >> that? What are the odds that any callers are actually designed for that >> behavior? > > And, in fact, a bit of looking quickly finds a counterexample, in > analyze.c: > > int k = (int) (targrows * sampler_random_fract(rstate.randstate)); > > Assert(k >= 0 && k < targrows); > > You can't just whack this around to satisfy some new call sites without > considering the behavior of existing use-cases. > Right, very good point. So, we used to protect against this problem by using long value and doing: ((double) random() + 1) / ((double) MAX_RANDOM_VALUE + 2) Maybe best solution is to have pg_lrand48() variant which accepts seed as parameter same way pg_erand48() does and use the same logic we used to have before sampling was added. The attached patch does that and it does not even need changes in the regression tests output which I consider to be a nice bonus. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
Re: Re: sampling.c and potential divisions by 0 ang log(0) with tablesample and ANALYZE in 9.5
From
Tom Lane
Date:
Petr Jelinek <petr@2ndquadrant.com> writes: > Right, very good point. > So, we used to protect against this problem by using long value and doing: > ((double) random() + 1) / ((double) MAX_RANDOM_VALUE + 2) > Maybe best solution is to have pg_lrand48() variant which accepts seed > as parameter same way pg_erand48() does and use the same logic we used > to have before sampling was added. I'm unimpressed with this coding because it presumes that MAX_RANDOM_VALUE (which is defined as the maximum value returned by random()) has something to do with the output range of pg_lrand48(). While that might accidentally fail to fail on all known platforms, it's not good, because those functions don't have the same provenance and so there's no good reason to assume that they produce the same output range. Rather than expanding the API of port.h still further, I continue to think that the best answer is just to repeat if we get a zero from pg_erand48. regards, tom lane
Re: Re: sampling.c and potential divisions by 0 ang log(0) with tablesample and ANALYZE in 9.5
From
Petr Jelinek
Date:
On 2015-06-30 20:52, Tom Lane wrote: > Petr Jelinek <petr@2ndquadrant.com> writes: >> Right, very good point. > >> So, we used to protect against this problem by using long value and doing: >> ((double) random() + 1) / ((double) MAX_RANDOM_VALUE + 2) > >> Maybe best solution is to have pg_lrand48() variant which accepts seed >> as parameter same way pg_erand48() does and use the same logic we used >> to have before sampling was added. > > I'm unimpressed with this coding because it presumes that MAX_RANDOM_VALUE > (which is defined as the maximum value returned by random()) has something > to do with the output range of pg_lrand48(). While that might > accidentally fail to fail on all known platforms, it's not good, because > those functions don't have the same provenance and so there's no good > reason to assume that they produce the same output range. Well, by this logic all the other places which are using MAX_RANDOM_VALUE would be broken as well as our port version of random() just calls pg_lrand48(). Not to mention that we hardwire MAX_RANDOM_VALUE to be same as PG_INT32_MAX (which we could use in the code I proposed instead of MAX_RANDOM_VALUE to be safe). > > Rather than expanding the API of port.h still further, I continue to think > that the best answer is just to repeat if we get a zero from pg_erand48. > Looping over function call until you get value you like seems quite ugly to me tbh. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Re: sampling.c and potential divisions by 0 ang log(0) with tablesample and ANALYZE in 9.5
From
Michael Paquier
Date:
On Wed, Jul 1, 2015 at 1:17 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Petr Jelinek <petr@2ndquadrant.com> writes: >> On 2015-06-25 10:01, Michael Paquier wrote: >>> I think that we should change the returned double to be (0.0,1.0] > >> Agreed. > > I find this to be a pretty bad idea. That definition is simply weird; > where else in the world will you find a random number generator that does > that? What are the odds that any callers are actually designed for that > behavior? > > Another problem is that we consider anl_random_fract() to be an exported > API, and the very longstanding definition of that is that the result is > in (0,1), excluding both endpoints. Whatever we do with > sampler_random_fract(), we'd better make sure that anl_random_fract() > preserves that behavior, else we are likely to break third-party modules. Wait a minute... Yes you are right I clearly missed the fact that in ~9.4 the range of values returned by anl_random_fract() does not include 1. I thought it did, visibly I misread the code... > A simple fix would be to adjust sampler_random_fract to disallow 0 > as result, say by repeating the pg_erand48 call if it produces 0. > I'm not sure if that would throw off any of the math in the new > tablesample-related callers. If it would, I'm inclined to fix the > problem call-site-by-call-site, rather than inventing a definition > of sampler_random_fract() that fails to satisfy the POLA. Agreed. Disallowing 0 in sampler_random_fract looks like a good answer to that. Looking at the tablesample code, for the bernouilli trial I recall that the range of probability success and failure is actually (0,1), (if p is a success rate, the failure is 1 - p). I am not sure for Knuth Algo S though for the system sampling but that looks OK from a pure logical viewpoint. -- Michael
Attachment
Re: Re: sampling.c and potential divisions by 0 ang log(0) with tablesample and ANALYZE in 9.5
From
Michael Paquier
Date:
On Wed, Jul 1, 2015 at 4:56 PM, Michael Paquier wrote: > On Wed, Jul 1, 2015 at 1:17 AM, Tom Lane wrote: >> A simple fix would be to adjust sampler_random_fract to disallow 0 >> as result, say by repeating the pg_erand48 call if it produces 0. >> I'm not sure if that would throw off any of the math in the new >> tablesample-related callers. If it would, I'm inclined to fix the >> problem call-site-by-call-site, rather than inventing a definition >> of sampler_random_fract() that fails to satisfy the POLA. > > Agreed. Disallowing 0 in sampler_random_fract looks like a good answer > to that. Looking at the tablesample code, for the bernouilli trial I > recall that the range of probability success and failure is actually > (0,1), (if p is a success rate, the failure is 1 - p). I am not sure > for Knuth Algo S though for the system sampling but that looks OK from > a pure logical viewpoint. For the sake of the archives, this has been fixed by commit d7c19d68. (Thanks Tom for pushing it!) -- Michael
Re: Re: sampling.c and potential divisions by 0 ang log(0) with tablesample and ANALYZE in 9.5
From
Michael Paquier
Date:
On Wed, Jul 1, 2015 at 3:52 AM, Tom Lane wrote: > Rather than expanding the API of port.h still further > [...] The current trend, as far as I am understanding it, is not to increase the amount of code in src/port, but actually to reduce it and to rely a maximum on what modern platforms offer. See for example the recent discussion about rint and numeric rounding here: http://www.postgresql.org/message-id/flat/20150320194337.2573.72944@wrigleys.postgresql.org#20150320194337.2573.72944@wrigleys.postgresql.org We are keeping rint around because MSVC just added it recently in MS2013 even if the POSIX spec has it for a long time, but I am sure that we would drop it asap from the code tree if we could. -- Michael
Re: Re: sampling.c and potential divisions by 0 ang log(0) with tablesample and ANALYZE in 9.5
From
Petr Jelinek
Date:
On 2015-07-02 02:51, Michael Paquier wrote: > On Wed, Jul 1, 2015 at 3:52 AM, Tom Lane wrote: >> Rather than expanding the API of port.h still further >> [...] > > The current trend, as far as I am understanding it, is not to increase > the amount of code in src/port, but actually to reduce it and to rely > a maximum on what modern platforms offer. See for example the recent > discussion about rint and numeric rounding here: > http://www.postgresql.org/message-id/flat/20150320194337.2573.72944@wrigleys.postgresql.org#20150320194337.2573.72944@wrigleys.postgresql.org > We are keeping rint around because MSVC just added it recently in > MS2013 even if the POSIX spec has it for a long time, but I am sure > that we would drop it asap from the code tree if we could. > Which is completely moot point here because the sampling depends on pg_erand48 on every platform no matter what the platform provides. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services