Bug in numeric multiplication - Mailing list pgsql-hackers
From | Dean Rasheed |
---|---|
Subject | Bug in numeric multiplication |
Date | |
Msg-id | CAEZATCUUEeYOzi-cOt+1p3wo_15q=ffqVc+oV-uMnmBH-rPFGg@mail.gmail.com Whole thread Raw |
Responses |
Re: Bug in numeric multiplication
|
List | pgsql-hackers |
Hi, By chance, while testing the nearby numeric log/exp/pow patch, I came across the following case which generates an initially puzzling looking error on HEAD -- (5.6-1e-500) ^ (3.2-1e-200). This computation actually works OK with that other patch, but only by blind luck. It turns out that the underlying problem is a bug in the low-level numeric multiplication function mul_var(). It is possible to trigger it directly with the following carefully crafted inputs: select 4790999999999999999999999999999999999999999999999999999999999999999999999999999999999999 * 9999999999999999999999999999999999999999999999999999999999999999999999999999999999999999; Result: 47909999999999999999999999999999999999999999999999999999999999999999999999999999997852304953000000000000000000000000000000000000000000000000000000000000000000000000000000000001 That answer is actually incorrect. Tweaking the input a little, it is possible to generate a much more obviously nonsensical result: select 4789999999999999999999999999999999999999999999999999999999999999999999999999999999999999 * 9999999999999999999999999999999999999999999999999999999999999999999999999999999999999999; Result: 4789999999999999999999999999999999999999999999999999999999999999999999999999999999785231+0,*000000000000000000000000000000000000000000000000000000000000000000000000000000000001 Notice those garbage digits in the middle of the number returned. The problem is that these examples trigger an overflow of the digits in the accumulator array in mul_var(). The number on the left in the first example consists of 21 copies of 9999, preceded by 4790. Those are chosen so that when added together they lead to a value for maxdig in mul_var() of 21*9999 + 4790 = 214769, which is exactly equal to INT_MAX / (NBASE - 1). So this doesn't quite trigger a normalisation of the accumulator array, and leaves several of the digits in that array a little under INT_MAX at the end of the main multiplication loop. The problem then arises in the final carry propagation pass. During this phase of the computation, the carry from one digit (which can be a shade under INT_MAX / NBASE) is added to the next digit, and that's where the overflow happens. To fix that, the initial value for maxdig needs to be made larger to leave headroom for the carry. The largest possible carry is INT_MAX / NBASE, and maxdig is the maximum possible dig value divided by NBASE-1, so maxdig needs to be initialised to (INT_MAX / NBASE) / (NBASE - 1) which is 21 for NBASE = 10000. A new corner-case input that doesn't quite trigger an accumulator normalisation is then 47699999... The worst case inputs are now values like this for which the sum of a sequence of input digits is INT_MAX / (NBASE - 1) - 21 = 214769 - 21 = 214748. So in the worst case, the accumulator's digits can be up to 214748 * 9999 = 2147265252 in the main multiplication loop. Then, during the carry propagation phase (or any of the normalisation phases), the carry can be anything up to INT_MAX / NBASE = 214748. So the maximum value that can be assigned to any individual digit is now 2147265252 + 214748 = 2147480000, which is now less than INT_MAX. Patch attached. Regards, Dean
Attachment
pgsql-hackers by date: