Re: Parallel Aggregate - Mailing list pgsql-hackers
From | Haribabu Kommi |
---|---|
Subject | Re: Parallel Aggregate |
Date | |
Msg-id | CAJrrPGdqCb06j2qNag2aQUyMN4aEDL7pqeDkYLfHgY8YN+rBzg@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel Aggregate (David Rowley <david.rowley@2ndquadrant.com>) |
Responses |
Re: Parallel Aggregate
|
List | pgsql-hackers |
On Fri, Jan 22, 2016 at 10:13 PM, David Rowley <david.rowley@2ndquadrant.com> wrote: > On 22 January 2016 at 17:25, Haribabu Kommi <kommi.haribabu@gmail.com> wrote: >> Along with these changes, I added a float8 combine function to see >> how it works under parallel aggregate, it is working fine for float4, but >> giving small data mismatch with float8 data type. >> >> postgres=# select avg(f3), avg(f4) from tbl; >> avg | avg >> ------------------+------------------ >> 1.10000002384186 | 100.123449999879 >> (1 row) >> >> postgres=# set enable_parallelagg = true; >> SET >> postgres=# select avg(f3), avg(f4) from tbl; >> avg | avg >> ------------------+------------------ >> 1.10000002384186 | 100.123449999918 >> (1 row) >> >> Column - f3 - float4 >> Column - f4 - float8 >> >> similar problem for all float8 var_pop, var_samp, stddev_pop and stddev_samp >> aggregates. Any special care is needed for float8 datatype? > > I'm not sure if this is what's going on here, as I don't really know > the range of numbers that you've used to populate f4 with. It would be > good to know, does "f4" contain negative values too? No negative values are present in the f4 column. Following are the SQL statements, create table tbl(f1 int, f2 char(100), f3 float4, f4 float8); insert into tbl values(generate_series(1,100000), 'Fujitsu', 1.1, 100.12345); > It's not all that hard to demonstrate the instability of addition with > float8. Take the following example: > > create table d (d float8); > insert into d values(1223123223412324.2231),(0.00000000000023),(-1223123223412324.2231); > > # select sum(d order by random()) from d; > sum > ----- > 0 > (1 row) > > same query, once more. > > # select sum(d order by random()) from d; > sum > ---------- > 2.3e-013 > (1 row) > > Here the result just depends on the order which the numbers have been > added. You may need to execute a few more times to see the result > change. > > Perhaps a good test would be to perform a sum(f4 order by random()) in > serial mode, and see if you're getting a stable result from the > numbers that you have populated the table with. > > If that's the only problem at play here, then I for one am not worried > about it, as the instability already exists today depending on which > path is chosen to scan the relation. For example an index scan is > likely not to return rows in the same order as a seq scan. > > We do also warn about this in the manual: "Inexact means that some > values cannot be converted exactly to the internal format and are > stored as approximations, so that storing and retrieving a value might > show slight discrepancies. Managing these errors and how they > propagate through calculations is the subject of an entire branch of > mathematics and computer science and will not be discussed here, > except for the following points:" [1] > > > [1] http://www.postgresql.org/docs/devel/static/datatype-numeric.html > Thanks for the detailed explanation. Now I understood. Here I attached updated patch with additional combine function for two stage aggregates also. Regards, Hari Babu Fujitsu Australia
Attachment
pgsql-hackers by date: