Thread: Re: [GENERAL] select statement against pg_stats returns inconsistent data

Re: [GENERAL] select statement against pg_stats returns inconsistent data

From

Tom Lane

Date:

24 February 2004, 19:28:43

Shelby Cain <alyandon@yahoo.com> writes:
> The select statements return different data for
> most_commons_vals depending on whether n_distinct is
> included in the select clause or not.
> I only seem to get the behavior below against int8
> columns - but I haven't interated through every
> conceivable data type either.

Hoo, I'm surprised no one noticed this during 7.4 development/testing.
The problem applies for any datatype that requires double alignment,
which includes int8, float8, and timestamp as well as most of the
geometric types.  pg_statistic is declared as using type "anyarray",
and this type really needs to be marked as requiring double alignment
so that arrays of double-aligned datatypes will come out correctly.

The correct source fix is a one-line change in pg_type.h, but this will
not propagate into existing databases without an initdb.  It looks like
what you'd need to do to fix an existing database is

-- clear out broken data in pg_statistic
DELETE FROM pg_statistic;
-- this should update 1 row:
UPDATE pg_type SET typalign = 'd' WHERE oid = 2277;
-- this should update 6 rows:
UPDATE pg_attribute SET attalign = 'd' WHERE atttypid = 2277;
-- might be a good idea to start a fresh backend at this point
-- repopulate pg_statistic
ANALYZE;

Ugh :-(

            regards, tom lane

Re: [GENERAL] select statement against pg_stats returns inconsistent

From

Joe Conway

Date:

24 February 2004, 20:38:57

Tom Lane wrote:
> Hoo, I'm surprised no one noticed this during 7.4 development/testing.
> The problem applies for any datatype that requires double alignment,
> which includes int8, float8, and timestamp as well as most of the
> geometric types.  pg_statistic is declared as using type "anyarray",
> and this type really needs to be marked as requiring double alignment
> so that arrays of double-aligned datatypes will come out correctly.

anyarray has been defined this way since 7.3 -- any concerns there? I
see that back then pg_statistic used text[] instead of anyarray, so
perhaps not.

Joe

Re: [GENERAL] select statement against pg_stats returns inconsistent data

From

Tom Lane

Date:

24 February 2004, 23:35:10

Joe Conway <mail@joeconway.com> writes:
> anyarray has been defined this way since 7.3 -- any concerns there?

I don't think so --- we weren't trying to use it as an actual column
datatype back then.

7.4 has a problem though :-( ... this is one of the "damn I wish we'd
caught that before release" ones, since it can't easily be fixed without
initdb.  Reminds me that I need to get to work on making pg_upgrade
viable again.

            regards, tom lane

Re: [GENERAL] select statement against pg_stats returns

From

Christopher Kings-Lynne

Date:

25 February 2004, 00:04:18

> I don't think so --- we weren't trying to use it as an actual column
> datatype back then.
>
> 7.4 has a problem though :-( ... this is one of the "damn I wish we'd
> caught that before release" ones, since it can't easily be fixed without
> initdb.  Reminds me that I need to get to work on making pg_upgrade
> viable again.

Has anyone given any thought as to whether dumping and restoring
pg_statistic is worthwhile?

eg. some sort of ALTER TABLE..SET STATISTICS (1.0, 3.3, 'asdf',....)
command?

Chris

Re: [GENERAL] select statement against pg_stats returns inconsistent data

From

Tom Lane

Date:

25 February 2004, 00:15:42

Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:
> Has anyone given any thought as to whether dumping and restoring
> pg_statistic is worthwhile?

Why?  You can reconstruct it with a simple "ANALYZE" command.  Dumping
and restoring would mean nailing down cross-version assumptions about
what it contains, which doesn't seem real forward-looking...

            regards, tom lane

Re: [GENERAL] select statement against pg_stats returns

From

Christopher Kings-Lynne

Date:

25 February 2004, 00:33:17

> Why?  You can reconstruct it with a simple "ANALYZE" command.  Dumping
> and restoring would mean nailing down cross-version assumptions about
> what it contains, which doesn't seem real forward-looking...

I seem to recall that people like that kind of thing so that the dump is 
really the current state of the database.

Also, I believe big db's like DB2 and Oracle do such a thing.

I just recall it being discussed some time ago...

Chris

Re: [GENERAL] select statement against pg_stats returns

From

Bruce Momjian

Date:

02 March 2004, 14:45:56

Christopher Kings-Lynne wrote:
> > Why?  You can reconstruct it with a simple "ANALYZE" command.  Dumping
> > and restoring would mean nailing down cross-version assumptions about
> > what it contains, which doesn't seem real forward-looking...
> 
> I seem to recall that people like that kind of thing so that the dump is 
> really the current state of the database.
> 
> Also, I believe big db's like DB2 and Oracle do such a thing.
> 
> I just recall it being discussed some time ago...

I have heard of dumping stats so you are sure your production db has the
same stats as your test database, but with ANALYZE so fast, and our
optimizer so good, I don't see a use case for us.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073