Index corruption - Mailing list pgsql-general
From | Bankim Bhavsar |
---|---|
Subject | Index corruption |
Date | |
Msg-id | 5511B1AA.9060505@nimblestorage.com Whole thread Raw |
Responses |
Re: Index corruption
Re: Index corruption Re: Index corruption |
List | pgsql-general |
Hello postgres experts, We are running a test that periodically abruptly kills postgres process(equivalent to kill -9) and restarts it. After running this test for 24 hrs or so, we see duplicate primary key entries in postgres table. We detect this as we load internal hash-table data-structure in a separate process with primary key entries. Before hitting this issue we see following warning messages in pg_log 17365 2015-03-24 03:01:42.729 GMTWARNING: page is not marked all-visible but visibility map bit is set in relation "table_foo" page 12 17365 2015-03-24 03:01:42.729 GMTWARNING: page is not marked all-visible but visibility map bit is set in relation "table_foo" page 13 Some information about schema. - This table can contain upto 150k entries. - *IMPORTANT*: We constantly insert new entries and remove older entries from the table. Relevant columns in table_foo ----------------------------------------------------------------------------- pk_col3 | bigint | not null default 0::bigint pk_col1 | bigint | not null default 0::bigint pk_col2 | bigint | not null default 0::bigint "table_foo_pkey" PRIMARY KEY, btree (pk_col1, pk_col2, pk_col3) There are 3 other indexes on non-primary key columns in the table. Duplicate entries db=# select pk_col1, pk_col2, pk_col3, count(1) from table_foo group by pk_col1, pk_col2, pk_col3 having count(1) > 1; pk_col1 | pk_col2| pk_col3 | count --------------------+--------+----------+------- 627708949163497688 | 1 | 13467 | 2 627708949163497688 | 4 | 13566 | 2 627708949163497688 | 266 | 13565 | 2 (3 rows) Query analyzer using index only scan. sodb=# explain select pk_col1, pk_col2, pk_col3, count(1) from table_foo group by pk_col1, pk_col2, pk_col3 having count(1) > 1 order by pk_col3; QUERY PLAN ------------------------------------------------------------------------------------------------------ Sort (cost=166.25..167.97 rows=689 width=24) Sort Key: pk_col3 -> HashAggregate (cost=125.16..133.77 rows=689 width=24) Filter: (count(1) > 1) -> Index Only Scan using table_foo_pkey on table_foo (cost=0.00..113.36 rows=944 width=24) (5 rows) When non-primary key column is queried we don't get duplicate entries. Query analyzer is using sequential scan on table_foo table. sodb=# select pk_col1, pk_col2, pk_col3, creation_time, count(1) from table_foo group by pk_col1, pk_col2, pk_col3 having count(1) > 1 order by pk_col3; pk_col1 | pk_col2 | pk_col3 | creation_time | count ------------------------------------------- (0 rows) sodb=# explain select pk_col1, pk_col2, pk_col3, creation_time, count(1) from table_foo group by pk_col1, pk_col2, pk_col3 having count(1) > 1 order by pk_col3; QUERY PLAN -------------------------------------------------------------------------- Sort (cost=174.33..176.06 rows=689 width=32) Sort Key: pk_col3 -> HashAggregate (cost=133.24..141.85 rows=689 width=32) Filter: (count(1) > 1) -> Seq Scan on table_foo (cost=0.00..121.44 rows=944 width=32) (5 rows) We ran an experiment wherein we reindex the offending table on every postgres startup and we don't see the same issue after reindex. This leads us to believe that the index is corrupted but actual data on the table is fine. Some information about postgres setup. - 9.2.0 - We use standard configuration with shared_buffer setting as 32MB and checkpoint_timeout as 1 min. - In this particular case postgres replication is not enabled. Let me know if more information is needed to help understand this issue. Any help or pointers will be appreciated. Thanks, Bankim.
pgsql-general by date: