Thread: Index corruption / planner issue with one table in my pg 11.6 instance

Index corruption / planner issue with one table in my pg 11.6 instance

From

Jeremy Finzel

Date:

09 December 2019, 17:59:52

I have a table with about 7 million records. I had a query in which I needed 2 indexes added, one for a created timestamp field another for an id field; both very high cardinality.

First I noticed the query would not use the timestamp index no matter what session config settings I used. I finally created a temp table copy of the table and verified index is used. Then I rebuilt the main table with VACUUM FULL and this caused the index to be used.

I repeated this process again for an id index on same table. I created the index and it would never be chosen no matter what, until I rebuilt the table using VAC FULL.

I have run bt_index_check and bt_index_parent_check with heapallindexed on one of these indexes but nothing comes up.

But one other noteworthy thing is that a cluster restart appears to fix the issue, because on a snapshot of this system (which has been restarted) also at 11.6, the planner picks up the index.

We quite recently (~ 2-3 weeks) did pgupgrade from 9.6 to 11. This table in question is fed via pglogical. I checked similar behavior on another table in this stream and could not reproduce it. So for now, it seems limited to this one table.

Any suggestions as to how I could verify what is going on here? Anyone experienced the same?

Thanks!

Jeremy

Re: Index corruption / planner issue with one table in my pg 11.6 instance

From

Jeff Janes

Date:

09 December 2019, 20:51:39

On Mon, Dec 9, 2019 at 1:00 PM Jeremy Finzel <finzelj@gmail.com> wrote:

I have a table with about 7 million records. I had a query in which I needed 2 indexes added, one for a created timestamp field another for an id field; both very high cardinality.

First I noticed the query would not use the timestamp index no matter what session config settings I used. I finally created a temp table copy of the table and verified index is used. Then I rebuilt the main table with VACUUM FULL and this caused the index to be used.

Were they built with CONCURRENTLY? Do you have any long-open snapshots?

Cheers,

Jeff

Re: Index corruption / planner issue with one table in my pg 11.6instance

From

Michael Paquier

Date:

10 December 2019, 02:33:38

On Mon, Dec 09, 2019 at 03:51:39PM -0500, Jeff Janes wrote:
> On Mon, Dec 9, 2019 at 1:00 PM Jeremy Finzel <finzelj@gmail.com> wrote:
>> I have a table with about 7 million records.  I had a query in which I
>> needed 2 indexes added, one for a created timestamp field another for an id
>> field; both very high cardinality.
>>
>> First I noticed the query would not use the timestamp index no matter what
>> session config settings I used.  I finally created a temp table copy of the
>> table and verified index is used.  Then I rebuilt the main table with
>> VACUUM FULL and this caused the index to be used.
>
> Were they built with CONCURRENTLY?  Do you have any long-open snapshots?

Something new as of 11 is that btree indexes can be built in parallel,
and before releasing it we found some bugs with covering indexes.
Perhaps we have an issue hidden behind one of these, but hard to be
sure.  I have not seen that yet as of v11.
--
Michael

Attachment

signature.asc

Re: Index corruption / planner issue with one table in my pg 11.6 instance

From

Peter Geoghegan

Date:

10 December 2019, 02:56:26

On Mon, Dec 9, 2019 at 6:33 PM Michael Paquier <michael@paquier.xyz> wrote:
> Something new as of 11 is that btree indexes can be built in parallel,
> and before releasing it we found some bugs with covering indexes.
> Perhaps we have an issue hidden behind one of these, but hard to be
> sure.

I doubt it.

Jeremy did not report queries that give wrong answers. He only said
that the optimizer refused to use one particular index, before a
VACUUM FULL seemingly corrected the problem. OTOH, Jeremy did report
using contrib/amcheck on the index, which didn't complain. (Note also
that the amcheck functions will throw an error with an !indisvalid
index.)

-- 
Peter Geoghegan

Re: Index corruption / planner issue with one table in my pg 11.6 instance

From

Jerry Sievers

Date:

10 December 2019, 05:10:36

Peter Geoghegan <pg@bowt.ie> writes:

> On Mon, Dec 9, 2019 at 6:33 PM Michael Paquier <michael@paquier.xyz> wrote:
>
>> Something new as of 11 is that btree indexes can be built in parallel,
>> and before releasing it we found some bugs with covering indexes.
>> Perhaps we have an issue hidden behind one of these, but hard to be
>> sure.
>
> I doubt it.
>
> Jeremy did not report queries that give wrong answers. He only said
> that the optimizer refused to use one particular index, before a
> VACUUM FULL seemingly corrected the problem. OTOH, Jeremy did report
> using contrib/amcheck on the index, which didn't complain. (Note also
> that the amcheck functions will throw an error with an !indisvalid
> index.)

I suspect this was due to indcheckxmin=true for the involved index and
the documented (but IMO confusing) interplay w/broken hot-chains and
visibility.

Checking the same DB today, I find 35 indexes across the entire system
having indcheckxmin=true, including one on the same table, though not
the same index that Pg refused to use recently.

Many of the indexes have very old xmins and thus s/b all considered in
plans.

I was able to get that remaining index out of the indcheckxmin=true list
by...

1. Reindexing $index (did not change anything)
2. begin; drop; create; commit (still in the list but with a much newer
xmin.)
3. Vac-Full the table again (and now the index is gone from the
indcheckxmin=true list.)

Please advise.

Thx

-- 
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consulting@comcast.net

Re: Index corruption / planner issue with one table in my pg 11.6 instance

From

Tom Lane

Date:

10 December 2019, 06:09:21

Jerry Sievers <gsievers19@comcast.net> writes:
> I suspect this was due to indcheckxmin=true for the involved index and
> the documented (but IMO confusing) interplay w/broken hot-chains and
> visibility.

Yeah.  The reported behavior can mostly be explained if we assume
that there's some HOT chain in the table that involves an update
of this particular column, so that if we build an index on that
column we see a broken HOT chain, but building an index on some
other column doesn't have a problem.

The thing this doesn't easily explain is that the behavior persists
across repeated index rebuilds.  A broken HOT chain is only broken
as long as the older entry is still visible-to-somebody, so that
such situations ought to be self-healing as time passes.  If it
fails repeatedly, this theory requires assuming that either

1. You've got some extremely old open transactions (maybe forgotten
prepared transactions?), or

2. Your workload is constantly generating new broken HOT chains of
the same sort, so that there's usually a live one when you try
to build an index.

The fact that you even notice the indcheckxmin restriction indicates
that you do tend to have long-running transactions in the system,
else the index would come free for use fairly quickly.  So #1 isn't
as implausible as I might otherwise think.  But #2 seems probably
more likely on the whole.  OTOH, neither point is exactly within
the offered evidence.

            regards, tom lane

Re: Index corruption / planner issue with one table in my pg 11.6 instance

From

Jeremy Finzel

Date:

10 December 2019, 14:25:18

On Tue, Dec 10, 2019 at 12:09 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Yeah. The reported behavior can mostly be explained if we assume
that there's some HOT chain in the table that involves an update
of this particular column, so that if we build an index on that
column we see a broken HOT chain, but building an index on some
other column doesn't have a problem.

The problem exists so far as I can tell on indexing *any column* of *this particular table*. I tried same experiment on another table in the same replication stream, and I cannot reproduce it.

I am building the index **non-concurrently** every time.

The thing this doesn't easily explain is that the behavior persists
across repeated index rebuilds. A broken HOT chain is only broken
as long as the older entry is still visible-to-somebody, so that
such situations ought to be self-healing as time passes. If it
fails repeatedly, this theory requires assuming that either

1. You've got some extremely old open transactions (maybe forgotten
prepared transactions?), or

No prepared_xacts and no transactions older than a few hours. Several hour transactions are common in this reporting system. I have not yet seen if after several hours the index starts showing up in plans.

2. Your workload is constantly generating new broken HOT chains of
the same sort, so that there's usually a live one when you try
to build an index.

The fact that you even notice the indcheckxmin restriction indicates
that you do tend to have long-running transactions in the system,
else the index would come free for use fairly quickly. So #1 isn't
as implausible as I might otherwise think. But #2 seems probably
more likely on the whole. OTOH, neither point is exactly within
the offered evidence.

Is there a way for me to test this theory? I tried the following with no change in behavior:

Disable write load to table
Vacuum analyze table (not vac full)
Create index
Explain

Still did not pick up the index.

Thanks,

Jeremy

Re: Index corruption / planner issue with one table in my pg 11.6 instance

From

Jeremy Finzel

Date:

12 December 2019, 13:52:14

On Tue, Dec 10, 2019 at 8:25 AM Jeremy Finzel <finzelj@gmail.com> wrote:

Is there a way for me to test this theory? I tried the following with no change in behavior:
Disable write load to table
Vacuum analyze table (not vac full)
Create index
Explain
Still did not pick up the index.

Just another followup: with no other intervention on our part, after many hours the planner is picking up the index.

I don't quite know what is causing it still, but is this behavior actually desired? It's pretty inconvenient when trying to build an index for a query need and immediately use it which used to work :).

Thanks,

Jeremy