Thread: missing chunk number 0 for toast value 25693266 in pg_toast_25497233

missing chunk number 0 for toast value 25693266 in pg_toast_25497233

From

Bryan Murphy

Date:

06 May 2010, 20:24:50

I'm running into this issue again:

psql --version
psql (PostgreSQL) 8.3.7

COPY items_extended TO '/dev/null';
ERROR:  missing chunk number 0 for toast value 25693266 in pg_toast_25497233

Unfortunately, I do not know where these are coming from and I cannot
replicate the data in at least one of my tables (which has 20 million
records) because of this.  I've already found 10 bad records.  There
are more.

I have four tables with 20 million records (and a fifth which has even
more) that may have this problem.  Right now, checking every record
one by one is going to take *DAYS* to complete, and I'm in the middle
of adding additional nodes to our cluster because we are already
stretched to the limit.

I have a few options, such as trying to check batches of records and
spinning up multiple checkers in parallel looking at different subsets
of the data on wal shipped spares (assuming the wal shipped spares
would suffer the same problem, which is a big assumption), but this is
a lot of effort to get going.

Help!

Thanks,
Bryan

Re: missing chunk number 0 for toast value 25693266 in pg_toast_25497233

From

Magnus Hagander

Date:

07 May 2010, 11:03:10

On Fri, May 7, 2010 at 01:24, Bryan Murphy <bmurphy1976@gmail.com> wrote:
> I'm running into this issue again:
>
> psql --version
> psql (PostgreSQL) 8.3.7
>
> COPY items_extended TO '/dev/null';
> ERROR:  missing chunk number 0 for toast value 25693266 in pg_toast_25497233
>
> Unfortunately, I do not know where these are coming from and I cannot
> replicate the data in at least one of my tables (which has 20 million
> records) because of this.  I've already found 10 bad records.  There
> are more.
>
> I have four tables with 20 million records (and a fifth which has even
> more) that may have this problem.  Right now, checking every record
> one by one is going to take *DAYS* to complete, and I'm in the middle
> of adding additional nodes to our cluster because we are already
> stretched to the limit.
>
> I have a few options, such as trying to check batches of records and
> spinning up multiple checkers in parallel looking at different subsets
> of the data on wal shipped spares (assuming the wal shipped spares
> would suffer the same problem, which is a big assumption), but this is
> a lot of effort to get going.

Try doing a binary search with LIMIT. E.g., if you have 20M reecords,
do a SELECT * FROM ... LIMIT 10M. (throw away the results) If that
broke, check the upper half, if not, check the lower one (with
OFFSET).

If you have a serial primary key or something, you can use WHERE on it
which will likely be a lot faster than using LIMIT, but the same idea
applies - do a binary search. Should take a lot less than days, and is
reasonably easy to script.


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: missing chunk number 0 for toast value 25693266 in pg_toast_25497233

From

Bryan Murphy

Date:

07 May 2010, 11:53:23

On Fri, May 7, 2010 at 9:02 AM, Magnus Hagander <magnus@hagander.net> wrote:
> Try doing a binary search with LIMIT. E.g., if you have 20M reecords,
> do a SELECT * FROM ... LIMIT 10M. (throw away the results) If that
> broke, check the upper half, if not, check the lower one (with
> OFFSET).
>
> If you have a serial primary key or something, you can use WHERE on it
> which will likely be a lot faster than using LIMIT, but the same idea
> applies - do a binary search. Should take a lot less than days, and is
> reasonably easy to script.

That's my next step if I can't find a quicker/simpler method.  My math
tells me that my current script is going to take 24 days to test every
record.  Obviously, there are ways I can speed that up if I have no
choice but I'm hoping for a simpler solution.

I'd prefer to run a COPY TABLE like command and have it skip the bad records.

Thanks,
Bryan