Thread: BUG #12617: DETAIL: Could not read from file "pg_subtrans/06F8" at offset 90112: Success.

BUG #12617: DETAIL: Could not read from file "pg_subtrans/06F8" at offset 90112: Success.

From

davividal@gmail.com

Date:

21 January 2015, 14:33:10

The following bug has been logged on the website:

Bug reference:      12617
Logged by:          Davi Koscianski Vidal
Email address:      davividal@gmail.com
PostgreSQL version: 9.4.0
Operating system:   Linux CentOS 6.6 x86_64
Description:

My scenario: I have a daily physical backup from my production server
(9.1).
I create a 9.1 cluster from it and upgrade it to 9.4 daily. After that, I
deploy a bunch of changes using sqitch (sqitch.org). Again: I do it on a
daily basis, but this is the first time I faced this issue:

$ sqitch deploy -t sandbox views/sistema.sf_guard_user@HEAD
Deploying changes through views/sistema.sf_guard_user to sandbox
  + data_migration/rate_plan.payment_policies ...............
psql:sql/deploy/data_migration/rate_plan.payment_policies.sql:21: ERROR:
could not access status of transaction 116940611
DETAIL:  Could not read from file "pg_subtrans/06F8" at offset 90112:
Success.
CONTEXT:  while updating tuple (6302,20) in relation "rate_daily_policies"
not ok

Re-running the command yields the exactly same error.
Running from psql give the same output:

reserva_imediata=# ALTER TABLE rate_plan.rate_daily_policies DISABLE TRIGGER
USER;
ALTER TABLE
reserva_imediata=# ALTER TABLE rate_plan.payment_policies DISABLE TRIGGER
USER;
ALTER TABLE
reserva_imediata=# UPDATE rate_plan.rate_daily_policies AS rp
reserva_imediata-#         SET
reserva_imediata-#         (
reserva_imediata(#                   discount,
reserva_imediata(#                   min_stay,
reserva_imediata(#                   max_stay,
reserva_imediata(#                   meal_plan_type_ota_code
reserva_imediata(#           ) = (
reserva_imediata(#                   pp.discount,
reserva_imediata(#                   pp.min_stay,
reserva_imediata(#                   pp.max_stay,
reserva_imediata(#                   pp.meal_plan_type_ota_code
reserva_imediata(#           )
reserva_imediata-# FROM rate_plan.payment_policies AS pp
reserva_imediata-# WHERE pp.policy_id = rp.policy_id_payment;
ERROR:  could not access status of transaction 116940611
DETAIL:  Could not read from file "pg_subtrans/06F8" at offset 90112:
Success.
CONTEXT:  while updating tuple (6302,20) in relation "rate_daily_policies"
reserva_imediata=# ALTER TABLE rate_plan.rate_daily_policies ENABLE TRIGGER
USER;
ALTER TABLE
reserva_imediata=# ALTER TABLE rate_plan.payment_policies ENABLE TRIGGER
USER;
ALTER TABLE

The command I used to upgrade was: /usr/pgsql-9.4/bin/pg_upgrade -d
/var/lib/pgsql/9.1/data -D /var/lib/9.4/data -b /usr/pgsql/9.1/bin -B
/usr/pgsql/9.4/bin

What other info can I provide?
My database is 30 GB in size and have customers data, so it is *very* hard
(or even impossible) to provide a dump.

Re: BUG #12617: DETAIL: Could not read from file "pg_subtrans/06F8" at offset 90112: Success.

From

Alvaro Herrera

Date:

21 January 2015, 15:12:09

davividal@gmail.com wrote:

> My scenario: I have a daily physical backup from my production server
> (9.1).
> I create a 9.1 cluster from it and upgrade it to 9.4 daily. After that, I
> deploy a bunch of changes using sqitch (sqitch.org). Again: I do it on a
> daily basis, but this is the first time I faced this issue:
>
> $ sqitch deploy -t sandbox views/sistema.sf_guard_user@HEAD
> Deploying changes through views/sistema.sf_guard_user to sandbox
>   + data_migration/rate_plan.payment_policies ...............
> psql:sql/deploy/data_migration/rate_plan.payment_policies.sql:21: ERROR:
> could not access status of transaction 116940611
> DETAIL:  Could not read from file "pg_subtrans/06F8" at offset 90112:
> Success.
> CONTEXT:  while updating tuple (6302,20) in relation "rate_daily_policies"
> not ok

Can you please paste the output of pg_controldata on both clusters?


--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: BUG #12617: DETAIL: Could not read from file "pg_subtrans/06F8" at offset 90112: Success.

From

davi vidal

Date:

22 January 2015, 14:01:59

On Wed, Jan 21, 2015 at 1:07 PM, Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

> davividal@gmail.com wrote:
>
> > My scenario: I have a daily physical backup from my production server
> > (9.1).
> > I create a 9.1 cluster from it and upgrade it to 9.4 daily. After that, I
> > deploy a bunch of changes using sqitch (sqitch.org). Again: I do it on a
> > daily basis, but this is the first time I faced this issue:
> >
> > $ sqitch deploy -t sandbox views/sistema.sf_guard_user@HEAD
> > Deploying changes through views/sistema.sf_guard_user to sandbox
> >   + data_migration/rate_plan.payment_policies ...............
> > psql:sql/deploy/data_migration/rate_plan.payment_policies.sql:21: ERROR:
> > could not access status of transaction 116940611
> > DETAIL:  Could not read from file "pg_subtrans/06F8" at offset 90112:
> > Success.
> > CONTEXT:  while updating tuple (6302,20) in relation
> "rate_daily_policies"
> > not ok
>
> Can you please paste the output of pg_controldata on both clusters?
>
>
Sure:

# /usr/pgsql-9.1/bin/pg_controldata /var/lib/pgsql/9.1/data
pg_control version number:            903
Catalog version number:               201105231
Database system identifier:           6084433861875394175
Database cluster state:               in production
pg_control last modified:             Tue Jan 20 03:01:14 2015
Latest checkpoint location:           7E/4C3989F8
Prior checkpoint location:            7E/49B6F770
Latest checkpoint's REDO location:    7E/49C3C350
Latest checkpoint's TimeLineID:       1
Latest checkpoint's NextXID:          0/116927458
Latest checkpoint's NextOID:          2168506
Latest checkpoint's NextMultiXactId:  899337
Latest checkpoint's NextMultiOffset:  2295779
Latest checkpoint's oldestXID:        1875
Latest checkpoint's oldestXID's DB:   1
Latest checkpoint's oldestActiveXID:  116927458
Time of latest checkpoint:            Tue Jan 20 03:00:01 2015
Minimum recovery ending location:     0/0
Backup start location:                0/0
Current wal_level setting:            hot_standby
Current max_connections setting:      1200
Current max_prepared_xacts setting:   64
Current max_locks_per_xact setting:   64
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Date/time type storage:               64-bit integers
Float4 argument passing:              by value
Float8 argument passing:              by value



# /usr/pgsql-9.4/bin/pg_controldata /var/lib/pgsql/9.4/data/
pg_control version number:            942
Catalog version number:               201409291
Database system identifier:           6106609909976182597
Database cluster state:               in production
pg_control last modified:             Thu Jan 22 10:48:57 2015
Latest checkpoint location:           7F/8F000830
Prior checkpoint location:            7F/8F000758
Latest checkpoint's REDO location:    7F/8F0007F8
Latest checkpoint's REDO WAL file:    000000010000007F0000008F
Latest checkpoint's TimeLineID:       1
Latest checkpoint's PrevTimeLineID:   1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:          0/116936094
Latest checkpoint's NextOID:          2168930
Latest checkpoint's NextMultiXactId:  899338
Latest checkpoint's NextMultiOffset:  0
Latest checkpoint's oldestXID:        1875
Latest checkpoint's oldestXID's DB:   16414
Latest checkpoint's oldestActiveXID:  116936094
Latest checkpoint's oldestMultiXid:   899337
Latest checkpoint's oldestMulti's DB: 1
Time of latest checkpoint:            Thu Jan 22 10:48:57 2015
Fake LSN counter for unlogged rels:   0/1
Minimum recovery ending location:     0/0
Min recovery ending loc's timeline:   0
Backup start location:                0/0
Backup end location:                  0/0
End-of-backup record required:        no
Current wal_level setting:            hot_standby
Current wal_log_hints setting:        off
Current max_connections setting:      1200
Current max_worker_processes setting: 8
Current max_prepared_xacts setting:   64
Current max_locks_per_xact setting:   64
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Size of a large-object chunk:         2048
Date/time type storage:               64-bit integers
Float4 argument passing:              by value
Float8 argument passing:              by value
Data page checksum version:           0



As soon as I got that error, I stop'ed the services and aborted my daily
restore routine.


Davi

Re: BUG #12617: DETAIL: Could not read from file "pg_subtrans/06F8" at offset 90112: Success.

From

Alvaro Herrera

Date:

22 January 2015, 15:24:07

davi vidal wrote:
> On Wed, Jan 21, 2015 at 1:07 PM, Alvaro Herrera <alvherre@2ndquadrant.com>
> wrote:
>
> > davividal@gmail.com wrote:
> >
> > > My scenario: I have a daily physical backup from my production server
> > > (9.1).
> > > I create a 9.1 cluster from it and upgrade it to 9.4 daily. After that, I
> > > deploy a bunch of changes using sqitch (sqitch.org). Again: I do it on a
> > > daily basis, but this is the first time I faced this issue:
> > >
> > > $ sqitch deploy -t sandbox views/sistema.sf_guard_user@HEAD
> > > Deploying changes through views/sistema.sf_guard_user to sandbox
> > >   + data_migration/rate_plan.payment_policies ...............
> > > psql:sql/deploy/data_migration/rate_plan.payment_policies.sql:21: ERROR:
> > > could not access status of transaction 116940611
> > > DETAIL:  Could not read from file "pg_subtrans/06F8" at offset 90112:
> > > Success.
> > > CONTEXT:  while updating tuple (6302,20) in relation
> > "rate_daily_policies"

So the mechanism for this error is that heap_update calls
XactLockTableWait (either directly or through MultiXactIdWait), which in
turn calls SubTransGetParent().  That's the source of the error (you can
see because the error context callback is installed with an oper code of
XLTW_Update, which is what generates that precise wording.)

Question is why can't it read from the file?  Presumably the file hasn't
been extended to that point yet, or maybe it doesn't even exist.  Maybe
a glitch in pg_upgrade?  I don't know how does the new cluster acquire
its pg_subtrans files after pg_upgrade.  Can you paste a listing of
$PGDATA/pg_subtrans?

The file should be created by ExtendSUBTRANS(), but this routine only
works at the first ID of each page.  I guess if you upgrade and the XID
counter ends up in the middle of a page, then try to read the parent XID
of some subxact, the file extension won't have happened.  To workaround
the problem, I think, you could just read a few hundred transactions
until the next pg_subtrans page extension takes place.

I haven't tested this hypothesis.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: BUG #12617: DETAIL: Could not read from file "pg_subtrans/06F8" at offset 90112: Success.

From

Alvaro Herrera

Date:

22 January 2015, 17:48:41

davi vidal wrote:
> On Thu, Jan 22, 2015 at 1:18 PM, Alvaro Herrera <alvherre@2ndquadrant.com>
> wrote:

> The 9.1 cluster is empty.
>
>
> # ls -a /var/lib/pgsql/9.4/data/pg_subtrans/
> .  ..  06F8

How big is this file?

> > I haven't tested this hypothesis.
>
> Sorry, I didn't understand anything.
> But I can do as much tests as you want/need.

Don't worry, that was mostly notes to other hackers and to myself.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: BUG #12617: DETAIL: Could not read from file "pg_subtrans/06F8" at offset 90112: Success.

From

davi vidal

Date:

22 January 2015, 20:29:41

On Thu, Jan 22, 2015 at 1:18 PM, Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

> davi vidal wrote:
> > On Wed, Jan 21, 2015 at 1:07 PM, Alvaro Herrera <
> alvherre@2ndquadrant.com>
> > wrote:
> >
> > > davividal@gmail.com wrote:
> > >
> > > > My scenario: I have a daily physical backup from my production server
> > > > (9.1).
> > > > I create a 9.1 cluster from it and upgrade it to 9.4 daily. After
> that, I
> > > > deploy a bunch of changes using sqitch (sqitch.org). Again: I do it
> on a
> > > > daily basis, but this is the first time I faced this issue:
> > > >
> > > > $ sqitch deploy -t sandbox views/sistema.sf_guard_user@HEAD
> > > > Deploying changes through views/sistema.sf_guard_user to sandbox
> > > >   + data_migration/rate_plan.payment_policies ...............
> > > > psql:sql/deploy/data_migration/rate_plan.payment_policies.sql:21:
> ERROR:
> > > > could not access status of transaction 116940611
> > > > DETAIL:  Could not read from file "pg_subtrans/06F8" at offset 90112:
> > > > Success.
> > > > CONTEXT:  while updating tuple (6302,20) in relation
> > > "rate_daily_policies"
>
> So the mechanism for this error is that heap_update calls
> XactLockTableWait (either directly or through MultiXactIdWait), which in
> turn calls SubTransGetParent().  That's the source of the error (you can
> see because the error context callback is installed with an oper code of
> XLTW_Update, which is what generates that precise wording.)
>
> Question is why can't it read from the file?  Presumably the file hasn't
> been extended to that point yet, or maybe it doesn't even exist.  Maybe
> a glitch in pg_upgrade?  I don't know how does the new cluster acquire
> its pg_subtrans files after pg_upgrade.  Can you paste a listing of
> $PGDATA/pg_subtrans?
>
>
The 9.1 cluster is empty.


# ls -a /var/lib/pgsql/9.4/data/pg_subtrans/
.  ..  06F8

# file /var/lib/pgsql/9.4/data/pg_subtrans/06F8
/var/lib/pgsql/9.4/data/pg_subtrans/06F8: data



> The file should be created by ExtendSUBTRANS(), but this routine only
> works at the first ID of each page.  I guess if you upgrade and the XID
> counter ends up in the middle of a page, then try to read the parent XID
> of some subxact, the file extension won't have happened.  To workaround
> the problem, I think, you could just read a few hundred transactions
> until the next pg_subtrans page extension takes place.
>
> I haven't tested this hypothesis.
>
>
Sorry, I didn't understand anything.
But I can do as much tests as you want/need.


Davi

Re: BUG #12617: DETAIL: Could not read from file "pg_subtrans/06F8" at offset 90112: Success.

From

davi vidal

Date:

22 January 2015, 20:30:40

On Thu, Jan 22, 2015 at 3:48 PM, Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

> davi vidal wrote:
> > On Thu, Jan 22, 2015 at 1:18 PM, Alvaro Herrera <
> alvherre@2ndquadrant.com>
> > wrote:
>
> > The 9.1 cluster is empty.
> >
> >
> > # ls -a /var/lib/pgsql/9.4/data/pg_subtrans/
> > .  ..  06F8
>
> How big is this file?
>
>
40 KB

I can't open it using vim.


Davi

Re: BUG #12617: DETAIL: Could not read from file "pg_subtrans/06F8" at offset 90112: Success.

From

Alvaro Herrera

Date:

22 January 2015, 20:39:37

davi vidal wrote:
> On Thu, Jan 22, 2015 at 3:48 PM, Alvaro Herrera <alvherre@2ndquadrant.com>
> wrote:
>
> > davi vidal wrote:
> > > On Thu, Jan 22, 2015 at 1:18 PM, Alvaro Herrera <
> > alvherre@2ndquadrant.com>
> > > wrote:
> >
> > > The 9.1 cluster is empty.
> > >
> > >
> > > # ls -a /var/lib/pgsql/9.4/data/pg_subtrans/
> > > .  ..  06F8
> >
> > How big is this file?
> >
> 40 KB
>
> I can't open it using vim.

Yeah, it's a binary file, don't worry about that.

Anyway that's pretty odd.  I assume that your blocksize is 8 kB.  Then
this file is 5 pages long, but the system is asking for data that should
be in page 11.  For some reason 6 page extensions in pg_subtrans were
ignored?  That sounds really odd.

I'm also surprised that the subdir in 9.1 is empty.  The file certainly
should still be there.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services