Thread: heap_delete, heap_mark4update must reset t_ctid
I have been looking at an example of the "no one parent tuple found" VACUUM error provided by Mario Weilguni. It appears to me that VACUUM is getting confused by a tuple that looks like so in pg_filedump: Item 4 -- Length: 249 Offset: 31616 (0x7b80) Flags: USED OID: 0 CID: min(240) max(18) XID: min(5691267) max(6484551)Block Id: 1 linp Index: 1 Attributes: 38 Size: 40 infomask: 0x3503 (HASNULL|HASVARLENA|XMIN_COMMITTED|XMAX_COMMITTED|MARKED_FOR_UPDATE|UPDATED) Notice that the t_ctid field is not pointing to this tuple, but to a different item on the same page (which in fact is an unused item). This causes VACUUM to believe that the tuple is part of an update chain. But in point of fact it is not part of a chain (indeed there are *no* chains in the test relation, thus leading to the observed failure). As near as I can tell, the sequence of events was: 1. this row was updated by a transaction that stored the updated version in lineindex 1, but later aborted. t_ctid is left pointing to linp 1. 2. Some other transaction came along, marked the row FOR UPDATE, and committed (with no actual update). So we now have XMAX_COMMITTED and t_ctid != t_self, which looks way too much like a tuple that's been updated, when in fact it is the latest good version of its row. I think an appropriate fix would be to reset t_ctid to equal t_self whenever we clear XMAX_INVALID, which in practice means heap_delete and heap_mark4update need to do this. (heap_update also clears XMAX_INVALID, but of course it's setting t_ctid to point to the updated tuple.) Comments? regards, tom lane
Tom, When/if you have a patch for this, I would like to test it. I still have a copy of a database showing the same problem that I would like to test this on when it is ready. thanks, --Barry Tom Lane wrote: >I have been looking at an example of the "no one parent tuple found">VACUUM error provided by Mario Weilguni. It appearsto me that VACUUM>is getting confused by a tuple that looks like so in pg_filedump:>> Item 4 -- Length: 249 Offset:31616 (0x7b80) Flags: USED> OID: 0 CID: min(240) max(18) XID: min(5691267) max(6484551)> Block Id: 1 linp Index:1 Attributes: 38 Size: 40> infomask: 0x3503 (HASNULL|HASVARLENA|XMIN_COMMITTED|XMAX_COMMITTED|MARKED_FOR_UPDATE|UPDATED)>>Notice that the t_ctid field is not pointingto this tuple, but to a>different item on the same page (which in fact is an unused item).>This causes VACUUM tobelieve that the tuple is part of an update chain.>But in point of fact it is not part of a chain (indeed there are *no*>chainsin the test relation, thus leading to the observed failure).>>As near as I can tell, the sequence of events was:>>1.this row was updated by a transaction that stored the updated version>in lineindex 1, but later aborted. t_ctidis left pointing to linp 1.>>2. Some other transaction came along, marked the row FOR UPDATE, and>committed (with noactual update).>>So we now have XMAX_COMMITTED and t_ctid != t_self, which looks way too>much like a tuple that's beenupdated, when in fact it is the latest>good version of its row.>>I think an appropriate fix would be to reset t_ctidto equal t_self>whenever we clear XMAX_INVALID, which in practice means heap_delete and>heap_mark4update need to dothis. (heap_update also clears>XMAX_INVALID, but of course it's setting t_ctid to point to the updated>tuple.)>>Comments?>> regards, tom lane>>---------------------------(end of broadcast)--------------------------->TIP6: Have you searched our list archives?>>http://archives.postgresql.org>>>
Has this been fixed? I think we did. --------------------------------------------------------------------------- Tom Lane wrote: > I have been looking at an example of the "no one parent tuple found" > VACUUM error provided by Mario Weilguni. It appears to me that VACUUM > is getting confused by a tuple that looks like so in pg_filedump: > > Item 4 -- Length: 249 Offset: 31616 (0x7b80) Flags: USED > OID: 0 CID: min(240) max(18) XID: min(5691267) max(6484551) > Block Id: 1 linp Index: 1 Attributes: 38 Size: 40 > infomask: 0x3503 (HASNULL|HASVARLENA|XMIN_COMMITTED|XMAX_COMMITTED|MARKED_FOR_UPDATE|UPDATED) > > Notice that the t_ctid field is not pointing to this tuple, but to a > different item on the same page (which in fact is an unused item). > This causes VACUUM to believe that the tuple is part of an update chain. > But in point of fact it is not part of a chain (indeed there are *no* > chains in the test relation, thus leading to the observed failure). > > As near as I can tell, the sequence of events was: > > 1. this row was updated by a transaction that stored the updated version > in lineindex 1, but later aborted. t_ctid is left pointing to linp 1. > > 2. Some other transaction came along, marked the row FOR UPDATE, and > committed (with no actual update). > > So we now have XMAX_COMMITTED and t_ctid != t_self, which looks way too > much like a tuple that's been updated, when in fact it is the latest > good version of its row. > > I think an appropriate fix would be to reset t_ctid to equal t_self > whenever we clear XMAX_INVALID, which in practice means heap_delete and > heap_mark4update need to do this. (heap_update also clears > XMAX_INVALID, but of course it's setting t_ctid to point to the updated > tuple.) > > Comments? > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Has this been fixed? I think we did. Yes. regards, tom lane
As you can see, there is a lot of cruft left in my mailbox, but there are some items that we left behind that may be fixable before 7.3. --------------------------------------------------------------------------- Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Has this been fixed? I think we did. > > Yes. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073