Thread: Wrong assert in TransactionGroupUpdateXidStatus
While testing, my colleague Vignesh has hit an assert in TransactionGroupUpdateXidStatus. But that is not reproducible. After some analysis and code review, I have found the reason for the same. As shown in the below code, there is an assert in TransactionGroupUpdateXidStatus, which assumes that an overflowed transaction can never get registered for the group update. But, actually, that is not true because while registering the transaction for group update, we only check how many committed children this transaction has because all aborted sub-transaction would have already updated their status. So if the transaction once overflowed but later all its children are aborted (i.e remaining committed children are <= THRESHOLD_SUBTRANS_CLOG_OPT) then it will be registered for the group update. /* * Overflowed transactions should not use group XID status update * mechanism. */ Assert(!pgxact->overflowed); A solution could be either we remove this assert or change this assert to Assert(pgxact->nxids <= THRESHOLD_SUBTRANS_CLOG_OPT); Note: I could not come up with the reproducible test case as we can not ensure whether a backend will try to group updates or not because that depends upon whether it gets the CLogControlLock or not. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Attachment
Hi, Amit, Robert, IIRC that's mostly your feature? On 2019-12-10 13:55:40 +0530, Dilip Kumar wrote: > While testing, my colleague Vignesh has hit an assert in > TransactionGroupUpdateXidStatus. But that is not reproducible. After > some analysis and code review, I have found the reason for the same. > > As shown in the below code, there is an assert in > TransactionGroupUpdateXidStatus, which assumes that an overflowed > transaction can never get registered for the group update. But, > actually, that is not true because while registering the transaction > for group update, we only check how many committed children this > transaction has because all aborted sub-transaction would have already > updated their status. So if the transaction once overflowed but later > all its children are aborted (i.e remaining committed children are <= > THRESHOLD_SUBTRANS_CLOG_OPT) then it will be registered for the group > update. > /* > * Overflowed transactions should not use group XID status update > * mechanism. > */ > Assert(!pgxact->overflowed); > > A solution could be either we remove this assert or change this assert > to Assert(pgxact->nxids <= THRESHOLD_SUBTRANS_CLOG_OPT); Maybe I'm missing something, but isn't this a bug then? IIRC We can't rely on MyProc->subxids once we overflowed, even if since then the remaining number of children has become low enough. It seems to me that the actual fix here is to correct the condition in TransactionIdSetPageStatus() checking whether group updates are possible - it seems it'd need to verify that the transaction isn't overflowed. Also, it's somewhat odd that TransactionIdSetPageStatus() first has /* Can't use group update when PGPROC overflows. */ StaticAssertStmt(THRESHOLD_SUBTRANS_CLOG_OPT <= PGPROC_MAX_CACHED_SUBXIDS, "group clog threshold less than PGPROC cached subxids"); and then, within an if(): /* * We don't try to do group update optimization if a process has * overflowed the subxids array in its PGPROC, since in that case we * don't have a complete list of XIDs for it. */ Assert(THRESHOLD_SUBTRANS_CLOG_OPT <= PGPROC_MAX_CACHED_SUBXIDS); Even if these weren't redundant, it can't make sense to test such a static condition only within an if? Is it possible this was actually intended to test something different? Greetings, Andres Freund
On Wed, Dec 11, 2019 at 4:02 AM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > Amit, Robert, IIRC that's mostly your feature? > I will look into this today. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Wed, Dec 11, 2019 at 4:02 AM Andres Freund <andres@anarazel.de> wrote: > On 2019-12-10 13:55:40 +0530, Dilip Kumar wrote: > > > /* > > * Overflowed transactions should not use group XID status update > > * mechanism. > > */ > > Assert(!pgxact->overflowed); > > > > A solution could be either we remove this assert or change this assert > > to Assert(pgxact->nxids <= THRESHOLD_SUBTRANS_CLOG_OPT); > > Maybe I'm missing something, but isn't this a bug then? IIRC We can't > rely on MyProc->subxids once we overflowed, even if since then the > remaining number of children has become low enough. > AFAICS, the MyProc->subxids is maintained properly if the number of subtransactions is lesser than PGPROC_MAX_CACHED_SUBXIDS (64). Can you explain the case where that won't be true? Also, even if what you are saying is true, I think the memcmp in TransactionIdSetPageStatus should avoid taking us a wrong decision. > > Also, it's somewhat odd that TransactionIdSetPageStatus() first has > > /* Can't use group update when PGPROC overflows. */ > StaticAssertStmt(THRESHOLD_SUBTRANS_CLOG_OPT <= PGPROC_MAX_CACHED_SUBXIDS, > "group clog threshold less than PGPROC cached subxids"); > > and then, within an if(): > > /* > * We don't try to do group update optimization if a process has > * overflowed the subxids array in its PGPROC, since in that case we > * don't have a complete list of XIDs for it. > */ > Assert(THRESHOLD_SUBTRANS_CLOG_OPT <= PGPROC_MAX_CACHED_SUBXIDS); > > Even if these weren't redundant, it can't make sense to test such a > static condition only within an if? > I don't remember exactly the reason for this, but now I don't find the Assert within if () meaningful. I think we should remove the Assert inside if() unless Robert or someone see any use of it. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Wed, Dec 11, 2019 at 11:00 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Dec 11, 2019 at 4:02 AM Andres Freund <andres@anarazel.de> wrote: > > On 2019-12-10 13:55:40 +0530, Dilip Kumar wrote: > > > > > /* > > > * Overflowed transactions should not use group XID status update > > > * mechanism. > > > */ > > > Assert(!pgxact->overflowed); > > > > > > A solution could be either we remove this assert or change this assert > > > to Assert(pgxact->nxids <= THRESHOLD_SUBTRANS_CLOG_OPT); > > > > Maybe I'm missing something, but isn't this a bug then? IIRC We can't > > rely on MyProc->subxids once we overflowed, even if since then the > > remaining number of children has become low enough. > > > > AFAICS, the MyProc->subxids is maintained properly if the number of > subtransactions is lesser than PGPROC_MAX_CACHED_SUBXIDS (64). Can > you explain the case where that won't be true? Also, even if what you > are saying is true, I think the memcmp in TransactionIdSetPageStatus > should avoid taking us a wrong decision. > I am able to reproduce the issue by reducing the values of PGPROC_MAX_CACHED_SUBXIDS and THRESHOLD_SUBTRANS_CLOG_OPT to 2. Below is what I did after reducing the values: Session-1 -------------- postgres=# begin; BEGIN postgres=# insert into t1 values(1); INSERT 0 1 postgres=# savepoint s1; SAVEPOINT postgres=# insert into t1 values(2); INSERT 0 1 postgres=# savepoint s2; SAVEPOINT postgres=# insert into t1 values(3); INSERT 0 1 postgres=# savepoint s3; SAVEPOINT postgres=# insert into t1 values(4); INSERT 0 1 postgres=# rollback to s2; ROLLBACK Session-2 --------------- insert into t1 values(4); -- attach debugger and stop in TransactionIdSetPageStatus after acquiring CLogControlLock Session-1 --------------- Commit; -- This will wait to acquire CLogControlLock in a group update path (TransactionGroupUpdateXidStatus). Now, continue in the session-2 debugger. After that continue in session-1's debugger and it will hit the Assert. The attached patch fixes it by changing the Assert. I have additionally removed the redundant Assert in TransactionIdSetPageStatus as pointed out by Andres. I am planning to commit and backpatch this early next week (Monday) unless someone wants to review it further or has objections to it. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Attachment
On 2019-Dec-11, Amit Kapila wrote: > On Wed, Dec 11, 2019 at 4:02 AM Andres Freund <andres@anarazel.de> wrote: > > On 2019-12-10 13:55:40 +0530, Dilip Kumar wrote: > > /* > > * We don't try to do group update optimization if a process has > > * overflowed the subxids array in its PGPROC, since in that case we > > * don't have a complete list of XIDs for it. > > */ > > Assert(THRESHOLD_SUBTRANS_CLOG_OPT <= PGPROC_MAX_CACHED_SUBXIDS); > > > > Even if these weren't redundant, it can't make sense to test such a > > static condition only within an if? > > I don't remember exactly the reason for this, but now I don't find the > Assert within if () meaningful. I think we should remove the Assert > inside if() unless Robert or someone see any use of it. The more I look at both these asserts, the less sense they make. Why does clog.c care about PGPROC at all? Looking at the callers of that routine, nowhere do they concern themselves with whether the overflowed flag has been set or not. It seems to me that the StaticAssert() should be near the PGPROC_MAX_CACHED_SUBXIDS definition, not the SUBTRANS definition (maybe as StaticAssertDecl, as in 201DD0641B056142AC8C6645EC1B5F62014B8E8030@SYD1217 ) -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Dec 12, 2019 at 6:10 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > On 2019-Dec-11, Amit Kapila wrote: > > > On Wed, Dec 11, 2019 at 4:02 AM Andres Freund <andres@anarazel.de> wrote: > > > On 2019-12-10 13:55:40 +0530, Dilip Kumar wrote: > > > > /* > > > * We don't try to do group update optimization if a process has > > > * overflowed the subxids array in its PGPROC, since in that case we > > > * don't have a complete list of XIDs for it. > > > */ > > > Assert(THRESHOLD_SUBTRANS_CLOG_OPT <= PGPROC_MAX_CACHED_SUBXIDS); > > > > > > Even if these weren't redundant, it can't make sense to test such a > > > static condition only within an if? > > > > I don't remember exactly the reason for this, but now I don't find the > > Assert within if () meaningful. I think we should remove the Assert > > inside if() unless Robert or someone see any use of it. > > The more I look at both these asserts, the less sense they make. Why > does clog.c care about PGPROC at all? > It is mainly for group updates. Basically, we want to piggyback the procs that are trying to update clog at the same time on the proc which got the CLogControlLock. This avoids taking/releasing that lock multiple times. See TransactionGroupUpdateXidStatus. > Looking at the callers of that > routine, nowhere do they concern themselves with whether the overflowed > flag has been set or not. It seems to me that the StaticAssert() should > be near the PGPROC_MAX_CACHED_SUBXIDS definition, not the SUBTRANS > definition (maybe as StaticAssertDecl, as in > 201DD0641B056142AC8C6645EC1B5F62014B8E8030@SYD1217 ) > Sounds reasonable. We can do that once the patch mentioned by you got committed. For now, we are planning to just remove the Assert inside if() condition. Do you see any problem with that? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On 2019-Dec-12, Amit Kapila wrote: > On Thu, Dec 12, 2019 at 6:10 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > The more I look at both these asserts, the less sense they make. Why > > does clog.c care about PGPROC at all? > > It is mainly for group updates. Basically, we want to piggyback the > procs that are trying to update clog at the same time on the proc > which got the CLogControlLock. This avoids taking/releasing that lock > multiple times. See TransactionGroupUpdateXidStatus. Yeah, I (think I) understand that. My point is that conceptually, the fact that a PGPROC has overflowed does not really affect clog.c in any way. > > Looking at the callers of that routine, nowhere do they concern > > themselves with whether the overflowed > > flag has been set or not. It seems to me that the StaticAssert() should > > be near the PGPROC_MAX_CACHED_SUBXIDS definition, not the SUBTRANS > > definition (maybe as StaticAssertDecl, as in > > 201DD0641B056142AC8C6645EC1B5F62014B8E8030@SYD1217 ) > > Sounds reasonable. We can do that once the patch mentioned by you got > committed. For now, we are planning to just remove the Assert inside > if() condition. Do you see any problem with that? Nope. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Dec 10, 2019 at 4:32 PM Andres Freund <andres@anarazel.de> wrote: > and then, within an if(): > > /* > * We don't try to do group update optimization if a process has > * overflowed the subxids array in its PGPROC, since in that case we > * don't have a complete list of XIDs for it. > */ > Assert(THRESHOLD_SUBTRANS_CLOG_OPT <= PGPROC_MAX_CACHED_SUBXIDS); > > Even if these weren't redundant, it can't make sense to test such a > static condition only within an if? Is it possible this was actually > intended to test something different? Based on the comment, I imagine it might've been intended to read Assert(nsubxids <= PGPROC_MAX_CACHED_SUBXIDS). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Dec 12, 2019 at 8:44 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Tue, Dec 10, 2019 at 4:32 PM Andres Freund <andres@anarazel.de> wrote: > > and then, within an if(): > > > > /* > > * We don't try to do group update optimization if a process has > > * overflowed the subxids array in its PGPROC, since in that case we > > * don't have a complete list of XIDs for it. > > */ > > Assert(THRESHOLD_SUBTRANS_CLOG_OPT <= PGPROC_MAX_CACHED_SUBXIDS); > > > > Even if these weren't redundant, it can't make sense to test such a > > static condition only within an if? Is it possible this was actually > > intended to test something different? > > Based on the comment, I imagine it might've been intended to read > Assert(nsubxids <= PGPROC_MAX_CACHED_SUBXIDS). > Do you think we need such an Assert after having StaticAssert for (THRESHOLD_SUBTRANS_CLOG_OPT <= PGPROC_MAX_CACHED_SUBXIDS) and then an if statement containing (nsubxids <= THRESHOLD_SUBTRANS_CLOG_OPT) just before this Assert? Sure, we can keep this for extra safety, but I don't see the need for it. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, Dec 12, 2019 at 9:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > Do you think we need such an Assert after having StaticAssert for > (THRESHOLD_SUBTRANS_CLOG_OPT <= PGPROC_MAX_CACHED_SUBXIDS) and then > an if statement containing (nsubxids <= THRESHOLD_SUBTRANS_CLOG_OPT) > just before this Assert? Sure, we can keep this for extra safety, but > I don't see the need for it. I don't have strong feelings about it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sun, Dec 15, 2019 at 8:51 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Dec 12, 2019 at 9:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > Do you think we need such an Assert after having StaticAssert for > > (THRESHOLD_SUBTRANS_CLOG_OPT <= PGPROC_MAX_CACHED_SUBXIDS) and then > > an if statement containing (nsubxids <= THRESHOLD_SUBTRANS_CLOG_OPT) > > just before this Assert? Sure, we can keep this for extra safety, but > > I don't see the need for it. > > I don't have strong feelings about it. > Okay, in that case, I am planning to push this patch [1] tomorrow morning unless I see any other comments. I am also planning to backpatch this through 10 where it got introduced, even though this is not a serious bug, but I think it is better to keep the code consistent in back branches. [1] - https://www.postgresql.org/message-id/CAA4eK1JZ5EipQ8Ta6eLMX_ni3CNtZDUrvHg0th1C8n%3D%2Bk%2B0ojg%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Mon, Dec 16, 2019 at 8:53 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Sun, Dec 15, 2019 at 8:51 AM Robert Haas <robertmhaas@gmail.com> wrote: > > > > On Thu, Dec 12, 2019 at 9:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > Do you think we need such an Assert after having StaticAssert for > > > (THRESHOLD_SUBTRANS_CLOG_OPT <= PGPROC_MAX_CACHED_SUBXIDS) and then > > > an if statement containing (nsubxids <= THRESHOLD_SUBTRANS_CLOG_OPT) > > > just before this Assert? Sure, we can keep this for extra safety, but > > > I don't see the need for it. > > > > I don't have strong feelings about it. > > > > Okay, in that case, I am planning to push this patch [1] tomorrow > morning unless I see any other comments. I am also planning to > backpatch this through 10 where it got introduced, > This was introduced in 11, so pushed and backpatched through 11. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com