Re: Parallel scan with SubTransGetTopmostTransaction assert coredump - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Parallel scan with SubTransGetTopmostTransaction assert coredump |
Date | |
Msg-id | CA+TgmoaZLxM+=L_MZ+Mm-iLBUDHWcqEuLv7Ami_oqxU3W2kZug@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel scan with SubTransGetTopmostTransaction assert coredump (Greg Nancarrow <gregn4422@gmail.com>) |
Responses |
Re: Parallel scan with SubTransGetTopmostTransaction assert coredump
|
List | pgsql-hackers |
On Wed, Aug 4, 2021 at 10:03 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > In setting up the snapshot for the execution state used in command > execution, GetTransactionSnapshot() is called and (possibly a copy of) > the returned snapshot is pushed as the ActiveSnapshot. I mean, there are LOTS of PushActiveSnapshot() calls in the code. A lot of those specifically say PushActiveSnapshot(GetTransactionSnapshot()) but others are pushing snapshots obtained from various other places. I really don't think it can possibly be correct in general to assume that the snapshot on top of the active snapshot stack is the same as the transaction snapshot. > So why (current Postgres code, no patches applied) in setting up for > parallel-worker execution (in InitializeParallelDSM) does the Postgres > code then acquire ANOTHER TransactionSnapshot (by calling > GetTransactionSnashot(), which could return CurrentSnapshot or a new > snapshot) and serialize that, as well as serializing what the > ActiveSnapshot points to, and then restore those in the workers as two > separate snapshots? Is it a mistake? Or if intentional and correct, > how so? Well, I already agreed that in cases where GetTransactionSnapshot() will acquire an altogether new snapshot, we shouldn't call it, but beyond that I don't see why you think this is wrong. I mean, suppose we only call GetTransactionSnapshot() at parallel worker when IsolationUsesXactSnapshot(). In that case, CurrentSnapshot is a durable, transaction-lifetime piece of backend-local state that can affect the results of future calls to GetTransactionSnapshot(), and therefore seems to need to be replicated to workers. Separately, regardless of IsolationUsesXactSnapshot(), the active snapshot is accessible via calls to GetActiveSnapshot() and therefore should also be replicated to workers. Now, I don't know of any necessary reason why those two things need to be the same, because again, there are PushActiveSnapshot() calls all over the place, and they're not all pushing the transaction snapshot. So therefore it makes sense to me that we need to replicate those two things separately - the active snapshot in the leader becomes the active snapshot in the workers and the transaction snapshot in the leader becomes the transaction snapshot in the worker. Now there is evidently something wrong with this line of reasoning because the code is buggy and my proposed fix doesn't work, but I don't know what is wrong with it. You seem to think that it's crazy that we try to replicate the active snapshot to the active snapshot and the transaction snapshot to the transaction snapshot, but that did, and still does, seem really sane to me. The only part that now seems clearly wrong to me is that !IsolationUsesXactSnapshot() case takes an *extra* snapshot, but since fixing that didn't fix the bug, there's evidently more to the problem than that. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: