On Wed, Dec 11, 2024 at 12:37 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> I confirmed that the proposed patch fixes these issues. I have one
> question about the patch:
>
> In the main loop in SnapBuildDistributeSnapshotAndInval(), we have the
> following code:
>
> /*
> * If we don't have a base snapshot yet, there are no changes in this
> * transaction which in turn implies we don't yet need a snapshot at
> * all. We'll add a snapshot when the first change gets queued.
> *
> * NB: This works correctly even for subtransactions because
> * ReorderBufferAssignChild() takes care to transfer the base snapshot
> * to the top-level transaction, and while iterating the changequeue
> * we'll get the change from the subtxn.
> */
> if (!ReorderBufferXidHasBaseSnapshot(builder->reorder, txn->xid))
> continue;
>
> Is there any case where we need to distribute inval messages to
> transactions that don't have the base snapshot yet but eventually need
> the inval messages?
>
Good point. It is mentioned that for snapshots: "We'll add a snapshot
when the first change gets queued.". I think we achieve this via
builder->committed.xip array such that when we set a base snapshot for
a transaction, we use that array to form a snapshot. However, I don't
see any such consideration for invalidations. Now, we could either
always add invalidations to xacts that don't have base_snapshot yet or
have a mechanism similar committed.xid array. But it is better to
first reproduce the problem.
> Overall, with this idea, we distribute invalidation messages to all
> concurrent decoded transactions. It could introduce performance
> regressions by several causes. For example, we could end up
> invalidating RelationSyncCache entries in more cases. While this is
> addressed by your selectively cache invalidation patch, there is still
> 5% regression. We might need to accept a certain amount of regressions
> for making it correct but it would be better to figure out where these
> regressions come from. Other than that, I think the performance
> regression could happen due to the costs of distributing invalidation
> messages. You've already observed there is 1~3% performance regression
> in cases where we distribute a large amount of invalidation messages
> to one concurrently decoded transaction[1]. I guess that the
> selectively cache invalidation idea would not help this case. Also, I
> think we might want to test other cases like where we distribute a
> small amount of invalidation messages to many concurrently decoded
> transactions.
>
+1.
--
With Regards,
Amit Kapila.