Thread: ON CONFLICT (and manual row locks) cause xmax of updated tuple tounnecessarily be set

ON CONFLICT (and manual row locks) cause xmax of updated tuple tounnecessarily be set

From

Andres Freund

Date:

24 July 2019, 23:24:39

Hi,


Scenario is a very plain upsert:

CREATE TABLE upsert(key int primary key);
INSERT INTO upsert VALUES(1) ON CONFLICT (key) DO UPDATE SET key = excluded.key;
INSERT INTO upsert VALUES(1) ON CONFLICT (key) DO UPDATE SET key = excluded.key;
INSERT 0 1
INSERT 0 1

postgres[8755][1]=# SELECT page_items.* FROM generate_series(0, pg_relation_size('upsert'::regclass::text) / 8192 - 1)
blkno,get_raw_page('upsert'::regclass::text, blkno::int4) AS raw_page, heap_page_items(raw_page) as page_items;
 

┌────┬────────┬──────────┬────────┬──────────┬──────────┬──────────┬────────┬─────────────┬────────────┬────────┬────────┬────────┬────────────┐
│ lp │ lp_off │ lp_flags │ lp_len │  t_xmin  │  t_xmax  │ t_field3 │ t_ctid │ t_infomask2 │ t_infomask │ t_hoff │
t_bits│ t_oid  │   t_data   │
 

├────┼────────┼──────────┼────────┼──────────┼──────────┼──────────┼────────┼─────────────┼────────────┼────────┼────────┼────────┼────────────┤
│  1 │   8160 │        1 │     28 │ 19742591 │ 19742592 │        0 │ (0,2)  │       24577 │        256 │     24 │
(null)│ (null) │ \x01000000 │
 
│  2 │   8128 │        1 │     28 │ 19742592 │ 19742592 │        0 │ (0,2)  │       32769 │       8336 │     24 │
(null)│ (null) │ \x01000000 │
 

└────┴────────┴──────────┴────────┴──────────┴──────────┴──────────┴────────┴─────────────┴────────────┴────────┴────────┴────────┴────────────┘
(2 rows)

as you can see the same xmax is set for both row version, with the new
infomask being  HEAP_XMAX_KEYSHR_LOCK | HEAP_XMAX_LOCK_ONLY | HEAP_UPDATED.


The reason that happens is that ExecOnConflictUpdate() needs to lock the
row to be able to reliably compute a new tuple version based on that
row. heap_update() then decides it needs to carry that xmax forward, as
it's a valid lock:


        /*
         * If the tuple we're updating is locked, we need to preserve the locking
         * info in the old tuple's Xmax.  Prepare a new Xmax value for this.
         */
        compute_new_xmax_infomask(HeapTupleHeaderGetRawXmax(oldtup.t_data),
                                                          oldtup.t_data->t_infomask,
                                                          oldtup.t_data->t_infomask2,
                                                          xid, *lockmode, true,
                                                          &xmax_old_tuple, &infomask_old_tuple,
                                                          &infomask2_old_tuple);

        /*
         * And also prepare an Xmax value for the new copy of the tuple.  If there
         * was no xmax previously, or there was one but all lockers are now gone,
         * then use InvalidXid; otherwise, get the xmax from the old tuple.  (In
         * rare cases that might also be InvalidXid and yet not have the
         * HEAP_XMAX_INVALID bit set; that's fine.)
         */
        if ((oldtup.t_data->t_infomask & HEAP_XMAX_INVALID) ||
                HEAP_LOCKED_UPGRADED(oldtup.t_data->t_infomask) ||
                (checked_lockers && !locker_remains))
                xmax_new_tuple = InvalidTransactionId;
        else
                xmax_new_tuple = HeapTupleHeaderGetRawXmax(oldtup.t_data);

but we really don't need to do any of that in this case - the only
locker is the current backend, after all.

I think this isn't great, because it'll later will cause unnecessary
hint bit writes (although ones somewhat likely combined with setting
XMIN_COMMITTED), and even extra work for freezing.

Based on a quick look this wasn't the case before the finer grained
tuple locking - which makes sense, there was no cases where locks would
need to be carried forward.


It's worthwhile to note that this *nearly* already works, because of the
following codepath in compute_new_xmax_infomask():

         * If the lock to be acquired is for the same TransactionId as the
         * existing lock, there's an optimization possible: consider only the
         * strongest of both locks as the only one present, and restart.
         */
        if (xmax == add_to_xmax)
        {
            /*
             * Note that it's not possible for the original tuple to be updated:
             * we wouldn't be here because the tuple would have been invisible and
             * we wouldn't try to update it.  As a subtlety, this code can also
             * run when traversing an update chain to lock future versions of a
             * tuple.  But we wouldn't be here either, because the add_to_xmax
             * would be different from the original updater.
             */
            Assert(HEAP_XMAX_IS_LOCKED_ONLY(old_infomask));

            /* acquire the strongest of both */
            if (mode < old_mode)
                mode = old_mode;
            /* mustn't touch is_update */

            old_infomask |= HEAP_XMAX_INVALID;
            goto l5;
        }

which set HEAP_XMAX_INVALID for old_infomask, which would then trigger
the code above not carrying forward xmax to the new tuple - but
compute_new_xmax_infomask() operates on a copy of old_infomask.


Note that this contradict comments in heap_update() itself:

         * If we found a valid Xmax for the new tuple, then the infomask bits
         * to use on the new tuple depend on what was there on the old one.
         * Note that since we're doing an update, the only possibility is that
         * the lockers had FOR KEY SHARE lock.
         */

which seems to indicate that this behaviour wasn't forseen.


I find the separation of concerns (and variable naming) between
computations happening in heap_update() itself, and
compute_new_xmax_infomask() fairly confusing and redundant.

I mean, compute_new_xmax_infomask() expands multis, builds a new one
with the updater added. Then re-expands it via GetMultiXactIdHintBits(),
to compute infomask bits for the old tuple. Then returns. Just for
heap_update() to re-re-expand the multi to compute the infomask bits for
the new tuple, again with GetMultiXactIdHintBits()?  I know that there's
a cache, but still. That's some seriously repetitive work.

Oh, and the things GetMultiXactIdHintBits() returns imo aren't really
well described with hint bits.

Greetings,

Andres Freund

Re: ON CONFLICT (and manual row locks) cause xmax of updated tuple tounnecessarily be set

From

Peter Geoghegan

Date:

25 July 2019, 00:14:39

On Wed, Jul 24, 2019 at 4:24 PM Andres Freund <andres@anarazel.de> wrote:
> as you can see the same xmax is set for both row version, with the new
> infomask being  HEAP_XMAX_KEYSHR_LOCK | HEAP_XMAX_LOCK_ONLY | HEAP_UPDATED.

Meta remark about your test case: I am a big fan of microbenchmarks
like this, which execute simple DML queries using a single connection,
and then consider if the on-disk state looks as good as expected, for
some value of "good". I had a lot of success with this approach while
developing the v12 work on nbtree, where I went to the trouble of
automating everything. The same test suite also helped with the nbtree
compression/deduplication patch just today.

I like to call these tests "wind tunnel tests". It's far from obvious
that you can take a totally synthetic, serial test, and use it to
measure something that is important to real workloads. It seems to
work well when there is a narrow, specific thing that you're
interested in. This is especially true when there is a real risk of
regressing performance in some way.

> but we really don't need to do any of that in this case - the only
> locker is the current backend, after all.
>
> I think this isn't great, because it'll later will cause unnecessary
> hint bit writes (although ones somewhat likely combined with setting
> XMIN_COMMITTED), and even extra work for freezing.
>
> Based on a quick look this wasn't the case before the finer grained
> tuple locking - which makes sense, there was no cases where locks would
> need to be carried forward.

I agree that this is unfortunate. Are you planning on working on it?

-- 
Peter Geoghegan

Re: ON CONFLICT (and manual row locks) cause xmax of updated tupleto unnecessarily be set

From

Andres Freund

Date:

25 July 2019, 22:10:00

Hi,

On 2019-07-24 17:14:39 -0700, Peter Geoghegan wrote:
> On Wed, Jul 24, 2019 at 4:24 PM Andres Freund <andres@anarazel.de> wrote:
> > but we really don't need to do any of that in this case - the only
> > locker is the current backend, after all.
> >
> > I think this isn't great, because it'll later will cause unnecessary
> > hint bit writes (although ones somewhat likely combined with setting
> > XMIN_COMMITTED), and even extra work for freezing.
> >
> > Based on a quick look this wasn't the case before the finer grained
> > tuple locking - which makes sense, there was no cases where locks would
> > need to be carried forward.
> 
> I agree that this is unfortunate. Are you planning on working on it?

Not at the moment, no. Are you planning / hoping to take a stab at it?

Greetings,

Andres Freund

Re: ON CONFLICT (and manual row locks) cause xmax of updated tuple tounnecessarily be set

From

Peter Geoghegan

Date:

26 July 2019, 00:02:21

On Thu, Jul 25, 2019 at 3:10 PM Andres Freund <andres@anarazel.de> wrote:
> > I agree that this is unfortunate. Are you planning on working on it?
>
> Not at the moment, no. Are you planning / hoping to take a stab at it?

The current behavior ought to be fixed, and it seems like it falls to
me to do that. OTOH, anything that's MultiXact adjacent makes my eyes
water. I don't consider myself to be particularly well qualified.

I'm sure that I could quickly find a way of making the behavior you
have pointed out match what is expected without causing any regression
tests to fail, but that's the easy part.

--
Peter Geoghegan