Thread: ON CONFLICT (and manual row locks) cause xmax of updated tuple tounnecessarily be set
ON CONFLICT (and manual row locks) cause xmax of updated tuple tounnecessarily be set
From
Andres Freund
Date:
Hi, Scenario is a very plain upsert: CREATE TABLE upsert(key int primary key); INSERT INTO upsert VALUES(1) ON CONFLICT (key) DO UPDATE SET key = excluded.key; INSERT INTO upsert VALUES(1) ON CONFLICT (key) DO UPDATE SET key = excluded.key; INSERT 0 1 INSERT 0 1 postgres[8755][1]=# SELECT page_items.* FROM generate_series(0, pg_relation_size('upsert'::regclass::text) / 8192 - 1) blkno,get_raw_page('upsert'::regclass::text, blkno::int4) AS raw_page, heap_page_items(raw_page) as page_items; ┌────┬────────┬──────────┬────────┬──────────┬──────────┬──────────┬────────┬─────────────┬────────────┬────────┬────────┬────────┬────────────┐ │ lp │ lp_off │ lp_flags │ lp_len │ t_xmin │ t_xmax │ t_field3 │ t_ctid │ t_infomask2 │ t_infomask │ t_hoff │ t_bits│ t_oid │ t_data │ ├────┼────────┼──────────┼────────┼──────────┼──────────┼──────────┼────────┼─────────────┼────────────┼────────┼────────┼────────┼────────────┤ │ 1 │ 8160 │ 1 │ 28 │ 19742591 │ 19742592 │ 0 │ (0,2) │ 24577 │ 256 │ 24 │ (null)│ (null) │ \x01000000 │ │ 2 │ 8128 │ 1 │ 28 │ 19742592 │ 19742592 │ 0 │ (0,2) │ 32769 │ 8336 │ 24 │ (null)│ (null) │ \x01000000 │ └────┴────────┴──────────┴────────┴──────────┴──────────┴──────────┴────────┴─────────────┴────────────┴────────┴────────┴────────┴────────────┘ (2 rows) as you can see the same xmax is set for both row version, with the new infomask being HEAP_XMAX_KEYSHR_LOCK | HEAP_XMAX_LOCK_ONLY | HEAP_UPDATED. The reason that happens is that ExecOnConflictUpdate() needs to lock the row to be able to reliably compute a new tuple version based on that row. heap_update() then decides it needs to carry that xmax forward, as it's a valid lock: /* * If the tuple we're updating is locked, we need to preserve the locking * info in the old tuple's Xmax. Prepare a new Xmax value for this. */ compute_new_xmax_infomask(HeapTupleHeaderGetRawXmax(oldtup.t_data), oldtup.t_data->t_infomask, oldtup.t_data->t_infomask2, xid, *lockmode, true, &xmax_old_tuple, &infomask_old_tuple, &infomask2_old_tuple); /* * And also prepare an Xmax value for the new copy of the tuple. If there * was no xmax previously, or there was one but all lockers are now gone, * then use InvalidXid; otherwise, get the xmax from the old tuple. (In * rare cases that might also be InvalidXid and yet not have the * HEAP_XMAX_INVALID bit set; that's fine.) */ if ((oldtup.t_data->t_infomask & HEAP_XMAX_INVALID) || HEAP_LOCKED_UPGRADED(oldtup.t_data->t_infomask) || (checked_lockers && !locker_remains)) xmax_new_tuple = InvalidTransactionId; else xmax_new_tuple = HeapTupleHeaderGetRawXmax(oldtup.t_data); but we really don't need to do any of that in this case - the only locker is the current backend, after all. I think this isn't great, because it'll later will cause unnecessary hint bit writes (although ones somewhat likely combined with setting XMIN_COMMITTED), and even extra work for freezing. Based on a quick look this wasn't the case before the finer grained tuple locking - which makes sense, there was no cases where locks would need to be carried forward. It's worthwhile to note that this *nearly* already works, because of the following codepath in compute_new_xmax_infomask(): * If the lock to be acquired is for the same TransactionId as the * existing lock, there's an optimization possible: consider only the * strongest of both locks as the only one present, and restart. */ if (xmax == add_to_xmax) { /* * Note that it's not possible for the original tuple to be updated: * we wouldn't be here because the tuple would have been invisible and * we wouldn't try to update it. As a subtlety, this code can also * run when traversing an update chain to lock future versions of a * tuple. But we wouldn't be here either, because the add_to_xmax * would be different from the original updater. */ Assert(HEAP_XMAX_IS_LOCKED_ONLY(old_infomask)); /* acquire the strongest of both */ if (mode < old_mode) mode = old_mode; /* mustn't touch is_update */ old_infomask |= HEAP_XMAX_INVALID; goto l5; } which set HEAP_XMAX_INVALID for old_infomask, which would then trigger the code above not carrying forward xmax to the new tuple - but compute_new_xmax_infomask() operates on a copy of old_infomask. Note that this contradict comments in heap_update() itself: * If we found a valid Xmax for the new tuple, then the infomask bits * to use on the new tuple depend on what was there on the old one. * Note that since we're doing an update, the only possibility is that * the lockers had FOR KEY SHARE lock. */ which seems to indicate that this behaviour wasn't forseen. I find the separation of concerns (and variable naming) between computations happening in heap_update() itself, and compute_new_xmax_infomask() fairly confusing and redundant. I mean, compute_new_xmax_infomask() expands multis, builds a new one with the updater added. Then re-expands it via GetMultiXactIdHintBits(), to compute infomask bits for the old tuple. Then returns. Just for heap_update() to re-re-expand the multi to compute the infomask bits for the new tuple, again with GetMultiXactIdHintBits()? I know that there's a cache, but still. That's some seriously repetitive work. Oh, and the things GetMultiXactIdHintBits() returns imo aren't really well described with hint bits. Greetings, Andres Freund
Re: ON CONFLICT (and manual row locks) cause xmax of updated tuple tounnecessarily be set
From
Peter Geoghegan
Date:
On Wed, Jul 24, 2019 at 4:24 PM Andres Freund <andres@anarazel.de> wrote: > as you can see the same xmax is set for both row version, with the new > infomask being HEAP_XMAX_KEYSHR_LOCK | HEAP_XMAX_LOCK_ONLY | HEAP_UPDATED. Meta remark about your test case: I am a big fan of microbenchmarks like this, which execute simple DML queries using a single connection, and then consider if the on-disk state looks as good as expected, for some value of "good". I had a lot of success with this approach while developing the v12 work on nbtree, where I went to the trouble of automating everything. The same test suite also helped with the nbtree compression/deduplication patch just today. I like to call these tests "wind tunnel tests". It's far from obvious that you can take a totally synthetic, serial test, and use it to measure something that is important to real workloads. It seems to work well when there is a narrow, specific thing that you're interested in. This is especially true when there is a real risk of regressing performance in some way. > but we really don't need to do any of that in this case - the only > locker is the current backend, after all. > > I think this isn't great, because it'll later will cause unnecessary > hint bit writes (although ones somewhat likely combined with setting > XMIN_COMMITTED), and even extra work for freezing. > > Based on a quick look this wasn't the case before the finer grained > tuple locking - which makes sense, there was no cases where locks would > need to be carried forward. I agree that this is unfortunate. Are you planning on working on it? -- Peter Geoghegan
Re: ON CONFLICT (and manual row locks) cause xmax of updated tupleto unnecessarily be set
From
Andres Freund
Date:
Hi, On 2019-07-24 17:14:39 -0700, Peter Geoghegan wrote: > On Wed, Jul 24, 2019 at 4:24 PM Andres Freund <andres@anarazel.de> wrote: > > but we really don't need to do any of that in this case - the only > > locker is the current backend, after all. > > > > I think this isn't great, because it'll later will cause unnecessary > > hint bit writes (although ones somewhat likely combined with setting > > XMIN_COMMITTED), and even extra work for freezing. > > > > Based on a quick look this wasn't the case before the finer grained > > tuple locking - which makes sense, there was no cases where locks would > > need to be carried forward. > > I agree that this is unfortunate. Are you planning on working on it? Not at the moment, no. Are you planning / hoping to take a stab at it? Greetings, Andres Freund
Re: ON CONFLICT (and manual row locks) cause xmax of updated tuple tounnecessarily be set
From
Peter Geoghegan
Date:
On Thu, Jul 25, 2019 at 3:10 PM Andres Freund <andres@anarazel.de> wrote: > > I agree that this is unfortunate. Are you planning on working on it? > > Not at the moment, no. Are you planning / hoping to take a stab at it? The current behavior ought to be fixed, and it seems like it falls to me to do that. OTOH, anything that's MultiXact adjacent makes my eyes water. I don't consider myself to be particularly well qualified. I'm sure that I could quickly find a way of making the behavior you have pointed out match what is expected without causing any regression tests to fail, but that's the easy part. -- Peter Geoghegan