Re: INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0 - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0 |
Date | |
Msg-id | CAM3SWZRB6s6ydBfHQafNhnq2nxnBh2a5fCvLvUzmFdSz9GYjNg@mail.gmail.com Whole thread Raw |
In response to | INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0 (Peter Geoghegan <pg@heroku.com>) |
Responses |
Re: INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
Re: INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0 |
List | pgsql-hackers |
On Tue, Feb 2, 2015 at 01:06 AM, Andres Freund <andres@2ndquadrant.com> wrote: > A first (not actually that quick :() look through the patches to see > what actually happened in the last months. I didn't keep up with the > thread. So, let me get this out of the way: This is the first in-depth technical review that this work has had in a long time. Thank you for your help here. > Generally the split into the individual commits doesn't seem to make > much sense to me. The commits individually (except the first) aren't > indivdiually commitable and aren't even meaningful. Splitting off the > internal docs, tests and such actually just seems to make reviewing > harder because you miss context. Splitting it so that individual piece > are committable and reviewable makes sense, but... I have no problem > doing the user docs later. If you split of RLS support, you need to > throw an error before it's implemented. I mostly agree. Basically, I did not intend for all of the patches to be individually committed. The mechanism by which EXCLUDED.* expressions are added is somewhat novel, and deserves to be independently *considered*. I'm trying to show how the parts fit together more so than breaking things down in to smaller commits (as you picked up on, 0001 is the exception - that is genuinely intended to be committed early). Also, those commit messages give me the opportunity to put those parts in their appropriate context vis-a-vis our discussions. They refer to the Wiki, for example, or reasons why pg_stat_statements shouldn't care about ExcludedExpr. Obviously the final commit messages won't look that way. > 0001: > * References INSERT with ON CONFLICT UPDATE, can thus not be committed > independently. I don't think those references really are needed. > * I'm not a fan of the increased code duplication in > ExecCheckRTEPerms(). Can you look into cleaning that up? > * Also the comments inside the ACL_INSERT case still reference UPDATE. > > Other than that I think we can just go ahead and commit this ahead of > time. Mentioning ON CONFLICT UPDATE (OCU henceforth) in the commit > message only. Cool. Attached revision makes those changes. > 0007: > * "AMs" alone isn't particularly unique. > * Without the context of the discussion "unprincipled deadlocks" aren't > well defined. > * Too many "" words. > * Waiting "too long" isn't defined. Neither is why that'd imply > unprincipled deadlocks. Somewhat understandable with the context of > the discussion, but surely not a couple years down the road. > * As is I don't find the README entry super helpful. It should state > what the reason for doing this is cleary, start at the higher level, > and then move to the details. > * Misses details about the speculative heavyweight locking of tuples. Fair points. I'll work through that feedback. Actually, I think we should memorialize that "unprincipled deadlocks" should be avoided in some more general way, since they are after all a general problem that we've seen elsewhere. I'm not sure about how to go about doing that, though. > 0002: > * Tentatively I'd say that killspeculative should be done via a separate > function instead of heap_delete() Really? I guess if that were to happen, it would entail refactoring heap_delete() to call a static function, which was also called by a new kill_speculative() function that does this. Otherwise, you'd have far too much duplication. > * I think we should, as you ponder in a comment, do the OCU specific > stuff lazily and/or in a separate function from BuildIndexInfo(). That > function is already quite visible in profiles, and the additional work > isn't entirely trivial. Okay. > * I doubt logical decoding works with the patch as it stands. I thought so. Perhaps you could suggest a better use of the available XLOG_HEAP_* bits. I knew I needed to consider that more carefully (hence the XXX comment), but didn't get around to it. > * The added ereport (i.e. user facing) error message in > ExecInsertIndexTuples won't help a user at all. So, this: >> /* Skip this index-update if the predicate isn't satisfied */ >> if (!ExecQual(predicate, econtext, false)) >> + { >> + if (arbiterIdx == indexRelation->rd_index->indexrelid) >> + ereport(ERROR, >> + (errcode(ERRCODE_TRIGGERED_ACTION_EXCEPTION), >> + errmsg("partial arbiter unique index has predicate that does not cover tuple proposed for insertion"), >> + errdetail("ON CONFLICT inference clause implies that the tuple proposed for insertion actuallybe covered by partial predicate for index \"%s\".", >> + RelationGetRelationName(indexRelation)), >> + errhint("ON CONFLICT inference clause must infer a unique index that covers the final tuple, afterBEFORE ROW INSERT triggers fire."), >> + errtableconstraint(heapRelation, >> + RelationGetRelationName(indexRelation)))); >> continue; >> + } Yeah, that isn't a great error message. This happens here because you are using a partial unique index (and so you must have had an inference specification with a "WHERE" to get here). However, what you actually went to insert would not be covered by this partial unique index, and so couldn't ever take the alternative path, and so is likely not thought out. Maybe it would be better to silently always let the INSERT succeed as an INSERT. *That* actually wasn't really discussed - this is all my idea. > * Personally I don't care one iota for comments like "Get information > from the result relation info structure.". Yes, one of these already > exists, but ... Okay. > * If a arbiter index is passed to ExecCheckIndexConstraints(), can't we > abort the loop after checking it? Also, do we really have to iterate > over indexes for that case? How about moving the loop contents to a > separate function and using that separately for the arbiter cases? Well, the failure to do that implies very few extra cycles, but sure. I'll add a new reason to break at the end, when check_exclusion_or_unique_constraint() is called in respect of a particular (inferred) arbiter unique index. > * Don't like the comment above check_exclusion_or_unique_constraint()'s > much. Makes too much of a special case of OCU I guess I should just refer to speculative insertion. > * ItemPointerIsValid What about it? > * ExecCheckHeapTupleVisible's comment "It is not acceptable to proceed " > sounds like you're talking with a child or so ;) Fair point. I should say "It would not be consistent with the guarantees of the higher isolation levels..." > * ExecCheckHeapTupleVisible()'s errhint() sounds like an > argument/excuse (actually like a code comment). That's not going to > help a user at all. Really? I thought it might be less than intuitive that higher isolation levels cannot decide to do nothing on the basis of something not in their MVCC snapshot. But come to think of it, yeah, that errhint() isn't adding much over the main error message. > * I find the modified control flow in ExecInsert() pretty darn ugly. I > think this needs to be cleaned up. The speculative case should imo be > a separate function or something. Hmm. I'm not quite sold on that. Basically, if we did that, we'd still have a function that was more or less a strict superset of ExecInsert(). What have we saved? What I do agree with is that ExecInsert() should be refactored to make the common case (a vanilla insert) look like the common case, whereas the uncommon case (an upsert) should have that dealt with specially. There is room for improvement. Is that a fair compromise? > * /* > * This may occur when an instantaneously invisible tuple is blamed > * as a conflict because multiple rows are inserted with the same > * constrained values. > How can this happen? We don't insert multiple rows with the same > command id? This is a cardinality violation [1]. It can definitely happen - just try the examples you see on the Wiki. This is possible because I modified heap_lock_tuple() to return HeapTupleInvisible (and not just complain directly when HeapTupleSatisfiesUpdate() returns "HeapTupleInvisible"). It's also possible because we're using a DirtySnapshot at various points. This is sort of like how ExecUpdate() handles a return value of "HeapTupleSelfUpdated" from heap_update(). Not quite though, because 1. ) I prefer to throw an error (rather than silently not UPDATE that slot), and 2. ) we're not dealing with MVCC semantics, so the return values are different in both cases. The *nature* of the situation handled is similar between UPSERTs (in ExecLockUpdatedTuple()) and vanilla UPDATEs (in ExecUpdate()), though. Does that make sense? > * ExecLockUpdatedTuple() has (too?) many comments, but little overview > of what/why it is doing what it does on a higher level. Fair point. Seems like material for a better worked out executor README. > * plan_speculative_use_index: "Use the planner to decide speculative > insertion arbiter index" - Huh? " rel is the target to undergo ON > CONFLICT UPDATE/IGNORE." - Which rel? Sorry, that's an obsolete comment (the function signature changed). It should refer to the target of the Query being planned. > * formulations as "fundamental nexus" are hard to understand imo. I'm trying to suggest that INSERT ... ON CONFLICT UPDATE is not quite two separate top-level commands, and yet is also not a new, distinct type of top-level command. This is mostly a high level design decision that maximizes code reuse. > * Perhaps it has previously been discussed but I'm not convinced by the > reasoning for not looking at opclasses in infer_unique_index(). This > seems like it'd prohibit ever having e.g. case insensitive opclasses - > something surely worthwile. I don't think anyone gave that idea the thumbs-up. However, I really don't see the problem. Sure, we could have case insensitive opclasses in the future, and you may want to make a unique index using one. But then it becomes a matter of whatever unique indexes are available. The limitation is only that you cannot explicitly indicate that you want a certain opclass. It comes down to whatever unique indexes happen to be available, since of course taking the alternative path is arbitrated by a would-be unique violation. It's a bit odd that we're leaving it up to the available indexes to decide on semantics like that, but the problem is so narrow and the solution so involved that I'd argue it's acceptable. > * Doesn't infer_unique_index() have to look for indisvalid? This isn't > going to work well with a invalid (not to speak for a !ready) index. It does (check IndexIsValid()). I think the mistake I made here was not checking IndexIsReady(), since that is an additional concern above what the similar get_relation_info() function must consider. > * Is ->relation in the UpdateStmt generated in transformInsertStmt ever > used for anything? If so, it'd possibly generate some possible > nastyness due to repeated name lookups. Looks like it'll be used in > transformUpdateStmt What, you mean security issues, for example? I have a hard time seeing how that could work in practice, given that the one and only target RTE is marked with the appropriate updatedCols originating from transformUpdateStmt(). Still, it is a concern generally - better safe than sorry. I was thinking of plugging it by ensuring that the Relations matched, but that might not be good enough. Maybe it would be better to bite the bullet and have transformUpdateStmt() use the same Relation directly, which is something I hoped to avoid (better to have transformUpdateStmt() know as little as possible about this, I'd say). > * What's the reason for the !pstate->p_parent? Also why the parens? > pstate->p_is_speculative = (pstate->parentParseState && > (!pstate->p_parent_cte && > pstate->parentParseState->p_is_insert && > pstate->parentParseState->p_is_speculative)); You mean the "!pstate->p_parent_cte"? That's there because you can get queries to segfault if this logic doesn't consider that a data-modifying CTE can have an UPDATE that appears within a CTE referenced from an INSERT. :-) > * Why did you need to make %nonassoc DISTINCT and ON nonassoc in the grammar? To prevent a shift/reduce conflict, I changed the associativity. Without this, here are the details of State 700, which has the conflict (from gram.output): """"" State 700 1465 opt_distinct: DISTINCT . 1466 | DISTINCT . ON '(' expr_list ')' ON shift, and go to state 1094 ON [reduce using rule 1465 (opt_distinct)] $default reduce using rule 1465 (opt_distinct) """"" > * The whole speculative insert logic isn't really well documented. Why, > for example, do we actually need the token? And why are there no > issues with overflow? And where is it documented that a 0 means > there's no token? ... Fair enough. Presumably it's okay that overflow theoretically could occur, because a race is all but impossible. The token represents a particular attempt by some backend at inserting a tuple, that needs to be waited on specifically only if it is their active attempt (and the xact is still running). Otherwise, you get unprincipled deadlocks. Even if by some incredibly set of circumstances it wraps around, worst case scenario you get an unprinciped deadlock, which is hardly the end of the world given the immense number of insertions required, and the immense unlikelihood that things would work out that way - it'd be basically impossible. I'll document the "0" thing. > * Isn't "SpecType" a awfully generic (and nondescriptive) name? OK. That'll be changed. > * /* XXX: Make sure that re-use of bits is safe here */ - no, not > unless you change existing checks. I think that this is a restatement of your remarks on logical decoding. No? > * /* > * Immediately VACUUM "super-deleted" tuples > */ > if (TransactionIdEquals(HeapTupleHeaderGetRawXmin(tuple), > InvalidTransactionId)) > return HEAPTUPLE_DEAD; > Is that branch really needed? Shouldn't it just be happening as a > consequence of the already existing code? Same in SatisfiesMVCC. If > you actually needed that block, it'd need to be done in SatisfiesSelf > as well, no? You have a comment about a possible loop - but that seems > wrong to me, implying that HEAP_XMIN_COMMITTED was set invalidly. Indeed, this code is kind of odd. While I think the omission within SatisfiesSelf() may be problematic too, if you really want to know why this code is needed, uncomment it and run Jeff's stress-test. It will reliably break. This code: """"" if (HeapTupleHeaderXminInvalid(tuple)) return HEAPTUPLE_DEAD; """"" and this code: """"" /* * Immediately VACUUM "super-deleted" tuples */ if (TransactionIdEquals(HeapTupleHeaderGetRawXmin(tuple), InvalidTransactionId)) return HEAPTUPLE_DEAD; """"" are not equivalent (nor is the latter a superset of the former). Maybe they should be, but they're not. What's more, heap tuple header raw xmin has never been able to change, and I don't think there is any reason for it to be InvalidTransactionId. See my new comments within EvalPlanQualFetch() remarking on how it's now possible for that to change (before, the comment claimed that it wasn't possible). > Ok, I can't focus at all any further at this point. But there's enough > comments here that some even might make sense ;) Most do. :-) Thanks. [1] https://wiki.postgresql.org/wiki/UPSERT#.22Cardinality_violation.22_errors_in_detail -- Peter Geoghegan
Attachment
pgsql-hackers by date: