Re: Changeset Extraction v7.0 (was logical changeset generation) - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Changeset Extraction v7.0 (was logical changeset generation) |
Date | |
Msg-id | 20140116145432.GD4498@alap3.lan Whole thread Raw |
In response to | Re: Changeset Extraction v7.0 (was logical changeset generation) (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Changeset Extraction v7.0 (was logical changeset generation)
|
List | pgsql-hackers |
Hi, On 2014-01-16 09:34:51 -0500, Robert Haas wrote: > >> - ReplicationSlotAcquire probably needs to ignore slots that are not active. > > Not sure what you mean? If the slot isn't in_use we'll skip it in the loop. > > active != in_use. > > I suppose your point is that the slot can't be in_use if it's not also > active. Yes. There's asserts to that end... > Maybe it would be better to get rid of active/in_use and have > three states: REPLSLOT_CONNECTED, REPLSLOT_NOT_CONNECTED, > REPLSLOT_FREE. Or something like that. Hm. Color me unenthusiastic. If you feel strongly I can change it, but otherwise not. > >> - If there's a coding rule that slot->database can't be changed while > >> the slot is active, then the check to make sure that the user isn't > >> trying to bind to a slot with a mis-matching database could be done > >> before the code described in the previous point, avoiding the need to > >> go back and release the resource. > > > > I don't think slot->database should be allowed to change at all... > > Well, it can if the slot is dropped and a new one created. Well. That obviously requires the lwlock to be acquired... > >> - I think the critical section in ReplicationSlotDrop is bogus. If > >> DeleteSlot() fails, we scarcely need to PANIC. The slot just isn't > >> gone. > > > > Well, if delete slot fails, we don't really know at which point it > > failed which means that the on-disk state might not correspond to the > > in-memory state. I don't see a point in adding code trying to handle > > that case correctly... > > Deleting the slot should be an atomic operation. There's some > critical point before which the slot will be picked up by recovery and > after which it won't. You either did that operation, or not, and can > adjust the in-memory state accordingly. I am not sure I understand that point. We can either update the in-memory bit before performing the on-disk operations or afterwards. Either way, there's a way to be inconsistent if the disk operation fails somewhere inbetween (it might fail but still have deleted the file/directory!). The normal way to handle that in other places is PANICing when we don't know so we recover from the on-disk state. I really don't see the problem here? Code doesn't get more robust by doing s/PANIC/ERROR/, rather the contrary. It takes extra smarts to only ERROR, often that's not warranted. > >> - A comment in KillSlot wonders whether locking is required. I say > >> yes. It's safe to take lwlocks and spinlocks during shmem exit, and > >> failing to do so seems like a recipe for subtle corner-case bugs. > > > > I agree that it's safe to use spinlocks, but lwlocks? What if we are > > erroring out while holding an lwlock? Code that runs inside a > > TransactionCommand is protected against that, but if not ProcKill() > > which invokes LWLockReleaseAll() runs pretty late in the teardown > > process... > > Hmm. I guess it'd be fine to decide that a connected slot can be > marked not-connected without the lock. I now acquire the spinlock since that has to work, or we have much worse problems... That guarantees that other backends see the value as well. > I think you'd want a rule that > a slot can't be freed except when it's not-connected; otherwise, you > might end up marking the slot not-connected after someone else had > already recycled it for an unrelated purpose (drop slot, create new > slot). Yea, that rule is there. Otherwise we'd get in great trouble. > >> - There seems to be no interface to acquire or release slots from > >> either SQL or the replication protocol, nor any way for a client of > >> this code to update its slot details. > > > > I don't think either ever needs to do that - START_TRANSACTION SLOT slot > > ...; and decoding_slot_*changes will acquire/release for them while > > active. What would the usecase be to allow them to be acquired from SQL? > > My point isn't so much about SQL as that with just this patch I don't > see any way for anyone to ever acquire a slot for anything, ever. So > I think there's a piece missing, or three. The slot is acquired by code using the slot. So when START_TRANSACTION SLOT ... (in contrast to a START_TRANSACTION without SLOT) is sent, walsender.c does an ReplicationSlotAcquire(cmd->slotname) in StartReplication() and releases it after it has finished. > > The slot details get updates by the respective replication code. For > > streaming rep, that should happen via reply and feedback > > messages. For changeset extraction it happens when > > LogicalConfirmReceivedLocation() is called; the walsender interface > > does that using reply messages, the SQL interface calls it when > > finished (unless you use the _peek_ functions). > > Right, but where is this code? I don't see this updating the reply > and feedback message processing code to touch slots. Did I miss that? It's in "wal_decoding: logical changeset extraction walsender interface" currently :(. Splitting the streaming replication part of that patch off isn't easy... Greetings, Andres Freund
pgsql-hackers by date: