Re: Single transaction in the tablesync worker? - Mailing list pgsql-hackers

From Petr Jelinek
Subject Re: Single transaction in the tablesync worker?
Date
Msg-id e2d6c854-61ac-2720-37bd-d463a8fc1210@enterprisedb.com
Whole thread Raw
In response to Re: Single transaction in the tablesync worker?  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Single transaction in the tablesync worker?
Re: Single transaction in the tablesync worker?
Re: Single transaction in the tablesync worker?
List pgsql-hackers
Hi,

We had a bit high-level discussion about this patches with Amit 
off-list, so I decided to also take a look at the actual code.

My main concern originally was the potential for left-over slots on 
publisher, but I think the state now is relatively okay, with couple of 
corner cases that are documented and don't seem much worse than the main 
slot.

I wonder if we should mention the max_slot_wal_keep_size GUC in the 
table sync docs though.

Another thing that might need documentation is that the the visibility 
of changes done by table sync is not anymore isolated in that table 
contents will show intermediate progress to other backends, rather than 
switching from nothing to state consistent with rest of replication.


Some minor comments about code:

> +        else if (res->status == WALRCV_ERROR && missing_ok)
> +        {
> +            /* WARNING. Error, but missing_ok = true. */
> +            ereport(WARNING,

I wonder if we need to add error code to the WalRcvExecResult and check 
for the appropriate ones here. Because this can for example return error 
because of timeout, not because slot is missing. Not sure if it matters 
for current callers though (but then maybe don't call the param 
missign_ok?).


> +ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char syncslotname[NAMEDATALEN])
> +{
> +    if (syncslotname)
> +        sprintf(syncslotname, "pg_%u_sync_%u", suboid, relid);
> +    else
> +        syncslotname = psprintf("pg_%u_sync_%u", suboid, relid);
> +
> +    return syncslotname;
> +}

Given that we are now explicitly dropping slots, what happens here if we 
have 2 different downstreams that happen to get same suboid and reloid, 
will one of the drop the slot of the other one? Previously with the 
cleanup being left to temp slot we'd at maximum got error when creating 
it but with the new logic in LogicalRepSyncTableStart it feels like we 
could get into situation where 2 downstreams are fighting over slot no?


-- 
Petr



pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: [HACKERS] GSoC 2017: Foreign Key Arrays
Next
From: Mark Rofail
Date:
Subject: Re: [HACKERS] GSoC 2017: Foreign Key Arrays