Re: Making WAL receiver startup rely on GUC context forprimary_conninfo and primary_slot_name - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Making WAL receiver startup rely on GUC context forprimary_conninfo and primary_slot_name |
Date | |
Msg-id | CA+TgmoZVmJX1+QTWw2tSnPHrnkwKZxC3ZsRynFB-Fpzm1Oxuhw@mail.gmail.com Whole thread Raw |
In response to | Re: Making WAL receiver startup rely on GUC context forprimary_conninfo and primary_slot_name (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Making WAL receiver startup rely on GUC context forprimary_conninfo and primary_slot_name
Re: Making WAL receiver startup rely on GUC context forprimary_conninfo and primary_slot_name |
List | pgsql-hackers |
On Thu, Jan 10, 2019 at 7:42 PM Andres Freund <andres@anarazel.de> wrote: > I still think this whole direction of accessing the GUC in walreceiver > is a bad idea and shouldn't be pursued further. There's definite > potential for startup process and WAL receiver having different states > of GUCs, the code doesn't get meaningfully simpler, the GUC value checks > in walreceiver make for horrible reporting up the chain. I'm trying to assess the validity of this complaint. It seems to me that it might advance the discussion to be more clear about what the problem is here. When pg_ctl reload is executed (or the equivalent), different backends reload the file at different times. Currently, because the values are explicitly passed down from the startup process to the walreceiver, it's guaranteed that they will both use the same value. With this change, one might see an older value and the other the current value. So there's room to be nervous about code like this: if (PrimaryConnInfo && strcmp(PrimaryConnInfo, "") != 0) { ... assorted other code ... RequestXLogStreaming(tli, ptr, PrimaryConnInfo, PrimarySlotName); receivedUpto = 0; } With the patch, the PrimaryConnInfo and PrimarySlotName arguments are removed from RequestXLogStreaming. That means that the new walreceiver could come to a different conclusion about the values of those arguments than the startup process. In particular, it could end up thinking that primary_conninfo is empty when, if the startup process had thought that, the walreceiver would never have been launched in the first place. But it's not obvious that you've added any logic in WALReceiverMain or elsewhere to compensate for this possibility -- what would happen in that case? Would we crash? Connect to the wrong server? I might be wrong here, but I'm inclined to think that this scenario hasn't really been contemplated carefully by the patch authors. There are no added TAP tests for the scenario where the values differ between the two processes, no code comments which explain why it's OK if that happens, really no mention of it in the patch at all. And on that basis I'm inclined to think that Andres is really quite correct to be worried about this. The problem he's talking about here is very low-probability because the race condition is narrow, but it's real, and it surely needs to be handled somehow. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: