RE: ReplicationSlotRelease() crashes when the instance is in the single user mode - Mailing list pgsql-hackers
From | Hayato Kuroda (Fujitsu) |
---|---|
Subject | RE: ReplicationSlotRelease() crashes when the instance is in the single user mode |
Date | |
Msg-id | OSCPR01MB14966B39B3ED21229EAEFA10FF531A@OSCPR01MB14966.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: ReplicationSlotRelease() crashes when the instance is in the single user mode (Paul A Jungwirth <pj@illuminatedcomputing.com>) |
Responses |
Re: ReplicationSlotRelease() crashes when the instance is in the single user mode
|
List | pgsql-hackers |
Dear Paul, Mutaamba, Here are updated patches. Based on the Robert's suggestion, I separated into two parts. 0001 fixed the original issue and 0002 prohibited the slot manipulation in single-user mode. I want to focus on 0001 first because on one would argue it. All comments from you were included in 0002. > The patch does not apply. The attached v5 fixes it with a small update > to the context for the slot.h hunk. Oh, I didn't realize because cfbot said OK. Anyway, new patch could be applied atop HEAD. > The commit message needs some explanation. Why are we doing this? What > does the patch do? What alternatives did we consider? Updated. How do you feel? > The patch has no docs. If we plan to forbid some functions in > single-user mode, we should document which ones (e.g. in func.sgml). A statement was added. > There are no tests. Do we have any tests at all for single-user mode? > The only one I see is in recovery/t/017_shm.pl. What if we had tests > that ran the regress suite in single-user mode? Basically instead of > psql we run postgres --single? Do we expect a lot of it would fail? Is > it small enough the test could maintain a diff that expects those > failures? I'm not saying we should do that as part of this patch, but > is there any interest in that? Since single-user mode is most often > used when things are broken, it's a harsh place to hit a bug. > > The patch lacks source comments. Something on > CheckSlotIsInSingleUserMode explaining why would be good. Comments were added in all caller functions. > In ReplicationSlotRelease, why only assert that `slot->active_pid != > 0`? Why not assert that it's MyProcPid (even outside single-user > mode)? Can one backend really release a slot while another is using > it? Can you drop it? Maybe you can copy it? You meant we should assert `slot->active_pid == MyProcPid`, right? Naively considered it can be fix, but it is out-of-scope of the thread. It is existing code, should be discussed verified in another place. > Are we calling CheckSlotIsInSingleUserMode from everywhere that needs > it? We tried to check all the functions that call > ReplicationSlotCreate, ReplicationSlotRelease, and > update_synced_slots_inactive_since (since they all have these > asserts). To call out a few: > > The pg_replication_origin_* functions don't call the Assert-ing functions. You asked that whether we should call CheckSlotIsInSingleUserMode in pg_replication_origin_* APIs, right? It depends on the policy. If we want to prohibit all replication-related command, it should be. It is still under discussion, so I did not touch. > binary_upgrade_logical_slot_has_caught_up - Not available from the command > line. I found that we can in both single-user and binary-upgrade mode. At that time the function could be called: ``` $ postgres --single -D data/ postgres -b PostgreSQL stand-alone backend 19devel backend> SELECT binary_upgrade_logical_slot_has_caught_up('test'); 1: binary_upgrade_logical_slot_has_caught_up (typeid = 16, len = 1, typmod = -1, byval = t) ---- LOG: starting logical decoding for slot "test" DETAIL: Streaming transactions committing after 0/0182C640, reading WAL from 0/0182C608. STATEMENT: SELECT binary_upgrade_logical_slot_has_caught_up('test'); LOG: logical decoding found consistent point at 0/0182C608 DETAIL: There are no running transactions. STATEMENT: SELECT binary_upgrade_logical_slot_has_caught_up('test'); 1: binary_upgrade_logical_slot_has_caught_up = "f" (typeid = 16, len = 1, typmod = -1, byval = t) ---- ``` I don't think this is a realistic case, but the check was added. > WalSndErrorCleanup - Probably not used in single-user mode? IIUC we won't reach in the single-user mode. Hence it was not called intentionally. > We also see that PostgresMain calls ReplicationSlotRelease from its > error handler. Presumably single-user mode would be executing > PostgresSingleUserMain instead, ISTM, PostgresMain() would be called from the PostgresSingleUserMain(). > but still perhaps we should call > CheckSlotIsInSingleUserMode here just as a sanity-check? I feel this code is to release the acquired slot when ERROR was raised. Since we have already covered the entrance, it is not needed. [1]: https://www.postgresql.org/message-id/CA%2BTgmobspWhoMysNHL9b3C9xi%3DOpHdBYhtAgDH4N_A2foyjN-w%40mail.gmail.com Best regards, Hayato Kuroda FUJITSU LIMITED
Attachment
pgsql-hackers by date: