Thread: Persist injection points across server restarts
Hi all, A bug related to data visibility on standbys with a race condition mixing 2PC transactions and synchronous_standby_names with the checkpointer has been fixed a couple of days ago: https://www.postgresql.org/message-id/163fcbec-900b-4b07-beaa-d2ead8634bec@postgrespro.ru One issue that we had while discussing this thread is that there was no easy way to have a regression test because the problem requires a wait in the checkpointer at a very early startup phase where the shared memory status data related to s_s_names has *not* been initialized yet. I have been working on the problem and found out one nice way to address this limitation, introducing in the module injection_points a new function that flushes to a file the set of injection points currently attached to a cluster, reloading the data from the file to shmem early at startup when initializing the shmem state data through shared_preload_libraries. With that in place, it is then possible to make the checkpointer wait when it starts at a very early stage, giving a way to reproduce the original failure reported on the other thread: - A wait injection point is attached. - A flush is used to write the points' data to disk. - Node restarts, loading back their state. - The wait triggers in the checkpointer. So, please find attached a patch set for all that: - 0001 is a patch I have stolen from a different thread (see [1]), introducing InjectionPointList() that retrieves a list of the injection points attached. - 0002 extends injection_points with a new flush function, that can be used in TAP tests to persist some points across restarts. One sticky point is that I did not want to add any of this information in the core backend injection point APIs, nor to any of the backend structures because that's not necessary, and what's here is enough for some TAP tests. - 0003 adds a new regression test providing some coverage for 2e57790836c6. Reverting 2e57790836c6 causes the test to fail. This shows how to use this new facility. This is v19 work, so I am adding that to the next commit fest. Thanks, [1]: https://www.postgresql.org/message-id/Z_xYkA21KyLEHvWR@paquier.xyz -- Michael
Attachment
> On 30 Apr 2025, at 10:35, Michael Paquier <michael@paquier.xyz> wrote: > > - A flush is used to write the points' data to disk. Interesting functionality. Did you consider custom resource manager to WAL-log injection points? This way we do not need to flush them explicitly, theyalways will be persistent. Though they will appear on standby, which is, probably, not expected functionality... Best regards, Andrey Borodin.
On Wed, Apr 30, 2025 at 10:52:51AM +0500, Andrey Borodin wrote: > Did you consider custom resource manager to WAL-log injection > points? This way we do not need to flush them explicitly, they > always will be persistent. WAL-logging would not work in the case I've faced, because we need a point much earlier than the startup process beginning any redo activity, so I don't think that this would be useful. -- Michael
Attachment
On Mon, May 26, 2025 at 01:17:46PM +0530, Rahila Syed wrote: > It appears that Github CI is reporting failures with > injection_points/002_data_persist > failing across all OSes. I am not sure to see what you are referring to here, based on the following reports: https://commitfest.postgresql.org/patch/5731/ https://cirrus-ci.com/github/postgresql-cfbot/postgresql/cf%2F5731 http://cfbot.cputube.org/highlights/all.html The last failure reported that I know of was due to the addition of the runtime arguments to the macro INJECTION_POINT(). > I wonder if it would be better to include a check for the return value of > `durable_rename` to > manage potential failure scenarios. Or we can be smarter and make this call an ERROR. > 2. + > + file = AllocateFile(INJ_DUMP_FILE ".tmp", PG_BINARY_W); > > Could we perhaps add a brief comment clarifying the rationale behind the > use of a > temporary file in this context? Sure. This is about avoiding the load of incomplete files if there's a crash in the middle of a flush call, for example. > It seems that `buf` should be null-terminated before being passed to > `pstrdup(buf)`. > Otherwise, `strlen` might not accurately determine the length. This seems > to be > consistent with the handling of `fread` calls in `snapmgr.c`. Hmm, good point. That would be safer, even if the write part should already push the null termination. -- Michael
Attachment
On Thu, May 29, 2025 at 01:17:21PM +0530, Rahila Syed wrote: > After applying the v3-patches, I see failure like these: > macOS - Sonoma - Meson - Cirrus CI > <https://cirrus-ci.com/task/4589192849653760> > Windows - Server 2019, VS 2019 - Meson & ninja - Cirrus CI > <https://cirrus-ci.com/task/5715092756496384> > Linux - Debian Bookworm - Meson - Cirrus CI > <https://cirrus-ci.com/task/6629886430806016> These tasks complain about the following: [07:22:58.931] ../src/include/utils/injection_point.h:62:28: note: 'InjectionPointList' declared here [07:22:58.931] 62 | extern InjectionPointData *InjectionPointList(uint32 *num_points); The latest version of the patch (v3) posted on this thread includes the following: ./v3-0001-Add-InjectionPointList-to-retrieve-list-of-inject.patch: +extern List *InjectionPointList(void); ./v3-0001-Add-InjectionPointList-to-retrieve-list-of-inject.patch: +InjectionPointList(void) ./v3-0002-injection_points-Add-function-to-flush-injection-.patch: + inj_points = InjectionPointList(); How is it possible for what you have applied to include changes that are not part of the patch set posted? Manual mistake on your side perhaps where versions of the patches have been mixed? The CF bot is green: https://commitfest.postgresql.org/patch/5731/ > Should we also consider unlinking the target file INJ_DUMP_FILE in case of > an error during renaming? I don't think that this is worth caring for the sake of tests. If an error happens before the atomic rename is done to the final file, leaving around a temporary file, the next flush would just recreate a new file from scratch with the new contents. -- Michael
Attachment
> On 27 May 2025, at 05:51, Michael Paquier <michael@paquier.xyz> wrote: > > <v3-0001-Add-InjectionPointList-to-retrieve-list-of-inject.patch> > <v3-0002-injection_points-Add-function-to-flush-injection-.patch> > <v3-0003-Add-regression-test-for-2PC-visibility-check-on-s.patch> I've made another round of looking into these patches. I think it's fine that private data is not included. But the feature looks orthogonal, and I do not see specific reasonswhy it should be omitted when IPs are persisted. Another idea of improvement is using distinguishable errors in injection_shmem_startup(). Like differentiating between readerror and wrong magic number. But there's no big value in these improvements, so the patch is fine as is too. Best regards, Andrey Borodin.
On Wed, 2025-04-30 at 14:35 +0900, Michael Paquier wrote: > - 0001 is a patch I have stolen from a different thread (see [1]), > introducing InjectionPointList() that retrieves a list of the > injection points attached. Unless that other thread is blocked for some reason, I'd suggest just going ahead with the SQL callable function first. That seems independently useful. > - 0002 extends injection_points with a new flush function, that can > be > used in TAP tests to persist some points across restarts. One sticky > point is that I did not want to add any of this information in the > core backend injection point APIs, nor to any of the backend > structures because that's not necessary, and what's here is enough > for > some TAP tests. This seems like a fair amount of complexity for a single use case. Arguably, complexity around testing infrastructure is a lot less of a problem than other kinds of complexity, so it might be OK. But it would be nice if there were a couple cases that would benefit rather than one. Regards, Jeff Davis
Hi, On 2025-06-03 13:13:06 +0900, Michael Paquier wrote: > On Mon, Jun 02, 2025 at 12:42:35PM -0700, Jeff Davis wrote: > > This seems like a fair amount of complexity for a single use case. > > Arguably, complexity around testing infrastructure is a lot less of a > > problem than other kinds of complexity, so it might be OK. But it would > > be nice if there were a couple cases that would benefit rather than > > one. > > In all the approaches I've considered, this one was the least worst of > all based on the point that all the complexity is hidden in the test > module; there is no need to touch the backend code at all as long as > there is a way to retrieve the list of points that would be dumped to > disk. > > Another set of test cases I had in mind was waits during recovery > before consistency is reached. There is no way to add a point without > connecting to the database, and we've had plenty of fixes involving > the startup process and a different process, mostly the checkpointer. > That's an annoying limitation. I'm somewhat doubtful this is is the right direction. Tests that require injection points before consistency also can't wait for injection points using the SQL interface or such, so most of the stuff has to be written in C anyway. And if so, you also can attach to injection points in the relevant shared_preload_libraries entry. Greetings, Andres Freund