Re: Crash during backend start when low on memory - Mailing list pgsql-bugs
From | Alvaro Herrera |
---|---|
Subject | Re: Crash during backend start when low on memory |
Date | |
Msg-id | 20230113115352.sbuv5egpjyoxn52i@alvherre.pgsql Whole thread Raw |
In response to | Re: Crash during backend start when low on memory (Mats Kindahl <mats@timescale.com>) |
Responses |
Re: Crash during backend start when low on memory
Re: Crash during backend start when low on memory Re: Crash during backend start when low on memory |
List | pgsql-bugs |
On 2023-Jan-13, Mats Kindahl wrote: > I have an improved patch based on raising an error when malloc fails. I > used the version of mallocfail that is available at > https://github.com/ralight/mallocfail which allows you to run the program > with incrementally failing memory allocations but since the count is around > 6886 or so when the postmaster starts to allow backend starts and other > processes are starting in between, this count is not always accurate and > something "smarter" is needed to get a reliable failure test. Hmm, but I'm not sure I like what you did to BackendStartup: @@ -4054,14 +4073,7 @@ BackendStartup(Port *port) * Create backend data structure. Better before the fork() so we can * handle failure cleanly. */ - bn = (Backend *) malloc(sizeof(Backend)); - if (!bn) - { - ereport(LOG, - (errcode(ERRCODE_OUT_OF_MEMORY), - errmsg("out of memory"))); - return STATUS_ERROR; - } + CHECKED_CALL(bn = malloc(sizeof(Backend))); With this code, now we will have postmaster raise ERROR if this malloc fails, rather than returning STATUS_ERROR as previously. How does postmaster react to this? I think it won't go very well. (Stylistically, I'd rather have the test and corresponding reaction at each call site rather than using a macro to hide it. I especially don't like the fact that the errdetail you propose would report a line of C code to the user, rather than an intelligible explanation of what that code is trying to do.) > It is quite straightforward to test it by attaching a debugger and setting > the return value of malloc, but to be able to have some "smarter" failures > only when, for example, starting a backend it would be necessary to have > something better integrated with the postgres code. Being able to attach a > uprobe using bpftrace and override the return value would be the most > elegant solution, but that is currently not possible, so considering > whether to add something like the mallocfail above to the regular testing. Hmm, I'm not sure that this type of testing is something to do on an ongoing basis. > Note: there are a bunch of other core dumps that occur during startup > before the postmaster is ready to accept connections, but I think that is > less of a problem since it fails during startup before the system is ready > to accept connections. That sounds reasonable. The service watcher should restart postmaster appropriately in that case (be it old SysV init, systemd, your container management layer, or whatever.) -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Ellos andaban todos desnudos como su madre los parió, y también las mujeres, aunque no vi más que una, harto moza, y todos los que yo vi eran todos mancebos, que ninguno vi de edad de más de XXX años" (Cristóbal Colón)
pgsql-bugs by date: