Re: pgsql: pgstat: Bring up pgstat in BaseInit() to fix uninitialized use o - Mailing list pgsql-committers
From | Andres Freund |
---|---|
Subject | Re: pgsql: pgstat: Bring up pgstat in BaseInit() to fix uninitialized use o |
Date | |
Msg-id | 20210807044952.orwuf7hua2wytx4g@alap3.anarazel.de Whole thread Raw |
In response to | Re: pgsql: pgstat: Bring up pgstat in BaseInit() to fix uninitialized use o (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: pgsql: pgstat: Bring up pgstat in BaseInit() to fix uninitialized use o
|
List | pgsql-committers |
Hi, On 2021-08-06 23:22:36 -0400, Tom Lane wrote: > I wrote: > > Guessing the common factor is "macOS", but that's just a guess. > > I can poke into it tomorrow. > > I did try it real quick on my Mac laptop, and that fails too. > Here's a more accurate backtrace, in case that helps. Thanks! I now managed to reproduce it on a CI system (after some initial failed attempts due to pg_upgrade tests failing due to path length issues): https://api.cirrus-ci.com/v1/artifact/task/4863568911794176/log/src/bin/pg_basebackup/tmp_check/log/010_pg_basebackup_main.log 2021-08-06 21:18:40.096 PDT [22031] 010_pg_basebackup.pl LOG: received replication command: BASE_BACKUP LABEL 'pg_basebackupbase backup' PROGRESS NOWAIT MANIFEST 'yes' 2021-08-06 21:18:40.096 PDT [22031] 010_pg_basebackup.pl STATEMENT: BASE_BACKUP LABEL 'pg_basebackup base backup' PROGRESS NOWAIT MANIFEST 'yes' 2021-08-06 21:18:42.210 PDT [22031] 010_pg_basebackup.pl LOG: could not send data to client: Broken pipe 2021-08-06 21:18:42.210 PDT [22031] 010_pg_basebackup.pl STATEMENT: BASE_BACKUP LABEL 'pg_basebackup base backup' PROGRESS NOWAIT MANIFEST 'yes' 2021-08-06 21:18:42.210 PDT [22031] 010_pg_basebackup.pl FATAL: connection to client lost 2021-08-06 21:18:42.210 PDT [22031] 010_pg_basebackup.pl STATEMENT: BASE_BACKUP LABEL 'pg_basebackup base backup' PROGRESS NOWAIT MANIFEST 'yes' TRAP: FailedAssertion("pgstat_is_initialized && !pgstat_is_shutdown", File: "pgstat.c", Line: 4810, PID: 22031) vs what I locally get: 2021-08-06 20:52:06.163 PDT [1397252] 010_pg_basebackup.pl LOG: received replication command: BASE_BACKUP LABEL 'pg_basebackupbase backup' PROGRESS NOWAIT MANIFEST 'yes' 2021-08-06 20:52:06.163 PDT [1397252] 010_pg_basebackup.pl STATEMENT: BASE_BACKUP LABEL 'pg_basebackup base backup' PROGRESS NOWAIT MANIFEST 'yes' 2021-08-06 20:52:08.189 PDT [1397252] 010_pg_basebackup.pl LOG: could not send data to client: Broken pipe 2021-08-06 20:52:08.189 PDT [1397252] 010_pg_basebackup.pl STATEMENT: BASE_BACKUP LABEL 'pg_basebackup base backup' PROGRESS NOWAIT MANIFEST 'yes' 2021-08-06 20:52:08.189 PDT [1397252] 010_pg_basebackup.pl ERROR: base backup could not send data, aborting backup 2021-08-06 20:52:08.189 PDT [1397252] 010_pg_basebackup.pl STATEMENT: BASE_BACKUP LABEL 'pg_basebackup base backup' PROGRESS NOWAIT MANIFEST 'yes' 2021-08-06 20:52:08.189 PDT [1397252] 010_pg_basebackup.pl FATAL: connection to client lost Note that OSX version doesn't have the "base backup could not send data, aborting backup" message. I guess that is what leads to the difference in behaviour: The temp file is created by InitializeBackupManifest(). In the !OSX case, we first abort via an ERROR, which triggers the cleanup via WalSndResourceCleanup(). On OSX however, we immediately error out with FATAL for some reason (timing? network buffering differences?), which will never reach WalSndErrorCleanup(). Therefore the temp file only gets deleted during proc_exit(), which triggers the issue... Not yet really sure what the best way to deal with this is. Presumably this issue would be fixed if AtProcExit_Files()/CleanupTempFiles() were scheduled via before_shmem_exit(). And perhaps it's not too off to schedule CleanupTempFiles() there - but it doesn't quite seem entirely right either. I'd kinda like to avoid having to overhaul the process exit infrastructure as a prerequisite to getting the shared memory stats patch in :(. Greetings, Andres Freund
pgsql-committers by date: