Problem in starting postgresql on a probably slow machine:pg_ctlcluster ignores pg_ctl timeout - Mailing list pgsql-pkg-debian
From | Giuseppe Sacco |
---|---|
Subject | Problem in starting postgresql on a probably slow machine:pg_ctlcluster ignores pg_ctl timeout |
Date | |
Msg-id | 831b8b45c4e1d4732aa39b41e276d49e3ea4bf59.camel@eppesuigoccas.homedns.org Whole thread Raw |
Responses |
Re: Problem in starting postgresql on a probably slow machine:pg_ctlcluster ignores pg_ctl timeout
|
List | pgsql-pkg-debian |
Hello, when I boot a machine that hosts a postgresql cluster, the cluster does not start. The cluster is configured for starting automatically, but this is what I find in the log via journalctl: root@hostname:~# journalctl --unit postgresql@9.4-main.service -- Logs begin at sab 2018-09-29 10:05:56 CEST, end at lun 2018-10-01 14:37:11 CEST. -- set 29 10:06:04 hostname systemd[1]: Starting PostgreSQL Cluster 9.4-main... set 29 10:06:06 hostname postgresql@9.4-main[685]: Removed stale pid file. set 29 10:06:13 hostname postgresql@9.4-main[685]: The PostgreSQL server failed to start. Please check the log output: set 29 10:06:13 hostname postgresql@9.4-main[685]: 2018-09-29 10:06:07 CEST [1034-1] LOG: il database è stato interrotto;l'ultimo segno di vita risale alle 2018-09-20 1 set 29 10:06:13 hostname postgresql@9.4-main[685]: 2018-09-29 10:06:07 CEST [1058-1] [sconosciuto]@[sconosciuto] LOG: pacchettodi avvio incompleto set 29 10:06:13 hostname postgresql@9.4-main[685]: 2018-09-29 10:06:08 CEST [1134-1] postgres@postgres FATALE: il databasesi sta avviando set 29 10:06:13 hostname postgresql@9.4-main[685]: 2018-09-29 10:06:09 CEST [1137-1] postgres@postgres FATALE: il databasesi sta avviando set 29 10:06:13 hostname postgresql@9.4-main[685]: 2018-09-29 10:06:09 CEST [1147-1] postgres@postgres FATALE: il databasesi sta avviando set 29 10:06:13 hostname postgresql@9.4-main[685]: 2018-09-29 10:06:10 CEST [1151-1] postgres@postgres FATALE: il databasesi sta avviando set 29 10:06:13 hostname postgresql@9.4-main[685]: 2018-09-29 10:06:10 CEST [1159-1] postgres@postgres FATALE: il databasesi sta avviando set 29 10:06:13 hostname postgresql@9.4-main[685]: 2018-09-29 10:06:11 CEST [1180-1] postgres@postgres FATALE: il databasesi sta avviando set 29 10:06:13 hostname postgresql@9.4-main[685]: 2018-09-29 10:06:11 CEST [1199-1] postgres@postgres FATALE: il databasesi sta avviando set 29 10:06:13 hostname postgresql@9.4-main[685]: 2018-09-29 10:06:12 CEST [1224-1] postgres@postgres FATALE: il databasesi sta avviando set 29 10:06:13 hostname postgresql@9.4-main[685]: 2018-09-29 10:06:12 CEST [1238-1] postgres@postgres FATALE: il databasesi sta avviando set 29 10:06:13 hostname postgresql@9.4-main[685]: 2018-09-29 10:06:13 CEST [1265-1] postgres@postgres FATALE: il databasesi sta avviando set 29 10:06:13 hostname postgresql@9.4-main[685]: 2018-09-29 10:06:13 CEST [1271-1] postgres@postgres FATALE: il databasesi sta avviando set 29 10:06:13 hostname systemd[1]: postgresql@9.4-main.service: control process exited, code=exited status=1 set 29 10:06:15 hostname systemd[1]: Failed to start PostgreSQL Cluster 9.4-main. set 29 10:06:15 hostname systemd[1]: Unit postgresql@9.4-main.service entered failed state. I think this is a problema related to a timeout, so I investigated it. I found a generic timeout in systemd unit files and a specific timeout in pg_ctl. They are different so I leveled both of them at 5 minutes. I created a directory named /etc/systemd/system/postgresql@.service.d and I put there a file named timeout.conf with these three lines: [Service] Environment=PGCTLTIMEOUT=270 TimeoutSec=300s the first line is supposed to change the environment variale for pg_ctl, the second one is changing the timeout unit for systemd. (I know I could probably have set pg_ctl timeout in /etc/postgresql/9.4/main/pg_ctl.conf but I've preferred to keep both timeounts in one file, and I don't want to set this file in every cluster. Systemd will use this file when starting the relevant units. Because the path use the generic system unit name, it is applied to all postgresql clusters.) The problem has not been solved and I found a different problem in /usr/bin/pg_ctlcluster. At about line 220, it does a loop for 60 times, and in each loop it calls select() with a timeout of 0.5 seconds. If the database does not start during this loop, pg_ctlcluster abort it start operation. [...] my $pidfile = $info{'pgdata'}.'/postmaster.pid'; for (my $attempt = 0; $attempt < 60; $attempt++) { select (undef, undef, undef, 0.5); $currlogsize = (stat $info{'logfile'})[7] if $info{'logfile'} && -r $info{'logfile'}; [...] So, beside the timeout I put in systemd and pg_ctl, pg_ctlcluster always use its internal 30 seconds timeout. Please note that is select() returns early, then its 60 iteration could last less than 30 seconds. In my case, the database seems started at 10:06:07 and it gives up at 10:06:13. Postgresql log contains some more lines that show the process receives a smart stop request: 2018-09-29 10:06:13 CEST [1265-1] postgres@postgres FATALE: il database si sta avviando 2018-09-29 10:06:13 CEST [1271-1] postgres@postgres FATALE: il database si sta avviando 2018-09-29 10:06:13 CEST [866-1] LOG: richiesta di arresto smart ricevuta 2018-09-29 10:06:14 CEST [1034-2] LOG: il database non è stato arrestato correttamente; ripristino automatico in corso 2018-09-29 10:06:15 CEST [1034-3] LOG: record con lunghezza zero a 20/B9365820 and this is probably pg_ctlcluster (or is it systemd?) that tries to stop it when the startup did not correctly work. So, finally, my question: is pg_ctlcluster the root cause of my problem? Why does it loop and wait if pg_ctl will exit anyway when the database start or when its timeout is triggered? Should I open a bug report for this? Thank you, Giuseppe
pgsql-pkg-debian by date: