Re: BUG #16161: pg_ctl stop fails sometimes (on Windows) - Mailing list pgsql-bugs
From | Alexander Lakhin |
---|---|
Subject | Re: BUG #16161: pg_ctl stop fails sometimes (on Windows) |
Date | |
Msg-id | b2b8cc4c-b7d7-0d1b-f6ee-a02bbef5bdd0@gmail.com Whole thread Raw |
In response to | Re: BUG #16161: pg_ctl stop fails sometimes (on Windows) (Kyotaro Horiguchi <horikyota.ntt@gmail.com>) |
Responses |
Re: BUG #16161: pg_ctl stop fails sometimes (on Windows)
|
List | pgsql-bugs |
Hello, 18.12.2019 6:25, Kyotaro Horiguchi wrote: > At Tue, 17 Dec 2019 18:50:26 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in >> >> Now, it's mighty suspicious that this has a timeout of 30 sec >> which is exactly how long we made pgwin32_open retry for --- >> but how is it that this test is causing an ERROR_ACCESS_DENIED >> failure? And if it was, how did we get past that before? Not exactly, we set timeout for 1 sec. It seems we have this timeout occurred many times. >> It does look suspiciously like this wait is triggering on these >> machines and somehow getting masked in most other test cases. >> If you compare the per-step runtimes in jacana's last successful >> run, >> >> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2019-12-16%2013%3A01%3A26 >> >> with those after this patch went in, >> >> https://buildfarm.postgresql.org/cgi-bin/shocw_log.pl?nm=jacana&dt=2019-12-17%2004%3A16%3A40 >> >> there's clearly something very wrong. fairywren likewise. > The immediate cause of the failure seems to be a slowdown of startup > or recovery. It took only 8 minutes in the successful case but 25 > minutes even though stopped on the way in the failure case. > > In the failure case, it took 30 seconds to move only 26.1MB from > 15E8C48 to 30013A0. just around 0.3 seconds for a move of the same > size (15E9C48->3000DE8 (26.1MB)) in the successful case (*1). In the > sucessful case, the time took from archiving 00000002.history to > removing old segments is 0.15 s, but about 3.5 s in the failure case > (*2). > > Seemingly the immediate cause of the timeout looks like just an > extreme slowdown of file read, but it took 2 seconds to move within a > page from 3000C18 to 3000C50. So I suspect that 6d7547c219 might slow > down readTimeLineHistory() significantly in the case. > > regards. > > Yes, some tests are really slowed down (namely, test-decoding-check, pg_basebackup-check, ...). ProcessMonitor shows: "8:11:57.1428800 AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot\otherdb_slot","SUCCESS",... "8:11:57.1429851 AM","postgres.exe","2640","SetRenameInformationFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot\otherdb_slot\state.tmp","SUCCESS",... "8:11:57.1432041 AM","postgres.exe","2640","CloseFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot\otherdb_slot","SUCCESS","" "8:11:57.1432964 AM","postgres.exe","2640","CloseFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot\otherdb_slot\state","SUCCESS","" "8:11:57.1435137 AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot\otherdb_slot\state","SUCCESS",... "8:11:57.1435709 AM","postgres.exe","2640","CloseFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot\otherdb_slot\state","SUCCESS","" "8:11:57.1437251 AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot\otherdb_slot","IS DIRECTORY",... "8:11:57.2451100 AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot\otherdb_slot","IS DIRECTORY",... "8:11:57.3571077 AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot\otherdb_slot","IS DIRECTORY",... "8:11:57.4643527 AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot\otherdb_slot","IS DIRECTORY",... "8:11:57.5778052 AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot\otherdb_slot","IS DIRECTORY",... "8:11:57.6825679 AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot\otherdb_slot","IS DIRECTORY",... "8:11:57.7921704 AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot\otherdb_slot","IS DIRECTORY",... "8:11:57.9018507 AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot\otherdb_slot","IS DIRECTORY",... "8:11:58.0109597 AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot\otherdb_slot","IS DIRECTORY",... "8:11:58.1204329 AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot\otherdb_slot","IS DIRECTORY",... "8:11:58.2300829 AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot\otherdb_slot","IS DIRECTORY",... "8:11:58.2302124 AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot","IS DIRECTORY",... "8:11:58.3393345 AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot","IS DIRECTORY",... "8:11:58.4484310 AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot","IS DIRECTORY",... "8:11:58.5584079 AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot","IS DIRECTORY",... "8:11:58.6683586 AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot","IS DIRECTORY",... "8:11:58.7773396 AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot","IS DIRECTORY",... "8:11:58.8858091 AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot","IS DIRECTORY",... "8:11:58.9955264 AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot","IS DIRECTORY",... "8:11:59.1050665 AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot","IS DIRECTORY",... "8:11:59.2137752 AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot","IS DIRECTORY",... "8:11:59.3234641 AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot","IS DIRECTORY",... "8:11:59.3261361 AM","postgres.exe","2640","QueryOpen","C:\Windows\System32\kernel.appcore.dll","FAST IO DISALLOWED","" https://web.archive.org/web/20150317121919/http://support.microsoft.com/en-us/kb/113996 tells that STATUS_FILE_IS_A_DIRECTORY is mapped to ERROR_ACCESS_DENIED too. It seems that the cause of the issue is in fsync_fname_ext: fd = OpenTransientFile(fname, flags); /* * Some OSs don't allow us to open directories at all (Windows returns * EACCES), just ignore the error in that case. If desired also silently * ignoring errors about unreadable files. Log others. */ if (fd < 0 && isdir && (errno == EISDIR || errno == EACCES)) return 0; Maybe we need to take isdir into account before calling OpenTransientFile on Windows... Please look at the simple patch that makes the recoverycheck test pass. Best regards, Alexander
Attachment
pgsql-bugs by date: