BUG #15438: Standby corruption after "Too many open files in system"error - Mailing list pgsql-bugs
From | PG Bug reporting form |
---|---|
Subject | BUG #15438: Standby corruption after "Too many open files in system"error |
Date | |
Msg-id | 15438-bbe4679a595d795f@postgresql.org Whole thread Raw |
Responses |
Re: BUG #15438: Standby corruption after "Too many open files insystem" error
|
List | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 15438 Logged by: Juanjo Santamaría Email address: juanjo.santamaria@gmail.com PostgreSQL version: 10.4 Operating system: Redhat 7.3 Description: After some time working under poor conditions, the open file limit was too low for the postgres user, the streaming replica ended in a corrupted state. The primary was working the whole time and recreating the standby replica, after configuring the user limits, seems to solve the issue. The log is quite verbose, so I have tried to summarize at least one occurrency for every error: Oct 15 10:47:54 postgres[9507]: [9-1] LOG: started streaming WAL from primary at 6/BE000000 on timeline 3 Oct 15 10:48:38 postgres[9507]: [10-1] FATAL: could not receive data from WAL stream: ERROR: epoll_create1 failed: Too many open files in system Oct 15 10:48:38 postgres[40098]: [15-1] LOG: record with incorrect prev-link 6/A1A993C8 at 6/BEA99400 Oct 15 10:48:38 postgres[9550]: [9-1] FATAL: could not connect to the primary server: server closed the connection unexpectedly Oct 15 10:48:38 postgres[9550]: [9-2] #011#011This probably means the server terminated abnormally Oct 15 10:48:38 postgres[9550]: [9-3] #011#011before or while processing the request. Oct 15 10:48:53 postgres[9586]: [9-1] LOG: started streaming WAL from primary at 6/BE000000 on timeline 3 Oct 15 10:49:32 postgres[9586]: [10-1] FATAL: could not receive data from WAL stream: ERROR: epoll_create1 failed: Too many open f iles in system Oct 15 10:49:32 postgres[40098]: [16-1] LOG: invalid record length at 6/BEABD0F8: wanted 24, got 0 Oct 15 10:50:31 postgres[9765]: [9-1] LOG: started streaming WAL from primary at 6/BE000000 on timeline 3 Oct 15 10:50:39 postgres[9765]: [10-1] FATAL: could not receive data from WAL stream: ERROR: epoll_create1 failed: Too many open files in system Oct 15 10:50:39 postgres[40098]: [19-1] LOG: invalid resource manager ID 101 at 6/BEAE3708 Oct 15 10:51:01 postgres[9767]: [9-1] FATAL: could not connect to the primary server: server closed the connection unexpectedly Oct 15 10:51:01 postgres[9767]: [9-2] #011#011This probably means the server terminated abnormally Oct 15 10:51:01 postgres[9767]: [9-3] #011#011before or while processing the request. Oct 15 10:51:06 postgres[9783]: [9-1] LOG: started streaming WAL from primary at 6/BE000000 on timeline 3 Oct 15 10:51:39 postgres[9783]: [10-1] FATAL: could not receive data from WAL stream: ERROR: epoll_create1 failed: Too many open files in system Oct 15 10:51:39 postgres[40098]: [20-1] LOG: record with incorrect prev-link 3E/746F6F72 at 6/BEC3D5E8 Oct 15 11:03:39 postgres[10850]: [9-1] FATAL: could not connect to the primary server: FATAL: could not open relation mapping file "global/pg_filenode.map": Too many open files in system Oct 15 12:00:10 postgres[40101]: [3732-1] LOG: restartpoint complete: wrote 61 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 re cycled; write=6.159 s, sync=0.018 s, total=6.223 s; sync files=16, longest=0.006 s, average=0.001 s; distance=451 kB, estimate=2215 kB Oct 15 12:01:40 postgres[15182]: [10-1] FATAL: could not receive data from WAL stream: ERROR: epoll_create1 failed: Too many open files in system Oct 15 12:01:40 postgres[40098]: [80-1] LOG: invalid resource manager ID 64 at 6/BEE34F00 Oct 15 12:01:42 postgres[15711]: [9-1] LOG: started streaming WAL from primary at 6/BE000000 on timeline 3 Oct 15 12:05:38 postgres[15711]: [10-1] FATAL: could not receive data from WAL stream: ERROR: epoll_create1 failed: Too many open files in system Oct 15 12:05:38 postgres[40098]: [81-1] LOG: unexpected pageaddr 6/A1E6E000 in log segment 0000000300000006000000BE, offset 15130624 Oct 16 01:34:15 postgres[18890]: [9-1] LOG: started streaming WAL from primary at 6/C3000000 on timeline 3 Oct 16 01:34:15 postgres[18890]: [10-1] FATAL: could not receive data from WAL stream: ERROR: could not open file "pg_wal/0000000300000006000000C3": Too many open files in system Oct 16 17:58:44 postgres[38030]: [9-1] FATAL: the database system is starting up Oct 16 17:58:47 postgres[38029]: [9-1] LOG: entering standby mode Oct 16 17:58:47 postgres[38029]: [10-1] LOG: redo starts at 6/CF9B3598 Oct 16 17:58:47 postgres[38029]: [11-1] LOG: consistent recovery state reached at 6/CF9C53B8 Oct 16 17:58:47 postgres[38026]: [8-1] LOG: database system is ready to accept read only connections Oct 16 17:58:48 postgres[38029]: [12-1] FATAL: invalid memory alloc request size 1715044352 Oct 16 17:58:48 postgres[38026]: [9-1] LOG: startup process (PID 38029) exited with exit code 1 Oct 16 17:58:48 postgres[38026]: [10-1] LOG: terminating any other active server processes Oct 16 17:58:48 postgres[38026]: [11-1] LOG: database system is shut down
pgsql-bugs by date: