Re: Error:could not extend file " with FileFallocate(): No space left on device - Mailing list pgsql-general
| From | Aleksandr Fedorov |
|---|---|
| Subject | Re: Error:could not extend file " with FileFallocate(): No space left on device |
| Date | |
| Msg-id | c4d4536e-2f22-4a27-9499-6cafd1e7a941@postgrespro.ru Whole thread Raw |
| In response to | Error:could not extend file " with FileFallocate(): No space left on device (Pecsök Ján <jan.pecsok@profinit.eu>) |
| List | pgsql-general |
Dear community,
Based on the analysis of logs collected from several incidents under OEL 8.10 / 9.3, the most likely cause is local exhaustion of free space in an allocation group in the XFS filesystem.
Further investigation revealed that a similar issue is documented in the Red Hat knowledge base (https://access.redhat.com/solutions/7129010),
describing ENOSPC errors from the fallocate() function in XFS filesystems during PostgreSQL backup operations.
Red Hat references the commit https://github.com/torvalds/linux/commit/6773da870ab89123d1b513da63ed59e32a29cb77 and
believes that this kernel fix may address the PostgreSQL issue.
After analyzing the change set from this commit, we identified the following combination of conditions that can trigger the ENOSPC error:
1. Presence of delayed allocations (committed but not yet written to disk).
2. Insufficient free space in the allocation group to cover all pending delayed allocations.
Subsequent search of the PostgreSQL community knowledge base led to the message https://www.postgresql.org/message-id/50A117B6.5030300@optionshouse.com.
Important points to highlight from this message:
1. Since kernel versions 2.6.x, XFS has implemented dynamic speculative preallocation.
2. The term "dynamic" means the preallocation size is regulated by internal heuristics.
3. These heuristics are based on file access patterns and history.
4. Additional space allocated during preallocation is intended to prevent file fragmentation.
5. When a file extends, its data is written into extents that may be distributed across one or more allocation groups.
6. Delayed allocation writes allow merging multiple allocations into preallocated space before writing to disk, reducing the number of extents and thus file fragmentation.
7. The logic for tracking additional space retains it as long as there are in-memory references to the file — for example, in an actively running PostgreSQL database.
8. The XFS filesystem itself considers this space as used.
9. The actual file size may exceed the 1GB limit (not to be confused with apparent size).
This is confirmed by information collected using the `du -h` command, which shows "actual" file sizes and helps to detect files larger than 1GB at the time of command execution (some even up to 2GB but we know that maximum size is 1GB).
There may have been more such files, but after the replica crash, file descriptors were released, causing the "actual" size to return to normal.
The dynamic allocator can be disabled by specifying the `allocsize` mount option when mounting the XFS filesystem.
We would like to share additional observations to help resolve the issue.
We were able to reproduce the original problem in two ways: directly on a PostgreSQL replica, and using a C program.
The first method is a test script (please see the attached README_test_pg.md) that uses the mount option `allocsize=$(1*1024*1024)` when mounting the disk where PGDATA is located.
The pgbench_accounts table is generated using the pgbench tool, and multiple copies of this table are created and populated in parallel.
During the process of filling these small tables (each table is no larger than 25 MB upon script completion), numerous delayed preallocation events occur, consuming free disk space.
The subsequent parallel INSERT statements then cause replica crashes because there is no contiguous free space left on the disk to extend the file of the large table.
Here an example of availabled free space in mounted points after replica is crashed with ENOSPC error ( pgdata_main is related to primary server and pgdata_repl is related to replica ):
Filesystem Type Size Used Avail Use% Mounted on
/dev/loop0 xfs 4.0G 4.0G 74M 99% /pgdata_main
/dev/loop3 xfs 4.0G 3.8G 280M 94% /pgdata_repl
You may observe that when the issue is reproduced and the replica crashes, the available disk space on the replica side appears larger than on the primary side.
However, the ENOSPC error in the logs indicates that disk space was exhausted — and this is indeed accurate: after the crash, all file descriptors were released, and the space previously preallocate files was reclaimed by the filesystem. Monitoring of files size using "du -h" right before the moment of crash and some time ago after that is showing that files sizes are decrease from 26 Mb to 25 Mb.
The issue does not occur when using the minimum possible value for the allocsize parameter, which is set to allocsize=$(4*1024).
Testing various values of allocsize under a specific workload on PostgreSQL with synchronous physical replication shows:
+----------------------+----------------------+---------------------------------------------------------------------+
| allocsize setting | Thread model | Result |
+----------------------+----------------------+---------------------------------------------------------------------+
| 1M | single thread | No issues observed |
+----------------------+----------------------+---------------------------------------------------------------------+
| 1M | multiple threads | Replica failed: "could not extend file ... No space left on device" |
+----------------------+----------------------+---------------------------------------------------------------------+
| 1GB | multiple threads | Primary failed: "could not extend file ... No space left on device" |
+----------------------+----------------------+---------------------------------------------------------------------+
| 4KB | multiple threads | No failure occurred |
+----------------------+----------------------+---------------------------------------------------------------------+
Another method is C program ( please find README_test_c.md ) which reproduces the ENOSPC error on kernel version 5.15.0-101.103.2.1.el9uek.x86_64.
The program first attempts to write 748 KB to a file and then allocate an additional 16 KB using posix_fallocate().
If posix_fallocate() fails, it displays a corresponding message and retries the operation.
The second attempt succeeds, indicating that space was available.
However, the program does not fully reproduce the potential PostgreSQL scenario, key differences are:
1. The program uses a single process with a single thread, whereas real systems involve one process with multiple threads or multiple processes operating on files.
2. The program uses a fixed buffer size for the mounted filesystem's journal, whereas in production environments the buffer size is dynamic (allocated based on historical space usage, i.e., workload-dependent).
3. The issue does not occur when there are multiple allocation groups that are completely empty.
In our practice, we identified two viable approaches:
1. As a permanent solution: Upgrade the UEK kernel.
Note that the fix has not been backported to all UEK versions:
- It is not present in UEK7 (5.15.x).
- It is present in UEK8 (6.12.x, available starting with OL 9.5) from kernel version 6.12.0-0.20.20 onwards.
2. As a temporary solution: Use the allocsize parameter to disable dynamic speculative preallocation.
However, since this does not fix the root cause, failures may still occur.
@font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4;}@font-face {font-family:Aptos;}p.MsoNormal, li.MsoNormal, div.MsoNormal {margin:0cm; font-size:11.0pt; font-family:"Aptos",sans-serif; mso-ligatures:standardcontextual; mso-fareast-language:EN-US;}span.StylE-mailovZprvy17 {mso-style-type:personal-compose; font-family:"Aptos",sans-serif; color:windowtext;}.MsoChpDefault {mso-style-type:export-only; font-size:11.0pt; mso-fareast-language:EN-US;}div.WordSection1 {page:WordSection1;} Dear community,
After upgrade of Posgres from version 13.5 to 16.2 we experience following error:
could not extend file "pg_tblspc/16401/PG_16_202307071/17820/3968302971" with FileFallocate(): No space left on device
We cannot easily replicate problem. It happens at randomly every 1-2 weeks of intensive query computation.
Was there some changes in space allocation from Posgres 13.5 to Posgres 16.2?
Database has size 91TB and has 27TB more space available.
Attachment
pgsql-general by date: