Re: Large block sizes support in Linux - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: Large block sizes support in Linux |
Date | |
Msg-id | Zf5BZVA4UhbSlLa4@momjian.us Whole thread Raw |
In response to | Re: Large block sizes support in Linux (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Responses |
Re: Large block sizes support in Linux
|
List | pgsql-hackers |
On Fri, Mar 22, 2024 at 10:31:11PM +0100, Tomas Vondra wrote: > Right, but things change over time - current storage devices support > much larger sectors (LBA format), usually 4K. And if you do I/O with > this size, it's usually atomic. > > AFAIK if you built Postgres with 4K pages, on a device with 4K LBA > format, that would not need full-page writes - we always do I/O in 4k > pages, and block layer does I/O (during writeback from page cache) with > minimum guaranteed size = logical block size. 4K are great for OLTP > systems in general, it'd be even better if we didn't need to worry about > torn pages (but the tricky part is to be confident it's safe to disable > them on a particular system). Yes, even if the file system is 8k, and the storage is 8k, we only know that torn pages are impossible if the file system never overwrites existing 8k pages, but writes new ones and then makes it active. I think ZFS does that to handle snapshots. > The other thing is - is there a reliable way to say when the guarantees > actually apply? I mean, how would the administrator *know* it's safe to > set full_page_writes=off, or even better how could we verify this when > the database starts (and complain if it's not safe to disable FPW)? Yes, this is quite hard to know. Our docs have: https://www.postgresql.org/docs/current/wal-reliability.html Another risk of data loss is posed by the disk platter write operations themselves. Disk platters are divided into sectors, commonly 512 bytes each. Every physical read or write operation processes a whole sector. When a write request arrives at the drive, it might be for some multiple of 512 bytes (PostgreSQL typically writes 8192 bytes, or 16 sectors, at a time), and the process of writing could fail due to power loss at any time, meaning some of the 512-byte sectors were written while others were not. To guard against such failures, PostgreSQL periodically writes full page images to permanent WAL storage before modifying the actual page on disk. By doing this, during crash recovery PostgreSQL can --> restore partially-written pages from WAL. If you have file-system --> software that prevents partial page writes (e.g., ZFS), you can turn off --> this page imaging by turning off the full_page_writes parameter. --> Battery-Backed Unit (BBU) disk controllers do not prevent partial page --> writes unless they guarantee that data is written to the BBU as full --> (8kB) pages. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
pgsql-hackers by date: