Re: Direct I/O - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: Direct I/O |
Date | |
Msg-id | CA+hUKG+DoL6GZyUpoVhbfT2+N53g_DXZr5fvq1-Wh=aEtaY92w@mail.gmail.com Whole thread Raw |
In response to | Re: Direct I/O (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>) |
Responses |
Re: Direct I/O
|
List | pgsql-hackers |
On Wed, Jan 25, 2023 at 8:57 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > Thanks. I have some comments on > v3-0002-Add-io_direct-setting-developer-only.patch: > > 1. I think we don't need to overwrite the io_direct_string in > check_io_direct so that show_io_direct can be avoided. Thanks for looking at this, and sorry for the late response. Yeah, agreed. > 2. check_io_direct can leak the flags memory - when io_direct is not > supported or for an invalid list syntax or an invalid option is > specified. > > I have addressed my review comments as a delta patch on top of v3-0002 > and added it here as v1-0001-Review-comments-io_direct-GUC.txt. Thanks. Your way is nicer. I merged your patch and added you as a co-author. > Some comments on the tests added: > > 1. Is there a way to know if Direct IO for WAL and data has been > picked up programmatically? IOW, can we know if the OS page cache is > bypassed? I know an external extension pgfincore which can help here, > but nothing in the core exists AFAICS. Right, that extension can tell you how many pages are in the kernel page cache which is quite interesting for this. I also once hacked up something primitive to see *which* pages are in kernel cache, so I could join that against pg_buffercache to measure double buffering, when I was studying the "smile" shape where pgbench TPS goes down and then back up again as you increase shared_buffers if the working set is nearly as big as physical memory (code available in a link from [1]). Yeah, I agree it might be nice for human investigators to put something like that in contrib/pg_buffercache, but I'm not sure you could rely on it enough for an automated test, though, 'cause it probably won't work on some file systems and the tests would probably fail for random transient reasons (for example: some systems won't kick pages out of kernel cache if they were already there, just because we decided to open the file with O_DIRECT). (I got curious about why mincore() wasn't standardised along with mmap() and all that jazz; it seems the BSD and later Sun people who invented all those interfaces didn't think that one was quite good enough[2], but every (?) Unixoid OS copied it anyway, with variations... Apparently the Windows thing is called VirtualQuery()). > 2. Can we combine these two append_conf to a single statement? > +$node->append_conf('io_direct', 'data,wal,wal_init'); > +$node->append_conf('shared_buffers', '64kB'); # tiny to force I/O OK, sure, done. And also oops, that was completely wrong and not working the way I had it in that version... > 3. A nitpick: Can we split these queries multi-line instead of in a single line? > +$node->safe_psql('postgres', 'begin; create temporary table t2 as > select 1 as i from generate_series(1, 10000); update t2 set i = i; > insert into t2count select count(*) from t2; commit;'); OK. > 4. I don't think we need to stop the node before the test ends, no? > +$node->stop; > + > +done_testing(); Sure, but why not? Otherwise, I rebased, and made a couple more changes: I found a line of the manual about wal_sync_method that needed to be removed: - The <literal>open_</literal>* options also use <literal>O_DIRECT</literal> if available. In fact that sentence didn't correctly document the behaviour in released branches (wal_level=minimal is also required for that, so probably very few people ever used it). I think we should adjust that misleading sentence in back-branches, separately from this patch set. I also updated the commit message to highlight the only expected user-visible change for this, namely the loss of the above incorrectly documented obscure special case, which is replaced by the less obscure new setting io_direct=wal, if someone still wants that behaviour. Also a few minor comment changes. [1] https://twitter.com/MengTangmu/status/994770040745615361 [2] http://kos.enix.org/pub/gingell8.pdf
Attachment
pgsql-hackers by date: