Re: Add os_page_num to pg_buffercache - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Add os_page_num to pg_buffercache |
Date | |
Msg-id | d96c343b-959a-45ae-b0d8-e728bb1d0ed2@vondra.me Whole thread Raw |
In response to | Re: Add os_page_num to pg_buffercache (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>) |
Responses |
Re: Add os_page_num to pg_buffercache
|
List | pgsql-hackers |
On 7/1/25 18:34, Bertrand Drouvot wrote: > Hi, > > On Tue, Jul 01, 2025 at 04:31:01PM +0200, Tomas Vondra wrote: >> On 7/1/25 15:45, Bertrand Drouvot wrote: >> >> I took a quick look on this, > > Thanks for looking at it! > >> and I doubt we want to change the schema of >> pg_buffercache like this. Adding columns is fine, but it seems rather >> wrong to change the cardinality. The view is meant to be 1:1 mapping for >> buffers, but now suddenly it's 1:1 with memory pages. Or rather (buffer, >> page), to be precise. >> >> I think this will break a lot of monitoring queries, and possibly in a >> very subtle way - especially on systems with huge pages, where most >> buffers will have one row, but then a buffer that happens to be split on >> two pages will have two rows. That seems not great. >> >> IMHO it'd be better to have a new view for this info, something like >> pg_buffercache_pages, or something like that. > > That's a good point, fully agree! > >> But I'm also starting to question if the patch really is that useful. >> Sure, people may not have NUMA support enabled (e.g. on non-linux >> platforms), and even if they do the _numa view is quite expensive. >> > > Yeah, it's not for day to day activities, more for configuration testing and > also for development activity/testing. > > For example, If I set BLCKSZ to 8KB and enable huge pages (2MB), then I may > expect to see buffers not spread across pages. > > But what I can see is: > > SELECT > pages_per_buffer, > COUNT(*) as buffer_count > FROM ( > SELECT bufferid, COUNT(*) as pages_per_buffer > FROM pg_buffercache > GROUP BY bufferid > ) subq > GROUP BY pages_per_buffer > ORDER BY pages_per_buffer; > > pages_per_buffer | buffer_count > ------------------+-------------- > 1 | 261120 > 2 | 1024 > > This is due to the shared buffers being aligned to PG_IO_ALIGN_SIZE. > > If I change it to: > > BufferManagerShmemInit(void) > > /* Align buffer pool on IO page size boundary. */ > BufferBlocks = (char *) > - TYPEALIGN(PG_IO_ALIGN_SIZE, > + TYPEALIGN(2 * 1024 * 1024, > ShmemInitStruct("Buffer Blocks", > - NBuffers * (Size) BLCKSZ + PG_IO_ALIGN_SIZE, > + NBuffers * (Size) BLCKSZ + (2 * 1024 * 1024), > &foundBufs)); > > Then I get: > > pages_per_buffer | buffer_count > ------------------+-------------- > 1 | 262144 > (1 row) > > > So we've been able to see that some buffers were spread across pages due to > shared buffer alignment on PG_IO_ALIGN_SIZE. And that if we change the alignment > to be set to 2MB then I don't see any buffers spread across pages anymore. > > I think that it helps "visualize" some configuration or code changes. > > What are your thoughts? > But isn't the _numa view good enough for this? Sure, you need NUMA support for it, and it may take a fair amount of time, but how often you need to do such queries? I don't plan to block improving this use case, but I'm not sure it's worth the effort. cheers -- Tomas Vondra
pgsql-hackers by date: