Home > mailing lists

Re: Add os_page_num to pg_buffercache - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: Add os_page_num to pg_buffercache
Date	July 1 19:45:37
Msg-id	d96c343b-959a-45ae-b0d8-e728bb1d0ed2@vondra.me Whole thread Raw
In response to	Re: Add os_page_num to pg_buffercache (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>)
Responses	Re: Add os_page_num to pg_buffercache
List	pgsql-hackers

Tree view

On 7/1/25 18:34, Bertrand Drouvot wrote:
> Hi,
> 
> On Tue, Jul 01, 2025 at 04:31:01PM +0200, Tomas Vondra wrote:
>> On 7/1/25 15:45, Bertrand Drouvot wrote:
>>
>> I took a quick look on this,
> 
> Thanks for looking at it!
> 
>> and I doubt we want to change the schema of
>> pg_buffercache like this. Adding columns is fine, but it seems rather
>> wrong to change the cardinality. The view is meant to be 1:1 mapping for
>> buffers, but now suddenly it's 1:1 with memory pages. Or rather (buffer,
>> page), to be precise.
>>
>> I think this will break a lot of monitoring queries, and possibly in a
>> very subtle way - especially on systems with huge pages, where most
>> buffers will have one row, but then a buffer that happens to be split on
>> two pages will have two rows. That seems not great.
>>
>> IMHO it'd be better to have a new view for this info, something like
>> pg_buffercache_pages, or something like that.
> 
> That's a good point, fully agree!
> 
>> But I'm also starting to question if the patch really is that useful.
>> Sure, people may not have NUMA support enabled (e.g. on non-linux
>> platforms), and even if they do the _numa view is quite expensive.
>>
> 
> Yeah, it's not for day to day activities, more for configuration testing and
> also for development activity/testing.
> 
> For example, If I set BLCKSZ to 8KB and enable huge pages (2MB), then I may
> expect to see buffers not spread across pages.
> 
> But what I can see is:
> 
> SELECT
>     pages_per_buffer,
>     COUNT(*) as buffer_count
> FROM (
>     SELECT bufferid, COUNT(*) as pages_per_buffer
>     FROM pg_buffercache
>     GROUP BY bufferid
> ) subq
> GROUP BY pages_per_buffer
> ORDER BY pages_per_buffer;
> 
>  pages_per_buffer | buffer_count
> ------------------+--------------
>                 1 |       261120
>                 2 |         1024
> 
> This is due to the shared buffers being aligned to PG_IO_ALIGN_SIZE.
> 
> If I change it to:
> 
> BufferManagerShmemInit(void)
> 
>         /* Align buffer pool on IO page size boundary. */
>         BufferBlocks = (char *)
> -               TYPEALIGN(PG_IO_ALIGN_SIZE,
> +               TYPEALIGN(2 * 1024 * 1024,
>                                   ShmemInitStruct("Buffer Blocks",
> -                                                                 NBuffers * (Size) BLCKSZ + PG_IO_ALIGN_SIZE,
> +                                                                 NBuffers * (Size) BLCKSZ + (2 * 1024 * 1024),
>                                                                   &foundBufs));
> 
> Then I get:
> 
>  pages_per_buffer | buffer_count
> ------------------+--------------
>                 1 |       262144
> (1 row)
> 
> 
> So we've been able to see that some buffers were spread across pages due to 
> shared buffer alignment on PG_IO_ALIGN_SIZE. And that if we change the alignment
> to be set to 2MB then I don't see any buffers spread across pages anymore.
> 
> I think that it helps "visualize" some configuration or code changes.
> 
> What are your thoughts?
> 

But isn't the _numa view good enough for this? Sure, you need NUMA
support for it, and it may take a fair amount of time, but how often you
need to do such queries? I don't plan to block improving this use case,
but I'm not sure it's worth the effort.


cheers

-- 
Tomas Vondra

pgsql-hackers by date:

From: Tom Lane
Date: 01 July, 19:41:50
Subject: Re: GNU/Hurd portability patches

From: Peter Eisentraut
Date: 01 July, 19:49:43
Subject: Re: wrong comments in rewriteTargetListIU

Re: Add os_page_num to pg_buffercache - Mailing list pgsql-hackers

Previous

Next