Re: Adding basic NUMA awareness - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Adding basic NUMA awareness
Date
Msg-id zdc2y4q4lfl4nlckd2tm64kkg6hyri6yajajreqjsg43uguzyg@svfdz2wkykbr
Whole thread Raw
In response to Re: Adding basic NUMA awareness  (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>)
List pgsql-hackers
Hi,

On 2025-07-10 14:17:21 +0000, Bertrand Drouvot wrote:
> On Wed, Jul 09, 2025 at 03:42:26PM -0400, Andres Freund wrote:
> > I wonder if we should *increase* the size of shared_buffers whenever huge
> > pages are in use and there's padding space due to the huge page
> > boundaries. Pretty pointless to waste that memory if we can instead use if for
> > the buffer pool.  Not that big a deal with 2MB huge pages, but with 1GB huge
> > pages...
> 
> I think that makes sense, except maybe for operations that need to scan
> the whole buffer pool (i.e related to BUF_DROP_FULL_SCAN_THRESHOLD)?

I don't think the increases here are big enough for that to matter, unless
perhaps you're using 1GB huge pages. But if you're concerned about dropping
tables very fast (i.e. you're running schema change heavy regression tests),
you're not going to use 1GB huge pages.



> > > 5) v1-0005-NUMA-interleave-PGPROC-entries.patch
> > >
> > > Another area that seems like it might benefit from NUMA is PGPROC, so I
> > > gave it a try. It turned out somewhat challenging. Similarly to buffers
> > > we have two pieces that need to be located in a coordinated way - PGPROC
> > > entries and fast-path arrays. But we can't use the same approach as for
> > > buffers/descriptors, because
> > >
> > > (a) Neither of those pieces aligns with memory page size (PGPROC is
> > > ~900B, fast-path arrays are variable length).
> > 
> > > (b) We could pad PGPROC entries e.g. to 1KB, but that'd still require
> > > rather high max_connections before we use multiple huge pages.
> > 
> > Right now sizeof(PGPROC) happens to be multiple of 64 (i.e. the most common
> > cache line size)
> 
> Oh right, it's currently 832 bytes and the patch extends that to 840 bytes.

I don't think the patch itself is the problem - it really is just happenstance
that it's a multiple of the line size right now. And it's not on common Armv8
platforms...


> With a bit of reordering:
> 
> That could be back to 832 (the order does not make sense logically speaking
> though).

I don't think shrinking the size in a one-off way just to keep the
"accidental" size-is-multiple-of-64 property is promising. It'll just get
broken again.  I think we should:

a) pad the size of PGPROC to a cache line (or even to a subsequent power of 2,
   to make array access cheaper, right now that involves actual
   multiplications rather than shifts or indexed `lea` instructions).

   That's probably just a pg_attribute_aligned

b) Reorder PGPROC to separate frequently modified from almost-read-only data,
   to increase cache hit ratio.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Adding basic NUMA awareness
Next
From: Aleksander Alekseev
Date:
Subject: Re: Missing NULL check after calling ecpg_strdup