Re: Adding basic NUMA awareness - Preliminary feedback and outline for an extensible approach - Mailing list pgsql-hackers

From Cédric Villemain
Subject Re: Adding basic NUMA awareness - Preliminary feedback and outline for an extensible approach
Date
Msg-id c892aa85-9e09-42e5-bf74-2302f9693bf4@data-bene.io
Whole thread Raw
In response to Adding basic NUMA awareness  (Tomas Vondra <tomas@vondra.me>)
List pgsql-hackers





>> On 7/7/25 16:51, Cédric Villemain wrote:
>>>>> * Others might use it to integrate PostgreSQL's own resources (e.g.,
>>>>> "areas" of shared buffers) into policies.
>>>>>
>>>>> Hope this perspective is helpful.
>>>>
>>>> Can you explain how you want to manage this by an extension defined at
>>>> the SQL level, when most of this stuff has to be done when setting up
>>>> shared memory, which is waaaay before we have any access to catalogs?
>>>
>>> I should have said module instead, I didn't follow carefully but at some
>>> point there were discussion about shared buffers resized "on-line".
>>> Anyway, it was just to give some few examples, maybe this one is to be
>>> considered later (I'm focused on cgroup/psi, and precisely reassigning
>>> PIDs as needed).
>>>
>>
>> I don't know. I have a hard time imagining what exactly would the
>> policies / profiles do exactly to respond to changes in the system
>> utilization. And why should that interfere with this patch ...
>>
>> The main thing patch series aims to implement is partitioning different
>> pieces of shared memory (buffers, freelists, ...) to better work for
>> NUMA. I don't think there's that many ways to do this, and I doubt it
>> makes sense to make this easily customizable from external modules of
>> any kind. I can imagine providing some API allowing to isolate the
>> instance on selected NUMA nodes, but that's about it.
>>
>> Yes, there's some relation to the online resizing of shared buffers, in
>> which case we need to "refresh" some of the information. But AFAICS it's
>> not very extensive (on top of what already needs to happen after the
>> resize), and it'd happen within the boundaries of the partitioning
>> scheme. There's not that much flexibility.
>>
>> The last bit (pinning backends to a NUMA node) is experimental, and
>> mostly intended for easier evaluation of the earlier parts (e.g. to
>> limit the noise when processes get moved to a CPU from a different NUMA
>> node, and so on).
> 
> The backend pinning can be done by replacing your patch on proc.c to 
> call an external profile manager doing exactly the same thing maybe ?
> 
> Similar to:
> pmroutine = GetPmRoutineForInitProcess();
> if (pmroutine != NULL &&
>      pmroutine->init_process != NULL)
>      pmroutine->init_process(MyProc);
> 
> ...
> 
> pmroutine = GetPmRoutineForInitAuxilliary();
> if (pmroutine != NULL &&
>      pmroutine->init_auxilliary != NULL)
>      pmroutine->init_auxilliary(MyProc);
> 
> Added on some rare places should cover most if not all the requirement 
> around process placement (process_shared_preload_libraries() is called 
> earlier in the process creation I believe).
> 

After a first read I think this works for patches 002 and 005. For this 
last one, InitProcGlobal() may setup things as you do but then expose 
the choice a bit later, basically in places where you added the if 
condition on the GUC: numa_procs_interleave).


-- 
Cédric Villemain +33 6 20 30 22 52
https://www.Data-Bene.io
PostgreSQL Support, Expertise, Training, R&D




pgsql-hackers by date:

Previous
From: Vik Fearing
Date:
Subject: Re: What is a typical precision of gettimeofday()?
Next
From: Dilip Kumar
Date:
Subject: Re: A recent message added to pg_upgade