Re: Adding basic NUMA awareness - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Adding basic NUMA awareness
Date
Msg-id 2db78610-b480-4aa0-a1b6-57f1c2dcb708@vondra.me
Whole thread Raw
In response to Re: Adding basic NUMA awareness  (Andres Freund <andres@anarazel.de>)
Responses Re: Adding basic NUMA awareness
Re: Adding basic NUMA awareness
List pgsql-hackers
On 1/13/26 01:24, Andres Freund wrote:
> Hi,
> 
> On 2026-01-12 19:10:00 -0500, Andres Freund wrote:
>> On 2026-01-13 00:58:49 +0100, Tomas Vondra wrote:
>>> On 1/10/26 02:42, Andres Freund wrote:
>>>> psql -Xq -c 'SELECT pg_buffercache_evict_all();' -c 'SELECT numa_node, sum(size) FROM pg_shmem_allocations_numa
GROUPBY 1;' && perf stat --per-socket  -M memory_bandwidth_read,memory_bandwidth_write -a psql -c 'SELECT sum(abalance)
FROMpgbench_accounts;'
 
>>
>>> And then I initialized pgbench with scale that is much larger than
>>> shared buffers, but fits into RAM. So cached, but definitely > NB/4. And
>>> then I ran
>>>
>>>   select * from pgbench_accounts offset 1000000000;
>>>
>>> which does a sequential scan with the circular buffer you mention abobe
>>
>> Did you try it with the query I suggested? One plausible reason why you did
>> not see an effect with your query is that with a huge offset you actually
>> never deform the tuple, which is an important and rather latency sensitive
>> path.
> 
> Btw, this doesn't need anywhere close to as much data, it should be visible as
> soon as you're >> L3.
> 
> To show why
>   SELECT * FROM pgbench_accounts OFFSET 100000000
> doesn't show an effect but
>   SELECT sum(abalance) FROM pgbench_accounts;
> 
> does, just look at the difference using the perf command I posted. Here on a
> scale 200.
> 

OK, I tried with smaller scale (and larger shared buffers, to make the
data set smaller than NBuffers/4).

On the azure VM (scale 200, 32GB sb), there's still no difference:

numactl --membind 0 --cpunodebind 0
297.770 ms

numactl --membind 0 --cpunodebind 1
297.924 ms


and on xeon (scale 100, 8GB sb), there's a bit of a difference:

numactl --membind 0 --cpunodebind 0
236.451 ms

numactl --membind 0 --cpunodebind 1
298.418 ms

So roughly 20%. There's also a bigger difference in the perf, about
5944.3 MB/s vs. 5202.3 MB/s.

> 
> Interestingly I do see a performance difference, albeit a smaller one, even
> with OFFSET. I see similar numbers on two different 2 socket machines.
> 

I wonder how significant is the number of sockets. The Azure is a single
socket with 2 NUMA nodes, so maybe the latency differences are not
significant enough to affect this kind of tests.

The xeon is a 2-socket machine, but it's also older (~10y).


regards

-- 
Tomas Vondra




pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Adding basic NUMA awareness
Next
From: Tomas Vondra
Date:
Subject: Re: Adding basic NUMA awareness