Thread: NUMA settings
Hello, I am trying to figure out the recommended settings for a PG dedicated machine regarding NUMA. I assume that the shared buffers are using Huge Phages only. Please correct if I am wrong: 1) postgres is started with numactl --interleave=all, in order to spread memory pages evenly on nodes. 2) wm.swappiness is left to the default 60 value, because Huge Pages never swap, and we wish the idle backend to be swapped out if necessary. 3) vm.zone_reclaim_mode = 0. I am not sure it is the right choice. 4) kernel.numa_balancing = 1. Only if it is confirmed that it will not affect postgres, because started with the interleave policy. Thanks
On Wed, 2020-04-29 at 08:54 +0200, Marc Rechté wrote: > I am trying to figure out the recommended settings for a PG dedicated > machine regarding NUMA. > > I assume that the shared buffers are using Huge Phages only. Please > correct if I am wrong: > > 1) postgres is started with numactl --interleave=all, in order to spread > memory pages evenly on nodes. > 2) wm.swappiness is left to the default 60 value, because Huge Pages > never swap, and we wish the idle backend to be swapped out if necessary. > 3) vm.zone_reclaim_mode = 0. I am not sure it is the right choice. > 4) kernel.numa_balancing = 1. Only if it is confirmed that it will not > affect postgres, because started with the interleave policy. I am not the top expert on this, but as far as I can tell: - Disabling NUMA is good if you want to run a single database cluster on the machine that should use all resources. If you want to run several clusters that share the resources, leaving NUMA support enabled might be the better thing to do. - If you can, disable NUMA in the BIOS, on as low a level as possible. - I think "kernel.numa_balancing" should be 0. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
Hi, On 2020-04-29 10:50:54 +0200, Laurenz Albe wrote: > On Wed, 2020-04-29 at 08:54 +0200, Marc Rechté wrote: > > I am trying to figure out the recommended settings for a PG dedicated > > machine regarding NUMA. > > > > I assume that the shared buffers are using Huge Phages only. Please > > correct if I am wrong: > > > > 1) postgres is started with numactl --interleave=all, in order to spread > > memory pages evenly on nodes. > > 2) wm.swappiness is left to the default 60 value, because Huge Pages > > never swap, and we wish the idle backend to be swapped out if necessary. > > 3) vm.zone_reclaim_mode = 0. I am not sure it is the right choice. > > 4) kernel.numa_balancing = 1. Only if it is confirmed that it will not > > affect postgres, because started with the interleave policy. > > I am not the top expert on this, but as far as I can tell: > > - Disabling NUMA is good if you want to run a single database cluster > on the machine that should use all resources. > > If you want to run several clusters that share the resources, leaving > NUMA support enabled might be the better thing to do. > > - If you can, disable NUMA in the BIOS, on as low a level as possible. I am doubtful that that's generally going to be beneficial. I think the strategy of starting postgres with interleave is probably a better answer. - Andres
> Hi, > > On 2020-04-29 10:50:54 +0200, Laurenz Albe wrote: >> On Wed, 2020-04-29 at 08:54 +0200, Marc Rechté wrote: >>> I am trying to figure out the recommended settings for a PG dedicated >>> machine regarding NUMA. >>> >>> I assume that the shared buffers are using Huge Phages only. Please >>> correct if I am wrong: >>> >>> 1) postgres is started with numactl --interleave=all, in order to spread >>> memory pages evenly on nodes. >>> 2) wm.swappiness is left to the default 60 value, because Huge Pages >>> never swap, and we wish the idle backend to be swapped out if necessary. >>> 3) vm.zone_reclaim_mode = 0. I am not sure it is the right choice. >>> 4) kernel.numa_balancing = 1. Only if it is confirmed that it will not >>> affect postgres, because started with the interleave policy. >> >> I am not the top expert on this, but as far as I can tell: >> >> - Disabling NUMA is good if you want to run a single database cluster >> on the machine that should use all resources. >> >> If you want to run several clusters that share the resources, leaving >> NUMA support enabled might be the better thing to do. >> >> - If you can, disable NUMA in the BIOS, on as low a level as possible. > > I am doubtful that that's generally going to be beneficial. I think the > strategy of starting postgres with interleave is probably a better > answer. > > - Andres > > Thanks for answers. Further readings make me think that we should *not* start postgres with numactl --interleave=all: this may have counter productive effect on backends anon memory (heap, stack). IMHO, what is important is to use Huge Pages for shared buffers: they are allocated (reserved) by the kernel at boot time and spread evenly on all nodes. On top of that they never swap. My (temp) conclusions are following: vm.zone_reclaim_mode = 0 kernel.numa_balancing = 0 (still not sure with that choice) wm.swappiness = 60 (default) start postgres as usual (no numactl)
On Tue, 2020-05-05 at 07:56 +0200, Marc Rechté wrote: > Thanks for answers. Further readings make me think that we should *not* > start postgres with numactl --interleave=all: this may have counter > productive effect on backends anon memory (heap, stack). IMHO, what is > important is to use Huge Pages for shared buffers: they are allocated > (reserved) by the kernel at boot time and spread evenly on all nodes. On > top of that they never swap. > > My (temp) conclusions are following: > vm.zone_reclaim_mode = 0 > kernel.numa_balancing = 0 (still not sure with that choice) > wm.swappiness = 60 (default) > start postgres as usual (no numactl) Thanks for sharing your insights. I think that "vm.swappiness" should be 0. PostgreSQL does its own memory management, any swapping by the kernel would go against that. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
> On Tue, 2020-05-05 at 07:56 +0200, Marc Rechté wrote: >> Thanks for answers. Further readings make me think that we should *not* >> start postgres with numactl --interleave=all: this may have counter >> productive effect on backends anon memory (heap, stack). IMHO, what is >> important is to use Huge Pages for shared buffers: they are allocated >> (reserved) by the kernel at boot time and spread evenly on all nodes. On >> top of that they never swap. >> >> My (temp) conclusions are following: >> vm.zone_reclaim_mode = 0 >> kernel.numa_balancing = 0 (still not sure with that choice) >> wm.swappiness = 60 (default) >> start postgres as usual (no numactl) > > Thanks for sharing your insights. > > I think that "vm.swappiness" should be 0. > PostgreSQL does its own memory management, any swapping by the kernel > would go against that. > > Yours, > Laurenz Albe > As said in the post, we wish the idle backends to be swapped out if necessary. Therefore lowering swappiness would produce the opposite effect: swapping out Linux file cache rather than backends memory.
On Tue, 2020-05-05 at 10:11 +0200, Marc Rechté wrote: > > I think that "vm.swappiness" should be 0. > > PostgreSQL does its own memory management, any swapping by the kernel > > would go against that. > > > > Yours, > > Laurenz Albe > > > As said in the post, we wish the idle backends to be swapped out if > necessary. Therefore lowering swappiness would produce the opposite > effect: swapping out Linux file cache rather than backends memory. I see. Sorry for not paying attention. An idle backend consumes only a few MB of RAM, though. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com