spin_delay() for ARM - Mailing list pgsql-hackers
From | Amit Khandekar |
---|---|
Subject | spin_delay() for ARM |
Date | |
Msg-id | CAJ3gD9fJqB5cUfBRQ2h=OG6Z2E5JRE6T7fL0uD=2ie7KXd0xBA@mail.gmail.com Whole thread Raw |
Responses |
Re: spin_delay() for ARM
|
List | pgsql-hackers |
Hi, We use (an equivalent of) the PAUSE instruction in spin_delay() for Intel architectures. The goal is to slow down the spinlock tight loop and thus prevent it from eating CPU and causing CPU starvation, so that other processes get their fair share of the CPU time. Intel documentation [1] clearly mentions this, along with other benefits of PAUSE, like, low power consumption, and avoidance of memory order violation while exiting the loop. Similar to PAUSE, the ARM architecture has YIELD instruction, which is also clearly documented [2]. It explicitly says that it is a way to hint the CPU that it is being called in a spinlock loop and this process can be preempted out. But for ARM, we are not using any kind of spin delay. For PG spinlocks, the goal of both of these instructions are the same, and also both architectures recommend using them in spinlock loops. Also, I found multiple places where YIELD is already used in same situations : Linux kernel [3] ; OpenJDK [4],[5] Now, for ARM implementations that don't implement YIELD, it runs as a no-op. Unfortunately the ARM machine I have does not implement YIELD. But recently there has been some ARM implementations that are hyperthreaded, so they are expected to actually do the YIELD, although the docs do not explicitly say that YIELD has to be implemented only by hyperthreaded implementations. I ran some pgbench tests to test PAUSE/YIELD on the respective architectures, once with the instruction present, and once with the instruction removed. Didn't see change in the TPS numbers; they were more or less same. For Arm, this was expected because my ARM machine does not implement it. On my Intel Xeon machine with 8 cores, I tried to test PAUSE also using a sample C program (attached spin.c). Here, many child processes (much more than CPUs) wait in a tight loop for a shared variable to become 0, while the parent process continuously increments a sequence number for a fixed amount of time, after which, it sets the shared variable to 0. The child's tight loop calls PAUSE in each iteration. What I hoped was that because of PAUSE in children, the parent process would get more share of the CPU, due to which, in a given time, the sequence number will reach a higher value. Also, I expected the CPU cycles spent by child processes to drop down, thanks to PAUSE. None of these happened. There was no change. Possibly, this testcase is not right. Probably the process preemption occurs only within the set of hyperthreads attached to a single core. And in my testcase, the parent process is the only one who is ready to run. Still, I have anyway attached the program (spin.c) for archival; in case somebody with a YIELD-supporting ARM machine wants to use it to test YIELD. Nevertheless, I think because we have clear documentation that strongly recommends to use it, and because it has been used in other use-cases such as linux kernel and JDK, we should start using YIELD for spin_delay() in ARM. Attached is the trivial patch (spin_delay_for_arm.patch). To start with, it contains changes only for aarch64. I haven't yet added changes in configure[.in] for making sure yield compiles successfully (YIELD is present in manuals from ARMv6 onwards). Before that I thought of getting some comments; so didn't do configure changes yet. [1] https://c9x.me/x86/html/file_module_x86_id_232.html [2] https://developer.arm.com/docs/100076/0100/instruction-set-reference/a64-general-instructions/yield [3] https://elixir.bootlin.com/linux/latest/source/arch/arm64/include/asm/processor.h#L259 [4] http://cr.openjdk.java.net/~dchuyko/8186670/yield/spinwait.html [5] http://mail.openjdk.java.net/pipermail/aarch64-port-dev/2017-August/004880.html -- Thanks, -Amit Khandekar Huawei Technologies
Attachment
pgsql-hackers by date: