Thread: Remove last traces of HPPA support
We removed support for the HP-UX OS in v16, but left in support for the PA-RISC architecture, mainly because I thought that its spinlock mechanism is weird enough to be a good stress test for our spinlock infrastructure. It still is that, but my one remaining HPPA machine has gone to the great recycle heap in the sky. There seems little point in keeping around nominal support for an architecture that we can't test and no one is using anymore. Hence, the attached removes the remaining support for HPPA. Any objections? regards, tom lane diff --git a/contrib/pgcrypto/crypt-blowfish.c b/contrib/pgcrypto/crypt-blowfish.c index 1264eccb3f..5a1b1e1009 100644 --- a/contrib/pgcrypto/crypt-blowfish.c +++ b/contrib/pgcrypto/crypt-blowfish.c @@ -41,7 +41,7 @@ #ifdef __i386__ #define BF_ASM 0 /* 1 */ #define BF_SCALE 1 -#elif defined(__x86_64__) || defined(__hppa__) +#elif defined(__x86_64__) #define BF_ASM 0 #define BF_SCALE 1 #else diff --git a/doc/src/sgml/installation.sgml b/doc/src/sgml/installation.sgml index f4b1f81189..3608aec595 100644 --- a/doc/src/sgml/installation.sgml +++ b/doc/src/sgml/installation.sgml @@ -3359,8 +3359,8 @@ export MANPATH <para> In general, <productname>PostgreSQL</productname> can be expected to work on - these CPU architectures: x86, PowerPC, S/390, SPARC, ARM, MIPS, RISC-V, - and PA-RISC, including + these CPU architectures: x86, PowerPC, S/390, SPARC, ARM, MIPS, + and RISC-V, including big-endian, little-endian, 32-bit, and 64-bit variants where applicable. It is often possible to build on an unsupported CPU type by configuring with @@ -3391,7 +3391,8 @@ export MANPATH <para> Historical versions of <productname>PostgreSQL</productname> or POSTGRES also ran on CPU architectures including Alpha, Itanium, M32R, M68K, - M88K, NS32K, SuperH, and VAX, and operating systems including 4.3BSD, BEOS, + M88K, NS32K, PA-RISC, SuperH, and VAX, + and operating systems including 4.3BSD, BEOS, BSD/OS, DG/UX, Dynix, HP-UX, IRIX, NeXTSTEP, QNX, SCO, SINIX, Sprite, SunOS, Tru64 UNIX, and ULTRIX. </para> diff --git a/src/backend/storage/lmgr/s_lock.c b/src/backend/storage/lmgr/s_lock.c index 327ac64f7c..0e3f04207c 100644 --- a/src/backend/storage/lmgr/s_lock.c +++ b/src/backend/storage/lmgr/s_lock.c @@ -110,12 +110,7 @@ s_lock(volatile slock_t *lock, const char *file, int line, const char *func) void s_unlock(volatile slock_t *lock) { -#ifdef TAS_ACTIVE_WORD - /* HP's PA-RISC */ - *TAS_ACTIVE_WORD(lock) = -1; -#else *lock = 0; -#endif } #endif diff --git a/src/include/port/atomics.h b/src/include/port/atomics.h index bbff945eba..f6f62d68c0 100644 --- a/src/include/port/atomics.h +++ b/src/include/port/atomics.h @@ -69,8 +69,6 @@ #include "port/atomics/arch-x86.h" #elif defined(__ppc__) || defined(__powerpc__) || defined(__ppc64__) || defined(__powerpc64__) #include "port/atomics/arch-ppc.h" -#elif defined(__hppa) || defined(__hppa__) -#include "port/atomics/arch-hppa.h" #endif /* diff --git a/src/include/port/atomics/arch-hppa.h b/src/include/port/atomics/arch-hppa.h deleted file mode 100644 index 4c89fbff71..0000000000 --- a/src/include/port/atomics/arch-hppa.h +++ /dev/null @@ -1,17 +0,0 @@ -/*------------------------------------------------------------------------- - * - * arch-hppa.h - * Atomic operations considerations specific to HPPA - * - * Portions Copyright (c) 1996-2023, PostgreSQL Global Development Group - * Portions Copyright (c) 1994, Regents of the University of California - * - * NOTES: - * - * src/include/port/atomics/arch-hppa.h - * - *------------------------------------------------------------------------- - */ - -/* HPPA doesn't do either read or write reordering */ -#define pg_memory_barrier_impl() pg_compiler_barrier_impl() diff --git a/src/include/port/atomics/fallback.h b/src/include/port/atomics/fallback.h index a9e8e77c03..d119e8cc50 100644 --- a/src/include/port/atomics/fallback.h +++ b/src/include/port/atomics/fallback.h @@ -75,11 +75,7 @@ typedef struct pg_atomic_flag * be content with just one byte instead of 4, but that's not too much * waste. */ -#if defined(__hppa) || defined(__hppa__) /* HP PA-RISC, GCC and HP compilers */ - int sema[4]; -#else int sema; -#endif volatile bool value; } pg_atomic_flag; @@ -93,11 +89,7 @@ typedef struct pg_atomic_flag typedef struct pg_atomic_uint32 { /* Check pg_atomic_flag's definition above for an explanation */ -#if defined(__hppa) || defined(__hppa__) /* HP PA-RISC */ - int sema[4]; -#else int sema; -#endif volatile uint32 value; } pg_atomic_uint32; @@ -111,11 +103,7 @@ typedef struct pg_atomic_uint32 typedef struct pg_atomic_uint64 { /* Check pg_atomic_flag's definition above for an explanation */ -#if defined(__hppa) || defined(__hppa__) /* HP PA-RISC */ - int sema[4]; -#else int sema; -#endif volatile uint64 value; } pg_atomic_uint64; diff --git a/src/include/storage/s_lock.h b/src/include/storage/s_lock.h index c9fa84cc43..e76e8b6888 100644 --- a/src/include/storage/s_lock.h +++ b/src/include/storage/s_lock.h @@ -530,71 +530,6 @@ do \ #endif /* __mips__ && !__sgi */ -#if defined(__hppa) || defined(__hppa__) /* HP PA-RISC */ -/* - * HP's PA-RISC - * - * Because LDCWX requires a 16-byte-aligned address, we declare slock_t as a - * 16-byte struct. The active word in the struct is whichever has the aligned - * address; the other three words just sit at -1. - */ -#define HAS_TEST_AND_SET - -typedef struct -{ - int sema[4]; -} slock_t; - -#define TAS_ACTIVE_WORD(lock) ((volatile int *) (((uintptr_t) (lock) + 15) & ~15)) - -static __inline__ int -tas(volatile slock_t *lock) -{ - volatile int *lockword = TAS_ACTIVE_WORD(lock); - int lockval; - - /* - * The LDCWX instruction atomically clears the target word and - * returns the previous value. Hence, if the instruction returns - * 0, someone else has already acquired the lock before we tested - * it (i.e., we have failed). - * - * Notice that this means that we actually clear the word to set - * the lock and set the word to clear the lock. This is the - * opposite behavior from the SPARC LDSTUB instruction. For some - * reason everything that H-P does is rather baroque... - * - * For details about the LDCWX instruction, see the "Precision - * Architecture and Instruction Reference Manual" (09740-90014 of June - * 1987), p. 5-38. - */ - __asm__ __volatile__( - " ldcwx 0(0,%2),%0 \n" -: "=r"(lockval), "+m"(*lockword) -: "r"(lockword) -: "memory"); - return (lockval == 0); -} - -#define S_UNLOCK(lock) \ - do { \ - __asm__ __volatile__("" : : : "memory"); \ - *TAS_ACTIVE_WORD(lock) = -1; \ - } while (0) - -#define S_INIT_LOCK(lock) \ - do { \ - volatile slock_t *lock_ = (lock); \ - lock_->sema[0] = -1; \ - lock_->sema[1] = -1; \ - lock_->sema[2] = -1; \ - lock_->sema[3] = -1; \ - } while (0) - -#define S_LOCK_FREE(lock) (*TAS_ACTIVE_WORD(lock) != 0) - -#endif /* __hppa || __hppa__ */ - /* * If we have no platform-specific knowledge, but we found that the compiler diff --git a/src/tools/pginclude/cpluspluscheck b/src/tools/pginclude/cpluspluscheck index 4e09c4686b..96852ef75f 100755 --- a/src/tools/pginclude/cpluspluscheck +++ b/src/tools/pginclude/cpluspluscheck @@ -84,12 +84,10 @@ do # Likewise, these files are platform-specific, and the one # relevant to our platform will be included by atomics.h. test "$f" = src/include/port/atomics/arch-arm.h && continue - test "$f" = src/include/port/atomics/arch-hppa.h && continue test "$f" = src/include/port/atomics/arch-ppc.h && continue test "$f" = src/include/port/atomics/arch-x86.h && continue test "$f" = src/include/port/atomics/fallback.h && continue test "$f" = src/include/port/atomics/generic.h && continue - test "$f" = src/include/port/atomics/generic-acc.h && continue test "$f" = src/include/port/atomics/generic-gcc.h && continue test "$f" = src/include/port/atomics/generic-msvc.h && continue test "$f" = src/include/port/atomics/generic-sunpro.h && continue diff --git a/src/tools/pginclude/headerscheck b/src/tools/pginclude/headerscheck index 8dee1b5670..0b9b9740f4 100755 --- a/src/tools/pginclude/headerscheck +++ b/src/tools/pginclude/headerscheck @@ -79,12 +79,10 @@ do # Likewise, these files are platform-specific, and the one # relevant to our platform will be included by atomics.h. test "$f" = src/include/port/atomics/arch-arm.h && continue - test "$f" = src/include/port/atomics/arch-hppa.h && continue test "$f" = src/include/port/atomics/arch-ppc.h && continue test "$f" = src/include/port/atomics/arch-x86.h && continue test "$f" = src/include/port/atomics/fallback.h && continue test "$f" = src/include/port/atomics/generic.h && continue - test "$f" = src/include/port/atomics/generic-acc.h && continue test "$f" = src/include/port/atomics/generic-gcc.h && continue test "$f" = src/include/port/atomics/generic-msvc.h && continue test "$f" = src/include/port/atomics/generic-sunpro.h && continue
On Thu, Oct 19, 2023 at 11:16:28AM -0400, Tom Lane wrote: > We removed support for the HP-UX OS in v16, but left in support > for the PA-RISC architecture, mainly because I thought that its > spinlock mechanism is weird enough to be a good stress test > for our spinlock infrastructure. It still is that, but my > one remaining HPPA machine has gone to the great recycle heap > in the sky. There seems little point in keeping around nominal > support for an architecture that we can't test and no one is > using anymore. Looks OK for the C parts. > Hence, the attached removes the remaining support for HPPA. > Any objections? Would a refresh of config/config.guess and config/config.sub be suited? This stuff still has references to HPPA. -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > On Thu, Oct 19, 2023 at 11:16:28AM -0400, Tom Lane wrote: >> Hence, the attached removes the remaining support for HPPA. >> Any objections? > Would a refresh of config/config.guess and config/config.sub be > suited? This stuff still has references to HPPA. AFAIK we just absorb those files verbatim from upstream. There is plenty of stuff in them for systems we don't support; it's not worth trying to clean that out. regards, tom lane
On Thu, Oct 19, 2023 at 11:16:28AM -0400, Tom Lane wrote: > We removed support for the HP-UX OS in v16, but left in support > for the PA-RISC architecture, mainly because I thought that its > spinlock mechanism is weird enough to be a good stress test > for our spinlock infrastructure. It still is that, but my > one remaining HPPA machine has gone to the great recycle heap > in the sky. There seems little point in keeping around nominal > support for an architecture that we can't test and no one is > using anymore. > > Hence, the attached removes the remaining support for HPPA. > Any objections? I wouldn't do this. NetBSD/hppa still claims to exist, as does the OpenBSD equivalent. I presume its pkgsrc compiles this code. The code is basically zero-maintenance, so there's not much to gain from deleting it preemptively.
Noah Misch <noah@leadboat.com> writes: > On Thu, Oct 19, 2023 at 11:16:28AM -0400, Tom Lane wrote: >> Hence, the attached removes the remaining support for HPPA. > I wouldn't do this. NetBSD/hppa still claims to exist, as does the OpenBSD > equivalent. I presume its pkgsrc compiles this code. The code is basically > zero-maintenance, so there's not much to gain from deleting it preemptively. I doubt it: I don't think anyone is routinely building very much of pkgsrc for backwater hardware like HPPA, on either distro. It takes too much time (as cross-build doesn't work IME) and there are too few potential users. I certainly had to build all my own packages during my experiments with running those systems on my machine. Moreover, if they are compiling it they aren't testing it. I filed a pile of bugs against NetBSD kernel and toolchains on the way to getting the late lamented chickadee animal running. While it was pretty much working when I retired chickadee, it was obviously ground that nobody else had trodden in a long time. As for OpenBSD, while I did have a working installation of 6.4 at one time, I completely failed to get 7.1 running on that hardware. I think it's maintained only for very small values of "maintained". Lastly, even when they're working those systems are about half the speed of HP-UX on the same hardware; and even when using HP-UX there is no HPPA hardware that's not insanely slow by modern standards. I can't believe that anyone would want to run modern PG on that stack, and I don't believe that anyone but me has tried in a long time. regards, tom lane
On Fri, Oct 20, 2023 at 4:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > We removed support for the HP-UX OS in v16, but left in support > for the PA-RISC architecture, mainly because I thought that its > spinlock mechanism is weird enough to be a good stress test > for our spinlock infrastructure. It still is that, but my > one remaining HPPA machine has gone to the great recycle heap > in the sky. There seems little point in keeping around nominal > support for an architecture that we can't test and no one is > using anymore. > > Hence, the attached removes the remaining support for HPPA. +1
I wrote: > Noah Misch <noah@leadboat.com> writes: >> On Thu, Oct 19, 2023 at 11:16:28AM -0400, Tom Lane wrote: >>> Hence, the attached removes the remaining support for HPPA. >> I wouldn't do this. NetBSD/hppa still claims to exist, as does the OpenBSD >> equivalent. I presume its pkgsrc compiles this code. The code is basically >> zero-maintenance, so there's not much to gain from deleting it preemptively. > I doubt it: I don't think anyone is routinely building very much of > pkgsrc for backwater hardware like HPPA, on either distro. I dug a bit further on this point. The previous discussion about our policy for old-hardware support was here: https://www.postgresql.org/message-id/flat/959917.1657522169%40sss.pgh.pa.us#47f7af4817dc8dc0d8901d1ee965971e The existence of a NetBSD/sh3el package for Postgres didn't stop us from dropping SuperH support. Moreover, the page showing the existence of that package: https://ftp.netbsd.org/pub/pkgsrc/current/pkgsrc/databases/postgresql14-server/index.html also shows a build for VAX, which we know positively would not have passed regression tests, so they certainly weren't testing those builds. (And, to the point here, it does *not* show any build for hppa.) The bottom line, though, is that IMV we agreed in that thread to a policy that no architecture will be considered supported unless it has a representative in the buildfarm. We've since enforced that policy in the case of loongarch64, so it seems established. With my HPPA animal gone, and nobody very likely to step up with a replacement, HPPA no longer meets that threshold requirement. regards, tom lane
Hi, On 2023-10-19 17:23:04 -0700, Noah Misch wrote: > On Thu, Oct 19, 2023 at 11:16:28AM -0400, Tom Lane wrote: > > We removed support for the HP-UX OS in v16, but left in support > > for the PA-RISC architecture, mainly because I thought that its > > spinlock mechanism is weird enough to be a good stress test > > for our spinlock infrastructure. It still is that, but my > > one remaining HPPA machine has gone to the great recycle heap > > in the sky. There seems little point in keeping around nominal > > support for an architecture that we can't test and no one is > > using anymore. > > > > Hence, the attached removes the remaining support for HPPA. > > Any objections? > > I wouldn't do this. NetBSD/hppa still claims to exist, as does the OpenBSD > equivalent. I presume its pkgsrc compiles this code. The code is basically > zero-maintenance, so there's not much to gain from deleting it preemptively. In addition to the point Tom has made, I think it's also not correct that hppa doesn't impose a burden: hppa is the only of our architectures that doesn't actually support atomic operations, requiring us to have infrastructure to backfill atomics using spinlocks. This does preclude some uses of atomics, e.g. in signal handlers - I think Thomas wanted to do so for some concurrency primitive. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > In addition to the point Tom has made, I think it's also not correct that hppa > doesn't impose a burden: hppa is the only of our architectures that doesn't > actually support atomic operations, requiring us to have infrastructure to > backfill atomics using spinlocks. This does preclude some uses of atomics, > e.g. in signal handlers - I think Thomas wanted to do so for some concurrency > primitive. Hmm, are you saying there's more of port/atomics/ that could be removed? What exactly? Do we really want to assume that all future architectures will have atomic operations? regards, tom lane
Hi, On 2023-10-20 15:59:42 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > In addition to the point Tom has made, I think it's also not correct that hppa > > doesn't impose a burden: hppa is the only of our architectures that doesn't > > actually support atomic operations, requiring us to have infrastructure to > > backfill atomics using spinlocks. This does preclude some uses of atomics, > > e.g. in signal handlers - I think Thomas wanted to do so for some concurrency > > primitive. > > Hmm, are you saying there's more of port/atomics/ that could be > removed? What exactly? I was thinking we could remove the whole fallback path for atomic operations, but it's a bit less, because we likely don't want to mandate support for 64bit atomics yet. That'd still allow removing more than half of src/include/port/atomics/fallback.h and src/backend/port/atomics.c - and more if we finally decided to require a spinlock implementation. > Do we really want to assume that all future architectures will have atomic > operations? Yes. Outside of the tiny microcontrollers, which obviously won't run postgres, I cannot see any future architecture not having support for atomic operations. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2023-10-20 15:59:42 -0400, Tom Lane wrote: >> Hmm, are you saying there's more of port/atomics/ that could be >> removed? What exactly? > I was thinking we could remove the whole fallback path for atomic operations, > but it's a bit less, because we likely don't want to mandate support for 64bit > atomics yet. Yeah. That'd be tantamount to desupporting 32-bit arches altogether, I think. I'm not ready to go there yet. > That'd still allow removing more than half of > src/include/port/atomics/fallback.h and src/backend/port/atomics.c - and more > if we finally decided to require a spinlock implementation. In the wake of 1c72d82c2, it seems likely that requiring some kind of spinlock implementation is not such a big lift. Certainly, a machine without that hasn't been a fit target for production in a very long time, so maybe we should just drop that semaphore-based emulation. >> Do we really want to assume that all future architectures will have atomic >> operations? > Yes. Outside of the tiny microcontrollers, which obviously won't run postgres, > I cannot see any future architecture not having support for atomic operations. I'd like to refine what that means a bit more. Are we assuming that a machine providing any of the gcc atomic intrinsics (of a given width) will provide all of them? Or is there a specific subset that we can emulate the rest on top of? regards, tom lane
Hi, On 2023-10-20 17:46:59 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2023-10-20 15:59:42 -0400, Tom Lane wrote: > >> Hmm, are you saying there's more of port/atomics/ that could be > >> removed? What exactly? > > > I was thinking we could remove the whole fallback path for atomic operations, > > but it's a bit less, because we likely don't want to mandate support for 64bit > > atomics yet. > > Yeah. That'd be tantamount to desupporting 32-bit arches altogether, > I think. I'm not ready to go there yet. It shouldn't be tantamount to that - many 32bit archs support 64bit atomic operations. E.g. x86 supported it since the 586 (in 1993). However, arm only addded them to 32 bit, in an extension, comparatively recently... > > That'd still allow removing more than half of > > src/include/port/atomics/fallback.h and src/backend/port/atomics.c - and more > > if we finally decided to require a spinlock implementation. > > In the wake of 1c72d82c2, it seems likely that requiring some kind of > spinlock implementation is not such a big lift. Certainly, a machine > without that hasn't been a fit target for production in a very long > time, so maybe we should just drop that semaphore-based emulation. Yep. And the performance drop due to not having spinlock is also getting worse over time, with CPU bound workloads having become a lot more common due to larger amounts of memory and much much faster IO. > >> Do we really want to assume that all future architectures will have atomic > >> operations? > > > Yes. Outside of the tiny microcontrollers, which obviously won't run postgres, > > I cannot see any future architecture not having support for atomic operations. > > I'd like to refine what that means a bit more. Are we assuming that a > machine providing any of the gcc atomic intrinsics (of a given width) will > provide all of them? Or is there a specific subset that we can emulate the > rest on top of? Right now we don't require that. As long as we know how to do atomic compare exchange, we backfill all other atomic operations using compare-exchange - albeit less efficiently (there's no retries for atomic-add when implemented directly, but there are retries when using cmpxchg, the difference can be significant under contention). Practically speaking I think it's quite unlikely that a compiler + arch combination will have only some intrinsics of some width - I think all compilers have infrastructure to fall back to compare-exchange when there's no dedicated atomic operation for some intrinsic. Greetings, Andres Freund
On Fri, Oct 20, 2023 at 12:40:00PM -0700, Andres Freund wrote: > On 2023-10-19 17:23:04 -0700, Noah Misch wrote: > > On Thu, Oct 19, 2023 at 11:16:28AM -0400, Tom Lane wrote: > > > We removed support for the HP-UX OS in v16, but left in support > > > for the PA-RISC architecture, mainly because I thought that its > > > spinlock mechanism is weird enough to be a good stress test > > > for our spinlock infrastructure. It still is that, but my > > > one remaining HPPA machine has gone to the great recycle heap > > > in the sky. There seems little point in keeping around nominal > > > support for an architecture that we can't test and no one is > > > using anymore. > > > > > > Hence, the attached removes the remaining support for HPPA. > > > Any objections? > > > > I wouldn't do this. NetBSD/hppa still claims to exist, as does the OpenBSD > > equivalent. I presume its pkgsrc compiles this code. The code is basically > > zero-maintenance, so there's not much to gain from deleting it preemptively. > > In addition to the point Tom has made, I think it's also not correct that hppa > doesn't impose a burden: hppa is the only of our architectures that doesn't > actually support atomic operations, requiring us to have infrastructure to > backfill atomics using spinlocks. This does preclude some uses of atomics, > e.g. in signal handlers - I think Thomas wanted to do so for some concurrency > primitive. If the next thing is a patch removing half of the fallback atomics, that is a solid reason to remove hppa. The code removed in the last proposed patch was not that and was code that never changes, hence my reaction.
Noah Misch <noah@leadboat.com> writes: > If the next thing is a patch removing half of the fallback atomics, that is a > solid reason to remove hppa. Agreed, though I don't think we have a clear proposal as to what else to remove. > The code removed in the last proposed patch was > not that and was code that never changes, hence my reaction. Mmm ... I'd agree that the relevant stanzas of s_lock.h/.c haven't changed in a long time, but port/atomics/ is of considerably newer vintage and is still receiving a fair amount of churn. Moreover, much of what I proposed to remove from there is HPPA-only code with exactly no parallel in other arches (specifically, the bits in atomics/fallback.h). So I don't feel comfortable that it will continue to work without benefit of testing. We're taking a risk just hoping that it will continue to work in the back branches until they hit EOL. Expecting that it'll continue to work going forward, sans testing, seems like the height of folly. regards, tom lane
Hi, On 2023-10-20 22:06:55 -0400, Tom Lane wrote: > Noah Misch <noah@leadboat.com> writes: > > If the next thing is a patch removing half of the fallback atomics, that is a > > solid reason to remove hppa. > > Agreed, though I don't think we have a clear proposal as to what > else to remove. > > > The code removed in the last proposed patch was > > not that and was code that never changes, hence my reaction. > > Mmm ... I'd agree that the relevant stanzas of s_lock.h/.c haven't > changed in a long time, but port/atomics/ is of considerably newer > vintage and is still receiving a fair amount of churn. Moreover, > much of what I proposed to remove from there is HPPA-only code with > exactly no parallel in other arches (specifically, the bits in > atomics/fallback.h). So I don't feel comfortable that it will > continue to work without benefit of testing. We're taking a risk > just hoping that it will continue to work in the back branches until > they hit EOL. Expecting that it'll continue to work going forward, > sans testing, seems like the height of folly. It'd be one thing to continue supporting an almost-guaranteed-to-be-unused platform, if we expected it to become more popular or complete enough to be usable like e.g. risc-v a few years ago. But I doubt we'll find anybody out there believing that there's a potential future upward trend for HPPA. IMO a single person looking at HPPA code for a few minutes is a cost that more than outweighs the potential benefits of continuing "supporting" this dead arch. Even code that doesn't need to change has costs, particularly if it's intermingled with actually important code (which spinlocks certainly are). Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > It'd be one thing to continue supporting an almost-guaranteed-to-be-unused > platform, if we expected it to become more popular or complete enough to be > usable like e.g. risc-v a few years ago. But I doubt we'll find anybody out > there believing that there's a potential future upward trend for HPPA. Indeed. I would have bet that Postgres on HPPA was extinct in the wild, until I noticed this message a few days ago: https://www.postgresql.org/message-id/BYAPR02MB42624ED41C15BFA82DAE2C359BD5A%40BYAPR02MB4262.namprd02.prod.outlook.com But we already cut that user off at the knees by removing HP-UX support. The remaining argument for worrying about this architecture being in use in the field is the idea that somebody is using it on top of NetBSD or OpenBSD. But having used both of those systems (or tried to), I feel absolutely confident in asserting that nobody is using it in production today, let alone hoping to continue using it. > IMO a single person looking at HPPA code for a few minutes is a cost that more > than outweighs the potential benefits of continuing "supporting" this dead > arch. Even code that doesn't need to change has costs, particularly if it's > intermingled with actually important code (which spinlocks certainly are). Yup, that. It's not zero cost to carry this stuff. regards, tom lane
Hi, On October 20, 2023 11:18:19 PM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote: >Andres Freund <andres@anarazel.de> writes: >> It'd be one thing to continue supporting an almost-guaranteed-to-be-unused >> platform, if we expected it to become more popular or complete enough to be >> usable like e.g. risc-v a few years ago. But I doubt we'll find anybody out >> there believing that there's a potential future upward trend for HPPA. > >Indeed. I would have bet that Postgres on HPPA was extinct in the wild, >until I noticed this message a few days ago: > >https://www.postgresql.org/message-id/BYAPR02MB42624ED41C15BFA82DAE2C359BD5A%40BYAPR02MB4262.namprd02.prod.outlook.com > >But we already cut that user off at the knees by removing HP-UX support. Not that it matters really, but I'd assume that was hpux on ia64, not hppa? Greetings, Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Andres Freund <andres@anarazel.de> writes: > On October 20, 2023 11:18:19 PM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Indeed. I would have bet that Postgres on HPPA was extinct in the wild, >> until I noticed this message a few days ago: >> https://www.postgresql.org/message-id/BYAPR02MB42624ED41C15BFA82DAE2C359BD5A%40BYAPR02MB4262.namprd02.prod.outlook.com >> But we already cut that user off at the knees by removing HP-UX support. > Not that it matters really, but I'd assume that was hpux on ia64, not hppa? Hmm, maybe ... impossible to tell from the given information, but ia64 was at least still in production till recently, so you might be right. In any case, I heard no bleating when we nuked ia64 support. regards, tom lane
On Sat, Oct 21, 2023 at 02:18:19AM -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > It'd be one thing to continue supporting an almost-guaranteed-to-be-unused > > platform, if we expected it to become more popular or complete enough to be > > usable like e.g. risc-v a few years ago. But I doubt we'll find anybody out > > there believing that there's a potential future upward trend for HPPA. > > Indeed. I would have bet that Postgres on HPPA was extinct in the wild, > until I noticed this message a few days ago: > > https://www.postgresql.org/message-id/BYAPR02MB42624ED41C15BFA82DAE2C359BD5A%40BYAPR02MB4262.namprd02.prod.outlook.com > > But we already cut that user off at the knees by removing HP-UX support. > > The remaining argument for worrying about this architecture being in > use in the field is the idea that somebody is using it on top of > NetBSD or OpenBSD. But having used both of those systems (or tried > to), I feel absolutely confident in asserting that nobody is using > it in production today, let alone hoping to continue using it. > > > IMO a single person looking at HPPA code for a few minutes is a cost that more > > than outweighs the potential benefits of continuing "supporting" this dead > > arch. Even code that doesn't need to change has costs, particularly if it's > > intermingled with actually important code (which spinlocks certainly are). > > Yup, that. It's not zero cost to carry this stuff. +1 for dropping it.
Noah Misch <noah@leadboat.com> writes: > On Sat, Oct 21, 2023 at 02:18:19AM -0400, Tom Lane wrote: >> Andres Freund <andres@anarazel.de> writes: >>> IMO a single person looking at HPPA code for a few minutes is a cost that more >>> than outweighs the potential benefits of continuing "supporting" this dead >>> arch. Even code that doesn't need to change has costs, particularly if it's >>> intermingled with actually important code (which spinlocks certainly are). >> Yup, that. It's not zero cost to carry this stuff. > +1 for dropping it. Done at commit edadeb0710. regards, tom lane
On Tue, Jul 2, 2024 at 5:56 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Done at commit edadeb0710. Here are some experimental patches to try out some ideas mentioned upthread, that are approximately unlocked by that cleanup. 1. We could get rid of --disable-spinlocks. It is hard to imagine a hypothetical new port that would actually be a useful place to run PostgreSQL where you can't implement spinlocks. (This one isn't exactly unlocked by PA-RISC's departure, it's just tangled up with the relevant cruft.) 2. We could get rid of --disable-atomics, and require at least 32 bit lock-free (non-emulated) atomics. AFAIK there are no relevant systems that don't have them. Hypothetical new systems would be unlikely to omit them, unless they are eg embedded systems that don't intend to be able to run an OS. Personally I would like to do this, because I'd like to be able to use pg_atomic_fetch_or_u32() in a SIGALRM handler in my latchify-all-the-things patch (a stepping stone in the multi-threading project as discussed in the Vancouver unconference). That's not allowed if it might be a locking fallback. It's not strictly necessary for my project, and I could find another way if I have to, but when contemplating doing extra work to support imaginary computers that I don't truly believe in... and since this general direction was suggested already, both on this thread and in the comments in the tree... Once you decide to do #2, ie require atomics, perhaps you could also implement spinlocks with them, rendering point #1 moot, and delete all that hand-rolled TAS stuff. (Then you'd have spinlocks implemented with flag/u32 atomics APIs, but potentially also u64 atomics implemented with spinlocks! Circular, but not impossible AFAICT. Assuming we can't require 64 bit lock-free atomics any time soon that is, not considered). 🤯🤯🤯But maybe there are still good reasons to have hand-rolled specialisations in some cases? I have not researched that idea and eg compared the generated instructions... I do appreciate that that code reflects a lot of accumulated wisdom and experience that I don't claim to possess, and this bit is vapourware anyway. 3. While tinkering with the above, and contemplating contact with hypothetical future systems and even existing oddball systems, it practically suggests itself that we could allow <stdatomic.h> as a way of providing atomics (port/atomics.h clearly anticipated that, it was just too soon). Note: that's different from requiring C11, but it means that the new rule would be that your system should have *either* C11 <stdatomic.h> or a hand-rolled implementation in port/atomics/*.h. This is not a proposal, just an early stage experiment to test the waters! Some early thoughts about that, not fully explored: * Since C11 uses funky generics, perhaps we might want to add some type checks to make sure you don't accidentally confuse u32 and u64 somewhere. * I couldn't immediately see how to use the standard atomic_flag for our stuff due to lack of relaxed load, so it's falling back to the generic u32 implementation (a small waste of space). atomic_bool or atomic_char should work fine though, not tried. I guess pg_atomic_flag might be a minor historical mistake, assuming it was supposed to be just like the standard type of the same name. Or maybe I'm missing something. * The pg_spin_delay_impl() part definitely still needs hand-rolled magic still when using <stdatomic.h> (I'm not aware of any standard way to do that). But I'm not sure it even belongs in the "atomics" headers anyway? It's not the same kind of thing, is it? * The comments seem to imply that we need to tell the compiler not to generate any code for read/write barriers on TSO systems (compiler barrier only), but AFAICS the right thing happens anyway when coded as standard acquire/release barriers. x86: nothing. ARM: something. What am I missing? * It'd be interesting to learn about anything else that modern tool chains might do worse than our hand-rolled wisdom. * Special support for Sun's compiler could be dropped if we could just use their <stdatomic.h>. The same applies for MSVC 2022+ AFAICS, so maybe in ~3 years from now we could drop the Windows-specific code. * Uhh, yeah, so that would also apply to any modern GCC/Clang, so in effect everyone would be using <stdatomic.h> except any hand-rolled special bits that we decide to keep for performance reasons, and the rest would become dead code and liable for garbage collection. So that would amount to a confusing policy like: "we require <stdatomic.h> with at least lock-free int in practice, but we'd consider patches to add a non-C11-way to do this stuff if you invent a new kind of computer/toolchain and refuse to support C11". Hmm. (I have another version of this type of thinking happening in another pending patch, the pg_threads.h one, more on that shortly...)
Attachment
Thomas Munro <thomas.munro@gmail.com> writes: > Here are some experimental patches to try out some ideas mentioned > upthread, that are approximately unlocked by that cleanup. FWIW, I'm good with getting rid of --disable-spinlocks and --disable-atomics. That's a fair amount of code and needing to support it causes problems, as you say. I am very much less excited about ripping out our spinlock and/or atomics code in favor of <stdatomic.h>; I just don't see the gain there, and I do see risk in ceding control of the semantics and performance of those primitives. regards, tom lane
On Wed, Jul 3, 2024 at 8:09 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@gmail.com> writes: > > Here are some experimental patches to try out some ideas mentioned > > upthread, that are approximately unlocked by that cleanup. > > FWIW, I'm good with getting rid of --disable-spinlocks and > --disable-atomics. That's a fair amount of code and needing to > support it causes problems, as you say. I am very much less > excited about ripping out our spinlock and/or atomics code in favor > of <stdatomic.h>; I just don't see the gain there, and I do see risk > in ceding control of the semantics and performance of those > primitives. OK, <stdatomic.h> part on ice for now. Here's an update of the rest, this time also removing the barrier fallbacks as discussed in the LTO thread[1]. I guess we should also consider reimplementing the spinlock on the atomic API, but I can see that Andres is poking at spinlock code right now so I'll keep out of his way... Side issue: I noticed via CI failure when I tried to require read/write barriers to be provided (a choice I backed out of), that on MSVC we seem to be using the full memory barrier fallback for those. Huh? For x86, I think they should be using pg_compiler_barrier() (no code gen, just prevent reordering), not pg_pg_memory_barrier(), no? Perhaps I'm missing something but I suspect we might be failing to include arch-x86.h on that compiler when we should... maybe it needs to detect _M_AMD64 too? For ARM, from a quick look, the only way to reach real acquire/release barriers seems to be to use the C11 interface (which would also be fine on x86 where it should degrade to a no-op compiler barrier or signal fence as the standard calls it), but IIRC the Windows/ARM basics haven't gone in yet anyway. [1] https://www.postgresql.org/message-id/flat/721bf39a-ed8a-44b0-8b8e-be3bd81db748%40technowledgy.de#66ba381b05e8ee08b11503b846acc4a1
Attachment
On 30/07/2024 00:50, Thomas Munro wrote: > On Wed, Jul 3, 2024 at 8:09 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Thomas Munro <thomas.munro@gmail.com> writes: >>> Here are some experimental patches to try out some ideas mentioned >>> upthread, that are approximately unlocked by that cleanup. >> >> FWIW, I'm good with getting rid of --disable-spinlocks and >> --disable-atomics. That's a fair amount of code and needing to >> support it causes problems, as you say. I am very much less >> excited about ripping out our spinlock and/or atomics code in favor >> of <stdatomic.h>; I just don't see the gain there, and I do see risk >> in ceding control of the semantics and performance of those >> primitives. > > OK, <stdatomic.h> part on ice for now. Here's an update of the rest, > this time also removing the barrier fallbacks as discussed in the LTO > thread[1]. Looks good to me. > I guess we should also consider reimplementing the spinlock on the > atomic API, but I can see that Andres is poking at spinlock code right > now so I'll keep out of his way... > > Side issue: I noticed via CI failure when I tried to require > read/write barriers to be provided (a choice I backed out of), that on > MSVC we seem to be using the full memory barrier fallback for those. > Huh? For x86, I think they should be using pg_compiler_barrier() (no > code gen, just prevent reordering), not pg_pg_memory_barrier(), no? Agreed, arch-x86.h is quite clear on that. > Perhaps I'm missing something but I suspect we might be failing to > include arch-x86.h on that compiler when we should... maybe it needs > to detect _M_AMD64 too? Aha, yes I think that's it. Apparently, __x86_64__ is not defined on MSVC. To prove that, I added garbage to the "#ifdef __x86_64__" guarded block in atomics.h. The compilation passes on MSVC, but not on other platforms: https://cirrus-ci.com/build/6310061188841472. That means that we're not getting the x86-64 instructions in src/port/pg_crc32c_sse42.c on MSVC either. I think we should do: #ifdef _M_AMD64 #define __x86_64__ #endif somewhere, perhaps in src/include/port/win32.h. -- Heikki Linnakangas Neon (https://neon.tech)
On Tue, Jul 30, 2024 at 11:16 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > On 30/07/2024 00:50, Thomas Munro wrote: > > On Wed, Jul 3, 2024 at 8:09 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Thomas Munro <thomas.munro@gmail.com> writes: > > OK, <stdatomic.h> part on ice for now. Here's an update of the rest, > > this time also removing the barrier fallbacks as discussed in the LTO > > thread[1]. > > Looks good to me. Thanks. I'll wait just a bit longer to see if anyone else has comments. > > Perhaps I'm missing something but I suspect we might be failing to > > include arch-x86.h on that compiler when we should... maybe it needs > > to detect _M_AMD64 too? > > Aha, yes I think that's it. Apparently, __x86_64__ is not defined on > MSVC. To prove that, I added garbage to the "#ifdef __x86_64__" guarded > block in atomics.h. The compilation passes on MSVC, but not on other > platforms: https://cirrus-ci.com/build/6310061188841472. > > That means that we're not getting the x86-64 instructions in > src/port/pg_crc32c_sse42.c on MSVC either. > > I think we should do: > > #ifdef _M_AMD64 > #define __x86_64__ > #endif > > somewhere, perhaps in src/include/port/win32.h. Hmm. I had come up with the opposite solution, because we already tested for _M_AMD64 explicitly elsewhere, and also I was thinking we would back-patch, and I don't want to cause problems for external code that thinks that __x86_64__ implies it can bust out some GCC inline assembler or something. But I don't have a strong opinion, your idea is certainly simpler to implement and I also wouldn't mind much if we just fixed it in master only, for fear of subtle breakage... Same problem probably exists for i386. I don't think CI, build farm or the EDB packaging team do 32 bit Windows, so that makes it a little hard to know if your blind code changes have broken or fixed anything... on the other hand it's pretty simple... I wondered if the pre-Meson system might have somehow defined __x86_64__, but I'm not seeing it. Commit b64d92f1a56 explicitly mentions that it was tested on MSVC, so I guess maybe it was just always "working" but not quite taking the intended code paths? Funny though, that code that calls _mm_pause() on AMD64 or the __asm thing that only works on i386 doesn't look like blind code to me. Curious.
Attachment
On Tue, Jul 30, 2024 at 12:39 PM Thomas Munro <thomas.munro@gmail.com> wrote: > On Tue, Jul 30, 2024 at 11:16 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > Looks good to me. > > Thanks. I'll wait just a bit longer to see if anyone else has comments. And pushed. I am aware of a couple of build farm animals that will now fail because they deliberately test --disable-spinlocks: francolin and rorqual, which will need adjustment or retirement on master. I'll watch out for other surprises on the farm...
On Tue, Jul 30, 2024 at 9:50 AM Thomas Munro <thomas.munro@gmail.com> wrote: > I guess we should also consider reimplementing the spinlock on the > atomic API, but I can see that Andres is poking at spinlock code right > now so I'll keep out of his way... Here is a first attempt at that. I haven't compared the generated asm yet, but it seems to work OK. I solved some mysteries (or probably just rediscovered things that others already knew) along the way: 1. The reason we finished up with OK-looking MSVC atomics code that was probably never actually reachable might be that it was copied-and-pasted from the spinlock code. This patch de-duplicates that (and much more). 2. The pg_atomic_unlocked_test_flag() function was surprising to me: it returns true if it's not currently set (according to a relaxed load). Most of this patch was easy, but figuring out that I had reverse polarity here was a multi-coffee operation :-) I can't call it wrong though, as it's not based on <stdatomic.h>, and it's clearly documented, so *shrug*. 3. As for why we have a function that <stdatomic.h> doesn't, I speculate that it might have been intended for implementing this exact patch, ie wanting to perform that relaxed load while spinning as recommended by Intel. (If we strictly had to use <stdatomic.h> functions, we couldn't use atomic_flag due to the lack of a relaxed load operation on that type, so we'd probably have to use atomic_char instead. Perhaps one day we will cross that bridge.) 4. Another reason would be that you need it to implement SpinLockFree() and S_LOCK_FREE(). They don't seem to have had any real callers since the beginning of open source PostgreSQL!, except for a test of limited value in a new world without ports developing their own spinlock code. Let's remove them! I see this was already threatened by Andres in 3b37a6de. Archeological notes: I went back further and found that POSTGRES 4.2 used them only twice for assertions. These S_LOCK() etc interfaces seem to derive from Dynix's parallel programming library, but it didn't have S_LOCK_FREE() either. It looks like the Berkeley guys added _FREE() for *internal* use when dealing with PA-RISC, where free spinlocks were non-zero, but we later developed a different way of dealing with that.
Attachment
On 31/07/2024 08:52, Thomas Munro wrote: > On Tue, Jul 30, 2024 at 9:50 AM Thomas Munro <thomas.munro@gmail.com> wrote: >> I guess we should also consider reimplementing the spinlock on the >> atomic API, but I can see that Andres is poking at spinlock code right >> now so I'll keep out of his way... > > Here is a first attempt at that. Looks good, thanks! > I haven't compared the generated asm yet, but it seems to work OK. The old __i386__ implementation of TAS() said: > * When this was last tested, we didn't have separate TAS() and TAS_SPIN() > * macros. Nowadays it probably would be better to do a non-locking test > * in TAS_SPIN() but not in TAS(), like on x86_64, but no-one's done the > * testing to verify that. Without some empirical evidence, better to > * leave it alone. It seems that you did what the comment suggested. That seems fine. For sake of completeness, if someone has an i386 machine lying around, it would be nice to verify that. Or an official CPU manufacturer's implementation guide, or references to other implementations or something. > 2. The pg_atomic_unlocked_test_flag() function was surprising to me: > it returns true if it's not currently set (according to a relaxed > load). Most of this patch was easy, but figuring out that I had > reverse polarity here was a multi-coffee operation :-) I can't call > it wrong though, as it's not based on <stdatomic.h>, and it's clearly > documented, so *shrug*. Huh, yeah that's unexpected. > 3. As for why we have a function that <stdatomic.h> doesn't, I > speculate that it might have been intended for implementing this exact > patch, ie wanting to perform that relaxed load while spinning as > recommended by Intel. (If we strictly had to use <stdatomic.h> > functions, we couldn't use atomic_flag due to the lack of a relaxed > load operation on that type, so we'd probably have to use atomic_char > instead. Perhaps one day we will cross that bridge.) As a side note, I remember when I've tried to use pg_atomic_flag in the past, I wanted to do an atomic compare-and-exchange on it, to clear the value and return the old value. Surprisingly, there's no function to do that. There's pg_atomic_test_set_flag(), but no pg_atomic_test_clear_flag(). C11 has both "atomic_flag" and "atomic_bool", and I guess what I actually wanted was atomic_bool. > - * On platforms with weak memory ordering, the TAS(), TAS_SPIN(), and > - * S_UNLOCK() macros must further include hardware-level memory fence > - * instructions to prevent similar re-ordering at the hardware level. > - * TAS() and TAS_SPIN() must guarantee that loads and stores issued after > - * the macro are not executed until the lock has been obtained. Conversely, > - * S_UNLOCK() must guarantee that loads and stores issued before the macro > - * have been executed before the lock is released. That old comment means that both SpinLockAcquire() and SpinLockRelease() acted as full memory barriers, and looking at the implementations, that was indeed so. With the new implementation, SpinLockAcquire() will have "acquire semantics" and SpinLockRelease will have "release semantics". That's very sensible, and I don't believe it will break anything, but it's a change in semantics nevertheless. -- Heikki Linnakangas Neon (https://neon.tech)
On Wed, Jul 31, 2024 at 8:47 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > On 31/07/2024 08:52, Thomas Munro wrote: > The old __i386__ implementation of TAS() said: > > > * When this was last tested, we didn't have separate TAS() and TAS_SPIN() > > * macros. Nowadays it probably would be better to do a non-locking test > > * in TAS_SPIN() but not in TAS(), like on x86_64, but no-one's done the > > * testing to verify that. Without some empirical evidence, better to > > * leave it alone. > > It seems that you did what the comment suggested. That seems fine. For > sake of completeness, if someone has an i386 machine lying around, it > would be nice to verify that. Or an official CPU manufacturer's > implementation guide, or references to other implementations or something. Hmm, the last "real" 32 bit CPU is from ~20 years ago. Now the only 32 bit x86 systems we should nominally care about are modern CPUs that can also run 32 bit instruction; is there a reason to think they'd behave differently at this level? Looking at the current Intel optimisation guide's discussion of spinlock implementation at page 2-34 of [1], it doesn't distinguish between 32 and 64, and it has that double-check thing. > > - * On platforms with weak memory ordering, the TAS(), TAS_SPIN(), and > > - * S_UNLOCK() macros must further include hardware-level memory fence > > - * instructions to prevent similar re-ordering at the hardware level. > > - * TAS() and TAS_SPIN() must guarantee that loads and stores issued after > > - * the macro are not executed until the lock has been obtained. Conversely, > > - * S_UNLOCK() must guarantee that loads and stores issued before the macro > > - * have been executed before the lock is released. > > That old comment means that both SpinLockAcquire() and SpinLockRelease() > acted as full memory barriers, and looking at the implementations, that > was indeed so. With the new implementation, SpinLockAcquire() will have > "acquire semantics" and SpinLockRelease will have "release semantics". > That's very sensible, and I don't believe it will break anything, but > it's a change in semantics nevertheless. Yeah. It's interesting that our pg_atomic_clear_flag(f) is like standard atomic_flag_clear_explicit(f, memory_order_release), not like atomic_flag_clear(f) which is short for atomic_flag_clear_explicit(f, memory_order_seq_cst). Example spinlock code I've seen written in modern C or C++ therefore uses the _explicit variants, so it can get acquire/release, which is what people usually want from a lock-like thing. What's a good way to test the performance in PostgreSQL? In a naive loop that just test-and-sets and clears a flag a billion times in a loop and does nothing else, I see 20-40% performance increase depending on architecture when comparing _seq_cst with _acquire/_release. You're right that this semantic change deserves explicit highlighting, in comments somewhere... I wonder if we have anywhere that is counting on the stronger barrier... [1] https://www.intel.com/content/www/us/en/content-details/671488/intel-64-and-ia-32-architectures-optimization-reference-manual-volume-1.html
Hi, On 2024-07-31 17:52:34 +1200, Thomas Munro wrote: > 2. The pg_atomic_unlocked_test_flag() function was surprising to me: > it returns true if it's not currently set (according to a relaxed > load). Most of this patch was easy, but figuring out that I had > reverse polarity here was a multi-coffee operation :-) I can't call > it wrong though, as it's not based on <stdatomic.h>, and it's clearly > documented, so *shrug*. I have no idea why I did it that way round. This was a long time ago... > 4. Another reason would be that you need it to implement > SpinLockFree() and S_LOCK_FREE(). They don't seem to have had any > real callers since the beginning of open source PostgreSQL!, except > for a test of limited value in a new world without ports developing > their own spinlock code. Let's remove them! I see this was already > threatened by Andres in 3b37a6de. Note that I would like to add a user for S_LOCK_FREE(), to detect repeated SpinLockRelease(): https://postgr.es/m/20240729182952.hua325647e2ggbsy%40awork3.anarazel.de Greetings, Andres Freund
Hi, On 2024-07-30 23:08:36 +1200, Thomas Munro wrote: > On Tue, Jul 30, 2024 at 12:39 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > On Tue, Jul 30, 2024 at 11:16 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > > Looks good to me. > > > > Thanks. I'll wait just a bit longer to see if anyone else has comments. > > And pushed. Yay! > I am aware of a couple of build farm animals that will now fail > because they deliberately test --disable-spinlocks: francolin and > rorqual, which will need adjustment or retirement on master. I'll > watch out for other surprises on the farm... I've now adjusted rorqual, francolin, piculet to not run on master anymore - they're just there to test combinations of --disable-atomics and --disable-spinlocks, so there seems not much point in just disabling those options for HEAD. Greetings, Andres Freund
Hi, On 2024-07-31 22:32:19 +1200, Thomas Munro wrote: > > That old comment means that both SpinLockAcquire() and SpinLockRelease() > > acted as full memory barriers, and looking at the implementations, that > > was indeed so. With the new implementation, SpinLockAcquire() will have > > "acquire semantics" and SpinLockRelease will have "release semantics". > > That's very sensible, and I don't believe it will break anything, but > > it's a change in semantics nevertheless. > > Yeah. It's interesting that our pg_atomic_clear_flag(f) is like > standard atomic_flag_clear_explicit(f, memory_order_release), not like > atomic_flag_clear(f) which is short for atomic_flag_clear_explicit(f, > memory_order_seq_cst). Example spinlock code I've seen written in > modern C or C++ therefore uses the _explicit variants, so it can get > acquire/release, which is what people usually want from a lock-like > thing. What's a good way to test the performance in PostgreSQL? I've used c=8;pgbench -n -Mprepared -c$c -j$c -P1 -T10 -f <(echo "SELECT pg_logical_emit_message(false, \:client_id::text, '1'),generate_series(1, 1000) OFFSET 1000;") in the past. Because of NUM_XLOGINSERT_LOCKS = 8 this ends up with 8 backends doing tiny xlog insertions and heavily contending on insertpos_lck. The generate_series() is necessary as otherwise the context switch and executor startup overhead dominates. > In a naive loop that just test-and-sets and clears a flag a billion times in > a loop and does nothing else, I see 20-40% performance increase depending on > architecture when comparing _seq_cst with _acquire/_release. I'd expect the difference to be even bigger on concurrent workloads on x86-64 - the added memory barrier during lock release really hurts. I have a test program to play around with this and the difference in isolation is like 0.4x the throughput with a full barrier release on my older 2 socket workstation [1]. Of course it's not trivial to hit "pure enough" cases in the real world. On said workstation [1], with the above pgbench, I get ~1.95M inserts/sec (1959 TPS * 1000) on HEAD and 1.80M insert/sec after adding #define S_UNLOCK(lock) __atomic_store_n(lock, 0, __ATOMIC_SEQ_CST) If I change NUM_XLOGINSERT_LOCKS = 40 and use 40 clients, I get 1.03M inserts/sec with the current code and 0.86M inserts/sec with __ATOMIC_SEQ_CST. Greetings, Andres Freund [1] 2x Xeon Gold 5215
On Thu, Aug 1, 2024 at 7:07 AM Andres Freund <andres@anarazel.de> wrote: > Note that I would like to add a user for S_LOCK_FREE(), to detect repeated > SpinLockRelease(): > https://postgr.es/m/20240729182952.hua325647e2ggbsy%40awork3.anarazel.de What about adding a "magic" member in assertion builds? Here is my attempt at that, in 0002. I also realised that we might as well skip the trivial S_XXX macros and delete s_lock.h. In this version of 0001 we retain just spin.h, but s_lock.c still exists to hold the slow path.
Attachment
Hi, On 2024-08-01 10:09:07 +1200, Thomas Munro wrote: > On Thu, Aug 1, 2024 at 7:07 AM Andres Freund <andres@anarazel.de> wrote: > > Note that I would like to add a user for S_LOCK_FREE(), to detect repeated > > SpinLockRelease(): > > https://postgr.es/m/20240729182952.hua325647e2ggbsy%40awork3.anarazel.de > > What about adding a "magic" member in assertion builds? Here is my > attempt at that, in 0002. That changes the ABI, which we don't want, because it breaks using extensions against a differently built postgres. I don't really see a reason to avoid having S_LOCK_FREE(), am I missing something? Previously the semaphore fallback was a reason, but that's gone now... Greetings, Andres Freund
On Thu, Aug 1, 2024 at 10:38 AM Andres Freund <andres@anarazel.de> wrote: > On 2024-08-01 10:09:07 +1200, Thomas Munro wrote: > > On Thu, Aug 1, 2024 at 7:07 AM Andres Freund <andres@anarazel.de> wrote: > > > Note that I would like to add a user for S_LOCK_FREE(), to detect repeated > > > SpinLockRelease(): > > > https://postgr.es/m/20240729182952.hua325647e2ggbsy%40awork3.anarazel.de > > > > What about adding a "magic" member in assertion builds? Here is my > > attempt at that, in 0002. > > That changes the ABI, which we don't want, because it breaks using > extensions against a differently built postgres. Yeah, right, bad idea. Let me think about how to do something like what you showed, but with the atomics patch... Hmm. One of the interesting things about the atomic_flag interface is that it completely hides the contents of memory. (Guess: its weird minimal interface was designed to help weird architectures like PA-RISC, staying on topic for $SUBJECT; I doubt we'll see such a system again but it's useful for this trick). So I guess we could push the check down to that layer, and choose arbitrary non-zero values for the arch-x86.h implementation of pg_atomic_flag . See attached. Is this on the right track? (Looking ahead, if we eventually move to using <stdatomic.h>, we won't be able to use atomic_flag due to lack of relaxed load anyway, so we could generalise this to atomic_char (rather than atomic_bool), and keep using non-zero values. Presumably at that point we could also decree that zero-initialised memory is valid for initialising our spinlocks, but it seems useful as a defence against uninitialised objects anyway.) > I don't really see a reason to avoid having S_LOCK_FREE(), am I missing > something? Previously the semaphore fallback was a reason, but that's gone > now... Sure, but if it's just for assertions, we don't need it. Or any of the S_XXX stuff.
Attachment
On Tue, Jul 30, 2024 at 12:39 PM Thomas Munro <thomas.munro@gmail.com> wrote: > On Tue, Jul 30, 2024 at 11:16 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > I think we should do: > > > > #ifdef _M_AMD64 > > #define __x86_64__ > > #endif > > > > somewhere, perhaps in src/include/port/win32.h. I suppose we could define our own PG_ARCH_{ARM,MIPS,POWER,RISCV,S390,SPARC,X86}_{32,64} in one central place, instead. Draft patch for illustration.