Thread: Intermediate report for AIX 5L port
This is an intermediate report for AIX 5L port. o gmake check sometimes hungs up o pgbench almost always hungs up In summary current on AIX 5L is quite unstable(7.1 worked very well). FYI, here are the back trace info while pgbench hungs up. This includes backtraces for 3 processes. I will continue to investigate as long as the hardware is available for me. [ps indicates UPDATE] semop(??, ??, ??) at 0xd02be73c IpcSemaphoreLock() at 0x100091d0 LWLockAcquire() at 0x10019df4 LockRelease() at 0x10052540 UnlockRelation() at 0x10051950 index_endscan() at 0x10051104 ExecCloseR() at 0x100a0584 ExecEndIndexScan() at 0x100a3a30 ExecEndNode() at 0x1009e960 EndPlan() at 0x1009d700 ExecutorEnd() at 0x1009e434 ProcessQuery() at 0x100b60e8 pg_exec_query_string() at 0x1001c5b4 PostgresMain() at 0x1001c0a8 DoBackend() at 0x10003380 BackendStartup() at 0x1000287c ServerLoop() at 0x10002be8 PostmasterMain() at 0x10004934 main() at 0x100004ec (dbx) [ps indicates UPDATE] semop(??, ??, ??) at 0xd02be73c IpcSemaphoreLock() at 0x100091d0 LWLockAcquire() at 0x10019df4 LockRelease() at 0x10052540 UnlockRelation() at 0x10051950 relation_close() at 0x10014f1c SearchCatCache() at 0x100103f8 SearchSysCache() at 0x1000daac IndexSupportInitialize() at 0x100ae150 RelationInitIndexAccessInfo() at 0x100332a4 RelationBuildDesc() at 0x10030a20 RelationNameGetRelation() at 0x100337b0 relation_openr() at 0x10014f84 index_openr() at 0x100513dc SearchCatCache() at 0x1001022c SearchSysCache() at 0x1000daac pg_aclcheck() at 0x1008d844 ExecCheckRTEPerms() at 0x1009c838 ExecCheckRTPerms() at 0x1009c94c ExecCheckQueryPerms() at 0x1009cb28 InitPlan() at 0x1009da10 ExecutorStart() at 0x1009e6f4 ProcessQuery() at 0x100b6028 pg_exec_query_string() at 0x1001c5b4 PostgresMain() at 0x1001c0a8 DoBackend() at 0x10003380 BackendStartup() at 0x1000287c ServerLoop() at 0x10002be8 PostmasterMain() at 0x10004934 main() at 0x100004ec (dbx) [ps indicates UPDATE waiting] semop(??, ??, ??) at 0xd02be73c IpcSemaphoreLock() at 0x100091d0 ProcSleep() at 0x1001d680 WaitOnLock() at 0x10051e70 LockAcquire() at 0x10052d00 XactLockTableWait() at 0x10051580 heap_update() at 0x100133d0 ExecReplace() at 0x1009ce28 ExecutePlan() at 0x1009d678 ExecutorRun() at 0x1009e564 ProcessQuery() at 0x100b60c4 pg_exec_query_string() at 0x1001c5b4 PostgresMain() at 0x1001c0a8 DoBackend() at 0x10003380 BackendStartup() at 0x1000287c ServerLoop() at 0x10002be8 PostmasterMain() at 0x10004934 main() at 0x100004ec
> Tatsuo Ishii <t-ishii@sra.co.jp> writes: > > In summary current on AIX 5L is quite unstable(7.1 worked very well). > > This looks like a problem with the new LWLock logic. I guess I have > to answer for that ;-). I am willing to dig into it if you can get > me access to the machine. Thanks. Since the machine is behind a firewall, I have to negotiate with the network admin. Once I succeed it, I will let you know. > BTW, is this a single-CPU or multi-CPU machine? It's a 4-way machine. -- Tatsuo Ishii
Tatsuo Ishii <t-ishii@sra.co.jp> writes: > In summary current on AIX 5L is quite unstable(7.1 worked very well). This looks like a problem with the new LWLock logic. I guess I have to answer for that ;-). I am willing to dig into it if you can get me access to the machine. BTW, is this a single-CPU or multi-CPU machine? regards, tom lane
Got it. The AIX compiler apparently feels free to rearrange the sequence proc->lwWaiting = true; proc->lwExclusive = (mode == LW_EXCLUSIVE); proc->lwWaitLink = NULL; if (lock->head== NULL) lock->head = proc; else lock->tail->lwWaitLink = proc; lock->tail = proc; /* Can release the mutex now */ SpinLockRelease_NoHoldoff(&lock->mutex); into something wherein the SpinLockRelease (which is just "x = 0") occurs before the last two assignments into the lock structure. Boo, hiss. Evidently, on your multiprocessor machine, there may be another CPU that is able to obtain the spinlock and then read the un-updated lock values before the stores occur. Declaring the lock pointer "volatile" seems to prevent this misbehavior. Personally I'd call this a compiler bug; isn't it supposed to consider semicolons as sequence points? I never heard that rearranging the order of stores into memory was considered a kosher optimization of C code. regards, tom lane
... > Declaring the lock pointer "volatile" seems to prevent this misbehavior. Great. That is what it is anyway, right? > Personally I'd call this a compiler bug; isn't it supposed to consider > semicolons as sequence points? I never heard that rearranging the order > of stores into memory was considered a kosher optimization of C code. Sure it is. Presumably "-O0" or equivalent would have kept this from happening, but seemingly unrelated stores into non-overlapping memory are always fair game at even modest levels of optimization. The "x = 0" is cheaper than the other operations, though it may be reordered for internal RISC-y reasons rather than "cheapest first" considerations. - Thomas
Thomas Lockhart <lockhart@fourpalms.org> writes: >> Declaring the lock pointer "volatile" seems to prevent this misbehavior. > Great. That is what it is anyway, right? The reason I hadn't declared it volatile in the first place was that I was assuming there'd be sequence points at the spin lock and unlock calls. The order of operations *while holding the lock* is, and should be, optimizable. Pushing updates outside of the range where the lock is held, however, isn't cool. Now that I think a little more, I am worried about the idea that the same thing could potentially happen in other modules that access shared memory and use spinlocks to serialize the access. Perhaps the fix I applied was wrong, and the correct fix is to change S_LOCK from #define S_UNLOCK(lock) (*(lock) = 0) to #define S_UNLOCK(lock) (*((volatile slock_t *) (lock)) = 0) Assuming that the compiler does faithfully treat that as a sequence point, it would solve potential related problems in other modules, not only LWLock. I note that we've carefully marked all the TAS operations as using volatile pointers ... but we forgot about S_UNLOCK. Comments? regards, tom lane
> Got it. The AIX compiler apparently feels free to rearrange the > sequence > > proc->lwWaiting = true; > proc->lwExclusive = (mode == LW_EXCLUSIVE); > proc->lwWaitLink = NULL; > if (lock->head == NULL) > lock->head = proc; > else > lock->tail->lwWaitLink = proc; > lock->tail = proc; > > /* Can release the mutex now */ > SpinLockRelease_NoHoldoff(&lock->mutex); > > into something wherein the SpinLockRelease (which is just "x = 0") > occurs before the last two assignments into the lock structure. > Boo, hiss. Evidently, on your multiprocessor machine, there may be > another CPU that is able to obtain the spinlock and then read the > un-updated lock values before the stores occur. > > Declaring the lock pointer "volatile" seems to prevent this misbehavior. > > Personally I'd call this a compiler bug; isn't it supposed to consider > semicolons as sequence points? I never heard that rearranging the order > of stores into memory was considered a kosher optimization of C code. Looks funny to me too. I will let the IBM engineers know what you have found. Thanks. -- Tatsuo Ishii
I said: > Now that I think a little more, I am worried about the idea that the > same thing could potentially happen in other modules that access shared > memory and use spinlocks to serialize the access. Perhaps the fix I > applied was wrong, and the correct fix is to change S_LOCK from > #define S_UNLOCK(lock) (*(lock) = 0) > to > #define S_UNLOCK(lock) (*((volatile slock_t *) (lock)) = 0) I have applied this patch also, since on reflection it seems the clearly Right Thing. However, I find that AIX 5's compiler must have the LWLock pointers marked volatile as well, else it still generates bad code. Original assembly code fragment (approximately lines 244-271 of lwlock.c): l r3,8(r25)stb r24,44(r25)cmpi 0,r0,0stb r4,45(r25)st r23,48(r25)cal r5,0(r0)stx r23,r28,r27 <----- spinlock releasebc BO_IF_NOT,CR0_EQ,__L834st r25,12(r26) <----- store into lock->headst r25,16(r26) <----- store into lock->taill r4,12(r25)bl .IpcSemaphoreLock{PR} With "volatile" added in S_UNLOCK: stb r24,44(r25)stb r3,45(r25)cmpi 0,r0,0cal r5,0(r0)st r23,48(r25)bc BO_IF_NOT,CR0_EQ,__L81cst r25,12(r26) <----- store into lock->headstx r23,r28,r27 <----- spinlock releasel r3,8(r25)st r25,16(r26) <----- store into lock->taill r4,12(r25)bl .IpcSemaphoreLock{PR} With "volatile" lock pointer in LWLockAcquire: stb r25,44(r23)stb r3,45(r23)cmpi 0,r0,0cal r5,0(r0)st r24,48(r23)bc BO_IF_NOT,CR0_EQ,__L850st r23,12(r26) <----- store into lock->headst r23,16(r26) <----- store into lock->tailstx r24,r28,r27 <----- spinlock releasel r3,8(r23)l r4,12(r23)bl .IpcSemaphoreLock{PR} I believe the second of these cases is inarguably a compiler bug. It is moving a store (into lock->tail) across a store through a volatile-qualified pointer. As I read the spec, that's not kosher. regards, tom lane
> Thomas Lockhart <lockhart@fourpalms.org> writes: > >> Declaring the lock pointer "volatile" seems to prevent this misbehavior. > > > Great. That is what it is anyway, right? > > The reason I hadn't declared it volatile in the first place was that I > was assuming there'd be sequence points at the spin lock and unlock > calls. The order of operations *while holding the lock* is, and should > be, optimizable. Pushing updates outside of the range where the lock is > held, however, isn't cool. > > Now that I think a little more, I am worried about the idea that the > same thing could potentially happen in other modules that access shared > memory and use spinlocks to serialize the access. Perhaps the fix I > applied was wrong, and the correct fix is to change S_LOCK from Here is my limited experience with volatile. There was a BSD/OS multiport card that mapped card memory to a RAM address, but the pointers pointing to that address weren't marked as volatile. An upgrade to a better compiler caused the driver to fail, and I finally figured out why. Marking them as volatile fixed it. Seems this is the same case. We are not pointing to memory on a card but to shared memory which can change on its own, hence it is volatile. Tom, I assume what you are saying is that the access to the spinlocks, already marked as volatile, should have prevented any code from migrating over those locks. I guess my big question is does any volatile access prevent optimization of other variables across that volatiles access? I didn't think that was guaranteed. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Tom, I assume what you are saying is that the access to the spinlocks, > already marked as volatile, should have prevented any code from > migrating over those locks. I guess my big question is does any > volatile access prevent optimization of other variables across that > volatiles access? I didn't think that was guaranteed. After eyeballing the C spec some more, I think you might be right. If that's the correct reading then it is indeed necessary for lwlock.c to mark the whole lock structure as volatile, not only the spinlock fields. However, if that's true then (a) 7.2 has three other modules that are potentially vulnerable to similar problems; (b) prior releases had many more places that were potentially vulnerable --- ie, all the modules that used to use spinlocks directly and now use LWLocks. If this sort of behavior is allowed, ISTM we should have been seeing major instability on lots of SMP machines. Comments? Do we need to put a bunch of "volatile" keywords into every place that uses raw spinlocks? If so, why wasn't the previous code equally broken?? I don't think the places that use LWLocks need volatile markers on their data structures, since the LWLock lock and unlock calls will be out-of-line subroutine calls. But for spinlocks that can be inlined, it seems there is a risk. regards, tom lane
> Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Tom, I assume what you are saying is that the access to the spinlocks, > > already marked as volatile, should have prevented any code from > > migrating over those locks. I guess my big question is does any > > volatile access prevent optimization of other variables across that > > volatiles access? I didn't think that was guaranteed. > > After eyeballing the C spec some more, I think you might be right. > If that's the correct reading then it is indeed necessary for lwlock.c > to mark the whole lock structure as volatile, not only the spinlock > fields. OK. > However, if that's true then (a) 7.2 has three other modules that are > potentially vulnerable to similar problems; (b) prior releases had That was going to be my next question. > many more places that were potentially vulnerable --- ie, all the > modules that used to use spinlocks directly and now use LWLocks. > If this sort of behavior is allowed, ISTM we should have been seeing > major instability on lots of SMP machines. Again, a good question. No idea. Here is a more general question: If you do: get lock;a=4release lock; Can the compiler reorder that to: a=4get lock;release lock; It can see the lock values don't have any effect on 'a'. What actually does keep this stuff from moving around? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
... > It can see the lock values don't have any effect on 'a'. What actually > does keep this stuff from moving around? Lack of ambition? I'm pretty sure that the only reasons *to* reorder instructions are: 1) there could be a performance gain, as in a) loop unrolling b) pipeline fill considerations c) unnecessary assignment(e.g. result is ignored, or only used on one path) 2) the optimization level allows it (-O0 does not reorder at all) I vaguely recall that the gcc docs discuss the kinds of optimizations allowed at each level. Presumably IBM's AIX compiler was a bit more aggressive in evaluating costs or pipeline fills than is gcc on other processors. - Thomas
Tom, Have you fixed the case on AIX 5L? I still see hunging backends with pgbench -c 64. Maybe AIX 5L (more precisely xlc) needs additional fixed? If so, I'm wondering why see no improvements even with gcc. -- Tatsuo Ishii (dbx) where semop(??, ??, ??) at 0xd02be73c IpcSemaphoreLock() at 0x100091d0 LWLockAcquire() at 0x10019df4 ReleaseBuffer() at 0x100205a4 CatalogIndexFetchTuple() at 0x1005a31c AttributeRelidNumIndexScan() at 0x1005a4e4 build_tupdesc_ind() at 0x10030c5c RelationBuildTupleDesc() at 0x10031180 RelationBuildDesc() at 0x100309c0 RelationNameGetRelation() at 0x100337b0 relation_openr() at 0x10014f84 heap_openr() at 0x10014d3c CatalogCacheInitializeCache() at 0x1000f194 SearchCatCache() at 0x1000fe9c SearchSysCache() at 0x1000daac eqsel() at 0x100c5388 OidFunctionCall4() at 0x10045ccc restriction_selectivity() at 0x100c7594 clauselist_selectivity() at 0x100c72a4 restrictlist_selectivity() at 0x100c7424 set_baserel_size_estimates() at 0x100c8924 set_plain_rel_pathlist() at 0x100e1268 set_base_rel_pathlists() at 0x100e13a4 make_one_rel() at 0x100e1518 subplanner() at 0x100e0b6c query_planner() at 0x100e0d98 grouping_planner() at 0x100df0f0 subquery_planner() at 0x100dff00 planner() at 0x100dffe0 pg_plan_query() at 0x1001c6b0 pg_exec_query_string() at 0x1001c530 PostgresMain() at 0x1001c0a8 DoBackend() at 0x10003380 BackendStartup() at 0x1000287c ServerLoop() at 0x10002be8 PostmasterMain() at 0x10004934 main() at 0x100004ec (dbx) > I said: > > Now that I think a little more, I am worried about the idea that the > > same thing could potentially happen in other modules that access shared > > memory and use spinlocks to serialize the access. Perhaps the fix I > > applied was wrong, and the correct fix is to change S_LOCK from > > #define S_UNLOCK(lock) (*(lock) = 0) > > to > > #define S_UNLOCK(lock) (*((volatile slock_t *) (lock)) = 0) > > I have applied this patch also, since on reflection it seems the clearly > Right Thing. However, I find that AIX 5's compiler must have the LWLock > pointers marked volatile as well, else it still generates bad code. > > Original assembly code fragment (approximately lines 244-271 of > lwlock.c): > > l r3,8(r25) > stb r24,44(r25) > cmpi 0,r0,0 > stb r4,45(r25) > st r23,48(r25) > cal r5,0(r0) > stx r23,r28,r27 <----- spinlock release > bc BO_IF_NOT,CR0_EQ,__L834 > st r25,12(r26) <----- store into lock->head > st r25,16(r26) <----- store into lock->tail > l r4,12(r25) > bl .IpcSemaphoreLock{PR} > > With "volatile" added in S_UNLOCK: > > stb r24,44(r25) > stb r3,45(r25) > cmpi 0,r0,0 > cal r5,0(r0) > st r23,48(r25) > bc BO_IF_NOT,CR0_EQ,__L81c > st r25,12(r26) <----- store into lock->head > stx r23,r28,r27 <----- spinlock release > l r3,8(r25) > st r25,16(r26) <----- store into lock->tail > l r4,12(r25) > bl .IpcSemaphoreLock{PR} > > With "volatile" lock pointer in LWLockAcquire: > > stb r25,44(r23) > stb r3,45(r23) > cmpi 0,r0,0 > cal r5,0(r0) > st r24,48(r23) > bc BO_IF_NOT,CR0_EQ,__L850 > st r23,12(r26) <----- store into lock->head > st r23,16(r26) <----- store into lock->tail > stx r24,r28,r27 <----- spinlock release > l r3,8(r23) > l r4,12(r23) > bl .IpcSemaphoreLock{PR} > > I believe the second of these cases is inarguably a compiler bug. > It is moving a store (into lock->tail) across a store through a > volatile-qualified pointer. As I read the spec, that's not kosher. > > regards, tom lane >
Tatsuo Ishii <t-ishii@sra.co.jp> writes: > Tom, Have you fixed the case on AIX 5L? I still see hunging backends > with pgbench -c 64. Maybe AIX 5L (more precisely xlc) needs additional > fixed? If so, I'm wondering why see no improvements even with gcc. Hmm, I thought I'd fixed it ... are you using CVS tip? The largest test case I tried was 5 client * 10000 transactions, but that ran to completion just fine. regards, tom lane
> Tatsuo Ishii <t-ishii@sra.co.jp> writes: > > Tom, Have you fixed the case on AIX 5L? I still see hunging backends > > with pgbench -c 64. Maybe AIX 5L (more precisely xlc) needs additional > > fixed? If so, I'm wondering why see no improvements even with gcc. > > Hmm, I thought I'd fixed it ... are you using CVS tip? The largest > test case I tried was 5 client * 10000 transactions, but that ran to > completion just fine. I believe I grabbed the latest current source. But I will try again... -- Tatsuo Ishii