Re: User concurrency thresholding: where do I look? - Mailing list pgsql-performance
From | Jignesh K. Shah |
---|---|
Subject | Re: User concurrency thresholding: where do I look? |
Date | |
Msg-id | 46A8BD62.2080904@sun.com Whole thread Raw |
In response to | Re: User concurrency thresholding: where do I look? (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: User concurrency thresholding: where do I look?
Re: User concurrency thresholding: where do I look? |
List | pgsql-performance |
I will look for runs with longer samples.. Now the script could have mislabeled lock names.. Anyway digging into the one that seems to increase over time... I did stack profiles on how that increases... and here are some numbers.. For 600-850 Users: that potential mislabeled CheckPointStartLock or LockID==12 comes from various sources where the top source (while system is still doing great) comes from: postgres`LWLockAcquire+0x1c8 postgres`SimpleLruReadPage_ReadOnly+0xc postgres`TransactionIdGetStatus+0x14 postgres`TransactionLogFetch+0x58 postgres`TransactionIdDidCommit+0x4 postgres`HeapTupleSatisfiesSnapshot+0x234 postgres`heap_release_fetch+0x1a8 postgres`index_getnext+0xf4 postgres`IndexNext+0x7c postgres`ExecScan+0x8c postgres`ExecProcNode+0xb4 postgres`ExecutePlan+0xdc postgres`ExecutorRun+0xb0 postgres`PortalRunSelect+0x9c postgres`PortalRun+0x244 postgres`exec_execute_message+0x3a0 postgres`PostgresMain+0x1300 postgres`BackendRun+0x278 postgres`ServerLoop+0x63c postgres`PostmasterMain+0xc40 8202100 postgres`LWLockAcquire+0x1c8 postgres`TransactionIdSetStatus+0x1c postgres`RecordTransactionCommit+0x2a8 postgres`CommitTransaction+0xc8 postgres`CommitTransactionCommand+0x90 postgres`finish_xact_command+0x60 postgres`exec_execute_message+0x3d8 postgres`PostgresMain+0x1300 postgres`BackendRun+0x278 postgres`ServerLoop+0x63c postgres`PostmasterMain+0xc40 postgres`main+0x394 postgres`_start+0x108 30822100 However at 900 Users where the big drop in throughput occurs: It gives a different top "consumer" of time: postgres`LWLockAcquire+0x1c8 postgres`TransactionIdSetStatus+0x1c postgres`RecordTransactionCommit+0x2a8 postgres`CommitTransaction+0xc8 postgres`CommitTransactionCommand+0x90 postgres`finish_xact_command+0x60 postgres`exec_execute_message+0x3d8 postgres`PostgresMain+0x1300 postgres`BackendRun+0x278 postgres`ServerLoop+0x63c postgres`PostmasterMain+0xc40 postgres`main+0x394 postgres`_start+0x108 406601300 postgres`LWLockAcquire+0x1c8 postgres`SimpleLruReadPage+0x1ac postgres`TransactionIdGetStatus+0x14 postgres`TransactionLogFetch+0x58 postgres`TransactionIdDidCommit+0x4 postgres`HeapTupleSatisfiesUpdate+0x360 postgres`heap_lock_tuple+0x27c postgres`ExecutePlan+0x33c postgres`ExecutorRun+0xb0 postgres`PortalRunSelect+0x9c postgres`PortalRun+0x244 postgres`exec_execute_message+0x3a0 postgres`PostgresMain+0x1300 postgres`BackendRun+0x278 postgres`ServerLoop+0x63c postgres`PostmasterMain+0xc40 postgres`main+0x394 postgres`_start+0x108 444523100 postgres`LWLockAcquire+0x1c8 postgres`SimpleLruReadPage+0x1ac postgres`TransactionIdGetStatus+0x14 postgres`TransactionLogFetch+0x58 postgres`TransactionIdDidCommit+0x4 postgres`HeapTupleSatisfiesSnapshot+0x234 postgres`heap_release_fetch+0x1a8 postgres`index_getnext+0xf4 postgres`IndexNext+0x7c postgres`ExecScan+0x8c postgres`ExecProcNode+0xb4 postgres`ExecutePlan+0xdc postgres`ExecutorRun+0xb0 postgres`PortalRunSelect+0x9c postgres`PortalRun+0x244 postgres`exec_execute_message+0x3a0 postgres`PostgresMain+0x1300 postgres`BackendRun+0x278 postgres`ServerLoop+0x63c postgres`PostmasterMain+0xc40 1661300000 Maybe you all will understand more than I do on what it does here.. Looks like IndexNext has a problem at high number of users to me.. but I could be wrong.. -Jignesh Tom Lane wrote: > "Jignesh K. Shah" <J.K.Shah@Sun.COM> writes: > >> The count is only for a 10-second snapshot.. Plus remember there are >> about 1000 users running so the connection being profiled only gets >> 0.01 of the period on CPU.. And the count is for that CONNECTION only. >> > > OK, that explains the low absolute levels of the counts, but if the > counts are for a regular backend then there's still a lot of bogosity > here. Backends won't be taking the CheckpointLock at all, nor do they > take CheckpointStartLock in exclusive mode. The bgwriter would do that > but it'd not be taking most of these other locks. So I think the script > is mislabeling the locks somehow. > > Also, elementary statistics should tell you that a sample taken as above > is going to have enormous amounts of noise. You should be sampling over > a much longer period, say on the order of a minute of runtime, to have > numbers that are trustworthy. > > regards, tom lane >
pgsql-performance by date: