Re: pgbench - exclude pthread_create() from connection start timing - Mailing list pgsql-hackers
From | Noah Misch |
---|---|
Subject | Re: pgbench - exclude pthread_create() from connection start timing |
Date | |
Msg-id | 20130930223621.GA125986@tornado.leadboat.com Whole thread Raw |
In response to | Re: pgbench - exclude pthread_create() from connection start timing (Fabien COELHO <coelho@cri.ensmp.fr>) |
Responses |
Re: pgbench - exclude pthread_create() from connection
start timing
|
List | pgsql-hackers |
On Thu, Sep 26, 2013 at 01:41:01PM +0200, Fabien COELHO wrote: > >I don't get it; why is taking the time just after pthread_create() more sane > >than taking it just before pthread_create()? > > Thread create time seems to be expensive as well, maybe up 0.1 > seconds under some conditions (?). Under --rate, this create delay > means that throttling is laging behind schedule by about that time, > so all the first transactions are trying to catch up. threadRun() already initializes throttle_trigger with a fresh timestamp. Please detail how the problem remains despite that. > >typically far more expensive than pthread_create(). The patch for threaded > >pgbench made the decision to account for pthread_create() as though it were > >part of establishing the connection. You're proposing to not account for it > >all. Both of those designs are reasonable to me, but I do not comprehend the > >benefit you anticipate from switching from one to the other. > > > >>-j 800 vs -j 100 : ITM that if I you create more threads, the time delay > >>incurred is cumulative, so the strangeness of the result should worsen. > > > >Not in general; we do one INSTR_TIME_SET_CURRENT() per thread, just before > >calling pthread_create(). However, thread 0 is a special case; we set its > >start time first and actually start it last. Your observation of cumulative > >delay fits those facts. > > Yep, that must be thread 0 which has a very large delay. I think it > is simpler that each threads record its start time when it has > started, without exception. > > > Initializing the thread-0 start time later, just before calling > >its threadRun(), should clear this anomaly without changing other > >aspects of the measurement. > > Always taking the thread start time when the thread is started does > solve the issue as well, and it is homogeneous for all cases, so the > solution I suggest seems reasonable and simple. To exercise the timing semantics before and after your proposed change, I added a "sleep(1);" before the pthread_create() call. Here are the results with and without "-j", with and without pgbench-measurements-v5.patch: $ echo 'select 1' >test.sql # just the sleep(1) addition $ env time pgbench -c4 -t1000 -S -n -f test.sql | grep tps tps = 6784.410104 (including connections establishing) tps = 7094.701854 (excluding connections establishing) 0.03user 0.07system 0:00.60elapsed 16%CPU (0avgtext+0avgdata 0maxresident)k $ env time pgbench -j4 -c4 -t1000 -S -n -f test.sql | grep tps tps = 1224.327010 (including connections establishing) tps = 2274.160899 (excluding connections establishing) 0.02user 0.03system 0:03.27elapsed 1%CPU (0avgtext+0avgdata 0maxresident)k # w/ pgbench-measurements-v5.patch $ env time pgbench -c4 -t1000 -S -n -f test.sql | grep tps tps = 6792.393877 (including connections establishing) tps = 7207.142278 (excluding connections establishing) 0.08user 0.06system 0:00.60elapsed 23%CPU (0avgtext+0avgdata 0maxresident)k $ env time pgbench -j4 -c4 -t1000 -S -n -f test.sql | grep tps tps = 1212.040409 (including connections establishing) tps = 1214.728830 (excluding connections establishing) 0.09user 0.06system 0:03.31elapsed 4%CPU (0avgtext+0avgdata 0maxresident)k Rather than, as I supposed before, excluding the cost of thread start entirely, pgbench-measurements-v5.patch has us count pthread_create() as part of the main runtime. I now see the cumulative delay you mentioned, but pgbench-measurements-v5.patch does not fix it. The problem is centered on the fact that pgbench.c:main() calculates a single total_time and models each thread as having run for that entire period. If pthread_create() is slow, reality diverges considerably from that model; some threads start notably late, and other threads finish notably early. The threadRun() runtime intervals in the last test run above are actually something like this: thread 1: 1.0s - 1.3s thread 2: 2.0s - 2.3s thread 3: 3.0s - 3.3s thread 0: 3.0s - 3.3s Current pgbench instead models every thread as having run 0.0s - 3.3s, hence the numbers reported. To make the numbers less surprising, we could axe the global total_time=end_time-start_time and instead accumulate total_time on a per-thread basis, just as we now accumulate conn_time on a per-thread basis. That ceases charging threads for time spent not-yet-running or already-finished, but it can add its own inaccuracy. Performance during a period in which some clients have yet to start is not interchangeable with performance when all clients are running. pthread_create() slowness would actually make the workload seem to perform better. An alternate strategy would be to synchronize the actual start of command issuance across threads. All threads would start and make their database connections, then signal readiness. Once the first thread notices that every other thread is ready, it would direct them to actually start issuing queries. This might minimize the result skew problem of the first strategy. A third strategy is to just add a comment and write this off as one of the several artifacts of short benchmark runs. Opinions, other ideas? > >While pondering this area of the code, it occurs to me -- > >shouldn't we initialize the throttle rate trigger later, after > >establishing connections and sending startup queries? As it > >stands, we build up a schedule deficit during those tasks. Was > >that deliberate? > > On the principle, I agree with you. > > The connection creation time is another thing, but it depends on the > options set. Under some options the connection is open and closed > for every transaction, so there is no point in avoiding it in the > measure or in the scheduling, and I want to avoid having to > distinguish those cases. That's fair enough. -- Noah Misch EnterpriseDB http://www.enterprisedb.com
pgsql-hackers by date: