Re: Reduce ProcArrayLock contention - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Reduce ProcArrayLock contention |
Date | |
Msg-id | 20150820101940.GA5070@alap3.anarazel.de Whole thread Raw |
In response to | Re: Reduce ProcArrayLock contention (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Reduce ProcArrayLock contention
|
List | pgsql-hackers |
On 2015-08-20 15:38:36 +0530, Amit Kapila wrote: > On Wed, Aug 19, 2015 at 9:09 PM, Andres Freund <andres@anarazel.de> wrote: > > I spent some time today reviewing the commited patch. So far my only > > major complaint is that I think the comments are only insufficiently > > documenting the approach taken: > > Stuff like avoiding ABA type problems by clearling the list entirely and > > it being impossible that entries end up on the list too early absolutely > > needs to be documented explicitly. > > > > I think more comments can be added to explain such behaviour if it is > not clear via looking at current code and comments. It's not mentioned at all, so yes. > I think you are right and here we need to use something like what is > suggested below by you. Originally the code was similar to what you > have written below, but it was using a different (new) variable to achieve > what you have achieved with lwWaiting and to avoid the use of new > variable the code has been refactored in current way. I think we should > do this change (I can write a patch) unless Robert feels otherwise. I think we can just rename lwWaiting to something more generic. > > Consider what happens if such a follower enqueues in another > > transaction. It is not, as far as I could find out, guaranteed on all > > types of cpus that a third backend can actually see nextClearXidElem > > as INVALID_PGPROCNO. That'd likely require SMT/HT cores and multiple > > sockets. If the write to nextClearXidElem is entered into the local > > store buffer (leader #1) a hyper-threaded process (leader #2) can > > possibly see it (store forwarding) while another backend doesn't > > yet. > > > > I think this is very unlikely to be an actual problem due to > > independent barriers until enqueued again, but I don't want to rely > > on it undocumentedly. It seems safer to replace > > + wakeidx = pg_atomic_read_u32(&proc->nextClearXidElem); > > + pg_atomic_write_u32(&proc->nextClearXidElem, > INVALID_PGPROCNO); > > with a pg_atomic_exchange_u32(). > > > > I didn't follow this point, if we ensure that follower can never return > before leader wakes it up, then why it will be a problem to update > nextClearXidElem like above. Because it doesn't generally enforce that *other* backends have seen the write as there's no memory barrier. > > +/* > > + * ProcArrayGroupClearXid -- group XID clearing > > + * > > + * When we cannot immediately acquire ProcArrayLock in exclusive mode at > > + * commit time, add ourselves to a list of processes that need their XIDs > > + * cleared. The first process to add itself to the list will acquire > > + * ProcArrayLock in exclusive mode and perform > ProcArrayEndTransactionInternal > > + * on behalf of all group members. This avoids a great deal of context > > + * switching when many processes are trying to commit at once, since the > lock > > + * only needs to be handed from the last share-locker to one process > waiting > > + * for the exclusive lock, rather than to each one in turn. > > + */ > > +static void > > +ProcArrayGroupClearXid(PGPROC *proc, TransactionId latestXid) > > +{ > > > > This comment, in my opinion, is rather misleading. If you have a > > workload that primarily is slowed down due to transaction commits, this > > patch doesn't actually change the number of context switches very > > much. Previously all backends enqueued in the lwlock and got woken up > > one-by-one. Safe backends 'jumping' the queue while a lock has been > > released but the woken up backend doesn't yet run, there'll be exactly > > as many context switches as today. > > > > I think this is debatable as in previous mechanism all the backends > one-by-one clears their transaction id's (which is nothing but context > switching) which lead to contention not only with read lockers > but Write lockers as well. Huh? You can benchmark it, there's barely any change in the number of context switches. I am *not* saying that there's no benefit to the patch, I am saying that context switches are the wrong explanation. Reduced contention (due to shorter lock holding times, fewer cacheline moves etc.) is the reason this is beneficial. > > I don't think it's a good idea to use the variable name in PROC_HDR and > > PGPROC, it's confusing. > What do you mean by this, are you not happy with variable name? Yes. I think it's a bad idea to have the same variable name in PROC_HDR and PGPROC. struct PGPROC { .../* Support for group XID clearing. */volatile pg_atomic_uint32 nextClearXidElem; ... } typedef struct PROC_HDR { .../* First pgproc waiting for group XID clear */volatile pg_atomic_uint32 nextClearXidElem; ... } PROC_HDR's variable imo isn't well named. Greetings, Andres Freund
pgsql-hackers by date: