Re: A long-running transaction - Mailing list pgsql-sql
From | Andrew Sullivan |
---|---|
Subject | Re: A long-running transaction |
Date | |
Msg-id | 20070411154923.GA22771@phlogiston.dyndns.org Whole thread Raw |
In response to | Re: A long-running transaction (John Summerfield <postgres@herakles.homelinux.org>) |
Responses |
Re: A long-running transaction
|
List | pgsql-sql |
On Wed, Apr 11, 2007 at 10:23:34PM +0800, John Summerfield wrote: > Each row's identified by a key, and the key doesn't change. That doesn't matter. > ADABAS would put the updated record right back where it came from, it That isn't how PostgreSQL works. I'm having a hard time laying my hands on the bits of the docs that explain this (anyone?), but MVCC stands fo Multi-Version Concurrency Control. When you write a row, it has a version number. That version number tells other transactions what version of the row they look at. Here's a _really over simplified_ explanation, at a conceptual level. [Note to experts: please don't correct details of how this is wrong. I know that. But John's problem is that he doesn't have this conceptually yet. The following explanation has worked for me in the past in getting the idea across first, so that the details can later make sense.] Suppose you have two transactions, t1 and t2. They start at approximately the same time. I'm assuming they're READ COMMITTED. Suppose t1 updates row R. That version of R has a number R(t1). Suppose at the same time, t2 is looking at R. The _version_ of R that t2 sees is R(tn), where n<1. This way, t2 does not have to wait on t1 in order for t2 to proceed (unless t2 actually modifies R. Ignore that for the sake of this explanation, please). The way this works is that, in marking R(t1), the system says "R(t1) is valid for transactions that committed after t1". Since t2 hasn't committed, it sees the old version. So when t1 commits, there are _two_ versions of R -- R(t1) and R(tn),n<1. When all transactions have committed such that there is no (tn),n<1, then the row is marked dead, and can be recovered by VACUUM (this is one of the main things VACUUM does. The idea is to move the housekeeping of concurrency control out of the main transaction. Oracle does something conceptually similar, except using rollback segments, which is why long-running transactions can exhaust the supply of rollback segments on Oracle). Now, you're thinking, "But this is all in one transaction!" Yes, but the implementation details make it impossible that t1 rewrite the same row over and over again, so your rewriting of the same row again and again actually is creating huge numbers of dead rows. The unfortunate thing is that you have to follow the whole dead-row chain to find the currently live version of your row. The upshot of this is that updating the same row over and over in one transaction will make your transaction go slower on each round, until you are nearly stopped. That's why a lot of updates of the sort you describe are in fact the worst case under Postgres. EnterpriseDB seems to have a proposal (which I believe is going to make 8.3) that will mitigate but not completely solve some of this. > user : 1d 3:17:04.03 16.0% page in : 131097310 disk 1: > 3079516r20087664w > nice : 0:05:39.64 0.1% page out: 197016649 > system: 2d 20:38:37.13 40.1% page act: 87906251 > > That line above. The way I figure it the kernel's doing an enormous > amount of work handling its buffers. No, I think it's doing an enormous amount of I/O, because it has to keep looking for these new rows (and scanning over the old areas while it's at it). This is system time, if I recall correctly through the haze that is now my understanding of Linux I/O, because it's down at the I/O subsystem and can't actually be done by the user program. (Feel free to correct me on this. But I still bet it's dead rows.) > Since all the work is within one transaction, there's no real need for > postgresql to write data at all, unless it's running out of memory. No, I'm sure this has to do with the multi-version row writing. Your alternatives are ok, but I'd be more tempted to commit a range of data in a timestamped series, and then join on a subselect picking the latest insert (and add another daemon to knock old values off the bottom of the list). > In both cases, I am using default values. Unless someone thinks there's > clearly something to be gained by changing them, I won't. The PostgreSQL instal defaults are there to get things started, not to perform well. There is a long-standing debate in the community about whether that's a good idea (and the defaults are way more sensible than they used to be), but in general you won't get really good performance from the default configuration of a PostgreSQL installation. Here is a possibly useful bit of reading to start with on tuning, although it's getting long in the tooth: http://www.powerpostgresql.com/Downloads/annotated_conf_80.html Also http://varlena.com/varlena/GeneralBits/Tidbits/perf.html And the -performance list will probably help you more than this one. Hope this helps. A -- Andrew Sullivan | ajs@crankycanuck.ca Users never remark, "Wow, this software may be buggy and hard to use, but at least there is a lot of code underneath." --Damien Katz