Re: vacuum, performance, and MVCC - Mailing list pgsql-hackers
From | Christopher Browne |
---|---|
Subject | Re: vacuum, performance, and MVCC |
Date | |
Msg-id | 87veqtbcq3.fsf@wolfe.cbbrowne.com Whole thread Raw |
In response to | vacuum, performance, and MVCC ("Mark Woodward" <pgsql@mohawksoft.com>) |
Responses |
Re: vacuum, performance, and MVCC
Re: vacuum, performance, and MVCC |
List | pgsql-hackers |
After a long battle with technology, pgsql@mohawksoft.com ("Mark Woodward"), an earthling, wrote: >> Clinging to sanity, pgsql@mohawksoft.com ("Mark Woodward") mumbled into >> her beard: >>> We all know that PostgreSQL suffers performance problems when rows are >>> updated frequently prior to a vacuum. The most serious example can be >>> seen >>> by using PostgreSQL as a session handler for a busy we site. You may >>> have >>> thousands or millions of active sessions, each being updated per page >>> hit. >>> >>> Each time the record is updated, a new version is created, thus >>> lengthening the "correct" version search each time row is accessed, >>> until, >>> of course, the next vacuum comes along and corrects the index to point >>> to >>> the latest version of the record. >>> >>> Is that a fair explanation? >> >> No, it's not. >> >> 1. The index points to all the versions, until they get vacuumed out. > > It can't point to "all" versions, it points to the last "current" version > as updated by vacuum, or the first version of the row. No, it points to *all* the versions. Suppose I take a table with two rows: INFO: analyzing "public.test" INFO: "test": 1 pages, 2 rows sampled, 2 estimated total rows VACUUM Then, over and over, I remove and insert one entry with the same PK: sample=# delete from test where id = 2;insert into test (id) values (2); DELETE 1 INSERT 4842550 1 sample=# delete from test where id = 2;insert into test (id) values (2); DELETE 1 INSERT 4842551 1 sample=# delete from test where id = 2;insert into test (id) values (2); DELETE 1 INSERT 4842552 1 sample=# delete from test where id = 2;insert into test (id) values (2); DELETE 1 INSERT 4842553 1 sample=# delete from test where id = 2;insert into test (id) values (2); DELETE 1 INSERT 4842554 1 sample=# delete from test where id = 2;insert into test (id) values (2); DELETE 1 INSERT 4842555 1 sample=# delete from test where id = 2;insert into test (id) values (2); DELETE 1 INSERT 4842556 1 sample=# delete from test where id = 2;insert into test (id) values (2); DELETE 1 INSERT 4842557 1 sample=# delete from test where id = 2;insert into test (id) values (2); DELETE 1 INSERT 4842558 1 sample=# delete from test where id = 2;insert into test (id) values (2); DELETE 1 INSERT 4842559 1 Now, I vacuum it. sample=# vacuum verbose analyze test; INFO: vacuuming "public.test" INFO: index "test_id_key" now contains 2 row versions in 2 pages DETAIL: 10 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "test": removed 10 row versions in 1 pages DETAIL: CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "test": found 10 removable, 2 nonremovable row versions in 1 pages DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: analyzing "public.test" INFO: "test": 1 pages, 2 rows sampled, 2 estimated total rows VACUUM Notice that the index contained 10 versions of that one row. It pointed to *ALL* the versions. >> 2. There may simultaneously be multiple "correct" versions. The >> notion that there is one version that is The Correct One is wrong, and >> you need to get rid of that thought. > > Sorry, this is misunderstanding. By "correct version search" it was > implied "for this transaction." Later I mention finding the first row with > a transaction lower than the current. Ah. Then you need for each transaction to spawn an index for each table that excludes non-current values. >>> If my assertion is fundimentally true, then PostgreSQL will always >>> suffer performance penalties under a heavy modification load. Of >>> course, tables with many inserts are not an issue, it is mainly >>> updates. The problem is that there are classes of problems where >>> updates are the primary operation. >> >> The trouble with your assertion is that it is true for *all* database >> systems except for those whose only transaction mode is READ >> UNCOMMITTED, where the only row visible is the "Latest" version. > > Not true. Oracle does not seem to exhibit this problem. Oracle suffers a problem in this regard that PostgreSQL doesn't; in Oracle, rollbacks are quite expensive, as "recovery" requires doing extra work that PostgreSQL doesn't do. -- output = ("cbbrowne" "@" "gmail.com") http://cbbrowne.com/info/ Marriage means commitment. Of course, so does insanity.
pgsql-hackers by date: