Home > mailing lists

Performance tuning - Mailing list pgsql-performance

From	Jacques Caron
Subject	Performance tuning
Date	March 11, 2005 15:41:13
Msg-id	6.2.0.14.0.20050311155221.041aa528@pop.interactivemediafactory.net Whole thread Raw
Responses	Questions about 2 databases. Re: Performance tuning
List	pgsql-performance

Tree view

Hi all,

I'm preparing a set of servers which will eventually need to handle a high
volume of queries (both reads and writes, but most reads are very simple
index-based queries returning a limited set of rows, when not just one),
and I would like to optimize things as much as possible, so I have a few
questions on the exact way PostgreSQL's MVCC works, and how transactions,
updates and vacuuming interact. I hope someone will be able to point me in
the right direction (feel free to give pointers if I missed the places
where this is described).

 From what I understand (and testing confirms it), bundling many queries in
one single transaction is more efficient than having each query be a
separate transaction (like with autocommit on). However, I wonder about the
limits of this:

- are there any drawbacks to grouping hundreds or thousands of queries
(inserts/updates) over several minutes in one single transaction? Other
than the fact that the inserts/updates will not be visible until committed,
of course. Essentially turning autocommit off, and doing a commit once in a
while.

- does this apply only to inserts/selects/updates or also for selects?
Another way to put this is: does a transaction with only one select
actually have much transaction-related work to do? Or, does a transaction
with only selects actually have any impact anywhere? Does it really leave a
trace anywhere? Again, I understand that selects grouped in a transaction
will not see updates done after the start of the transaction (unless done
by the same process).

- if during a single transaction several UPDATEs affect the same row, will
MVCC generate as many row versions as there are updates (like would be the
case with autocommit) or will they be grouped into one single row version?

Another related issue is that many of the tables are indexed on a date
field, and one process does a lot of updates on "recent" rows (which lead
to many dead tuples), but after that "older" rows tend to remain pretty
much unchanged for quite a while. Other than splitting the tables into
"old" and "recent" tables, is there any way to make vacuum more efficient?
Scanning the whole table for dead tuples when only a small portion of the
table actually has any does not feel like being very efficient in this
situation.

Other issue: every five minutes or so, I see a noticeable performance drop
as PostgreSQL checkpoints. This is 7.4.3 with pretty lousy hardware, I know
8.0 with decent hardware and separate disk(s) for pg_xlog will definitely
help, but I really wonder if there is any way to reduce the amount of work
that needs to be done at that point (I'm a strong believer of fixing
software before hardware). I have already bumped checkpoint_segments to 8,
but I'm not quite sure I understand how this helps (or doesn't help)
things. Logs show 3 to 6 "recycled transaction log file" lines at that
time, that seems quite a lot of work for a load that's still pretty low.
Does grouping of more queries in transactions help with this? Are there
other parameters that can affect things, or is just a matter of how much
inserts/updates/deletes are done, and the amount of data that was changed?

Last point: some of the servers have expandable data (and will be
replicated with slony-I) and will run with fsync off. I have read
conflicting statements as to what exactly this does: some sources indicate
that setting fsync off actually switches off WAL/checkpointing, others that
it just prevents the fsync (or equivalent) system calls. Since I still see
checkpointing in that case, I guess it's not exactly the former, but I
would love to understand more about it. Really, I would love to be able to
set some tables or databases to "go as fast as you can and don't worry
about transactions, MVCC or anything like that", but I'm not sure that
option exists...

Thanks,

Jacques.

pgsql-performance by date:

From: John A Meinel
Date: 11 March 2005, 14:43:59
Subject: Re: What is the number of rows in explain?

From: "Lou O'Quin"
Date: 11 March 2005, 18:48:31
Subject: Query performance

Performance tuning - Mailing list pgsql-performance

Previous

Next