Re: Help tuning postgres - Mailing list pgsql-performance
From | Csaba Nagy |
---|---|
Subject | Re: Help tuning postgres |
Date | |
Msg-id | 1129648896.27587.22.camel@coppola.muc.ecircle.de Whole thread Raw |
In response to | Re: Help tuning postgres (Andrew Sullivan <ajs@crankycanuck.ca>) |
Responses |
Re: Help tuning postgres
|
List | pgsql-performance |
In the light of what you've explained below about "nonremovable" row versions reported by vacuum, I wonder if I should worry about the following type of report: INFO: vacuuming "public.some_table" INFO: "some_table": removed 29598 row versions in 452 pages DETAIL: CPU 0.01s/0.04u sec elapsed 18.77 sec. INFO: "some_table": found 29598 removable, 39684 nonremovable row versions in 851 pages DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 0.02s/0.07u sec elapsed 23.16 sec. VACUUM Does that mean that 39684 nonremovable pages are actually the active live pages in the table (as it reports 0 dead) ? I'm sure I don't have any long running transaction, at least according to pg_stats_activity (backed by the linux ps too). Or I should run a vacuum full... This table is one of which has frequently updated rows. TIA, Csaba. On Thu, 2005-10-13 at 14:40, Andrew Sullivan wrote: > On Thu, Oct 13, 2005 at 10:15:03AM +0200, Csaba Nagy wrote: > > > > OK, this sounds interesting, but I don't understand: why would an update > > "chase down a lot of dead tuples" ? Should I read up on some docs, cause > > I obviously don't know enough about how updates work on postgres... > > Right. Here's the issue: > > MVCC does not replace rows when you update. Instead, it marks the > old row as expired, and sets the new values. The old row is still > there, and it's available for other transactions who need to see it. > As the docs say (see > <http://www.postgresql.org/docs/8.0/interactive/transaction-iso.html>), > "In effect, a SELECT query sees a snapshot of the database as of the > instant that that query begins to run." And that can be true because > the original data is still there, although marked as expired for > subsequent transactions. > > UPDATE works the same was as SELECT in terms of searching for rows > (so does any command that searches for data). > > Now, when you select data, you actually have to traverse all the > existing versions of the tuple in order to get the one that's live > for you. This is normally not a problem: VACUUM goes around and > cleans out old, expired data that is not live for _anyone_. It does > this by looking for the oldest transaction that is open. (As far as > I understand it, this is actually the oldest transaction in the > entire back end; but I've never understood why that should the the > case, and I'm too incompetent/dumb to understand the code, so I may > be wrong on this point.) If you have very long-running transactions, > then, you can end up with a lot of versions of dead tuples on the > table, and so reading the few records you want can turn out actually > to be a very expensive operation, even though it ought to be cheap. > > You can see this by using the VERBOSE option to VACUUM: > > test=# VACUUM VERBOSE eval1 ; > INFO: vacuuming "public.eval1" > INFO: "eval1": found 0 removable, 0 nonremovable row versions in 0 > pages > DETAIL: 0 dead row versions cannot be removed yet. > There were 0 unused item pointers. > 0 pages are entirely empty. > CPU 0.00s/0.00u sec elapsed 0.00 sec. > INFO: vacuuming "pg_toast.pg_toast_18831" > INFO: index "pg_toast_18831_index" now contains 0 row versions in 1 > pages > DETAIL: 0 index pages have been deleted, 0 are currently reusable. > CPU 0.00s/0.00u sec elapsed 0.00 sec. > INFO: "pg_toast_18831": found 0 removable, 0 nonremovable row > versions in 0 pages > DETAIL: 0 dead row versions cannot be removed yet. > There were 0 unused item pointers. > 0 pages are entirely empty. > CPU 0.00s/0.00u sec elapsed 0.00 sec. > VACUUM > > Note those "removable" and "nonremovable" row versions. It's the > unremovable ones that can hurt. WARNING: doing VACUUM on a big table > on a disk that's already pegged is going to cause you performance > pain, because it scans the whole table. In some cases, though, you > have no choice: if the winds are already out of your sails, and > you're effectively stopped, anything that might get you moving again > is an improvement. > > > And how would the analyze help in finding this out ? I thought it would > > only show me additionally the actual timings, not more detail in what > > was done... > > Yes, it shows the actual timings, and the actual number of rows. But > if the estimates that the planner makes are wildly different than the > actual results, then you know your statistics are wrong, and that the > planner is going about things the wrong way. ANALYSE is a big help. > There's also a verbose option to it, but it's usually less useful in > production situations. > > A
pgsql-performance by date: