Thread: vacuum vs open transactions
I'm looking at some 7.3.4 vacuum output, and at first glance it does not appear that vacuum is reclaiming any dead tuple space if there is even a single open transaction, even if the open transaction does not in any way reference the table being vacuumed. Is that correct? Is the behavior different in later versions? Ed
On Wed, 2005-01-12 at 11:59, Ed L. wrote: > I'm looking at some 7.3.4 vacuum output, and at first glance it does not > appear that vacuum is reclaiming any dead tuple space if there is even a > single open transaction, even if the open transaction does not in any way > reference the table being vacuumed. Is that correct? Is the behavior > different in later versions? I believe the problem is occurring if the open transaction is older than the tuples that could be vacuumed. The MVCC system means that as long as a transaction that started X hours ago is still open, the tuples that have been freed since then can't vacuumed because they need to stay visible for that transaction.
On Wednesday January 12 2005 11:10, Scott Marlowe wrote: > > I believe the problem is occurring if the open transaction is older than > the tuples that could be vacuumed. The MVCC system means that as long > as a transaction that started X hours ago is still open, the tuples that > have been freed since then can't vacuumed because they need to stay > visible for that transaction. Is it possible via SQL query to tell how long a transaction has been open? Ed
On Wed, Jan 12, 2005 at 11:49:12AM -0700, Ed L. wrote: > > Is it possible via SQL query to tell how long a transaction has been open? I'm not aware of a way to find out when a transaction started, but if you have stats_command_string enabled then you can query pg_stat_activity to see when a session's current query started. -- Michael Fuhr http://www.fuhr.org/~mfuhr/
On Wed, Jan 12, 2005 at 11:30:50PM -0700, Michael Fuhr wrote: > On Wed, Jan 12, 2005 at 11:49:12AM -0700, Ed L. wrote: > > > > Is it possible via SQL query to tell how long a transaction has been open? > > I'm not aware of a way to find out when a transaction started, but > if you have stats_command_string enabled then you can query > pg_stat_activity to see when a session's current query started. now() returns the current transaction's start time. -- Alvaro Herrera (<alvherre[@]dcc.uchile.cl>) "No necesitamos banderas No reconocemos fronteras" (Jorge González)
On Thursday January 13 2005 5:50, Alvaro Herrera wrote: > On Wed, Jan 12, 2005 at 11:30:50PM -0700, Michael Fuhr wrote: > > On Wed, Jan 12, 2005 at 11:49:12AM -0700, Ed L. wrote: > > > Is it possible via SQL query to tell how long a transaction has been > > > open? > > > > I'm not aware of a way to find out when a transaction started, but > > if you have stats_command_string enabled then you can query > > pg_stat_activity to see when a session's current query started. > > now() returns the current transaction's start time. That would be perfect if I could see that for transactions other than my own. Ed
On Wednesday January 12 2005 11:30, Michael Fuhr wrote: > On Wed, Jan 12, 2005 at 11:49:12AM -0700, Ed L. wrote: > > Is it possible via SQL query to tell how long a transaction has been > > open? > > I'm not aware of a way to find out when a transaction started, but > if you have stats_command_string enabled then you can query > pg_stat_activity to see when a session's current query started. Yes, I see that in 7.4 (not in 7.3). But my purpose would be to remotely identify long-open transactions that are causing table bloat by making vacuum fail to reclaim space, so it seems I need the transaction start time, not query start time. Most likely this situation occurs when 1) someone starts a transaction in psql and then leaves it there, or 2) an application opens a transaction prior to getting user input. Ed
On Thu, Jan 13, 2005 at 08:45:38AM -0700, Ed L. wrote: > On Thursday January 13 2005 5:50, Alvaro Herrera wrote: > > On Wed, Jan 12, 2005 at 11:30:50PM -0700, Michael Fuhr wrote: > > > On Wed, Jan 12, 2005 at 11:49:12AM -0700, Ed L. wrote: > > > > Is it possible via SQL query to tell how long a transaction has been > > > > open? > > now() returns the current transaction's start time. > > That would be perfect if I could see that for transactions other than my > own. No, there's no way to know that. Unless, of course, you save it yourself somewhere at transaction start (though because of isolation, using a regular table would not help you any. Perhaps some PL's shared data can help you there.) -- Alvaro Herrera (<alvherre[@]dcc.uchile.cl>) Este mail se entrega garantizadamente 100% libre de sarcasmo.
On Thursday January 13 2005 10:09, Michael Fuhr wrote: > > For idle transactions pg_stat_activity shows "<IDLE> in transaction" > and the query_start column shows when the transaction became idle > (i.e., when the last statement completed). So if long-lived idle > transactions are the problem, then at least you can find out how > long they've been idle. That will help, thanks. Unfortunately, that doesn't appear to work for many of our pre-7.4 clusters, of which we have 60, but that will definitely help as we migrate forward. Ed
On Thu, Jan 13, 2005 at 08:44:56AM -0700, Ed L. wrote: > On Wednesday January 12 2005 11:30, Michael Fuhr wrote: > > > > I'm not aware of a way to find out when a transaction started, but > > if you have stats_command_string enabled then you can query > > pg_stat_activity to see when a session's current query started. > > Yes, I see that in 7.4 (not in 7.3). But my purpose would be to remotely > identify long-open transactions that are causing table bloat by making > vacuum fail to reclaim space, so it seems I need the transaction start > time, not query start time. Most likely this situation occurs when 1) > someone starts a transaction in psql and then leaves it there, or 2) an > application opens a transaction prior to getting user input. For idle transactions pg_stat_activity shows "<IDLE> in transaction" and the query_start column shows when the transaction became idle (i.e., when the last statement completed). So if long-lived idle transactions are the problem, then at least you can find out how long they've been idle. -- Michael Fuhr http://www.fuhr.org/~mfuhr/
On Thu, Jan 13, 2005 at 09:50:38AM -0300, Alvaro Herrera wrote: > On Wed, Jan 12, 2005 at 11:30:50PM -0700, Michael Fuhr wrote: > > > > I'm not aware of a way to find out when a transaction started, but > > if you have stats_command_string enabled then you can query > > pg_stat_activity to see when a session's current query started. > > now() returns the current transaction's start time. I meant when any transaction started, particularly a transaction other than the current one. -- Michael Fuhr http://www.fuhr.org/~mfuhr/
On Thu, Jan 13, 2005 at 08:44:56AM -0700, Ed L. wrote: > Yes, I see that in 7.4 (not in 7.3). But my purpose would be to remotely > identify long-open transactions that are causing table bloat by making > vacuum fail to reclaim space, so it seems I need the transaction start > time, not query start time. Most likely this situation occurs when 1) > someone starts a transaction in psql and then leaves it there, or 2) an > application opens a transaction prior to getting user input. Wouldn't the transaction ID be more useful. An earlier transaction ID obviously started earlier. So you should be able to identify the oldest transaction. Would the transaction ID field in pg_locks do? Hope this helps, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
On Thu, Jan 13, 2005 at 12:04:28PM -0700, Ed L. wrote: > It'd be nice if pg_stat_activity.transaction_start were added in the future > for a 100% answer, but I'm not sure there's much interest in this apart > from our needs. I wouldn't expect that to be hard to add. Consider submitting a patch or proposing it to the developers. -- Michael Fuhr http://www.fuhr.org/~mfuhr/
On Thu, Jan 13, 2005 at 10:33:50AM -0700, Ed L. wrote: > On Thursday January 13 2005 10:09, Michael Fuhr wrote: > > > > For idle transactions pg_stat_activity shows "<IDLE> in transaction" > > and the query_start column shows when the transaction became idle > > That will help, thanks. Unfortunately, that doesn't appear to work for many > of our pre-7.4 clusters, of which we have 60, but that will definitely help > as we migrate forward. Ah yes, I see in the Relase Notes that the query start time was added in 7.4. -- Michael Fuhr http://www.fuhr.org/~mfuhr/
On Thursday January 13 2005 11:37, Michael Fuhr wrote: > > That'll show which transaction is oldest but not how long it's been > open or idle, i.e., whether it's "long-open" or not. I assumed, > perhaps incorrectly, that he was already looking at pg_locks and > wanted to find out which of those transactions had been open for a > long time. Since pg_locks has a pid column, you can join (visually > or via a join query) with pg_stat_activity's procpid column. What I'm after is a simple way to automatically tell via cron query if there is a transaction staying open long enough (probably an hour) to cause bloat but that doesn't require any visual inspection. While not fool-proof, pg_stat_activity.query_start looks like a 90% answer. It'd be nice if pg_stat_activity.transaction_start were added in the future for a 100% answer, but I'm not sure there's much interest in this apart from our needs. Ed
On Thu, Jan 13, 2005 at 07:11:09PM +0100, Martijn van Oosterhout wrote: > > Wouldn't the transaction ID be more useful. An earlier transaction ID > obviously started earlier. So you should be able to identify the oldest > transaction. Would the transaction ID field in pg_locks do? That'll show which transaction is oldest but not how long it's been open or idle, i.e., whether it's "long-open" or not. I assumed, perhaps incorrectly, that he was already looking at pg_locks and wanted to find out which of those transactions had been open for a long time. Since pg_locks has a pid column, you can join (visually or via a join query) with pg_stat_activity's procpid column. -- Michael Fuhr http://www.fuhr.org/~mfuhr/