Thread: wrong database name in error message?
I'm on 9.2.4 and I have several databases on the instance say db1, db2, db2 etc. Today I got this error message on connection of any of the databases: ERROR: database is not accepting commands to avoid wraparound data loss in database "db1" Suggestion:Stop the postmaster and use a standalone backend to vacuum that database. The error messages says problem is on db1 when I connected to any database. But after I did vacuum full on some other databasessuch as db2, db2, but not db1, the problem disappeared. Was that just a wrong database name in the error message?
Rural Hunter wrote > I'm on 9.2.4 and I have several databases on the instance say db1, db2, > db2 etc. Today I got this error message on connection of any of the > databases: > ERROR: database is not accepting commands to avoid wraparound data loss > in database "db1" > Suggestion:Stop the postmaster and use a standalone backend to > vacuum that database. > > The error messages says problem is on db1 when I connected to any > database. But after I did vacuum full on some other databases such as db2, > db2, but not db1, the problem disappeared. > Was that just a wrong database name in the error message? Likely auto-vacuum kicked in and took care of the wraparound issue while the system was handling your manual vacuum routines...but that is just a theory David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/wrong-database-name-in-error-message-tp5770815p5770822.html Sent from the PostgreSQL - admin mailing list archive at Nabble.com.
于 2013/9/14 10:25, David Johnston 写道: > Likely auto-vacuum kicked in and took care of the wraparound issue > while the system was handling your manual vacuum routines...but that > is just a theory I don't think so. I had to use single connection mode to run the vacuum full on other dbs.
Rural Hunter escribió: > I'm on 9.2.4 and I have several databases on the instance say db1, > db2, db2 etc. Today I got this error message on connection of any > of the databases: > ERROR: database is not accepting commands to avoid wraparound data > loss in database "db1" > Suggestion:Stop the postmaster and use a standalone backend to > vacuum that database. > > The error messages says problem is on db1 when I connected to any database. But after I did vacuum full on some other databasessuch as db2, db2, but not db1, the problem disappeared. > Was that just a wrong database name in the error message? The ultimate source of truth here are the pg_class and pg_database catalogs (pg_class for each database stores the age of every table in that database; pg_database stores the minimum of such values in each database). The database name you see in the error messages is stored in pg_control (actually it's the OID that's stored not the name), but vacuuming other databases might have updated the pg_control info because of updated calculations from the shared catalog. I don't know how to explain the discrepancy other than concurrent processing by autovacuum, though. Perhaps autovacuum, in the last few Xids you had left, processed that database, but the field in pg_control didn't get updated until after you processed the other databases? Not sure about this. But it's past my bed time here, so no further speculation from me. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Sure. thanks any away and have a good night. Let me put here the whole scenario: 1. I was called by our application users that all the updating was failing. So I went to check the db. Any update transaction including manual vacuum is blocked out by the error message: ERROR: database is not accepting commands to avoid wraparound data loss in database "db1" Suggestion:Stop the postmaster and use a standalone backend to vacuum that database. 2. Since db1 is a very large database(it is the main db the user is using) I can not afford to take long time to vacuum full on that. So I thought about to try on other small dbs first. 3. I stop the instance. 4. I use "echo 'vacuum full;' | postgres --single -D /pgdata [other dbs]" to vacuum some other dbs. I still got several warning messages when vacuum the first database(let's say db2): 2013-09-14 08:56:44 CST [5536]: [453-1] user=,db=,host= WARNING: database "db1" must be vacuumed within 999775 transactions 2013-09-14 08:56:44 CST [5536]: [454-1] user=,db=,host= HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database. Here the error message still points to db1. 5. When I ran the single connection vacuum on other dbs(not db1), there was not any error/warning message. So I tried to start whole instance. 6. I started the instance and found everything is fine. So actually I have 3 questions here: 1. Was the db name in the error message wrong? 2. How would that happend? Shouldn't auto vacuum handle it and avoid such problem? 3. How to detect such problem earlier? 于 2013/9/14 12:55, Alvaro Herrera 写道: > The ultimate source of truth here are the pg_class and pg_database > catalogs (pg_class for each database stores the age of every table in > that database; pg_database stores the minimum of such values in each > database). The database name you see in the error messages is stored > in pg_control (actually it's the OID that's stored not the name), but > vacuuming other databases might have updated the pg_control info > because of updated calculations from the shared catalog. I don't know > how to explain the discrepancy other than concurrent processing by > autovacuum, though. Perhaps autovacuum, in the last few Xids you had > left, processed that database, but the field in pg_control didn't get > updated until after you processed the other databases? Not sure about > this. But it's past my bed time here, so no further speculation from me.
Rural Hunter <ruralhunter@gmail.com> wrote: > Let me put here the whole scenario: > 1. I was called by our application users that all the updating was > failing. So I went to check the db. Any update transaction including > manual vacuum is blocked out by the error message: > ERROR: database is not accepting commands to avoid wraparound data loss > in database "db1" > Suggestion:Stop the postmaster and use a standalone backend to > vacuum that database. > > 2. Since db1 is a very large database(it is the main db the user is > using) I can not afford to take long time to vacuum full on that. So I > thought about to try on other small dbs first. Why in the world would you want to use VACUUM FULL in this circumstance? > 3. I stop the instance. > > 4. I use "echo 'vacuum full;' | postgres --single -D /pgdata [other > dbs]" to vacuum some other dbs. I still got several warning messages > when vacuum the first database(let's say db2): > 2013-09-14 08:56:44 CST [5536]: [453-1] user=,db=,host= WARNING: > database "db1" must be vacuumed within 999775 transactions > 2013-09-14 08:56:44 CST [5536]: [454-1] user=,db=,host= HINT: To avoid > a database shutdown, execute a database-wide VACUUM in that database. WARNING, not error, so the VACUUM would have run. > Here the error message still points to db1. I'm not sure which database would be referenced if the table which needed the VACUUM was a shared table, like pg_database or pg_authid. > 5. When I ran the single connection vacuum on other dbs(not db1), there > was not any error/warning message. So I tried to start whole instance. > > 6. I started the instance and found everything is fine. > > So actually I have 3 questions here: > 1. Was the db name in the error message wrong? Probably not, to the extent that running VACUUM (FULL is not necessary) against that database would have solved the problem. If it was a shared catalog table it might be that it was not the *only* database which would work. > 2. How would that happend? Shouldn't auto vacuum handle it and avoid > such problem? There are two possibilities -- either you had a long-running transaction in the cluster or your autovacuum is not configured to be aggressive enough to keep you out of trouble. > 3. How to detect such problem earlier? We would need a description of the machine (cores, RAM, storage system) and the output of these queries to be able to make good suggestions on tuning autovacuum: SELECT version(); SELECT name, current_setting(name), source FROM pg_settings WHERE source NOT IN ('default', 'override'); In addition, make sure that you are monitoring for long-running transactions. A reasonable monitoring scheme might be to alert when either of these queries returns any rows: select * from pg_stat_activity where xact_start < (now() - interval '1 hour'); select * from pg_prepared_xacts where prepared < (now() - interval '1 minute'); You can, of course, adjust the intervals to what makes the most sense for your environment. If you have max_prepared_transactions set to zero, the latter query is not really necessary. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
于 2013/9/15 1:06, Kevin Grittner 写道: > Rural Hunter <ruralhunter@gmail.com> wrote: > > Why in the world would you want to use VACUUM FULL in this circumstance? > the db name in the error message wrong? I just googled around and found the solution. What's the other option? > There are two possibilities -- either you had a long-running > transaction in the cluster or your autovacuum is not configured to > be aggressive enough to keep you out of trouble. I checked the running transactions before I stopped the instance. There was no long running transaction exception one auto-vacuum running for about 1 hour. > > We would need a description of the machine (cores, RAM, storage > system) and the output of these queries to be able to make good > suggestions on tuning autovacuum: > > SELECT version(); > SELECT name, current_setting(name), source > FROM pg_settings > WHERE source NOT IN ('default', 'override'); I'm on Ubuntu 12.04.1 64bit with 32 cores and 377G memory. The data is stored on several rai10 SAS 15k disks. postgres=# SELECT version(); version --------------------------------------------------------------------------------------------------------------- PostgreSQL 9.2.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-52), 64-bit (1 row) postgres=# SELECT name, current_setting(name), source postgres-# FROM pg_settings postgres-# WHERE source NOT IN ('default', 'override'); name | current_setting | source -----------------------------+------------------------------------------------------------------------------------+---------------------- application_name | psql.bin | client archive_command | test ! -f /dbbk/postgres/logarch/%f.gz && gzip -c %p >/dbbk/postgres/logarch/%f.gz | configuration file archive_mode | on | configuration file autovacuum | on | configuration file autovacuum_freeze_max_age | 2000000000 | configuration file checkpoint_segments | 20 | configuration file client_encoding | UTF8 | client DateStyle | ISO, YMD | configuration file default_text_search_config | chinesecfg | configuration file effective_cache_size | 100GB | configuration file full_page_writes | off | configuration file lc_messages | zh_CN.utf8 | configuration file lc_monetary | zh_CN.utf8 | configuration file lc_numeric | zh_CN.utf8 | configuration file lc_time | zh_CN.utf8 | configuration file listen_addresses | * | configuration file log_autovacuum_min_duration | 30min | configuration file log_destination | stderr | configuration file log_line_prefix | %t [%p]: [%l-1] user=%u,db=%d,host=%h | configuration file log_min_duration_statement | 10s | configuration file log_statement | ddl | configuration file log_timezone | PRC | configuration file logging_collector | on | configuration file maintenance_work_mem | 20GB | configuration file max_connections | 2500 | configuration file max_stack_depth | 2MB | environment variable max_wal_senders | 1 | configuration file port | 3500 | configuration file shared_buffers | 16GB | configuration file synchronous_commit | off | configuration file TimeZone | PRC | configuration file track_activities | on | configuration file track_counts | on | configuration file vacuum_freeze_table_age | 1000000000 | configuration file wal_buffers | 16MB | configuration file wal_level | hot_standby | configuration file work_mem | 8MB | configuration file (37 rows) > > -- > Kevin Grittner > EDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >
On Sat, Sep 14, 2013 at 6:05 PM, Rural Hunter <ruralhunter@gmail.com> wrote:
于 2013/9/15 1:06, Kevin Grittner 写道:Rural Hunter <ruralhunter@gmail.com> wrote:I just googled around and found the solution. What's the other option?
Why in the world would you want to use VACUUM FULL in this circumstance?
the db name in the error message wrong?
A plain "VACUUM" will suffice -- there is no need to also use the FULL keyword in this case (wraparound).
Rural Hunter escribió: > 2. Since db1 is a very large database(it is the main db the user is > using) I can not afford to take long time to vacuum full on that. So > I thought about to try on other small dbs first. > > 3. I stop the instance. > > 4. I use "echo 'vacuum full;' | postgres --single -D /pgdata [other > dbs]" to vacuum some other dbs. Two things. One is you don't need VACUUM FULL, as already pointed out; plain VACUUM suffices. The other is that you don't actually need to vacuum all tables; only those with a very old pg_class.relfrozenxid. The one with the oldest value is that feeds pg_database.datfrozenxid; and that's what feeds the "must be vacuumed within XY transactions" messages. So you can just connect to db1, examine pg_class looking for tables whose age(relfrozenxid) is old, and vacuum only those. No need for downtime. Now the interesting question is why didn't autovacuum get to these. Normally it does, but when there's conflicting activity (say you have periodic ALTER TABLE for some reason) it might not be able to. Check the log for ERRORs that made autovacuum kill itself, for example. In the long run, the best solution is to not have such conflicting activity in the first place. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Rural Hunter <ruralhunter@gmail.com> wrote: > I'm on Ubuntu 12.04.1 64bit with 32 cores and 377G memory. The > data is stored on several rai10 SAS 15k disks. With a machine that beefy I have found it necessary to make the autovacuum settings more aggressive. Otherwise the need for vacuuming can outpace the ability of autovacuum to keep up. > autovacuum_freeze_max_age | 2000000000 | configuration file > vacuum_freeze_table_age | 1000000000 | configuration file There's your problem. You left so little space between when autovacuum would kick in for wraparound prevention (2 billion transactions) and when the server prevents new transactions in order to protect your data (2 ^ 31 - 1000000 transactions) that autovacuum didn't have enough time to complete its effort to do so. Changing a setting to ten times its default value is something which should always be approached with caution. In this case you changed the threshold for starting the work to prevent data loss from a little under 10% of the distance to the disastrous condition to a little under 100% of that distance. You could play with non-standard setting for these, but if you go anywhere near this extreme you risk downtime like you have just experienced. Personally, I have never had a reason to change these from the defaults. To ensure that autovacuum can keep up with the activity on a machine like this, I have generally gone to something like: autovacuum_cost_limit = 800 If you have more than a couple large tables which take long enough to scan to prevent small, frequently-updated tables from getting attention soon enough, you might want to boost autovacuum_max_workers, too. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
于 2013/9/16 1:31, Kevin Grittner 写道: > There's your problem. You left so little space between when > autovacuum would kick in for wraparound prevention (2 billion > transactions) and when the server prevents new transactions in > order to protect your data (2 ^ 31 - 1000000 transactions) that > autovacuum didn't have enough time to complete its effort to do so. > > Changing a setting to ten times its default value is something > which should always be approached with caution. In this case you > changed the threshold for starting the work to prevent data loss > from a little under 10% of the distance to the disastrous condition > to a little under 100% of that distance. > > You could play with non-standard setting for these, but if you go > anywhere near this extreme you risk downtime like you have just > experienced. Personally, I have never had a reason to change these > from the defaults. OK, thanks for pointing out the problem. This was changed quite long time ago when I saw too frequent auto vacuums to prevent the wrap-around on a very busy/large table which slow down the performance. I will change it back to the default to see how it works. > > To ensure that autovacuum can keep up with the activity on a > machine like this, I have generally gone to something like: > > autovacuum_cost_limit = 800 > > If you have more than a couple large tables which take long enough > to scan to prevent small, frequently-updated tables from getting > attention soon enough, you might want to boost > autovacuum_max_workers, too. > I will try the parameters as you suggested too. So you guys still think the problem is on db1(that's my main db) as the error message stated? Just auto-vacuum on db1 kicked off somehow and fixed the problem when I was running vacuum on other dbs?
Rural Hunter <ruralhunter@gmail.com> wrote: > This was changed quite long time ago when I saw too frequent auto > vacuums to prevent the wrap-around on a very busy/large table > which slow down the performance. I will change it back to the > default to see how it works. There was a long-standing bug which could cause over-frequent wraparound prevention autovacuums. As long as you are on the latest minor release, things should be much better now. > I will try the parameters as you suggested too. Possibly. As I said before, I think the symptoms might better fit a situation where the table in need of VACUUM was a shared table and it just happened to mention db1 because that was the database it was scanning at the time. (Every database includes the shared system tables in its catalog.) -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
于 2013/9/17 0:02, Kevin Grittner 写道: > Possibly. As I said before, I think the symptoms might better fit a > situation where the table in need of VACUUM was a shared table and it > just happened to mention db1 because that was the database it was > scanning at the time. (Every database includes the shared system > tables in its catalog.) OK, that sounds reasonable. thanks a lot for your advice and explanation.