Re: Problem after VACUUM ANALYZE - Mailing list pgsql-general
From | Richard Huxton |
---|---|
Subject | Re: Problem after VACUUM ANALYZE |
Date | |
Msg-id | 47FB91EB.1030004@archonet.com Whole thread Raw |
In response to | Problem after VACUUM ANALYZE (mljv@planwerk6.de) |
Responses |
Re: Problem after VACUUM ANALYZE
Re: Problem after VACUUM ANALYZE |
List | pgsql-general |
mljv@planwerk6.de wrote: > We looked in our cpu monitoring and saw that we have huge IOwait while VACUUM > is running, not unusual though. But just after VACUUM was finished, the > userCPU load raised to 200% (dual cpu). At this time in the early morning > there were only very few users using our app. We analyzed the situation > further and saw that most queries had a very long duration time (minutes > instead of milliseconds). BTW: the vacuum process was not running at this > moment. It was logged before as done. But WHAT was using all of your cpu? Was it PostgreSQL, and if so just one backend? If something else was using all your cpu, then it might just be that the PostgreSQL server wasn't getting a chance to run your queries. > * Maybe the Vacuum analyze process has not enough memory and therefore could > not ANALYZE the tables correctly. It then writes wrong statistics to the > database which results in wrong execution plans using sequence scans instead > of index scans. This only happens if the vacuum analyze process runs > concurrently with user requests. If it runs on its own, the vacuum process > has enough memory and writes correct statistics to the database. Doesn't sound likely to me. You've got 8GB of RAM, and if you were going into swap you'd have noticed the disk I/O. > Here are some of our configuration parameters. We never really tweaked it as > it ran fine. We just raised some parameters. The following list should show > all parameters changed from the default: > max_connections = 300 > shared_buffers = 30000 > work_mem = 10240 OK, so that's 30,000 * 8KB = 240MB of shared_buffers You have 10MB of work_mem and if all 300 connections were using that much you'd have committed 3GB of your RAM for that. Of course they'll want more than just that. Do you really have 300 concurrent connections? > maintenance_work_mem = 163840 160MB for vacuums - should be OK given how much memory you have and the fact that it's quiet when you vacuum. > vacuum_mem = 32000 This is what maintenance_work_mem used to be called. You can delete this entry. > max_fsm_pages = 500000 You can track at most 500,000 pages with free space on them. In 8.2+ versions VACUUM VERBOSE will show you how many are currently being used. Not sure about 8.1 > bgwriter_lru_percent = 10.0 > bgwriter_lru_maxpages = 100 > bgwriter_all_percent = 5 > bgwriter_all_maxpages = 200 > wal_buffers = 16 > checkpoint_segments = 10 If you have bursts of write activity you might want to increase this. > checkpoint_warning = 3600 > effective_cache_size = 180000 That's 180,000 * 8KB = 180 * 8MB = 1.4GB If that's really all you're using as cache, I'd reduce the number of concurrent connections. Check free/top and see how much RAM is really being used as disk cache. > random_page_cost = 3 Might be too high - you don't mention what disks you have. > stats_command_string = off If you turn this one on, you'll be able to see the queries each backend is executing as they happen. Might be useful, but does have some cost. The crucial thing is to find out exactly what is happening when things get very slow. Check vmstat and top, look in the pg_locks system-table and if needs be we can see what strace says a particular backend is doing. -- Richard Huxton Archonet Ltd
pgsql-general by date: