Thread: increased load on server
Hi, I'm using postgreSQL 8.3 on Ubuntu 8.04 with ltree. I tried to tune posgresql.conf to use more memory. In two years on production everything was fine, but today something goes wrong: postgresql processes eat my server ressources, my load, which was between 0 and 2.5 on my 4 cpus server, grows to 20. When the trouble appends there was a lot of waiting queries on my ltree table. I tried to restart postgresql server and to reboot the server, but 5 minutes later, the trouble was here again. I did a backup/restore of my database (pg_dump -Fc / pg_restore), and it was fine for 2 hours but after the touble reappears. I notice that the "base" directory grows a lot when the trouble begins. I made a 'vacuum', 'vacuum analyze', 'vacuum full' but the disk usage isn't lower. Please have you a clue to help me ? This is the week-end and I'm hoping to fix this before monday. Thanks.
jf <jfmeteo@gmail.com> wrote: > I tried to tune posgresql.conf to use more memory. Right before the problems started? As an attempt to deal with the problems? Years ago? > In two years on production everything was fine, but today > something goes wrong: postgresql processes eat my server > ressources, my load, which was between 0 and 2.5 on my 4 cpus > server, grows to 20. The output from `vmstat 1` during such an episode might be enlightening. > When the trouble appends there was a lot of waiting queries on my > ltree table. Waiting where? Are you talking about executing queries which are blocked trying to acqure locks on the ltree table? > I tried to restart postgresql server and to reboot the server, but > 5 minutes later, the trouble was here again. Any idea what changed during those 5 minutes? > I did a backup/restore of my database (pg_dump -Fc / pg_restore), > and it was fine for 2 hours but after the touble reappears. Did you notice anything about the usage during those two hours? Did you take a close look at what queries were being run when things are OK versus when they are not? > I notice that the "base" directory grows a lot when the trouble > begins. I made a 'vacuum', 'vacuum analyze', 'vacuum full' but the > disk usage isn't lower. Hopefully you are running autovacuum. VACUUM FULL is normally not a good idea in 8.3. What you might want to look for is a lingering transaction which is preventing normal autovacuum from working. Does anything show if you select from pg_prepared_xacts? Are there any rows in pg_stat_activity with a xact_start value older than it should be? One connection sitting in status '<IDLE> in transaction' for a long time can cause all sorts of problems. If these hints don't point you in the right direction, please review this page for ideas on more information you can supply to help people understand the problem and its cause: http://wiki.postgresql.org/wiki/Guide_to_reporting_problems In particular, knowing your exact PostgreSQL version, the contents of your postgresql.conf file (minus all comments), the number of connections you have active during the problem periods, and (if possible) listings of pg_stat_activity and pg_locks during a problem episode. -Kevin
Hi, >> I tried to tune posgresql.conf to use more memory. > Right before the problems started? Yes two years ago. >> postgresql processes eat my server >> ressources, my load, which was between 0 and 2.5 on my 4 cpus >> server, grows to 20. > > The output from `vmstat 1` during such an episode might be > enlightening. I saw those things during the problem: - a lot of blocks read per seconds (max to 46k) on the disk which only contains /var/lib (PostgreSQL datas but not pg_xlog) - lot of blocks write on the same disk (max 20k) and on the system + pg_xlog disk (max 25k) - a peak to 64k of PostgreSQL buffers allocated per second >> When the trouble appends there was a lot of waiting queries on my >> ltree table. > > Waiting where? Are you talking about executing queries which are > blocked trying to acqure locks on the ltree table? pg_stat_activity shows only select queries on ltree table and "IDLE" connections. All of those queries had xact_start value older than it should be. >> I tried to restart postgresql server and to reboot the server, but >> 5 minutes later, the trouble was here again. > > Any idea what changed during those 5 minutes? 5 minutes: just the time for my users to reconnect to my frontend. >> I did a backup/restore of my database (pg_dump -Fc / pg_restore), >> and it was fine for 2 hours but after the touble reappears. > > Did you notice anything about the usage during those two hours? Did > you take a close look at what queries were being run when things are > OK versus when they are not? I think queries are always quite the same. > In particular, knowing your exact PostgreSQL version, It was PostgresSQL 8.3.9, after this I upgrade to 8.3.14. > your postgresql.conf file (minus all comments), data_directory = '/var/lib/postgresql/8.3/main' hba_file = '/etc/postgresql/8.3/main/pg_hba.conf' ident_file = '/etc/postgresql/8.3/main/pg_ident.conf' external_pid_file = '/var/run/postgresql/8.3-main.pid' port = 5432 max_connections = 100 shared_buffers = 64MB work_mem = 16MB maintenance_work_mem = 128MB max_fsm_pages = 153600 log_min_duration_statement = 100 autovacuum = on datestyle = 'iso, dmy' lc_messages = 'fr_FR.UTF-8' lc_monetary = 'fr_FR.UTF-8' lc_numeric = 'fr_FR.UTF-8' lc_time = 'fr_FR.UTF-8' default_text_search_config = 'pg_catalog.french' > the number of connections you have active during the problem periods, Connections during problem: peak to 63. During normal use my average number is 6.5 My system: - RAM: 4Go - 2 disks: - sdb: contains only /var/lib, /var/lib/postgresql/8.3/main/pg_xlog is a symlink to an other partition on an other disk - sda contains partitions for /, /var, /tmp, /boot, /home - CPU: Intel Xeon 1.6G with 4 cores - No error on Linux system logs: dmesg, /var/log/syslog Thanks.
jf wrote: >>> I tried to restart postgresql server and to reboot the server, >>> but 5 minutes later, the trouble was here again. >> >> Any idea what changed during those 5 minutes? > 5 minutes: just the time for my users to reconnect to my frontend. > max_fsm_pages = 153600 Probably not the main problem, but you should be sure to schedule a VACUUM ANALYZE of the whole database (run as the database superuser) and check the last few lines to make sure the free space manager has enough space to prevent bloat. > I saw those things during the problem: > - a lot of blocks read per seconds (max to 46k) on the disk which > only contains /var/lib (PostgreSQL datas but not pg_xlog) > - lot of blocks write on the same disk (max 20k) and on the system > + pg_xlog disk (max 25k) > max_connections = 100 > shared_buffers = 64MB > Connections during problem: peak to 63. > During normal use my average number is 6.5 > > My system: > - RAM: 4Go > - 2 disks: > - sdb: contains only /var/lib, > /var/lib/postgresql/8.3/main/pg_xlog is a symlink to an other > partition on an other disk > - sda contains partitions for /, /var, /tmp, /boot, /home > - CPU: Intel Xeon 1.6G with 4 cores > - No error on Linux system logs: dmesg, /var/log/syslog You need to use some sort of connection pool to funnel those requests through about nine or ten connections. When you have more connections than that active on the hardware you describe, you're going to cut both throughput and response time, just when you don't want those to suffer. You want a pool which has a hard upper limit on number of connections and will queue requests when at the limit. Once you've done that you'll be in a better position to determine whether your hardware is hefty enough to handle the load. In particular, I'm concerned that you don't have RAID, both from a performance perspective, and from the ability to survive a drive failure. Those do happen. -Kevin
Hi, I had encountered this problem when there were a large number of writes on the disk (the operation involved bulk imports- read and writes) and disk configuration at that point of time showed latency in write operations - and inturn slowingdown the reads. The problem was resolved with following steps: 1) Changing disk configuration - it was a RAID 5. Had it replaced by a RAID 10 array. You may not require a large array dueto size of database, but it will certainly help to have multiple disks. 2) Regular VACUUM (auto vacuum/ analyze, manual vacuum/ analyze on certain tables during day and complete db vacuum andanalyze during weekends). 3) Regular reindexing - I schedule reindexes on tables after every 3-4 weeks. HTH, Shrirang Chitnis Sr. Manager, Applications Development HOV Services Office: (866) 808-0935 Ext: 39210 shrirang.chitnis@hovservices.com www.hovservices.com The information contained in this message, including any attachments, is attorney privileged and/or confidential informationintended only for the use of the individual or entity named as addressee. The review, dissemination, distributionor copying of this communication by or to anyone other than the intended addressee is strictly prohibited. Ifyou have received this communication in error, please immediately notify the sender by replying to the message and destroyall copies of the original message. -----Original Message----- From: Kevin Grittner [mailto:Kevin.Grittner@wicourts.gov] Sent: Saturday, February 12, 2011 8:05 PM To: jfmeteo@gmail.com; pgsql-admin@postgresql.org Subject: Re: increased load on server jf wrote: >>> I tried to restart postgresql server and to reboot the server, >>> but 5 minutes later, the trouble was here again. >> >> Any idea what changed during those 5 minutes? > 5 minutes: just the time for my users to reconnect to my frontend. > max_fsm_pages = 153600 Probably not the main problem, but you should be sure to schedule a VACUUM ANALYZE of the whole database (run as the database superuser) and check the last few lines to make sure the free space manager has enough space to prevent bloat. > I saw those things during the problem: > - a lot of blocks read per seconds (max to 46k) on the disk which > only contains /var/lib (PostgreSQL datas but not pg_xlog) > - lot of blocks write on the same disk (max 20k) and on the system > + pg_xlog disk (max 25k) > max_connections = 100 > shared_buffers = 64MB > Connections during problem: peak to 63. > During normal use my average number is 6.5 > > My system: > - RAM: 4Go > - 2 disks: > - sdb: contains only /var/lib, > /var/lib/postgresql/8.3/main/pg_xlog is a symlink to an other > partition on an other disk > - sda contains partitions for /, /var, /tmp, /boot, /home > - CPU: Intel Xeon 1.6G with 4 cores > - No error on Linux system logs: dmesg, /var/log/syslog You need to use some sort of connection pool to funnel those requests through about nine or ten connections. When you have more connections than that active on the hardware you describe, you're going to cut both throughput and response time, just when you don't want those to suffer. You want a pool which has a hard upper limit on number of connections and will queue requests when at the limit. Once you've done that you'll be in a better position to determine whether your hardware is hefty enough to handle the load. In particular, I'm concerned that you don't have RAID, both from a performance perspective, and from the ability to survive a drive failure. Those do happen. -Kevin