Thread: Backend dies for DBI
I have looked at the FAQs and archives, but haven't seen anything resembling this so please bear with me: Setup: Linux x86, 2.2.16 Apache 1.3.19 mod_perl 1.25 HTML-Mason-0.89 PostgreSQL 7.0.3 DBI-1.14 DBD-Pg-0.95 Mason (which is a template package on top of mod_perl, see http://www.masonhq.com/) opens a connection to the database, using DBI. This handle is then used for subsequent DB access. Mason runs within the web server. This worked flawlessly until I had a power outage a month or so ago. Since then, the backend postmaster process seems to die now and then, and I have been unable to reproduce it consistently. That is, the number of times I access the (dynamic HTML) page (and thus the DB) before the DB crashes, varies. The log file from postmaster simply says: ---> StartTransactionCommand [...] ProcessQuery CommitTransactionCommand proc_exit(0) shmem_exit(0) exit(0) /usr/local/pgsql/bin/postmaster: reaping dead processes... /usr/local/pgsql/bin/postmaster: CleanupProc: pid 1297 exited with status 0 proc_exit(0) shmem_exit(0) exit(0) /usr/local/pgsql/bin/postmaster: reaping dead processes... /usr/local/pgsql/bin/postmaster: CleanupProc: pid 1298 exited with status 0 ---< The postmaster is started like this: /usr/local/pgsql/bin/postmaster -d 2 -D /usr/local/pgsql/data >> /usr/local/pgsql/pgsql.log 2>&1 DBI (I think) sometimes output a message like "Backend message type 0x50 arrived while idle". This can happen several times, and the crash does not occur directly following any of the messages. The DBI error message simply says ---> DBD::Pg::st execute failed: pqReadData() -- backend closed the channel unexpectedly. This probably means the backendterminated abnormally before or while processing the request. ---< The same is true for Apache 1.3.12, mod_perl 1.24. Any pointers and hints so I can figure out how to make the backend not die, are greatly appreciated. Thanks, Bjorn.
Bjorn Solberg <bjorn@geocities.com> writes: > The log file from postmaster simply says: > ---> > StartTransactionCommand > [...] > ProcessQuery > CommitTransactionCommand > proc_exit(0) > shmem_exit(0) > exit(0) > /usr/local/pgsql/bin/postmaster: reaping dead processes... > /usr/local/pgsql/bin/postmaster: CleanupProc: pid 1297 exited with status 0 > proc_exit(0) > shmem_exit(0) > exit(0) > /usr/local/pgsql/bin/postmaster: reaping dead processes... > /usr/local/pgsql/bin/postmaster: CleanupProc: pid 1298 exited with status 0 > ---< This looks absolutely normal --- there's no evidence here for anything except normal backend exit. At least the backend thinks it's normal. You should be looking for client or protocol-level problems. > DBI (I think) sometimes output a message like "Backend message type 0x50 > arrived while idle". This can happen several times, and the crash does > not occur directly following any of the messages. This may indicate running out of memory for a query result on the client side; libpq is not very graceful about dealing with that :-(. You should check on the sizes of query results you are fetching and the amount of memory available to your DBI applications. Fetching a few rows at a time via a CURSOR is a good workaround if that seems to be the problem. regards, tom lane
Tom Lane writes: > Bjorn Solberg <bjorn@geocities.com> writes: >> The log file from postmaster simply says: ---> >> StartTransactionCommand >> [...] >> ProcessQuery >> CommitTransactionCommand >> proc_exit(0) >> shmem_exit(0) >> exit(0) >> /usr/local/pgsql/bin/postmaster: reaping dead processes... >> /usr/local/pgsql/bin/postmaster: CleanupProc: pid 1297 exited with status 0 >> proc_exit(0) >> shmem_exit(0) >> exit(0) >> /usr/local/pgsql/bin/postmaster: reaping dead processes... >> /usr/local/pgsql/bin/postmaster: CleanupProc: pid 1298 exited with status 0 >> ---< > This looks absolutely normal --- there's no evidence here for anything > except normal backend exit. At least the backend thinks it's normal. > You should be looking for client or protocol-level problems. OK. I thought it was abnormal because it only happens when DBI loses connection. >> DBI (I think) sometimes output a message like "Backend message type 0x50 >> arrived while idle". This can happen several times, and the crash does >> not occur directly following any of the messages. > This may indicate running out of memory for a query result on the client > side; libpq is not very graceful about dealing with that :-(. You > should check on the sizes of query results you are fetching and the > amount of memory available to your DBI applications. Fetching a few > rows at a time via a CURSOR is a good workaround if that seems to be > the problem. Thank you for your help, I'll look into this. I do check the result of each query, this is a very small database so I can't quite see how that would affect the system like this. What confuses me is that it ran fine for a couple of months. Then, after the power outage, it crashes every few hours, resulting in having to restart the web server to regain the DB handle. Currently the DB handle is allocated once at (web-)server startup. Maybe I'll change it to do a connect and disconnect for each web page instead. I figured doing it only once and reusing the DB handle would save some time, but maybe it is an unsafe thing to do? Bjorn.
> What confuses me is that it ran fine for a couple of months. Then, after > the power outage, it crashes every few hours, resulting in having to > restart the web server to regain the DB handle. Hmm. Possibly some corrupted data in the database now? regards, tom lane
> -----Mensaje original----- > De: pgsql-interfaces-owner@postgresql.org > [mailto:pgsql-interfaces-owner@postgresql.org]En nombre de Tom Lane > Enviado el: jueves, 08 de marzo de 2001 2:51 > Para: Bjorn Halvor Solberg > CC: pgsql-interfaces@postgresql.org > Asunto: Re: [INTERFACES] Backend dies for DBI > > > > What confuses me is that it ran fine for a couple of months. > Then, after > > the power outage, it crashes every few hours, resulting in having to > > restart the web server to regain the DB handle. > > Hmm. Possibly some corrupted data in the database now? > > regards, tom lane > Recently I had a problem similar to this. I don't know why, once or twice a day, some backend died abnormaly so killing all other backends. The problem was some index got corrupted. Using the command REINDEX TABLE table_name FORCE, it fixed the problem with the index. But the problem started again. Finally I've found that the problem was in NFS. My databases where in another computer (different from the postgres binaries) mounted by NFS. Now I've put postmaster and databases in the same hard disk and now it works. BTW, I have a conexion to the postmaster on every cgi (mod_perl) because I thought that it was not possible to open a conexión from within the httpd.conf and use it for every cgi. > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org >
Tom Lane writes: >> What confuses me is that it ran fine for a couple of months. Then, after >> the power outage, it crashes every few hours, resulting in having to >> restart the web server to regain the DB handle. > Hmm. Possibly some corrupted data in the database now? That thought did come to mind, but manual queries go through just fine. I did vacuum the database, but that didn't help. Maybe using just one DBI database handle is the wrong way to use it? Still, what puzzles me is that it worked fine for months before the power outage. I've changed it to open and close the database handle (connect/ disconnect) for every web page served now, instead of having just one per server process for the lifetime of the process, and haven't seen the problem since. I suppose there are performance issues with that, but the database and nature of the access is such that that isn't important now. :) The database is on the local machine, the same host as the web server is running on, so it is not due to an NFS problem. I suppose there are more people out there using PostgreSQL through DBI in Apache, either through mod_perl, Mason or other means - what do you do? Connect and disconnect per serverd web page, or keep one connection open for each web server process? Thanks for hints and help! Bjorn.
Bjorn Halvor Solberg <bhso@pacbell.net> writes: > What confuses me is that it ran fine for a couple of months. Then, after > the power outage, it crashes every few hours, resulting in having to > restart the web server to regain the DB handle. >> Hmm. Possibly some corrupted data in the database now? > That thought did come to mind, but manual queries go through just fine. > I did vacuum the database, but that didn't help. This is reaching a little bit, but something to think about: maybe you are not running exactly the same software or configuration as you were before the power outage. A well known gotcha on machines that stay up for a long time between boots is that you may have changed something without realizing that the effects would disappear at the next boot (or alternatively, only come into effect at the next boot). For example, starting a daemon without remembering to arrange for the boot scripts to start it, or changing a daemon's config file and neglecting to test the results, or installing a new version of some shlib that's used by a daemon. When a reboot does finally happen, you've forgotten that there was anything to worry about, and so the sudden change in behavior is mysterious. I realize this doesn't give anything concrete to go on, but maybe it will give you an idea of where to look. regards, tom lane