Thread: Re: [NOVICE] URGENT! pg_dump doesn't work! (fwd)
---------- Forwarded message ---------- Date: Tue, 23 Jul 2002 14:39:50 +0100 (BST) From: Nigel J. Andrews <nandrews@investsystems.co.uk> To: Wim <wdh@belbone.be> Subject: Re: [NOVICE] [GENERAL] URGENT! pg_dump doesn't work! On Tue, 23 Jul 2002, Wim wrote: > > I looked for core files in my data directory, but could find any... I > installed gdb, since I run postgres on Solaris 9. > I have no experience with gdb, so could you tell me how I get a trace > when postgres fails? What you need to do is: 1) start psql - to connect to your backend which I assume is running or you already know how to do 2) identify the backend (postgres) server your psql session from 1) is attached to - use ps ax | grep postgres to list all backend processes (possibly ps -fe instead ps ax). If your database is quiet and only your psql session is using it then the relevent process is obvious. If your database is busy you may find it easier to connect to it as a user that is not normally used, for example the database superuser (usually postgres), as this information should appear in the process listing. 3) start gdb using: gdb <path to postgres backend binary> <process id from step 2> - this attaches gdb to process serving your psql session. 4) at the gdb prompt press 'c' - this makes the backend process continue normal operations (gdb stopped it when it attached to it) 5) in your psql session issue the statement that causes the fault 6) when the backend generates it's fault you should have a prompt in your gdb session. Typing 'bt' at the gdb prompt will give you a stack trace. Now, the complications. 1) To get a useful stack trace you may need to have postgres built with debugging enabled, which would require a reconfigure and build from source if not already done. 2) I'm starting to think I've taken you wrong route with this. I'm no expert on the internals but I'm not convinced the backend is generating a fault such that you would get a core file or be able to do the final stage in above steps. My apologies for wasting your time if that's the case. However, as you've already gone to the trouble to install gdb it may be easiest to give the above instructions a go and see if you do indeed get a gdb prompt in the final step. Whether this works or not you should also attempt to obtain the stack trace from the pg_dump attempt. There should be a way to start pg_dump such that you have time to determine the backend process for it, however, reading the manpage I can only see -W as a possibility. One other question that would be interesting perhaps is whether the problem you are experiencing persists across a backend restart. Hopefully someone like Tom or Bruce can provide some guidance. Tom already suggested data corruption. -- Nigel J. Andrews Director --- Logictree Systems Limited Computer Consultants
Nigel J. Andrews wrote: snip... >2) I'm starting to think I've taken you wrong route with this. I'm no expert >on the internals but I'm not convinced the backend is generating a fault such >that you would get a core file or be able to do the final stage in above >steps. My apologies for wasting your time if that's the case. However, as >you've already gone to the trouble to install gdb it may be easiest to give the >above instructions a go and see if you do indeed get a gdb prompt in the final >step. Whether this works or not you should also attempt to obtain the stack >trace from the pg_dump attempt. There should be a way to start pg_dump such >that you have time to determine the backend process for it, however, reading >the manpage I can only see -W as a possibility. > >One other question that would be interesting perhaps is whether the problem you >are experiencing persists across a backend restart. > >Hopefully someone like Tom or Bruce can provide some guidance. Tom already >suggested data corruption. > This is a partial output when I enable debugging level 3... DEBUG: reaping dead processes DEBUG: child process (pid 10917) was terminated by signal 10 DEBUG: server process (pid 10917) was terminated by signal 10 DEBUG: terminating any other active server processes DEBUG: CleanupProc: sending SIGQUIT to process 10899 NOTICE: Message from PostgreSQL backend: The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory. I have rolled back the current transaction and am going to terminate your database system connection and exit. Please reconnect to the database system and repeat your query. DEBUG: CleanupProc: sending SIGQUIT to process 10898 NOTICE: Message from PostgreSQL backend: The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory. I have rolled back the current transaction and am going to terminate your database system connection and exit. Please reconnect to the database system and repeat your query. DEBUG: child process (pid 10898) exited with exit code 1 DEBUG: child process (pid 10899) exited with exit code 1 DEBUG: all server processes terminated; reinitializing shared memory and semaphores DEBUG: shmem_exit(0) invoking IpcMemoryCreate(size=1441792) DEBUG: database system was interrupted at 2002-07-23 15:52:57 CEST DEBUG: checkpoint record is at 1/1888F010 DEBUG: redo record is at 1/1888F010; undo record is at 0/0; shutdown TRUE DEBUG: next transaction id: 647512; next oid: 12041996 DEBUG: database system was not properly shut down; automatic recovery in progress DEBUG: redo starts at 1/1888F050 DEBUG: reaping dead processes DEBUG: ReadRecord: record with zero length at 1/18904CE8 DEBUG: redo done at 1/18904CC0 DEBUG: database system is ready DEBUG: proc_exit(0) DEBUG: shmem_exit(0) DEBUG: exit(0) DEBUG: reaping dead processes I have trouble with corrupted data on two databases... each on a different machine and with different data and structure... With this DB, I can't do a pg_dumpall, and with the other, I have problems with pg_clog, where some files are missing... It all began when I upgraded to version 7.2.1... I reported the problem I mention hereunder a few weeks ago, but fixed it with a pg_dumpall and a recreation of the database... This is the message from the other DB... FATAL 2: open of /pgdata/pg_clog/0700 failed: No such file or directory server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: NOTICE: Message from PostgreSQL backend: The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory. I have rolled back the current transaction and am going to terminate your database system connection and exit. Please reconnect to the database system and repeat your query. Cheers! Wim