Thread: Re: [NOVICE] URGENT! pg_dump doesn't work! (fwd)

Re: [NOVICE] URGENT! pg_dump doesn't work! (fwd)

From

"Nigel J. Andrews"

Date:

23 July 2002, 09:43:28

---------- Forwarded message ----------
Date: Tue, 23 Jul 2002 14:39:50 +0100 (BST)
From: Nigel J. Andrews <nandrews@investsystems.co.uk>
To: Wim <wdh@belbone.be>
Subject: Re: [NOVICE] [GENERAL] URGENT! pg_dump doesn't work!

On Tue, 23 Jul 2002, Wim wrote:
>
> I looked for core files in my data directory, but could find any... I
> installed gdb, since I run postgres on Solaris 9.
> I have no experience with gdb, so could you tell me how I get a trace
> when postgres fails?

What you need to do is:

1) start psql
  - to connect to your backend which I assume is running or you already know
    how to do

2) identify the backend (postgres) server your psql session from 1) is attached
to
  - use ps ax | grep postgres to list all backend processes (possibly ps -fe
    instead ps ax). If your database is quiet and only your psql session is
    using it then the relevent process is obvious. If your database is busy you
    may find it easier to connect to it as a user that is not normally used,
    for example the database superuser (usually postgres), as this information
    should appear in the process listing.

3) start gdb using:
      gdb <path to postgres backend binary> <process id from step 2>
  - this attaches gdb to process serving your psql session.

4) at the gdb prompt press 'c'
  - this makes the backend process continue normal operations (gdb stopped it
    when it attached to it)

5) in your psql session issue the statement that causes the fault

6) when the backend generates it's fault you should have a prompt in your gdb
session. Typing 'bt' at the gdb prompt will give you a stack trace.

Now, the complications.

1) To get a useful stack trace you may need to have postgres built with
debugging enabled, which would require a reconfigure and build from source if
not already done.

2) I'm starting to think I've taken you wrong route with this. I'm no expert
on the internals but I'm not convinced the backend is generating a fault such
that you would get a core file or be able to do the final stage in above
steps. My apologies for wasting your time if that's the case. However, as
you've already gone to the trouble to install gdb it may be easiest to give the
above instructions a go and see if you do indeed get a gdb prompt in the final
step. Whether this works or not you should also attempt to obtain the stack
trace from the pg_dump attempt. There should be a way to start pg_dump such
that you have time to determine the backend process for it, however, reading
the manpage I can only see -W as a possibility.

One other question that would be interesting perhaps is whether the problem you
are experiencing persists across a backend restart.

Hopefully someone like Tom or Bruce can provide some guidance. Tom already
suggested data corruption.

--
Nigel J. Andrews
Director

---
Logictree Systems Limited
Computer Consultants

Re: [NOVICE] URGENT! pg_dump doesn't work! (fwd)

From

Wim

Date:

23 July 2002, 10:05:09

Nigel J. Andrews wrote:

snip...

>2) I'm starting to think I've taken you wrong route with this. I'm no expert
>on the internals but I'm not convinced the backend is generating a fault such
>that you would get a core file or be able to do the final stage in above
>steps. My apologies for wasting your time if that's the case. However, as
>you've already gone to the trouble to install gdb it may be easiest to give the
>above instructions a go and see if you do indeed get a gdb prompt in the final
>step. Whether this works or not you should also attempt to obtain the stack
>trace from the pg_dump attempt. There should be a way to start pg_dump such
>that you have time to determine the backend process for it, however, reading
>the manpage I can only see -W as a possibility.
>
>One other question that would be interesting perhaps is whether the problem you
>are experiencing persists across a backend restart.
>
>Hopefully someone like Tom or Bruce can provide some guidance. Tom already
>suggested data corruption.
>
This is a partial output when I enable debugging level 3...

DEBUG:  reaping dead processes
DEBUG:  child process (pid 10917) was terminated by signal 10
DEBUG:  server process (pid 10917) was terminated by signal 10
DEBUG:  terminating any other active server processes
DEBUG:  CleanupProc: sending SIGQUIT to process 10899
NOTICE:  Message from PostgreSQL backend:
        The Postmaster has informed me that some other backend
        died abnormally and possibly corrupted shared memory.
        I have rolled back the current transaction and am
        going to terminate your database system connection and exit.
        Please reconnect to the database system and repeat your query.
DEBUG:  CleanupProc: sending SIGQUIT to process 10898
NOTICE:  Message from PostgreSQL backend:
        The Postmaster has informed me that some other backend
        died abnormally and possibly corrupted shared memory.
        I have rolled back the current transaction and am
        going to terminate your database system connection and exit.
        Please reconnect to the database system and repeat your query.
DEBUG:  child process (pid 10898) exited with exit code 1
DEBUG:  child process (pid 10899) exited with exit code 1
DEBUG:  all server processes terminated; reinitializing shared memory
and semaphores
DEBUG:  shmem_exit(0)
invoking IpcMemoryCreate(size=1441792)
DEBUG:  database system was interrupted at 2002-07-23 15:52:57 CEST
DEBUG:  checkpoint record is at 1/1888F010
DEBUG:  redo record is at 1/1888F010; undo record is at 0/0; shutdown TRUE
DEBUG:  next transaction id: 647512; next oid: 12041996
DEBUG:  database system was not properly shut down; automatic recovery
in progress
DEBUG:  redo starts at 1/1888F050
DEBUG:  reaping dead processes
DEBUG:  ReadRecord: record with zero length at 1/18904CE8
DEBUG:  redo done at 1/18904CC0
DEBUG:  database system is ready
DEBUG:  proc_exit(0)
DEBUG:  shmem_exit(0)
DEBUG:  exit(0)
DEBUG:  reaping dead processes




I have trouble with corrupted data on two databases... each on a
different machine and with different data and structure...
With this DB, I can't do a pg_dumpall, and with the other, I have
problems with pg_clog, where some files are missing...
It all began when I upgraded to version 7.2.1... I reported the problem
I mention hereunder a few weeks ago, but fixed it with a pg_dumpall and
a recreation of the database...


This is the message from the other DB...

FATAL 2:  open of /pgdata/pg_clog/0700 failed: No such file or directory
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: NOTICE:
Message from PostgreSQL backend:
        The Postmaster has informed me that some other backend
        died abnormally and possibly corrupted shared memory.
        I have rolled back the current transaction and am
        going to terminate your database system connection and exit.
        Please reconnect to the database system and repeat your query.


Cheers!

Wim