Home > mailing lists

Thread: Fwd: Stalled post to pgsql-es-ayuda

Fwd: Stalled post to pgsql-es-ayuda

From

Sebastian Machuca

Date:

23 June 2010, 21:01:00

Hello everyone. I have a problem

First, some basic data:

OS: Red Hat

Version 8.3

Size 15 GB

The directory which contain the PGDATA/data is full, and i move the entire directory data to another partition, and i link symbolic to PGDATA.

Now, when i try to start the server, keep on "startup process recovering "

When i try to connect, this is the message:

psql: FATAL: the database system is starting up

And never go on.

I attach another information that is interesting

---------------------------------------------------------------------------------------------------------------

$ ps aux | grep postgres
root     24586 0.0 0.1   4920 1236 pts/2    S    10:52   0:00 su - postgres
postgres 24587 0.0 0.1   4540 1440 pts/2    S+   10:52   0:00 -bash
root     27444 0.0 0.1   4920 1240 pts/1    S    14:40   0:00 su - postgres
postgres 27445 0.0 0.1   4536 1436 pts/1    S+   14:40   0:00 -bash
root     27593 0.0 0.1   4920 1244 pts/3    S    14:44   0:00 su - postgres
postgres 27594 0.0 0.1   4540 1468 pts/3    S    14:44   0:00 -bash
postgres 27725 4.1 0.8 283372 7236 pts/3    S    14:47   0:00 /usr/bin/postgres
postgres 27726 0.0 0.0 13952   696 ?        Ss   14:47   0:00 postgres: logger process
postgres 27727 2.0 0.5 283640 4828 ?        Ds   14:47   0:00 postgres: startup process   recovering 000000010000001A00000074
postgres 27730 0.0 0.1   4260   940 pts/3    R+   14:48   0:00 ps aux
postgres 27731 0.0 0.0   3920   696 pts/3    S+   14:48   0:00 grep postgres

---------------------------------------------------------------------------------------------------------------

$ pg_ctl status
pg_ctl: server is running (PID: 27725)
/usr/bin/postgres

---------------------------------------------------------------------------------------------------------------

$ pg_controldata
pg_control version number:            843
Catalog version number:               200904091
Database system identifier:           5409925551577885669
Database cluster state:               in crash recovery
pg_control last modified:             Wed 23 Jun 2010 11:20:42 AM CLT
Latest checkpoint location:           1A/73839208
Prior checkpoint location:            1A/73839208
Latest checkpoint's REDO location:    1A/7382D978
Latest checkpoint's TimeLineID:       1
Latest checkpoint's NextXID:          0/47874035
Latest checkpoint's NextOID:          51407
Latest checkpoint's NextMultiXactId: 1647347
Latest checkpoint's NextMultiOffset: 3431468
Time of latest checkpoint:            Sun 30 May 2010 11:10:30 AM CLT
Minimum recovery ending location:     0/0
Maximum data alignment:               4
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        2000
Date/time type storage:               64-bit integers
Float4 argument passing:              by value
Float8 argument passing:              by reference

---------------------------------------------------------------------------------------------------------------

The last day with transacctin, the log finish at follow:

$ tail PG_DATA/pg_log/postgresql-Sun.log

ERROR: null value in column "protocolo" violates not-null constraint
STATEMENT: INSERT INTO trx_5(nodo, puerto, dnis, ani, protocolo, time, status, service, fecha, trama) VALUES (2, 1, 900, 0, (SELECT pk FROM protocolos WHERE data = 78), 247, (SELECT pk FROM status WHERE status = 55), (SELECT pk FROM services WHERE service = 0),now(), (SELECT pk FROM tramas WHERE equipo = 2 AND puerto = 1))
LOG: could not write temporary statistics file "pg_stat_tmp/pgstat.tmp": No space left on device
LOG: could not write temporary statistics file "pg_stat_tmp/pgstat.tmp": No space left on device
LOG: could not write temporary statistics file "pg_stat_tmp/pgstat.tmp": No space left on device
LOG: could not write temporary statistics file "pg_stat_tmp/pgstat.tmp": No space left on device
LOG: could not write temporary statistics file "pg_stat_tmp/pgstat.tmp": No space left on device
LOG: could not write temporary statistics file "pg_stat_tmp/pgstat.tmp": No space left on device
LOG: could not write temporary statistics file "pg_stat_tmp/pgstat.tmp": No space left on device
LOG: could not write temporary statistics file "pg_stat_tmp/pgstat.tmp": No space left on device
LOG: could not write temporary statistics file "pg_st-bash-3.2$

---------------------------------------------------------------------------------------------------------------

$ psql
psql: FATAL: the database system is starting up

---------------------------------------------------------------------------------------------------------------

In the server i don't have lsof, but this is the information that i get:

With vmstat, the %wa is over 50%. %CPU very slow, postgres don't use over 2%.

In the /proc/27725/ ( pid of postgres 27727 2.0 0.5 283640 4828 ? Ds 14:47 0:00 postgres: startup process recovering 000000010000001A00000074) the file io have a intensive rate of write: aprox 1GB/20 minutes.

Any suggest??

--
Sebastian Machuca
Estudiante Ingeniería Civil en Computación
+56 9 77449117
http://twitter.com/serroba

Re: Fwd: Stalled post to pgsql-es-ayuda

From

Tom Lane

Date:

24 June 2010, 01:37:25

Sebastian Machuca <serroba@gmail.com> writes:
> The directory which contain the PGDATA/data is full, and i move the entire
> directory data to another partition, and i link symbolic to PGDATA.

> Now, when i try to start the server, keep on  "startup process   recovering
> "
> When i try to connect, this is the message:
> psql: FATAL:  the database system is starting up
> And never go on.

I think you just need to wait.  The thing that jumps out at me from your
details is

> Time of latest checkpoint:            Sun 30 May 2010 11:10:30 AM CLT

If it hasn't checkpointed since May, then either you've got some
seriously silly checkpoint settings or there was something blocking
checkpoints.  I'm guessing that it was the latter, and the reason you
ran out of disk space was that WAL files kept on accumulating (since
they can't be recycled until a checkpoint finishes).  So my theory is
that you have got a *whole lot* of WAL to churn through, and there is
nothing to do but wait for that to happen.  You could get an idea of how
long it will take by noting how fast the startup process is advancing
through WAL segments (watch its display in "ps") and looking to see what
the last WAL segment in $PGDATA/pg_xlog is.

Once you have a working database again, you should look into why
checkpoints were blocked.  Is there anything unusual in the postmaster
log from before you had the out-of-space condition?

            regards, tom lane