Re: Postgres "invalid page header" - Mailing list pgsql-general
From | Hubert Fröhlich |
---|---|
Subject | Re: Postgres "invalid page header" |
Date | |
Msg-id | 40FF949A.9050506@bvv.bayern.de Whole thread Raw |
In response to | Postgres "invalid page header" (Carl Anderson <carl.anderson@co.fulton.ga.us>) |
Responses |
Re: Postgres "invalid page header"
|
List | pgsql-general |
Hi list, hello Carl, Florian, maybe you remember my posting from 18.06.2004 and your reaction(-s) The problem is reproducing my error that does not happen very often, i.e. isolating the problem ... As Tom Lane put it: "I would very much like to see a test case that reliably produces "invalid page header" ..." Meanwhile, I was able to reproduce the error (and others) with some very basic scripts enclosed. If you are interested in the problem, feel free to use them (No liability ...) The basic doing is 1) prepCrashtest.sh sets up SQL files for creating & populating (via COPY) a database and doing a big update in 1 transaction. You can scale it by varying the numberOfRows parameter which sets the size of the only table (this may take a while because of random generation via openssl). There is only one table just with characters and integers and some indexes. No special datatypes, just very basic SQL. 2) multiCrashtest.sh starts working on several databases at the same time on each db: a) creating, populating, dumping, vcuuming the db and performing a "big" update operation (by singleCrashtest.sh), b) all that in several cycles (5 to 10 seems absolutely enough). You can scale this by varying the number of databases accessed simultaneously E.g. On a client (Athlon 1400+, 512 RAM, XFS, SuSE Linux 8.1 with 2.4.19 kernel), I regularly crash when using numberOfRows=300000 and ndatabases=3 I regularly get crashes like ERROR: could not access status of transaction 1497515332 ERROR: could not open segment 1 of relation "idx_vmpnfl_obnr" (... ERROR: invalid page header in block 25190 of relation "vmpnfl" PANIC: right sibling's left-link doesn't match ERROR: could not access status of transaction 188219392 (even PANIC!) Same holds for an even smaller notebook . On my big box (see below) i'm still testing with larger values. But as you can see, the error can be reproduced on a smaller scale with a smaller box. The postgresql.conf settings in all cases are somewhat standard, and I don't believe I can influence the behavior by using different settings. I am not too familiar with postgres internals, let alone the C coding, but I guess that there might be some buffer overflow?! As the hardware on testing was very different, Meanwhile, I dont believe this is a hardware issue; but this is just a "feeling", no expert's opinion. The crashes seem to happen only when *** several *** databases are accessed simultaneously on the same box - and only when there is heavy load. The three scripts are enclosed below in plaintext to avoid cutting of attachments (sorry for bad line formatting). The scripts may be VERY BASIC but I hope they can shed some light on the issue <snip> #!/bin/bash # prepCrashtest.sh # prepare a SQL file containing a COPY with $1 randomly generated rows # --------------------------------------------------------------------- # THIS=`basename $0` if [ $# != 1 ] then echo "usage: $THIS numberOfRows" exit 1 fi nrows=$1 make_copy_row() # ---------------------------------------------------------------------- # create a row for COPY command randomly { obnr=`openssl rand -base64 20 | cut -c1-16` pkn=`openssl rand -base64 25 | cut -c1-30` echo -e "$obnr\t0\t2147483647\t10\t$pkn\t\\N\t0" } make_copy_nrows() { # ---------------------------------------------------------------------------- # create a COPY command for a table with $nrows randomly generated rows echo "COPY vmpnfl (obnr, edat, udat, vma, pkn, dpp, orient) FROM stdin;" n=0 while [[ $n -lt $nrows ]] do make_copy_row n=`expr $n + 1` done echo "\." } echo "...create SQL CREATE TABLE file" echo "CREATE TABLE vmpnfl (obnr character(16), edat integer DEFAULT 0, udat integer DEFAULT 2147483647, vma smallint, pkn character(30), dpp smallint, orient double precision DEFAULT 0 );">crashSetup.sql echo "CREATE INDEX idx_vmpnfl_obnr ON vmpnfl USING btree (obnr);">>crashSetup.sql echo "CREATE INDEX idx_vmpnfl_udat ON vmpnfl USING btree (udat);">>crashSetup.sql echo "CREATE INDEX idx_vmpnfl_oid ON vmpnfl USING btree (oid);">>crashSetup.sql echo "...create SQL UPDATE file" echo "begin; update vmpnfl set dpp = 1234; commit" >crashUpdate.sql echo "...create SQL COPY file with $nrows rows ... this may take a little while" make_copy_nrows >crashTable.sql echo "Finished. Try now:" echo "multiCrashtest.sh ndatabases servername ncycles" <snip> #!/bin/bash # singleCrashtest.sh # testing a single database on heavy load # ---------------------------------------------------------------------- # setting up a database with one table and some indexex # populating the db # dumping the db somewhere # vacuuming the db # performing a big update operation: update of all columns in one row THIS=`basename $0` if [ $# != 3 ] then echo "usage: $THIS dbname servername ncycles" exit 1 fi database=$1 server=$2 ncycles=$3 dumpfile=dump$database.sql echo "=== test on database $database, server $server, $ncycles R/W cycles: "`date +%Y%m%d%H%M%S`" ===" cycle=0 while [[ $cycle -lt $ncycles ]] do cycle=`expr $cycle + 1` echo "=== cycle $cycle: "`date +%Y%m%d%H%M%S`" ===" dropdb -h $server $database 2>&1 createdb -h $server $database 2>&1 psql -h $server $database -f crashSetup.sql 2>&1| grep -v CREATE # reducing debug output psql -h $server $database -f crashTable.sql 2>&1 | grep -v INSERT # reducing debug output sleep 3 echo "--- cycle $cycle: Setup "`date +%Y%m%d%H%M%S`" ---" pg_dump -h $server $database >$dumpfile 2>&1 sleep 3 echo "--- cycle $cycle: Dump "`date +%Y%m%d%H%M%S`" ---" rm $dumpfile echo -n "Tuples: " psql $database -h $server -P tuples_only -c "select count(*) from vmpnfl" psql $database -h $server -c "vacuum verbose analyze" 2>&1 sleep 3 echo "--- cycle $cycle: Vacuum "`date +%Y%m%d%H%M%S`" ---" psql -h $server $database -f crashUpdate.sql 2>&1 sleep 3 echo "--- cycle $cycle: BIG Update "`date +%Y%m%d%H%M%S`" ---" done <snip> #!/bin/bash # multiCrashtest.sh # testing databases on heavy load: # Reading/writing into several databases simultaneously: # this can be looped several times # load can be increased # by increasing ndb = no. of databases being accessed simultaneously # or by increasing the number of rows in prepCrashtest.sh # ---------------------------------------------------------------------- # ndatabases # servername = name of database server # ncycles = THIS=`basename $0` if [ $# != 3 ] then echo "usage: $THIS ndatabases servername ncycles" exit 1 fi ndb=$1 server=$2 ncycles=$3 # ---------------------------------------------------------------------- echo "...loop over $ndb databases on server $servername: $ncycles R/W cycles per database" db=0 while [[ $db -lt $ndb ]] do # putting job to background singleCrashtest.sh crash${db} localhost $ncycles >log_crash${db} & echo " ... process for database crash${db} started" sleep 5 db=`expr $db + 1` done exit <snip> > > Hubert, > > I too have noticed such an issue > > Would you be willing to try to load a large table (from a SQL file). > It reliably demonstrates the behavior "invalid page header" > after completion (for me) > > It seems to me that while there are WAL logs being flushed, a certain > type of failed SQL insert will cause a page to be marked wrong. > > C. > >> Hi list, >> >> I am working with 7.4.1 under Linux (SuSE 8.1) The server is a HP >> ProLiant DL 380-G3, 2x Intel Pentium4-Xeon, 2.8 GHz, 4 GB memory and a >> RAID 5 system with ca. 500 GB diskspace (xfs file system) >> >> >> When doing big transactions or changes (UPDATE several million rows in >> one step) on a database with ca. 50 GB diskspace and ca 15 million >> entries, PostGIS and lots of indexes, I get errors like >> >> ERROR: invalid page header in block 582024 of relation ... >> >> The error does not occur regularly. It seems to me that the error is >> related with heavy load and heavy I/O . CPU and memory does not seem >> the problem, according to my hotsanic tools. >> >> I have worked through >> >> http://archives.postgresql.org/pgsql-general/2003-11/msg01288.php >> http://archives.postgresql.org/pgsql-admin/2003-09/msg00423.php >> http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&threadm=20030922162322.E12708%40quartz.newn.cam.ac.uk&rnum=8&prev=/groups%3Fq%3Dpg_filedump%26hl%3Den%26lr%3D%26ie%3DUTF-8%26selm%3D20030922162322.E12708%2540quartz.newn.cam.ac.uk%26rnum%3D8 >> >> >> >> which suggest to me that I have run into hardware trouble (disk? >> memory?!) But I supposed that my server was not SO bad for this >> database application... >> >> How to fix that? (I mean - not to repair the tables or pages but to >> get a stable one? ) Trying another sotrage hardware?! >> -- ------------------------------------------------------------------------------- Dr.-Ing. Hubert Fröhlich Bezirksfinanzdirektion München Alexandrastr. 3, D-80538 München, GERMANY Tel. :+49 (0)89 / 2190 - 2980 Fax :+49 (0)89 / 2190 - 2997 hubert dot froehlich at bvv dot bayern dot de
pgsql-general by date: