Vacuum analyze bug CAUGHT - Mailing list pgsql-hackers
From | Michael Simms |
---|---|
Subject | Vacuum analyze bug CAUGHT |
Date | |
Msg-id | 199909090143.CAA11936@argh.demon.co.uk Whole thread Raw |
Responses |
Re: [HACKERS] Vacuum analyze bug CAUGHT
RE: [HACKERS] Vacuum analyze bug CAUGHT |
List | pgsql-hackers |
Hi Okee, I have caught the vacuum analyse crash that was giving me a load of grief Firstly, I create the following database $psql crashtest Welcome to the POSTGRESQL interactive sql monitor: Please read the file COPYRIGHT for copyright terms of POSTGRESQL [PostgreSQL 6.5.1 on i586-pc-linux-gnu, compiled by gcc -ggdb ] type \? for help on slash commands type \q to quit type \g or terminate with semicolon to execute queryYou are currentlyconnected to the database: crashtest crashtest=> select version(); version ------------------------------------------------------------- PostgreSQL 6.5.1 on i586-pc-linux-gnu, compiled by gcc -ggdb (1 row) crashtest=> \d Database = crashtest+------------------+----------------------------------+----------+| Owner | Relation | Type |+------------------+----------------------------------+----------+| test |testtable | table |+------------------+----------------------------------+----------+ crashtest=> select distinct * from testtable; vara|varb ----+---- abc |def (1 row) crashtest=> select count(*) from testtable;count ------ 319800 (1 row) ----------------------------------- Now, I then create a text file with 780 (just a random large number) new lines that are exactly the same as the ones already in the database, so they can be added with the copy command. (the 319,800 rows I have in my database were actually created using copy of this file, but I think that that is irrelavent). I then run the followiung script (which you will note just runs forever: ------------------------------------------------------ #!/bin/sh - ( while [ "1" = "1" ] do echo "create temp table temptest ( vara text, varb text );" echo "copy temptest from '/tmp/copyeffort';" echo "insert into testtable select * from temptest;" echo "drop table temptest;" sleep 1 done ) | psql crashtest ----------------------------------------------------- I attach gdb to the backend that is handling this task I then start a psql session to the same database Then I attach another GDB to this backend too, so I have GDB on both Then, in psql, I run and get the following result: ----------- crashtest=> vacuum analyze; NOTICE: Rel pg_type: TID 4/3: InsertTransactionInProgress 129915 - can't shrink relation NOTICE: Rel pg_attribute: TID 23/5: InsertTransactionInProgress 129915 - can't shrink relation NOTICE: Rel pg_attribute: TID 23/6: InsertTransactionInProgress 129915 - can't shrink relation NOTICE: Rel pg_attribute: TID 23/7: InsertTransactionInProgress 129915 - can't shrink relation NOTICE: Rel pg_attribute: TID 23/8: InsertTransactionInProgress 129915 - can't shrink relation NOTICE: Rel pg_attribute: TID 23/9: InsertTransactionInProgress 129915 - can't shrink relation NOTICE: Rel pg_attribute: TID 23/10: InsertTransactionInProgress 129915 - can't shrink relation NOTICE: Rel pg_attribute: TID 23/11: InsertTransactionInProgress 129915 - can't shrink relation NOTICE: Rel pg_attribute: TID 23/12: InsertTransactionInProgress 129915 - can't shrink relation NOTICE: Rel pg_class: TID 3/22: InsertTransactionInProgress 129915 - can't shrink relation NOTICE: AbortTransaction and not in in-progress state ------------ At this point, the backend attatched to the script doing the creates inserts and drops has the following: Program received signal SIGUSR2, User defined signal 2. 0x4018db4d in __libc_fsync () (gdb) where #0 0x4018db4d in __libc_fsync () #1 0x80d5908 in pg_fsync (fd=3) at fd.c:202 #2 0x80d615a in FileSync (file=3) at fd.c:876 #3 0x80dc9ed in mdcommit () at md.c:796 #4 0x80dd1bc in smgrcommit () at smgr.c:375 #5 0x80d47f7 in FlushBufferPool (StableMainMemoryFlag=0) at bufmgr.c:1260 #6 0x8078755 in RecordTransactionCommit () at xact.c:636 #7 0x8078919 in CommitTransaction () at xact.c:940 #8 0x8078ac1 in CommitTransactionCommand () at xact.c:1177 #9 0x80df0cf in PostgresMain (argc=-1073742286, argv=0xbffff7a0, real_argc=7, real_argv=0xbffffd04) at postgres.c:1679 #10 0x80c91ba in DoBackend (port=0x81ea4a0) at postmaster.c:1628 #11 0x80c8cda in BackendStartup (port=0x81ea4a0) at postmaster.c:1373 #12 0x80c8429 in ServerLoop () at postmaster.c:823 #13 0x80c7f67 in PostmasterMain (argc=7, argv=0xbffffd04) at postmaster.c:616 #14 0x80a0986 in main (argc=7, argv=0xbffffd04) at main.c:97 #15 0x400fccb3 in __libc_start_main (main=0x80a0920 <main>, argc=7, argv=0xbffffd04, init=0x8061360 <_init>, fini=0x810d63c<_fini>, rtld_fini=0x4000a350 <_dl_fini>, stack_end=0xbffffcfc) at ../sysdeps/generic/libc-start.c:78 (gdb) Program received signal SIGUSR2, User defined signal 2. 0x4018db4d in __libc_fsync () (gdb) where #0 0x4018db4d in __libc_fsync () #1 0x80d5908 in pg_fsync (fd=3) at fd.c:202 #2 0x80d615a in FileSync (file=3) at fd.c:876 #3 0x80dc9ed in mdcommit () at md.c:796 #4 0x80dd1bc in smgrcommit () at smgr.c:375 #5 0x80d47f7 in FlushBufferPool (StableMainMemoryFlag=0) at bufmgr.c:1260 #6 0x8078755 in RecordTransactionCommit () at xact.c:636 #7 0x8078919 in CommitTransaction () at xact.c:940 #8 0x8078ac1 in CommitTransactionCommand () at xact.c:1177 #9 0x80df0cf in PostgresMain (argc=-1073742286, argv=0xbffff7a0, real_argc=7, real_argv=0xbffffd04) at postgres.c:1679 #10 0x80c91ba in DoBackend (port=0x81ea4a0) at postmaster.c:1628 #11 0x80c8cda in BackendStartup (port=0x81ea4a0) at postmaster.c:1373 #12 0x80c8429 in ServerLoop () at postmaster.c:823 #13 0x80c7f67 in PostmasterMain (argc=7, argv=0xbffffd04) at postmaster.c:616 #14 0x80a0986 in main (argc=7, argv=0xbffffd04) at main.c:97 #15 0x400fccb3 in __libc_start_main (main=0x80a0920 <main>, argc=7, argv=0xbffffd04, init=0x8061360 <_init>, fini=0x810d63c<_fini>, rtld_fini=0x4000a350 <_dl_fini>, stack_end=0xbffffcfc) at ../sysdeps/generic/libc-start.c:78 (gdb) About 3 - 5 seconds later, the gdb attached to the vacuuming psql gets the following - Bear in mind this is a DIFFERENT BACKEND Program received signal SIGSEGV, Segmentation fault. AllocSetReset (set=0x0) at aset.c:159 aset.c:159: No such file or directory. (gdb) where #0 AllocSetReset (set=0x0) at aset.c:159 #1 0x810ac12 in EndPortalAllocMode () at portalmem.c:938 #2 0x8078833 in AtAbort_Memory () at xact.c:800 #3 0x80789ff in AbortTransaction () at xact.c:1026 #4 0x8078aef in AbortCurrentTransaction () at xact.c:1243 #5 0x80deed6 in PostgresMain (argc=-1073742288, argv=0xbffff7a0, real_argc=7, real_argv=0xbffffd04) at postgres.c:1550 #6 0x80c91ba in DoBackend (port=0x81ea4a0) at postmaster.c:1628 #7 0x80c8cda in BackendStartup (port=0x81ea4a0) at postmaster.c:1373 #8 0x80c8429 in ServerLoop () at postmaster.c:823 #9 0x80c7f67 in PostmasterMain (argc=7, argv=0xbffffd04) at postmaster.c:616 #10 0x80a0986 in main (argc=7, argv=0xbffffd04) at main.c:97 #11 0x400fccb3 in __libc_start_main (main=0x80a0920 <main>, argc=7, argv=0xbffffd04, init=0x8061360 <_init>, fini=0x810d63c<_fini>, rtld_fini=0x4000a350 <_dl_fini>, stack_end=0xbffffcfc) at ../sysdeps/generic/libc-start.c:78 (gdb) ------------------------------------------------------------------------ This is REPRODUCABLE I have run this through 5 times and it only managed to vacuum two times. Now, I have the two GDBs still running, so if you need any values such as variable states, let me know and I will supply them. (I'll be out of the house till about 7.30pm UK time tomorrow, so it will have to be after then). Hopefully, this will be something can get fixed for 6.5.2 as it is a BIG problem for me, as it happens when I am building my postgresql search engine. Thanks M Simms
pgsql-hackers by date: