Thread: VACUUM ANALYZE problem on linux
Hello! A week ago I reported a problem with VACUUM ANALYZE on linux and memory error. Three good guys saw my database and two of them for VACUUM problem, I hope (Tom Lane and Thomas Lockhart). Have you reproduced the case? I ran VACUUM ANALYZE on table basis. Results: ----- Script ----- VACUUM ANALYZE sections; VACUUM ANALYZE subsections; VACUUM ANALYZE positions; VACUUM ANALYZE cities; VACUUM ANALYZE districts; VACUUM ANALYZE shop_types; VACUUM ANALYZE shops; VACUUM ANALYZE producers; VACUUM ANALYZE products; VACUUM ANALYZE correspondents; VACUUM ANALYZE shop_corr; VACUUM ANALYZE money4corr; VACUUM ANALYZE raw_maillog; VACUUM ANALYZE corr_mail_errors; VACUUM ANALYZE pos_rating; VACUUM ANALYZE motd; VACUUM ANALYZE central; VACUUM ANALYZE bad_data; VACUUM ANALYZE today_history; VACUUM ANALYZE currencies; VACUUM ANALYZE currency_exch; VACUUM ANALYZE param_int; VACUUM ANALYZE param_str; VACUUM ANALYZE param_float; VACUUM ANALYZE param_datetime; VACUUM ANALYZE palette; VACUUM ANALYZE units; VACUUM ANALYZE mail_ecod; ----- Log ----- VACUUM VACUUM VACUUM VACUUM VACUUM VACUUM VACUUM VACUUM ----- Errors ----- VACUUM ANALYZE sections; VACUUM ANALYZE subsections; VACUUM ANALYZE positions; VACUUM ANALYZE cities; VACUUM ANALYZE districts; VACUUM ANALYZE shop_types; VACUUM ANALYZE shops; VACUUM ANALYZE producers; VACUUM ANALYZE products; pqReadData() -- backend closed the channel unexpectedly.This probably means the backend terminated abnormally before or whileprocessing the request. We have lost the connection to the backend, so further processing is impossible. Terminating. ----- End ----- If I remove "products" from script, the next table to fail is "palette". If I remove "palette" - all goes well. Oleg. ---- Oleg Broytmann http://members.xoom.com/phd2/ phd2@earthling.net Programmers don't die, they justGOSUB without RETURN.
Oleg Broytmann <phd@sun.med.ru> writes: > Hello! > A week ago I reported a problem with VACUUM ANALYZE on linux and memory > error. Three good guys saw my database and two of them for VACUUM problem, > I hope (Tom Lane and Thomas Lockhart). > Have you reproduced the case? Oh! I'm sorry, I thought I saw a report that someone had already fixed the problem, so I didn't look at it. regards, tom lane
> > Oleg Broytmann <phd@sun.med.ru> writes: > > Hello! > > A week ago I reported a problem with VACUUM ANALYZE on linux and memory > > error. Three good guys saw my database and two of them for VACUUM problem, > > I hope (Tom Lane and Thomas Lockhart). > > Have you reproduced the case? > > Oh! I'm sorry, I thought I saw a report that someone had already fixed > the problem, so I didn't look at it. Maybe a little misunderstanding. Oleg reported a memory exhaustion problem on COPY FROM in the same context (which also happened on large updates). I've tracked that down in the executor. It was because his table had a CHECK clause and that got stringToNode()'ed for each single tuple. This problem is fixed in CURRENT along with a speedup of factor 2++ for the case of CHECK on large ranges. The check's are only once stringToNode()'ed now and live until the executor's memory context get's destroyed (the portal level the plan is executed in). I don't know if the same caused the VACUUM problem. Oleg, could you please check against the CURRENT source tree if the problem still exists? Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #======================================== jwieck@debis.com (Jan Wieck) #
> I reported a problem with VACUUM ANALYZE on linux and memory > error. Three good guys saw my database and two of them for VACUUM > problem, I hope (Tom Lane and Thomas Lockhart). > Have you reproduced the case? I understood someone else was starting to look at it, so I have not (yet). Can do so later if needed... - Tom
Hello! I'll try CURRENT a bit later (things gonna get real slow these days :). But I am sure these are two different problems. First, I had memory problem on a big table, and VACUUM ANALYZE problem on two very small tables (few lines). Second, I have memory problem on 3 systems - RedHat 5.1 on Pentium, Debian 2.0 on Pentium, and Solaris on Ultra-1. But I have VACUUM ANALYZE problem only on linucies. BTW, I noted bot linucies are glibc2-based. It would be interesting to try libc5-based system. May be we can narrow the problem down to glibc2-based linux? Have someone libc5-based linux ready to test? On Mon, 8 Feb 1999, Jan Wieck wrote: > > > > Oleg Broytmann <phd@sun.med.ru> writes: > > > Hello! > > > A week ago I reported a problem with VACUUM ANALYZE on linux and memory > > > error. Three good guys saw my database and two of them for VACUUM problem, > > > I hope (Tom Lane and Thomas Lockhart). > > > Have you reproduced the case? > > > > Oh! I'm sorry, I thought I saw a report that someone had already fixed > > the problem, so I didn't look at it. > > Maybe a little misunderstanding. Oleg reported a memory > exhaustion problem on COPY FROM in the same context (which > also happened on large updates). I've tracked that down in > the executor. It was because his table had a CHECK clause and > that got stringToNode()'ed for each single tuple. > > This problem is fixed in CURRENT along with a speedup of > factor 2++ for the case of CHECK on large ranges. The check's > are only once stringToNode()'ed now and live until the > executor's memory context get's destroyed (the portal level > the plan is executed in). > > I don't know if the same caused the VACUUM problem. Oleg, > could you please check against the CURRENT source tree if the > problem still exists? Oleg. ---- Oleg Broytmann National Research Surgery Centre http://sun.med.ru/~phd/ Programmers don't die, they justGOSUB without RETURN.
> Second, I have memory problem on 3 systems - RedHat 5.1 on Pentium, > Debian 2.0 on Pentium, and Solaris on Ultra-1. > But I have VACUUM ANALYZE problem only on linucies. > BTW, I noted bot linucies are glibc2-based. It would be interesting to > try libc5-based system. May be we can narrow the problem down to > glibc2-based linux? > Have someone libc5-based linux ready to test? I can test on libc5 if you still see trouble after you have verified Jan's fixes for your memory exhaution problem. Let me know... - Tom
Hello! On Tue, 9 Feb 1999, Thomas G. Lockhart wrote: > I can test on libc5 if you still see trouble after you have verified > Jan's fixes for your memory exhaution problem. I've downloaded latest snapshot (9 Feb) and reproduced the problem with VACUUM ANALYZE on Debian 2.0 (glibc2). Oleg. ---- Oleg Broytmann http://members.xoom.com/phd2/ phd2@earthling.net Programmers don't die, they justGOSUB without RETURN.
Oleg Broytmann <phd@sun.med.ru> writes: >> I can test on libc5 if you still see trouble after you have verified >> Jan's fixes for your memory exhaution problem. > I've downloaded latest snapshot (9 Feb) and reproduced the problem with > VACUUM ANALYZE on Debian 2.0 (glibc2). I am not able to reproduce the problem on HPUX, using either current sources or 6.4.2. Looks like it must be platform specific. Could you build the backend with -g and send a gdb backtrace from the corefile produced when the crash occurs? regards, tom lane
On Thu, 11 Feb 1999, Tom Lane wrote: > I am not able to reproduce the problem on HPUX, using either current > sources or 6.4.2. Looks like it must be platform specific. Of course it is platform-specific. I reported the problem on glibc2-based linucies, but the same database works fine (and allows VACUUM ANALYZE) on sparc-solaris. Don't know about libc5 linux - I have no one in hand. > Could you build the backend with -g and send a gdb backtrace from the > corefile produced when the crash occurs? I'll do it this Saturday. Oleg. ---- Oleg Broytmann National Research Surgery Centre http://sun.med.ru/~phd/ Programmers don't die, they justGOSUB without RETURN.
Hi! On Thu, 11 Feb 1999, Tom Lane wrote: > Could you build the backend with -g and send a gdb backtrace from the > corefile produced when the crash occurs? Problem compiling with -g: make[3]: Entering directory `/usr/local/src/PostgreSQL/postgresql-6.4.2/src/interfaces/ecpg/preproc' gcc -I../../../include -I../../../backend -g -Wall -Wmissing-prototypes -I../include -DMAJOR_VERSION=2 -DMINOR_VERSION=4 -DPATCHLEVEL=4 -DINCLUDE_PATH=\"/usr/local/stow/pgsql-debug/include\" -c y.tab.c -o y.tab.o preproc.y:2389: parse error at end of input preproc.y:20: warning: `struct_level' defined but not used preproc.y:22: warning: `QueryIsRule' defined but not used preproc.y:23: warning: `actual_type' defined but not used preproc.y:24: warning: `actual_storage' defined but not used preproc.y:219: warning: `remove_variables' defined but not used preproc.y:254: warning: `reset_variables' defined but not used preproc.y:263: warning: `add_variable' defined but not used preproc.y:332: warning: `make1_str' defined but not used preproc.y:341: warning: `make2_str' defined but not used preproc.y:353: warning: `cat2_str' defined but not used preproc.y:366: warning: `make3_str' defined but not used preproc.y:380: warning: `cat3_str' defined but not used preproc.y:396: warning: `make4_str' defined but not used preproc.y:412: warning: `cat4_str' defined but not used preproc.y:431: warning: `make5_str' defined but not used preproc.y:449: warning: `cat5_str' defined but not used preproc.y:471: warning: `make_name' defined but not used preproc.y:481: warning: `output_statement' defined but not used make[3]: *** [y.tab.o] Error 1 make[3]: Leaving directory `/usr/local/src/PostgreSQL/postgresql-6.4.2/src/interfaces/ecpg/preproc' Oleg. ---- Oleg Broytmann National Research Surgery Centre http://sun.med.ru/~phd/ Programmers don't die, they justGOSUB without RETURN.
> > Could you build the backend with -g and send a gdb backtrace from > > the corefile produced when the crash occurs? > Problem compiling with -g: I'd be suprised if "-g" would do that to you. Are you sure the input file is well-formed? - Tom
Hi! On Sat, 13 Feb 1999, Thomas G. Lockhart wrote: > > > Could you build the backend with -g and send a gdb backtrace from > > > the corefile produced when the crash occurs? > > Problem compiling with -g: > > I'd be suprised if "-g" would do that to you. Are you sure the input > file is well-formed? May be not enough memory or disk space. In the beginnig of next week I'll have new linux box, so I'll retry. Oleg. ---- Oleg Broytmann National Research Surgery Centre http://sun.med.ru/~phd/ Programmers don't die, they justGOSUB without RETURN.
Hello! Sorry for making this late :( On Thu, 11 Feb 1999, Tom Lane wrote: > Could you build the backend with -g and send a gdb backtrace from the > corefile produced when the crash occurs? I have compiled with -g, but postgres didn't produce core. Do I need something special on startup to generate core on crash? Oleg. ---- Oleg Broytmann http://members.xoom.com/phd2/ phd2@earthling.net Programmers don't die, they justGOSUB without RETURN.
Oleg Broytmann <phd@sun.med.ru> writes: > I have compiled with -g, but postgres didn't produce core. Do I need > something special on startup to generate core on crash? Ordinarily not, but perhaps you have a shell 'limit' setting in place that prevents a corefile from being made? I think csh has such a setting but I forget the details. Anyway, if postmaster is started from a shell with any limit variables enabled, they will apply to the backends too. Or you might just not be looking in the right place. Backend crashes produce corefiles in the database subdirectory, eg, /usr/local/pgsql/data/base/MyDatabase/core regards, tom lane
Hi! On Wed, 24 Feb 1999, Tom Lane wrote: > Oleg Broytmann <phd@sun.med.ru> writes: > > I have compiled with -g, but postgres didn't produce core. Do I need > > something special on startup to generate core on crash? > > Ordinarily not, but perhaps you have a shell 'limit' setting in place > that prevents a corefile from being made? I think csh has such a I am using bash all the time. > setting but I forget the details. Anyway, if postmaster is started from > a shell with any limit variables enabled, they will apply to the > backends too. Ok, I'll retest this. > Or you might just not be looking in the right place. Backend crashes > produce corefiles in the database subdirectory, eg, > /usr/local/pgsql/data/base/MyDatabase/core I search with find / -name core. I got /dev/core and /usr/src/linux/.../core :) Oleg. ---- Oleg Broytmann http://members.xoom.com/phd2/ phd2@earthling.net Programmers don't die, they justGOSUB without RETURN.
On Wed, 24 Feb 1999, Oleg Broytmann wrote: > > Or you might just not be looking in the right place. Backend crashes > > produce corefiles in the database subdirectory, eg, > > /usr/local/pgsql/data/base/MyDatabase/core > > I search with find / -name core. I got /dev/core and > /usr/src/linux/.../core :) Try this instead: # find / -name '*.core' and you should find the other core dumps. Vince. -- ========================================================================== Vince Vielhaber -- KA8CSH email: vev@michvhf.com flame-mail: /dev/null # include <std/disclaimers.h> TEAM-OS2 Online Campground Directory http://www.camping-usa.com Online Giftshop Superstore http://www.cloudninegifts.com ==========================================================================
Hi! I ran postmaster from command line (usually I run it from /etc/init.d/), connected to it and ran VACUUM ANALYZE. It worked. I don't know should I use :) or :( - it failed on production server and worked on debugging server... Oleg. ---- Oleg Broytmann http://members.xoom.com/phd2/ phd2@earthling.net Programmers don't die, they justGOSUB without RETURN.
Followup to myself... On Wed, 24 Feb 1999, Oleg Broytmann wrote: > I ran postmaster from command line (usually I run it from /etc/init.d/), > connected to it and ran VACUUM ANALYZE. > It worked. I tested the following way: 1. Run postmaster without parameters; connect and run VACUUM ANALYZE - worked. 2. Run postmaster -b -D/usr/local/pgsql/data -o -Fe and run VACUUM ANALYZE - worked 3. Run postmaster -b -D/usr/local/pgsql/data -o -Fe -S (to detach it) and run VACUUM ANALYZE - worked (I took these parameters from script /etc/init.d/postgres) 4. Run /etc/init.d/postgres start and run VACUUM ANALYZE - failed, no core file. Oleg. ---- Oleg Broytmann http://members.xoom.com/phd2/ phd2@earthling.net Programmers don't die, they justGOSUB without RETURN.
Oleg Broytmann <phd@sun.med.ru> writes: > 3. Run postmaster -b -D/usr/local/pgsql/data -o -Fe -S (to detach it) > and run VACUUM ANALYZE - worked > (I took these parameters from script /etc/init.d/postgres) > 4. Run /etc/init.d/postgres start > and run VACUUM ANALYZE - failed, no core file. So there is something different about the environment of your postmaster when it's started by init.d versus when it's started by hand. Now you just have to figure out what. I thought of environment variables, ulimit settings, ownership/permission settings ... but it's not clear why any of these would affect VACUUM in particular yet leave you able to do other stuff successfully. Puzzling. regards, tom lane
Hello! On Thu, 11 Feb 1999, Tom Lane wrote: > Could you build the backend with -g and send a gdb backtrace from the > corefile produced when the crash occurs? I have a problem getting core - postgres didn't produces core. Recently I got a suggestion (from Vadim) to attach gdb to a running process and debug it this way. Ok, done. To remind of the problem - I have a problem running VACUUM ANALYZE on a glibc2 linux (Debian 2.0). On solaris it is Ok (and I got a report it is Ok on HP-UX). Here is the traceback. The problem is in strcoll, don't understand why. Program received signal SIGSEGV, Segmentation fault. 0x40119587 in strcoll () (gdb) where #0 0x40119587 in strcoll () #1 0x816cadd in varstr_cmp (arg1=0x4020fccc " ", len1=0, arg2=0x8268604 " ", len2=0) at varlena.c:511 #2 0x816b31d in bpcharlt (arg1=0x4020fcc8 "\n", arg2=0x8268600 "\n") at varchar.c:504 #3 0x80a1bb7 in vc_attrstats (onerel=0x8264378, vacrelstats=0x8262a10, tuple=0x4020fc90) at vacuum.c:1630 #4 0x809ffec in vc_scanheap (vacrelstats=0x8262a10, onerel=0x8264378, vacuum_pages=0xbfffcf84, fraged_pages=0xbfffcf78)at vacuum.c:806 #5 0x809f773 in vc_vacone (relid=32600, analyze=1 '\001', va_cols=0x0) at vacuum.c:504 #6 0x809ed84 in vc_vacuum (VacRelP=0x0, analyze=1 '\001', va_cols=0x0) at vacuum.c:257 #7 0x809ec3e in vacuum (vacrel=0x0, verbose=0 '\000', analyze=1 '\001', va_spec=0x0) at vacuum.c:160 #8 0x8138c43 in ProcessUtility (parsetree=0x82464e0, dest=Remote) at utility.c:644 #9 0x8135388 in pg_exec_query_dest (query_string=0xbfffd0d4 "vacuum analyze;", dest=Remote, aclOverride=0 '\000') at postgres.c:758 #10 0x8135264 in pg_exec_query (query_string=0xbfffd0d4 "vacuum analyze;") at postgres.c:699 #11 0x813677e in PostgresMain (argc=6, argv=0xbffff15c, real_argc=8, real_argv=0xbffffafc) at postgres.c:1645 #12 0x8111561 in DoBackend (port=0x82110f0) at postmaster.c:1541 #13 0x8110f35 in BackendStartup (port=0x82110f0) at postmaster.c:1312 #14 0x811012f in ServerLoop () at postmaster.c:757 #15 0x810fb5e in PostmasterMain (argc=8, argv=0xbffffafc) at postmaster.c:563 #16 0x80c8c32 in main (argc=8, argv=0xbffffafc) at main.c:93 > > regards, tom lane > Oleg. ---- Oleg Broytmann http://members.xoom.com/phd2/ phd2@earthling.net Programmers don't die, they justGOSUB without RETURN.
Oleg Broytmann <phd@sun.med.ru> writes: > To remind of the problem - I have a problem running VACUUM ANALYZE on a > glibc2 linux (Debian 2.0). On solaris it is Ok (and I got a report it is Ok > on HP-UX). > Here is the traceback. The problem is in strcoll, don't understand why. > Program received signal SIGSEGV, Segmentation fault. > 0x40119587 in strcoll () > (gdb) where > #0 0x40119587 in strcoll () > #1 0x816cadd in varstr_cmp (arg1=0x4020fccc " ", len1=0, > arg2=0x8268604 " ", len2=0) at varlena.c:511 > #2 0x816b31d in bpcharlt (arg1=0x4020fcc8 "\n", arg2=0x8268600 "\n") > at varchar.c:504 Sure looks like strcoll is broken on your platform. Build a little test program and see if strcoll("", "") coredumps ... if the traceback is accurate, that's what was getting passed to it. BTW, why in the world is varstr_cmp written to duplicate the strings it's passed, rather than just handing them off to strcoll() as-is? regards, tom lane
Hi! On Mon, 15 Mar 1999, Tom Lane wrote: > Sure looks like strcoll is broken on your platform. Build a little test > program and see if strcoll("", "") coredumps ... if the traceback is > accurate, that's what was getting passed to it. Will test it... > BTW, why in the world is varstr_cmp written to duplicate the strings > it's passed, rather than just handing them off to strcoll() as-is? I got the code... No, I didn't "got" it - I found the code. Initially it was written by Oleg Bartunov, and I extended it a bit for all char types (initial code worked only with "text" type). > regards, tom lane > Oleg. ---- Oleg Broytmann http://members.xoom.com/phd2/ phd2@earthling.net Programmers don't die, they justGOSUB without RETURN.
> Oleg Broytmann <phd@sun.med.ru> writes: > > To remind of the problem - I have a problem running VACUUM ANALYZE on a > > glibc2 linux (Debian 2.0). On solaris it is Ok (and I got a report it is Ok > > on HP-UX). > > Here is the traceback. The problem is in strcoll, don't understand why. > > > Program received signal SIGSEGV, Segmentation fault. > > 0x40119587 in strcoll () > > (gdb) where > > #0 0x40119587 in strcoll () > > #1 0x816cadd in varstr_cmp (arg1=0x4020fccc " ", len1=0, > > arg2=0x8268604 " ", len2=0) at varlena.c:511 > > #2 0x816b31d in bpcharlt (arg1=0x4020fcc8 "\n", arg2=0x8268600 "\n") > > at varchar.c:504 > > Sure looks like strcoll is broken on your platform. Build a little test > program and see if strcoll("", "") coredumps ... if the traceback is > accurate, that's what was getting passed to it. > > BTW, why in the world is varstr_cmp written to duplicate the strings > it's passed, rather than just handing them off to strcoll() as-is? It appears he is unsure whether the string is null-terminated, and he is right in not assuming that. We have strncmp, but there is no strncoll(). -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <maillist@candle.pha.pa.us> writes: >> BTW, why in the world is varstr_cmp written to duplicate the strings >> it's passed, rather than just handing them off to strcoll() as-is? > It appears he is unsure whether the string is null-terminated, and he is > right in not assuming that. Oh, of course. Excuse momentary brain fade :-( regards, tom lane
On Mon, 15 Mar 1999, Tom Lane wrote: > Sure looks like strcoll is broken on your platform. Build a little test > program and see if strcoll("", "") coredumps ... if the traceback is > accurate, that's what was getting passed to it. #include <stdio.h> #include <string.h> int main() { printf("strcoll: %d\n", strcoll("", ""));return 0; } prints: "strcoll: 0". No core. > regards, tom lane > Oleg. ---- Oleg Broytmann http://members.xoom.com/phd2/ phd2@earthling.net Programmers don't die, they justGOSUB without RETURN.
Hello! > > To remind of the problem - I have a problem running VACUUM ANALYZE on a > > glibc2 linux (Debian 2.0). On solaris it is Ok (and I got a report it is Ok > > on HP-UX). I have upgradede Debian 2.0 to 2.1 and the problem mysteriously gone away! I am using the word "mysteriously" because: -- I have not upgraded kernel (yet) - I am still running 2.0.34 -- I have not upgraded glibc2 - both 2.0 and 2.1 are based upon libc-2.0.7 -- I have not upgraded nor recompiled postgres. Yes, this fix my problem, but what next? It seems suspicious to me, so I can expect other glibc2-related problems. BTW, I already reported yet another problem with glibc2 - bug with complex join (actually, not so complex - 4 tables). I overcame the error by rewriting the query into correlated subquery with EXISTS. I'll test if Debian upgrade "mysteriously" fix the problem too. Oleg. ---- Oleg Broytmann http://members.xoom.com/phd2/ phd2@earthling.net Programmers don't die, they justGOSUB without RETURN.
Hi! Followup to myself... On Sun, 21 Mar 1999, Oleg Broytmann wrote: > BTW, I already reported yet another problem with glibc2 - bug with > complex join (actually, not so complex - 4 tables). I overcame the error by > rewriting the query into correlated subquery with EXISTS. I'll test if > Debian upgrade "mysteriously" fix the problem too. No, the join still bugs (it return 0 rows, where rewrote query returns some number of rows, and these rows seems to me pretty good - either there is a bug in join or I rewrote the query in a wrong way, and got correct results :) Oleg. ---- Oleg Broytmann http://members.xoom.com/phd2/ phd2@earthling.net Programmers don't die, they justGOSUB without RETURN.