Valgrind Memcheck support - Mailing list pgsql-hackers
From | Noah Misch |
---|---|
Subject | Valgrind Memcheck support |
Date | |
Msg-id | 20130609212559.GB491289@tornado.leadboat.com Whole thread Raw |
Responses |
Re: Valgrind Memcheck support
Re: Valgrind Memcheck support Re: Valgrind Memcheck support |
List | pgsql-hackers |
Valgrind's Memcheck tool[1] is handy for finding bugs, but our use of a custom allocator limits its ability to detect problems in unmodified PostgreSQL. During the 9.1 beta cycle, I found some bugs[2] with a rough patch adding instrumentation to aset.c and mcxt.c such that Memcheck understood our allocator. I've passed that patch around to a few people over time, and I've now removed the roughness such that it's ready for upstream. In hopes of making things clearer in the commit history, I've split out a preliminary refactoring patch from the main patch and attached each separately. Besides the aset.c/mcxt.c instrumentation, this patch adds explicit checks for undefined memory to PageAddItem() and printtup(); this has caught C-language functions that fabricate a Datum without initializing all bits. The inclusion of all this is controlled by a pg_config_manual.h setting. The patch also adds a "suppression file" that directs Valgrind to silences nominal errors we don't plan to fix. To smoke-test the instrumentation, I used "make installcheck" runs on x86_64 GNU/Linux and ppc64 GNU/Linux. This turned up various new and newly-detected memory bugs, which I will discuss in a separate thread. With those fixed, "make installcheck" has a clean report (in my one particular configuration). I designed the suppression file to work across platforms; I specifically anticipated eventual use on x86_64 Darwin and x86_64 FreeBSD. Valgrind 3.8.1 quickly crashed when running PostgreSQL on Darwin; I did not dig further. Since aset.c and mcxt.c contain the hottest code paths in the backend, I verified that a !USE_VALGRIND, non-assert build produces the same code before and after the patch. Testing that revealed the need to move up the AllocSizeIsValid() check in repalloc(), though I don't understand why GCC reacted that way. Peter Geoghegan and Korry Douglas provided valuable feedback on earlier versions of this code. Possible future directions: - Test "make installcheck-world". When I last did this in past years, contrib did trigger some errors. - Test recovery, such as by running a streaming replica under Memcheck while the primary runs "make installcheck-world". - Test newer compilers and higher optimization levels. I used GCC 4.2 at -O1. A brief look at -O2 results showed a new error that I have not studied. GCC 4.8 at -O3 might show still more due to increasingly-aggressive assumptions. - A buildfarm member running its installcheck steps this way. - Memcheck has support for detecting leaks. I have not explored that side at all, always passing --leak-check=no. We could add support for freeing "everything" at process exit, thereby making the leak detection meaningful. Brief notes for folks reproducing my approach: I typically start the Memcheck-hosted postgres like this: valgrind --leak-check=no --gen-suppressions=all \ --suppressions=src/tools/valgrind.supp --time-stamp=yes \ --log-file=$HOME/pg-valgrind/%p.log postgres If that detected an error on which I desired more detail, I would rerun a smaller test case with "--track-origins=yes --read-var-info=yes". That slows things noticeably but gives more-specific messaging. When even that left the situation unclear, I would temporarily hack allocChunkLimit so every palloc() turned into a malloc(). I strongly advise installing the latest-available Valgrind, particularly because recent releases suffer far less of a performance drop processing the instrumentation added by this patch. A "make installcheck" run takes 273 minutes under Vaglrind 3.6.0 but just 27 minutes under Valgrind 3.8.1. Thanks, nm [1] http://valgrind.org/docs/manual/mc-manual.html [2] http://www.postgresql.org/message-id/20110312133224.GA7833@tornado.gateway.2wire.net -- Noah Misch EnterpriseDB http://www.enterprisedb.com
Attachment
pgsql-hackers by date: