Valgrind Memcheck support - Mailing list pgsql-hackers
| From | Noah Misch | 
|---|---|
| Subject | Valgrind Memcheck support | 
| Date | |
| Msg-id | 20130609212559.GB491289@tornado.leadboat.com Whole thread Raw | 
| Responses | Re: Valgrind Memcheck support Re: Valgrind Memcheck support Re: Valgrind Memcheck support | 
| List | pgsql-hackers | 
Valgrind's Memcheck tool[1] is handy for finding bugs, but our use of a custom
allocator limits its ability to detect problems in unmodified PostgreSQL.
During the 9.1 beta cycle, I found some bugs[2] with a rough patch adding
instrumentation to aset.c and mcxt.c such that Memcheck understood our
allocator.  I've passed that patch around to a few people over time, and I've
now removed the roughness such that it's ready for upstream.  In hopes of
making things clearer in the commit history, I've split out a preliminary
refactoring patch from the main patch and attached each separately.
Besides the aset.c/mcxt.c instrumentation, this patch adds explicit checks for
undefined memory to PageAddItem() and printtup(); this has caught C-language
functions that fabricate a Datum without initializing all bits.  The inclusion
of all this is controlled by a pg_config_manual.h setting.  The patch also
adds a "suppression file" that directs Valgrind to silences nominal errors we
don't plan to fix.
To smoke-test the instrumentation, I used "make installcheck" runs on x86_64
GNU/Linux and ppc64 GNU/Linux.  This turned up various new and newly-detected
memory bugs, which I will discuss in a separate thread.  With those fixed,
"make installcheck" has a clean report (in my one particular configuration).
I designed the suppression file to work across platforms; I specifically
anticipated eventual use on x86_64 Darwin and x86_64 FreeBSD.  Valgrind 3.8.1
quickly crashed when running PostgreSQL on Darwin; I did not dig further.
Since aset.c and mcxt.c contain the hottest code paths in the backend, I
verified that a !USE_VALGRIND, non-assert build produces the same code before
and after the patch.  Testing that revealed the need to move up the
AllocSizeIsValid() check in repalloc(), though I don't understand why GCC
reacted that way.
Peter Geoghegan and Korry Douglas provided valuable feedback on earlier
versions of this code.
Possible future directions:
- Test "make installcheck-world".  When I last did this in past years, contrib did
  trigger some errors.
- Test recovery, such as by running a streaming replica under Memcheck while
  the primary runs "make installcheck-world".
- Test newer compilers and higher optimization levels.  I used GCC 4.2 at -O1.
  A brief look at -O2 results showed a new error that I have not studied.  GCC
  4.8 at -O3 might show still more due to increasingly-aggressive assumptions.
- A buildfarm member running its installcheck steps this way.
- Memcheck has support for detecting leaks.  I have not explored that side at
  all, always passing --leak-check=no.  We could add support for freeing
  "everything" at process exit, thereby making the leak detection meaningful.
Brief notes for folks reproducing my approach: I typically start the
Memcheck-hosted postgres like this:
  valgrind --leak-check=no --gen-suppressions=all \
    --suppressions=src/tools/valgrind.supp --time-stamp=yes \
    --log-file=$HOME/pg-valgrind/%p.log postgres
If that detected an error on which I desired more detail, I would rerun a
smaller test case with "--track-origins=yes --read-var-info=yes".  That slows
things noticeably but gives more-specific messaging.  When even that left the
situation unclear, I would temporarily hack allocChunkLimit so every palloc()
turned into a malloc().
I strongly advise installing the latest-available Valgrind, particularly
because recent releases suffer far less of a performance drop processing the
instrumentation added by this patch.  A "make installcheck" run takes 273
minutes under Vaglrind 3.6.0 but just 27 minutes under Valgrind 3.8.1.
Thanks,
nm
[1] http://valgrind.org/docs/manual/mc-manual.html
[2] http://www.postgresql.org/message-id/20110312133224.GA7833@tornado.gateway.2wire.net
--
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com
		
	Attachment
pgsql-hackers by date: