amcheck prototype - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | amcheck prototype |
Date | |
Msg-id | CAM3SWZSRS2kS1Npdwn1V-BvSHBcNrzXUZD=pxXhL3Gqf7k-z5Q@mail.gmail.com Whole thread Raw |
Responses |
Re: amcheck prototype
Re: amcheck prototype |
List | pgsql-hackers |
Attached is a revision of what I previously called btreecheck, which is now renamed to amcheck. This is not 9.5 material - I already have 3 bigger patches in the queue, 2 of which are large and complex and have major controversies, and one of which has details that need to be worked out, which is currently consuming a lot of reviewer time. There seems to be little point in trying to get amcheck into shape for 9.5. The goals for this as a real patch need to be worked out in greater detail. At some point we'll need to have a discussion around both stress-testing (as a way of finding bugs) and allowing users to verify indexes on production systems when corruption is suspected. Since, as far as I know, no one else has so much as applied and compiled my ON CONFLICT UPDATE patch, it would be pretty senseless of me to add another patch to the queue. Reviewers are clearly more overburdened than ever. Anyway, this revision adds the ability to check invariants across pages (that a page's right-link comports with the target page's last item, since when targeting a particular page there is no locally available "next" item to check the last item against, other than the page highkey). This even occurs for the index check user callable SQL function that only acquire an AccessShareLock (bt_index_verify() and bt_page_verify()). As before, it also exhaustively tests certain other related invariants previously described [1], without really considering their plausibility as either bugs in the B-Tree code, or things likely to be violated in the event of organic data corruption. In other words, I could probably stand to be considerably more selective in what I'm testing, but in order to do that I'd need to make up my mind about my exact goals for this tool. amcheck is something that I thought might find bugs in approach #1 to value locking [2] (for the ON CONFLICT UPDATE patch). However, extensive stress testing while constantly using the tool has not revealed any bugs. That doesn't mean that they're not there, of course, and it doesn't really alter our understanding of approach #1, but it's worth mentioning. Anyway, this is presented here in the hope that it will be useful for testing other patches, and perhaps even in testing corruption on production systems (with appropriate precautions taken - this is still a prototype patch - but it's also still the only thing of its kind). I post this with the expectation that it won't make it into contrib until PostgreSQL 9.6, or whatever we end up calling it. It might be that someone has some feedback that allows me to build a better temporary prototype (certainly, some testing tools were maintained out of git for a while in the past, such as the precursor to isolation tester), but I don't expect even that. If no one wants to do anything with this patch in the foreseeable future (probably the current cycle), there may still be some value in dumping my progress here. As I said, I tend to think that its biggest problem right now is that it's just too scatter gun, but that's probably appropriate for an early iteration. In general, I think we could prevent a lot of bugs by performing targeted stress-testing with custom tools. Ideally, this tool would go on to provide a way of doing for several different areas of the code. [1] http://www.postgresql.org/message-id/CAM3SWZRtV+xmRWLWq6c-x7czvwavFdwFi4St1zz4dDgFH4yN4g@mail.gmail.com [2] https://wiki.postgresql.org/wiki/Value_locking#.231._Heavyweight_page_locking_.28Peter_Geoghegan.29 -- Peter Geoghegan
Attachment
pgsql-hackers by date: