Thread: offline consistency check and info on attributes
Hi, a recent discussion about possible data corruption (in the general list) suggests that a tool for offline integrity check might be useful. Something that might be run when the database is shut down, to check that the data files are OK. The idea is to read all heap blocks and check for various 'suspicious' things like 'pd_lower > pd_upper', invalid lengths (negative lengths, resulting in in alloc errors) etc. It might list blocks that are somehow corrupted, dump them in a separate file etc. Right now I do have a very simple tool that reads a given file and performs a lot of checks at the block level (as described in bufpage.h), and the next step should be validating basic structure of the tuples (lengths). And that's the point where I'm stuck right now - I'm thinking what might be the most elegant way to get info about attributes, without access to the pg_attribute catalog (the tool is intended for offline checks). I've figured out the catalog-to-file mapping (in relmapper.c), but now I'm wondering - it's just another relation, so I'd have to read the block, parse the items and interpret them (not sure how to do that without the pg_attribute data itself). So I wonder - what would be an elegant solution? regards Tomas
Excerpts from Tomas Vondra's message of dom abr 24 13:49:31 -0300 2011: > Right now I do have a very simple tool that reads a given file and > performs a lot of checks at the block level (as described in bufpage.h), > and the next step should be validating basic structure of the tuples > (lengths). And that's the point where I'm stuck right now - I'm thinking > what might be the most elegant way to get info about attributes, without > access to the pg_attribute catalog (the tool is intended for offline > checks). Each tuple declares its length. You don't need to know each attribute's length to check that. Doing attribute-level checks is probably pointless without catalog access. > I've figured out the catalog-to-file mapping (in relmapper.c), but now > I'm wondering - it's just another relation, so I'd have to read the > block, parse the items and interpret them (not sure how to do that > without the pg_attribute data itself). So I wonder - what would be an > elegant solution? This reminds me -- we need to have pg_filedump be able to dump the relmapper stuff. I was going to write a patch for it but then I forgot. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Dne 25.4.2011 18:16, Alvaro Herrera napsal(a): > Excerpts from Tomas Vondra's message of dom abr 24 13:49:31 -0300 2011: > >> Right now I do have a very simple tool that reads a given file and >> performs a lot of checks at the block level (as described in bufpage.h), >> and the next step should be validating basic structure of the tuples >> (lengths). And that's the point where I'm stuck right now - I'm thinking >> what might be the most elegant way to get info about attributes, without >> access to the pg_attribute catalog (the tool is intended for offline >> checks). > > Each tuple declares its length. You don't need to know each attribute's > length to check that. Doing attribute-level checks is probably > pointless without catalog access. Yes, I know the tuple length is in HeapTupleHeader (and I'm already checking that), but that does not allow to check lengths of the individual columns, especially those with varlena types. That's a very annoying type of corruption, because the queries that do not touch such columns seem to work fine, but once you attempt to access the corrupted column you'll get something like this: pg_dump: SQL command failed pg_dump: Error message from server: ERROR: invalid memory alloc request size 4294967293 So the ability to check where a the column lengths do not make sense (in this case it's a negative value) would be a nice thing. But without the access to pg_attribute this seems to be very difficult. Hmmm, maybe the idea to build it as an offline tool (to use it when the DB is not running) is not a good idea ... Tomas
Excerpts from Tomas Vondra's message of lun abr 25 14:50:18 -0300 2011: > Yes, I know the tuple length is in HeapTupleHeader (and I'm already > checking that), but that does not allow to check lengths of the > individual columns, especially those with varlena types. > > That's a very annoying type of corruption, because the queries that do > not touch such columns seem to work fine, but once you attempt to access > the corrupted column you'll get something like this: > > pg_dump: SQL command failed > pg_dump: Error message from server: ERROR: invalid memory alloc > request size 4294967293 Yeah, I agree with this being less than ideal. However, as you conclude, I don't think it's really workable to check this without support from the running system. I wrote a dumb tool to attempt to detoast all varlena columns, capture exceptions and report them; see the code here: http://alvherre.livejournal.com/4404.html (You need to pass it a table name as a text parameter; that bit is crap, as it fails for funny names). Note that this assumes that there is a function length() for every varlena datatype in the table, which may not be true for some of them. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Dne 25.4.2011 18:16, Alvaro Herrera napsal(a): > Excerpts from Tomas Vondra's message of dom abr 24 13:49:31 -0300 2011: > >> I've figured out the catalog-to-file mapping (in relmapper.c), but now >> I'm wondering - it's just another relation, so I'd have to read the >> block, parse the items and interpret them (not sure how to do that >> without the pg_attribute data itself). So I wonder - what would be an >> elegant solution? > > This reminds me -- we need to have pg_filedump be able to dump the > relmapper stuff. I was going to write a patch for it but then I forgot. Was this a polite question whether I volunteer to write that patch? ;-) I've never used pg_filedump and I'm not quite sure what exactly is needed, but it seem simple enough to do it. OK, I know the iceberg that sank Titanic seemed small too ... I think I'll move the integrity check to the db, so that it's possible to check the column lengths etc. (pageinspect seems like a good module to mutilate in this direction) but I still believe it'd be useful to have an offline tool for basic checks. Would pg_filedump be a resonable tool to do that? regards Tomas
Excerpts from Tomas Vondra's message of mar abr 26 17:39:19 -0300 2011: > Dne 25.4.2011 18:16, Alvaro Herrera napsal(a): > > This reminds me -- we need to have pg_filedump be able to dump the > > relmapper stuff. I was going to write a patch for it but then I forgot. > > Was this a polite question whether I volunteer to write that patch? ;-) > > I've never used pg_filedump and I'm not quite sure what exactly is > needed, but it seem simple enough to do it. OK, I know the iceberg that > sank Titanic seemed small too ... Err, no, sorry if you thought it was. If you still want to volunteer I'm sure it'd be more than welcome. > I think I'll move the integrity check to the db, so that it's possible > to check the column lengths etc. (pageinspect seems like a good module > to mutilate in this direction) but I still believe it'd be useful to > have an offline tool for basic checks. Would pg_filedump be a resonable > tool to do that? No, I don't think pg_filedump is a good host for such checks. If we're going to have a tool to do that it'd be better to be able to include it in core (or at least contrib), and we can't have pg_filedump in there for licensing reasons. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes: > Excerpts from Tomas Vondra's message of mar abr 26 17:39:19 -0300 2011: >> Dne 25.4.2011 18:16, Alvaro Herrera napsal(a): >> I think I'll move the integrity check to the db, so that it's possible >> to check the column lengths etc. (pageinspect seems like a good module >> to mutilate in this direction) but I still believe it'd be useful to >> have an offline tool for basic checks. Would pg_filedump be a resonable >> tool to do that? > No, I don't think pg_filedump is a good host for such checks. If we're > going to have a tool to do that it'd be better to be able to include it in > core (or at least contrib), and we can't have pg_filedump in there for > licensing reasons. Quite aside from licensing reasons, pg_filedump is only meant to print out a very low-level representation of file contents; it has little real understanding of what it's printing. So I think it's a bad basis for a verification utility on technical grounds too. regards, tom lane