Re: In-place upgrade: catalog side - Mailing list pgsql-hackers
From | Greg Smith |
---|---|
Subject | Re: In-place upgrade: catalog side |
Date | |
Msg-id | Pine.GSO.4.64.0812040401260.27355@westnet.com Whole thread Raw |
In response to | Re: In-place upgrade: catalog side (Zdenek Kotala <Zdenek.Kotala@Sun.COM>) |
Responses |
Re: In-place upgrade: catalog side
|
List | pgsql-hackers |
On Thu, 4 Dec 2008, Zdenek Kotala wrote: > The TOAST problem is already addressed and script should handle it correctly. > But I don't like it much, because it is kind of magic. I just read through the whole "toast by chunk-end" thread again and it does seem pretty complicated. What is the magic part you're still not happy with? > I'll send this version when I finish tests. You really should feel free to forward these things over as soon as you've got something working, even if you're still running your own tests. With all due respect to how much you've done here, the sooner we can get more people working with and on something closer to candidate code the better. I've have started a couple of days ago but couldn't find anything but the old script. If some parts have comments like "this is an awful check for dropped columns that probably doesn't even work yet", that's OK. We need to get other people helping out with this besides you. > Problem is that dropped columns are only mark as a deleted and data are > still stored in tuples. Catalog contains related information about > position and length, but when you perform dump and restore, this > information is lost and columns are shifted ... Here's a good example; that seems a perfect problem for somebody else to work on. I understand it now well enough to float ideas without even needing to see your code. Stop worring about it, I'll grab responsibility for making sure it gets done by someone. So, for everyone else who isn't Zdenek: when columns are dropped, pg_attribute.attisdropped turns true and atttypid goes to 0. pg_dump skips over them, and even if it didn't pg_restore doesn't know how to put them back. I can think of a couple of hacks to work around this, and one of them might even work: 1) Create a dummy type that exists only to flag these during conversion. Re-add all the deleted columns by turning off attisdropped and flag them with that type. Dump. Restore. Re-delete the columns. My first pass through poking holes in this idea wonders how the dump will go crazy if it finds rows that were created after the column was dropped, that therefore have no value for it. 2) Query the database to find all these deleted columns and store the information we need about them, save that into some text files (similary to how relids are handled by the script right now). After the schema restore, read that list in, iterating over the missing ones. For each column that was gone, increment attnum for everything above that position to renumber a place for it. Put a dummy column entry back in that's already marked as deleted. 3) Wander back into pre-upgrade land by putting together something that wanders through every table updating any row that contains data for a dropped column. Since dropping columns isn't really common in giant data warehouses, something that had to wander over all the tuples related to a table that has lost a column should only need to consider a pretty small subset of the database. You might even make it off-line without getting too many yelps from the giant DW crowd, seems like it would be easy to write something to estimate the amount of work needed in advance of doing it even (before you take the system down, run a check utility that says "The server currently has 65213 rows of data for tables with deleted columns"). Who wants to show off how much more they know about this than me by saying what's right or wrong with these various ideas? If we care about the fact that columns never go away and are using (1) or (2), could also consider adding some additional meta-data to 8.4 such that something like vacuum can flag when a column no longer exists in any part of the data. All deleted columns move from 8.3 to 8.4, but one day the 8.5 upgrade could finally blow them away. There's already plenty of per-table catalog data being proposed to push into 8.4 for making future upgrades easier, this seems like a possible candidate for something to make space for there. As I just came to appreciate the problem I'm not sure about that. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
pgsql-hackers by date: