Re: [WIP] In-place upgrade - Mailing list pgsql-hackers
| From | Bruce Momjian | 
|---|---|
| Subject | Re: [WIP] In-place upgrade | 
| Date | |
| Msg-id | 200811061715.mA6HF9g00508@momjian.us Whole thread Raw | 
| In response to | Re: [WIP] In-place upgrade ("Robert Haas" <robertmhaas@gmail.com>) | 
| Responses | Re: [WIP] In-place upgrade | 
| List | pgsql-hackers | 
Robert Haas wrote: > > That's all fine and dandy, except that it presumes that you can perform > > SELECT/UPDATE/DELETE on V3 tuple versions; you can't just pretend that > > A-E aren't there until they get converted. Which is exactly the > > overhead we were looking to avoid. > > I don't understand this comment at all. Unless you have some sort of > magical wand in your back pocket that will instantaneously transform > the entire database, there is going to be a period of time when you > have to cope with both V3 and V4 pages. ISTM that what we should be > talking about here is: > > (1) How are we going to do that in a way that imposes near-zero > overhead once the entire database has been converted? > (2) How are we going to do that in a way that is minimally invasive to the code? > (3) Can we accomplish (1) and (2) while still retaining somewhat > reasonable performance for V3 pages? > > Zdenek's initial proposal did this by replacing all of the tuple > header macros with functions that were conditionalized on page > version. I think we agree that's not going to work. That doesn't > mean that there is no approach that can work, and we were discussing > possible ways to make it work upthread until the thread got hijacked > to discuss the right way of handling page expansion. Now that it > seems we agree that a transaction can be used to move tuples onto new > pages, I think we'd be well served to stop talking about page > expansion and get back to the original topic: where and how to insert > the hooks for V3 tuple handling. I think the above is a good summary. For me, the problem with any approach that has information about prior-version block formats in the main code path is code complexity, and secondarily performance. I know there is concern that converting all blocks on read-in might expand the page beyond 8k in size. One idea Heikki had was to require some tool must be run on minor releases before a major upgrade to guarantee there is enough free space to convert the block to the current format on read-in, which would localize the information about prior block formats. We could release the tool in minor branches around the time as a major release. Also consider that there are very few releases that expand the page size. For these reasons, the expand-the-page-beyond-8k problem should not be dictating what approach we take for upgrade-in-place because there are workarounds for the problem, and the problem is rare. I would like us to again focus on converting the pages to the current version format on read-in, and perhaps a tool to convert all old pages to the new format. FYI, we are also going to need the ability to convert all pages to the current format for multi-release upgrades. For example, if you did upgrade-in-place from 8.2 to 8.3, you are going to need to update all pages to the 8.3 format before doing upgrade-in-place to 8.4; perhaps vacuum can do something like this on a per-table basis, and we can record that status a pg_class column. Also, consider that when we did PITR, we required commands before and after the tar so that there was a consistent API for PITR, and later had to add capabilities to those functions, but the user API didn't change. I envision a similar system where we have utilities to guarantee all pages have enough free space, and all pages are the current version, before allowing an upgrade-in-place to the next version. Such a consistent API will make the job for users easier and our job simpler, and with upgrade-in-place, where we have limited time and resources to code this for each release, simplicity is important. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
pgsql-hackers by date: