Re: Thoughts on how to avoid a massive integer update. - Mailing list pgsql-general
| From | Fehrle, Brian |
|---|---|
| Subject | Re: Thoughts on how to avoid a massive integer update. |
| Date | |
| Msg-id | 0F5992D4-9317-4007-9036-202DC89BA0D8@comscore.com Whole thread Raw |
| In response to | Re: Thoughts on how to avoid a massive integer update. (Adrian Klaver <adrian.klaver@aklaver.com>) |
| Responses |
Re: Thoughts on how to avoid a massive integer update.
|
| List | pgsql-general |
On 5/4/20, 3:56 PM, "Adrian Klaver" <adrian.klaver@aklaver.com> wrote:
[External Email]
On 5/4/20 2:32 PM, Fehrle, Brian wrote:
> Hi all,
>
> This is a shot in the dark in hopes to find a magic bullet to fix an
> issue I have, I can’t personally think of any solution myself.
>
> I have a database with hundreds of terabytes of data, where every table
> has an integer column referencing a small table. For reasons out of my
> control and cannot change, I NEED to update every single row in all
> these tables, changing the integer value to a different integer.
>
> Since I have to deal with dead space, I can only do a couple tables at a
> time, then do a vacuum full after each one.
Why?
A regular vacuum would mark the space as available.
A regular vacuum would mark the space as available – ***for re-use***, it will not release the space back to the drive
as‘unused’. 99% of my tables are old data that will not receive any future inserts or updates, which means that space
thatis marked ready for ‘reuse’ will not ever be used.
This means that a 100GB table will be updated, and every row marked as dead and re-created with the newly updated data.
Thistable is now 200GB in size. A vacuum will keep it at 200GB of space used, freeing up the 100GB of dead space as
ready-to-reuse.A vacuum full will make it 100GB again.
Since 99% of my tables will never be updated or inserted into again, this means my ~300 Terabytes of data would be ~600
Terabytesof data on disk. Thus, vacuum full.
More below.
> Another option is to build a new table with the new values, then drop
> the old one and swap in the new, either way is very time consuming.
>
> Initial tests suggest this effort will take several months to complete,
> not to mention cause blocking issues on tables being worked on.
>
> Does anyone have any hackery ideas on how to achieve this in less time?
> I was looking at possibly converting the integer column type to another
> that would present the integer differently, like a hex value, but
> everything still ends up requiring all data to be re-written to disk. In
> a well designed database (I didn’t design it :) ), I would simply change
> the data in the referenced table (200 total rows), however the key being
> referenced isn’t just an arbitrary ID, it’s actual ‘data’, and must be
> changed.
I'm not following above.
Could you show an example table relationship?
It’s a simple one-to-many relationship:
*Info_table*
info_table_sid integer
*data_table*
data_table_sid integer,
info_table_id integer references info_table(info_table_sid),
>
> Thanks for any thoughts or ideas,
>
> * Brian F
>
--
Adrian Klaver
adrian.klaver@aklaver.com
pgsql-general by date: