Home > mailing lists

Add checksums without --initdb - Mailing list pgsql-hackers

From	David Christensen
Subject	Add checksums without --initdb
Date	July 2, 2015 19:39:19
Msg-id	7A00D9D1-535A-4C37-94C7-02296AAF063F@endpoint.com Whole thread Raw
Responses	Re: Add checksums without --initdb
List	pgsql-hackers

Tree view

So on #postgresql, I was musing about methods of getting checksums enabled/disabled without requiring a separate initdb
stepand minimizing the downtime required to get such functionality enabled. 

What about adapting pg_basebackup to add the following options:

-k|--checksums - build the replica with checksums enabled.
-K|—no-checksums - build the replica with checksums disabled.

The way this would work would be to have pg_basebackup's ReceiveAndUnpackTarFile() calculate and/or remove the
checksumsfrom each heap page as it is streamed and update the pg_control file to reflect the new checksums setting.
Afterthis checksum-enabled replica is created, then it could stream/process WAL and get caught up, then the user fails
overto their brand-spanking-new checksum-enabled database.  Obviously this would be a bit slower to calculate each
page’schecksum than it would be just to write the data out from the tar stream, but it seems to me like this is a
singlepoint where the whole database would need to be processed page-by-page as it is. 

Possible concerns here are whether checksums are included in WAL full_page_writes or if they are independently
calculated;if the latter I think we’d be fine.  If checksums are all handled at the layer below WAL than any
streamed/processedchanges should be fine to get us to the point where we could come up as a master. 

We’d also need to be careful to add checksums to only heap files, but that would be able to be handled via the filename
prefixes(base|global) (I’m not sure if the relation forks are in standard Page format, but if not we could exclude
thoseas well).  Obviously this bakes quite a bit of cluster structural awareness into pg_basebackup and may tie it more
stronglyto a specific major version, but it seems to me like the tradeoffs would be worth it if you wanted to have that
optionand the code paths could exist to keep the existing behavior if so. 

Andres suggested a separate tool that would basically rewrite the existing data directory heap files in place, which I
canalso see a use case for, but I also think there’s some benefit to be found in having it happen while the replica is
beingstreamed/built. 

Ideas/thoughts/reasons this wouldn’t work?

David
--
David Christensen
PostgreSQL Team Manager
End Point Corporation
david@endpoint.com
785-727-1171

pgsql-hackers by date:

From: Peter Geoghegan
Date: 02 July 2015, 19:36:16
Subject: Re: Improve testing notes?

From: Tom Lane
Date: 02 July 2015, 19:44:17
Subject: Re: Faster setup_param_list() in plpgsql

Add checksums without --initdb - Mailing list pgsql-hackers

Previous

Next