Thread: Does PostgreSQL check database integrity at startup?
Recently I had a problem with a base file with size 0 in a standby server.
This raised one question: does PostgreSQL (9.6.6) check base integrity at startup?
At least if there are 0 byte size files in base dir? Or CRC? Something?
Regards,
--
Edson Carlos Ericksson Richter SimKorp Ltda | |
Fone: | (51) 3366-7964 |
![]() | |
"A mente que se abre a uma nova ideia jamais voltará ao seu tamanho original" - Albert Einstein |
Attachment
El 26/12/17 a las 09:52, Edson Carlos Ericksson Richter escribió: > Recently I had a problem with a base file with size 0 in a standby server. > > This raised one question: does PostgreSQL (9.6.6) check base integrity > at startup? > > At least if there are 0 byte size files in base dir? Or CRC? Something? Yes it has CRC check, but only if you initialize the cluster with --data-checksums, and there's a price to pay in performance. -- Martín Marqués http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
2017-12-26 14:44 GMT+01:00 Martin Marques <martin.marques@2ndquadrant.com>:
El 26/12/17 a las 09:52, Edson Carlos Ericksson Richter escribió:
> Recently I had a problem with a base file with size 0 in a standby server.
>
> This raised one question: does PostgreSQL (9.6.6) check base integrity
> at startup?
>
> At least if there are 0 byte size files in base dir? Or CRC? Something?
Yes it has CRC check, but only if you initialize the cluster with
--data-checksums, and there's a price to pay in performance.
It has CRC check, but it is used in runtime - when data are necessary
So Postgres usually check nothing on start - few system tables and indexes
Regards
Pavel
--
Martín Marqués http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Em 26/12/2017 12:25, Pavel Stehule escreveu:
Any tips to make database server don't start if corrupt?
If I can change the startup script to make some checks before effectively starting the database, what would be the recommendations?
One that seems obvious to me are empty data files (something like "find -size 0 $PG_DATA/base")...
But I'm sure that more experienced PostgreSQL DBA would have more tests to check before startup.
Thanks,
Edson
2017-12-26 14:44 GMT+01:00 Martin Marques <martin.marques@2ndquadrant.com>:El 26/12/17 a las 09:52, Edson Carlos Ericksson Richter escribió:
> Recently I had a problem with a base file with size 0 in a standby server.
>
> This raised one question: does PostgreSQL (9.6.6) check base integrity
> at startup?
>
> At least if there are 0 byte size files in base dir? Or CRC? Something?
Yes it has CRC check, but only if you initialize the cluster with
--data-checksums, and there's a price to pay in performance.It has CRC check, but it is used in runtime - when data are necessarySo Postgres usually check nothing on start - few system tables and indexesRegardsPavel
--
Martín Marqués http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Any tips to make database server don't start if corrupt?
If I can change the startup script to make some checks before effectively starting the database, what would be the recommendations?
One that seems obvious to me are empty data files (something like "find -size 0 $PG_DATA/base")...
But I'm sure that more experienced PostgreSQL DBA would have more tests to check before startup.
Thanks,
Edson
Look into amcheck:
2017-12-26 16:37 GMT+01:00 Edson Carlos Ericksson Richter <richter@simkorp.com.br>:
Any tips to make database server don't start if corrupt?Em 26/12/2017 12:25, Pavel Stehule escreveu:2017-12-26 14:44 GMT+01:00 Martin Marques <martin.marques@2ndquadrant.com>: El 26/12/17 a las 09:52, Edson Carlos Ericksson Richter escribió:
> Recently I had a problem with a base file with size 0 in a standby server.
>
> This raised one question: does PostgreSQL (9.6.6) check base integrity
> at startup?
>
> At least if there are 0 byte size files in base dir? Or CRC? Something?
Yes it has CRC check, but only if you initialize the cluster with
--data-checksums, and there's a price to pay in performance.It has CRC check, but it is used in runtime - when data are necessarySo Postgres usually check nothing on start - few system tables and indexesRegardsPavel
--
Martín Marqués http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
If I can change the startup script to make some checks before effectively starting the database, what would be the recommendations?
One that seems obvious to me are empty data files (something like "find -size 0 $PG_DATA/base")...
But I'm sure that more experienced PostgreSQL DBA would have more tests to check before startup.
I don't think so anybody does it. Reading 1TB database needs more then few hours.
Regards
Thanks,
Edson
Em 26/12/2017 13:40, Pavel Stehule escreveu:
If lost the database and backups because of that failure, it would be a giant disaster.
Few hours checking integrity would be acceptable... Specially if I can run it on standby only.
Regards,
Edson
I'm rebuilding the standby server for two days already, with 23% of completion status...2017-12-26 16:37 GMT+01:00 Edson Carlos Ericksson Richter <richter@simkorp.com.br>:Any tips to make database server don't start if corrupt?Em 26/12/2017 12:25, Pavel Stehule escreveu:2017-12-26 14:44 GMT+01:00 Martin Marques <martin.marques@2ndquadrant.com>: El 26/12/17 a las 09:52, Edson Carlos Ericksson Richter escribió:
> Recently I had a problem with a base file with size 0 in a standby server.
>
> This raised one question: does PostgreSQL (9.6.6) check base integrity
> at startup?
>
> At least if there are 0 byte size files in base dir? Or CRC? Something?
Yes it has CRC check, but only if you initialize the cluster with
--data-checksums, and there's a price to pay in performance.It has CRC check, but it is used in runtime - when data are necessarySo Postgres usually check nothing on start - few system tables and indexesRegardsPavel
--
Martín Marqués http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
If I can change the startup script to make some checks before effectively starting the database, what would be the recommendations?
One that seems obvious to me are empty data files (something like "find -size 0 $PG_DATA/base")...
But I'm sure that more experienced PostgreSQL DBA would have more tests to check before startup.I don't think so anybody does it. Reading 1TB database needs more then few hours.Regards
Thanks,
Edson
If lost the database and backups because of that failure, it would be a giant disaster.
Few hours checking integrity would be acceptable... Specially if I can run it on standby only.
Regards,
Edson
2017-12-26 16:50 GMT+01:00 Edson Carlos Ericksson Richter <richter@simkorp.com.br>:
I'm rebuilding the standby server for two days already, with 23% of completion status...Em 26/12/2017 13:40, Pavel Stehule escreveu:2017-12-26 16:37 GMT+01:00 Edson Carlos Ericksson Richter <richter@simkorp.com.br>:Any tips to make database server don't start if corrupt?Em 26/12/2017 12:25, Pavel Stehule escreveu:2017-12-26 14:44 GMT+01:00 Martin Marques <martin.marques@2ndquadrant.com>: El 26/12/17 a las 09:52, Edson Carlos Ericksson Richter escribió:
> Recently I had a problem with a base file with size 0 in a standby server.
>
> This raised one question: does PostgreSQL (9.6.6) check base integrity
> at startup?
>
> At least if there are 0 byte size files in base dir? Or CRC? Something?
Yes it has CRC check, but only if you initialize the cluster with
--data-checksums, and there's a price to pay in performance.It has CRC check, but it is used in runtime - when data are necessarySo Postgres usually check nothing on start - few system tables and indexesRegardsPavel
--
Martín Marqués http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
If I can change the startup script to make some checks before effectively starting the database, what would be the recommendations?
One that seems obvious to me are empty data files (something like "find -size 0 $PG_DATA/base")...
But I'm sure that more experienced PostgreSQL DBA would have more tests to check before startup.I don't think so anybody does it. Reading 1TB database needs more then few hours.Regards
Thanks,
Edson
If lost the database and backups because of that failure, it would be a giant disaster.
Few hours checking integrity would be acceptable... Specially if I can run it on standby only.
very simple check
pgdumpall > /dev/null
but this doesn't check indexes.
Regards
Pavel
Regards,
Edson
Em 26/12/2017 13:57, Pavel Stehule escreveu:
I'll check it and make some tests.2017-12-26 16:50 GMT+01:00 Edson Carlos Ericksson Richter <richter@simkorp.com.br>:I'm rebuilding the standby server for two days already, with 23% of completion status...Em 26/12/2017 13:40, Pavel Stehule escreveu:2017-12-26 16:37 GMT+01:00 Edson Carlos Ericksson Richter <richter@simkorp.com.br>:Any tips to make database server don't start if corrupt?Em 26/12/2017 12:25, Pavel Stehule escreveu:2017-12-26 14:44 GMT+01:00 Martin Marques <martin.marques@2ndquadrant.com>: El 26/12/17 a las 09:52, Edson Carlos Ericksson Richter escribió:
> Recently I had a problem with a base file with size 0 in a standby server.
>
> This raised one question: does PostgreSQL (9.6.6) check base integrity
> at startup?
>
> At least if there are 0 byte size files in base dir? Or CRC? Something?
Yes it has CRC check, but only if you initialize the cluster with
--data-checksums, and there's a price to pay in performance.It has CRC check, but it is used in runtime - when data are necessarySo Postgres usually check nothing on start - few system tables and indexesRegardsPavel
--
Martín Marqués http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
If I can change the startup script to make some checks before effectively starting the database, what would be the recommendations?
One that seems obvious to me are empty data files (something like "find -size 0 $PG_DATA/base")...
But I'm sure that more experienced PostgreSQL DBA would have more tests to check before startup.I don't think so anybody does it. Reading 1TB database needs more then few hours.Regards
Thanks,
Edson
If lost the database and backups because of that failure, it would be a giant disaster.
Few hours checking integrity would be acceptable... Specially if I can run it on standby only.very simple checkpgdumpall > /dev/nullbut this doesn't check indexes.RegardsPavel
Thanks!
Edson
Edson Carlos Ericksson Richter wrote: > I'm rebuilding the standby server for two days already, with 23% of > completion status... So how do you build your standbys, exactly? Maybe there's a bug in your procedure, rather than a bug in the software. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Em 26/12/2017 15:13, Alvaro Herrera escreveu: > Edson Carlos Ericksson Richter wrote: > >> I'm rebuilding the standby server for two days already, with 23% of >> completion status... > So how do you build your standbys, exactly? Maybe there's a bug in your > procedure, rather than a bug in the software. > Usually, or by "pg_basebackup" or by "rsync" as described in PostgreSQL Wiki. Right now, I'm doing via pg_basebackup. Nothing extraordinary. When using rsync: rsync -e "ssh -2 -C -p slave_ssh_port" --progress --partial -az /pgsql/9.6/master_data_folder/* superuser_name@$slave_host:/home/pgsql/9.6/slave_data_folder/ --exclude postmaster.pid --exclude postgresql.conf --exclude pg_log When using pg_basebackup: /usr/pgsql-9.6/bin/pg_basebackup -h master_host -p master_port -D /home/pgsql/9.6/master_data_folder -R -S slot_name -X stream -P -d " user=superuser_name password=superuser_pass sslmode=require " Of course, this is inside a bash script with variables at right places to make script generic as needed to make it works with dozens of databases... Regards, Edson
On 12/26/17 10:57 AM, Pavel Stehule wrote: > 2017-12-26 16:50 GMT+01:00 Edson Carlos Ericksson Richter >> > I'm rebuilding the standby server for two days already, with 23% of > completion status... > If lost the database and backups because of that failure, it would > be a giant disaster. > Few hours checking integrity would be acceptable... Specially if I > can run it on standby only. > > > very simple check > > pgdumpall > /dev/null > > but this doesn't check indexes. pgBackRest will validate all page checksums (including indexes, etc.) in the cluster during backup. Full backups check everything, incr/differential backups check only the files that have changed. -- -David david@pgmasters.net
David Steele wrote: > pgBackRest will validate all page checksums (including indexes, etc.) in the > cluster during backup. Full backups check everything, incr/differential > backups check only the files that have changed. If a table or index file is of zero length when backed up, as in the described case, nothing will be checked, right? I mean, there is nothing externally indicating that the file ought to be of a different size. Am I wrong? So Edson's situation here would not raise any red flags. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hello, On Tue, 2017-12-26 at 18:58 -0300, Alvaro Herrera wrote:Hello, > David Steele wrote: > > > pgBackRest will validate all page checksums (including indexes, > > etc.) in the > > cluster during backup. Full backups check everything, > > incr/differential > > backups check only the files that have changed. > > If a table or index file is of zero length when backed up, as in the > described case, nothing will be checked, right? I mean, there is > nothing externally indicating that the file ought to be of a > different > size. Am I wrong? So Edson's situation here would not raise any red > flags. > Could the following occur:- 1) Your app. issues a BEGIN followed by an INSERT. 2) Postgres decides to open a new file in order to store the new row. 3) Your app. then does a ROLLBACK. Wouldn't that leave you with a zero length file on disk? There's no reason for Postgres to delete the file just because a rollback was issued. All it has to do is clear the buffer in memory. My 2 cents. Rob
El 26/12/17 a las 14:46, Edson Carlos Ericksson Richter escribió: > > Usually, or by "pg_basebackup" or by "rsync" as described in PostgreSQL > Wiki. > Right now, I'm doing via pg_basebackup. > Nothing extraordinary. > > When using rsync: > > rsync -e "ssh -2 -C -p slave_ssh_port" --progress --partial -az > /pgsql/9.6/master_data_folder/* > superuser_name@$slave_host:/home/pgsql/9.6/slave_data_folder/ --exclude > postmaster.pid --exclude postgresql.conf --exclude pg_log You don't say so anywhere, but I suspect you run the rsync between a pg_start_backup() and pg_stop_backup(), right? That is the way it's described in the wiki. Regards, -- Martín Marqués http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Em 26/12/2017 20:42, Martin Marques escreveu: > El 26/12/17 a las 14:46, Edson Carlos Ericksson Richter escribió: >> Usually, or by "pg_basebackup" or by "rsync" as described in PostgreSQL >> Wiki. >> Right now, I'm doing via pg_basebackup. >> Nothing extraordinary. >> >> When using rsync: >> >> rsync -e "ssh -2 -C -p slave_ssh_port" --progress --partial -az >> /pgsql/9.6/master_data_folder/* >> superuser_name@$slave_host:/home/pgsql/9.6/slave_data_folder/ --exclude >> postmaster.pid --exclude postgresql.conf --exclude pg_log > You don't say so anywhere, but I suspect you run the rsync between a > pg_start_backup() and pg_stop_backup(), right? > > That is the way it's described in the wiki. > > Regards, > Yes, sure. When using wiki, I'm following all instructions. Process revisited many times do check if something changed (since 9.0 days :-) ) Right now, I do prefer to use pg_basebackup instead - but sometimes (when database is just too large), rsync seems more reliable (maybe it is the cause of the problem). Anyway, instead digging into rsync functionality (or bugs - I doubt, but who knows?), I do prefer to have a script I can run to check if there is obvious failures in standby servers. Looking for empty files would be a start point. I'm learning from experienced people from the list that are other points I would like to check as well. Regards, Edson.
Em 26/12/2017 20:11, rob stone escreveu: > Hello, > > > On Tue, 2017-12-26 at 18:58 -0300, Alvaro Herrera wrote:Hello, >> David Steele wrote: >> >>> pgBackRest will validate all page checksums (including indexes, >>> etc.) in the >>> cluster during backup. Full backups check everything, >>> incr/differential >>> backups check only the files that have changed. >> If a table or index file is of zero length when backed up, as in the >> described case, nothing will be checked, right? I mean, there is >> nothing externally indicating that the file ought to be of a >> different >> size. Am I wrong? So Edson's situation here would not raise any red >> flags. >> > > Could the following occur:- > 1) Your app. issues a BEGIN followed by an INSERT. > 2) Postgres decides to open a new file in order to store the new row. > 3) Your app. then does a ROLLBACK. > > Wouldn't that leave you with a zero length file on disk? > > There's no reason for Postgres to delete the file just because a > rollback was issued. All it has to do is clear the buffer in memory. > > My 2 cents. > > Rob > > You are right, in several databases I've looked there are many files with 0 byte size in ./base folder. This is not a good quest. Regards, Edson
Greetings, * Edson Carlos Ericksson Richter (richter@simkorp.com.br) wrote: > Right now, I do prefer to use pg_basebackup instead - but sometimes > (when database is just too large), rsync seems more reliable (maybe > it is the cause of the problem). I don't generally recommend rsync for various reasons, but if you make sure to call pg_start/stop_backup(), keep track of all your WAL and verify everything ends up written to disk then it should work. Don't use it for incrementals though- there are cases where that can fail. > Anyway, instead digging into rsync functionality (or bugs - I doubt, > but who knows?), I do prefer to have a script I can run to check if > there is obvious failures in standby servers. As mentioned, zero-byte files can be perfectly valid. PostgreSQL does have page-level CRCs, if you initialized your database with them (which I would strongly recommend). There are also backup tools which will verify those checksums when performing a backup of the system. In addition to that, you can do parallel backup and restore which can reduce the downtime for doing restores quite a bit (though this will depend on what you're bottleneck is, of course). I'd suggest you take a look at pgBackRest, though I think that other solutions now also have all of these features (though they all have their own features too). Thanks! Stephen