Thread: 15 pg_upgrade with -j
Hello!
We are moving from 10 to 15 and are in testing now.
Our development database is about 1400G and takes 12 minutes to complete a pg_upgrade with the -k (hard-links) version. This is on a CentOS 7 server with 80 cores.
Adding -j 40 to use half of those cores also finishes in 12 minutes and ps / top/ htop never show more than a single process at a time in use.
Bumping that to -j 80 to use them all also finishes in 12 minutes and still only a single process.
Running the suggested vacuum analyze after pg_upgrade completes takes about 19 minutes. Adding -j 40 takes that time down to around 5 minutes, jumps the server load up over 30 and htop shows 40 processes.
If -j 40 helps there--why not with pg_upgrade?
The full commands we are using for pg_upgrade are pretty stock:
time /usr/pgsql-15/bin/pg_upgrade -b /usr/pgsql-10/bin/ -B /usr/pgsql-15/bin/ -d /var/lib/pgsql/10/data -D /var/lib/pgsql/15up -k
time /usr/pgsql-15/bin/pg_upgrade -b /usr/pgsql-10/bin/ -B /usr/pgsql-15/bin/ -d /var/lib/pgsql/10/data -D /var/lib/pgsql/15up -k -j 40
time /usr/pgsql-15/bin/pg_upgrade -b /usr/pgsql-10/bin/ -B /usr/pgsql-15/bin/ -d /var/lib/pgsql/10/data -D /var/lib/pgsql/15up -k -j 80
Our production database is closer to 1900G. If we're looking at a 30 minute pg_upgrade window we'll be okay but if there is anything we can do to knock that time down we will and any suggestions to do so would be greatly appreciated.
Jeff Ross
We are moving from 10 to 15 and are in testing now.
Our development database is about 1400G and takes 12 minutes to complete a pg_upgrade with the -k (hard-links) version. This is on a CentOS 7 server with 80 cores.
Adding -j 40 to use half of those cores also finishes in 12 minutes and ps / top/ htop never show more than a single process at a time in use.
Bumping that to -j 80 to use them all also finishes in 12 minutes and still only a single process.
Running the suggested vacuum analyze after pg_upgrade completes takes about 19 minutes. Adding -j 40 takes that time down to around 5 minutes, jumps the server load up over 30 and htop shows 40 processes.
If -j 40 helps there--why not with pg_upgrade?
The full commands we are using for pg_upgrade are pretty stock:
time /usr/pgsql-15/bin/pg_upgrade -b /usr/pgsql-10/bin/ -B /usr/pgsql-15/bin/ -d /var/lib/pgsql/10/data -D /var/lib/pgsql/15up -k
time /usr/pgsql-15/bin/pg_upgrade -b /usr/pgsql-10/bin/ -B /usr/pgsql-15/bin/ -d /var/lib/pgsql/10/data -D /var/lib/pgsql/15up -k -j 40
time /usr/pgsql-15/bin/pg_upgrade -b /usr/pgsql-10/bin/ -B /usr/pgsql-15/bin/ -d /var/lib/pgsql/10/data -D /var/lib/pgsql/15up -k -j 80
Our production database is closer to 1900G. If we're looking at a 30 minute pg_upgrade window we'll be okay but if there is anything we can do to knock that time down we will and any suggestions to do so would be greatly appreciated.
Jeff Ross
On 5/22/23 16:20, Jeff Ross wrote: > Hello! > > We are moving from 10 to 15 and are in testing now. > > Our development database is about 1400G and takes 12 minutes to complete > a pg_upgrade with the -k (hard-links) version. This is on a CentOS 7 > server with 80 cores. > > Adding -j 40 to use half of those cores also finishes in 12 minutes and > ps / top/ htop never show more than a single process at a time in use. > > Bumping that to -j 80 to use them all also finishes in 12 minutes and > still only a single process. > > Running the suggested vacuum analyze after pg_upgrade completes takes > about 19 minutes. Adding -j 40 takes that time down to around 5 > minutes, jumps the server load up over 30 and htop shows 40 processes. > > If -j 40 helps there--why not with pg_upgrade? From docs: https://www.postgresql.org/docs/current/pgupgrade.html The --jobs option allows multiple CPU cores to be used for copying/linking of files and to dump and restore database schemas in parallel; a good place to start is the maximum of the number of CPU cores and tablespaces. This option can dramatically reduce the time to upgrade a multi-database server running on a multiprocessor machine. So is the 1400G mostly in one database in the cluster? > > The full commands we are using for pg_upgrade are pretty stock: > > time /usr/pgsql-15/bin/pg_upgrade -b /usr/pgsql-10/bin/ -B > /usr/pgsql-15/bin/ -d /var/lib/pgsql/10/data -D /var/lib/pgsql/15up -k > time /usr/pgsql-15/bin/pg_upgrade -b /usr/pgsql-10/bin/ -B > /usr/pgsql-15/bin/ -d /var/lib/pgsql/10/data -D /var/lib/pgsql/15up -k -j 40 > time /usr/pgsql-15/bin/pg_upgrade -b /usr/pgsql-10/bin/ -B > /usr/pgsql-15/bin/ -d /var/lib/pgsql/10/data -D /var/lib/pgsql/15up -k -j 80 > > Our production database is closer to 1900G. If we're looking at a 30 > minute pg_upgrade window we'll be okay but if there is anything we can > do to knock that time down we will and any suggestions to do so would be > greatly appreciated. > > Jeff Ross -- Adrian Klaver adrian.klaver@aklaver.com
On 5/22/23 5:24 PM, Adrian Klaver wrote:
Jeff
On 5/22/23 16:20, Jeff Ross wrote:Yes, one big database with about 80 schemas and several other smaller databases so -j should help, right?Hello!
We are moving from 10 to 15 and are in testing now.
Our development database is about 1400G and takes 12 minutes to complete a pg_upgrade with the -k (hard-links) version. This is on a CentOS 7 server with 80 cores.
Adding -j 40 to use half of those cores also finishes in 12 minutes and ps / top/ htop never show more than a single process at a time in use.
Bumping that to -j 80 to use them all also finishes in 12 minutes and still only a single process.
Running the suggested vacuum analyze after pg_upgrade completes takes about 19 minutes. Adding -j 40 takes that time down to around 5 minutes, jumps the server load up over 30 and htop shows 40 processes.
If -j 40 helps there--why not with pg_upgrade?
From docs:
https://www.postgresql.org/docs/current/pgupgrade.html
The --jobs option allows multiple CPU cores to be used for copying/linking of files and to dump and restore database schemas in parallel; a good place to start is the maximum of the number of CPU cores and tablespaces. This option can dramatically reduce the time to upgrade a multi-database server running on a multiprocessor machine.
So is the 1400G mostly in one database in the cluster?
The full commands we are using for pg_upgrade are pretty stock:
time /usr/pgsql-15/bin/pg_upgrade -b /usr/pgsql-10/bin/ -B /usr/pgsql-15/bin/ -d /var/lib/pgsql/10/data -D /var/lib/pgsql/15up -k
time /usr/pgsql-15/bin/pg_upgrade -b /usr/pgsql-10/bin/ -B /usr/pgsql-15/bin/ -d /var/lib/pgsql/10/data -D /var/lib/pgsql/15up -k -j 40
time /usr/pgsql-15/bin/pg_upgrade -b /usr/pgsql-10/bin/ -B /usr/pgsql-15/bin/ -d /var/lib/pgsql/10/data -D /var/lib/pgsql/15up -k -j 80
Our production database is closer to 1900G. If we're looking at a 30 minute pg_upgrade window we'll be okay but if there is anything we can do to knock that time down we will and any suggestions to do so would be greatly appreciated.
Jeff Ross
Jeff
Jeff Ross <jross@openvistas.net> writes: > On 5/22/23 5:24 PM, Adrian Klaver wrote: >> So is the 1400G mostly in one database in the cluster? > Yes, one big database with about 80 schemas and several other smaller > databases so -j should help, right? AFAICT from a quick look at the code, you won't get any meaningful parallelism unless you have several large DBs and/or several large tablespaces. It looks like the assumption was that issuing link() requests in parallel wouldn't help much but just swamp your disk if they're all on the same filesystem. Maybe that could use rethinking, not sure. regards, tom lane
On 5/22/23 16:29, Jeff Ross wrote: > On 5/22/23 5:24 PM, Adrian Klaver wrote: >> On 5/22/23 16:20, Jeff Ross wrote: >>> Hello! >>> >> >> From docs: >> >> https://www.postgresql.org/docs/current/pgupgrade.html >> >> The --jobs option allows multiple CPU cores to be used for >> copying/linking of files and to dump and restore database schemas in >> parallel; a good place to start is the maximum of the number of CPU >> cores and tablespaces. This option can dramatically reduce the time to >> upgrade a multi-database server running on a multiprocessor machine. >> >> So is the 1400G mostly in one database in the cluster? >> >>> >>> The full commands we are using for pg_upgrade are pretty stock: > Yes, one big database with about 80 schemas and several other smaller > databases so -j should help, right? As I understand it no. That the parallelism is between databases not within a database. Further that 'database schemas' refers to schema as the overall database object definitions not the namespaces known as schemas in the database. > ' > Jeff -- Adrian Klaver adrian.klaver@aklaver.com
On 5/22/23 18:42, Tom Lane wrote: > Jeff Ross <jross@openvistas.net> writes: >> On 5/22/23 5:24 PM, Adrian Klaver wrote: >>> So is the 1400G mostly in one database in the cluster? >>> >> Yes, one big database with about 80 schemas and several other smaller >> databases so -j should help, right? > AFAICT from a quick look at the code, you won't get any meaningful > parallelism unless you have several large DBs and/or several large > tablespaces. Hmm. I'm glad I'm reading this now. > It looks like the assumption was that issuing link() > requests in parallel wouldn't help much but just swamp your disk > if they're all on the same filesystem. > Maybe that could use rethinking, not sure. It does need rethinking in the era of VMs and SANs. /var/lib/pgsql/15 is going to be on a different LUN from /var/lib/pgsql/9.6 just like /var/lib/pgsql/backups is on a different LUN. -- Born in Arizona, moved to Babylonia.
On 5/22/23 5:42 PM, Tom Lane wrote:
Thanks Tom. These are all smokingly fast SSDs so it would be interesting to see how well they'd hold up under that load.Jeff Ross <jross@openvistas.net> writes:On 5/22/23 5:24 PM, Adrian Klaver wrote:So is the 1400G mostly in one database in the cluster?Yes, one big database with about 80 schemas and several other smaller databases so -j should help, right?AFAICT from a quick look at the code, you won't get any meaningful parallelism unless you have several large DBs and/or several large tablespaces. It looks like the assumption was that issuing link() requests in parallel wouldn't help much but just swamp your disk if they're all on the same filesystem. Maybe that could use rethinking, not sure. regards, tom lane
Jeff
On 5/22/23 5:43 PM, Adrian Klaver wrote:
Thanks Adrian. That "restore database schemas in parallel" phrase seems like it would be really easy to read like we did and expect it to work with one database and multiple schemas.
Maybe it should be changed to "restore multiple databases in parallel" instead?
Jeff
From docs:
https://www.postgresql.org/docs/current/pgupgrade.html
The --jobs option allows multiple CPU cores to be used for copying/linking of files and to dump and restore database schemas in parallel; a good place to start is the maximum of the number of CPU cores and tablespaces. This option can dramatically reduce the time to upgrade a multi-database server running on a multiprocessor machine.
So is the 1400G mostly in one database in the cluster?
The full commands we are using for pg_upgrade are pretty stock:Yes, one big database with about 80 schemas and several other smaller databases so -j should help, right?
As I understand it no. That the parallelism is between databases not within a database. Further that 'database schemas' refers to schema as the overall database object definitions not the namespaces known as schemas in the database.
Thanks Adrian. That "restore database schemas in parallel" phrase seems like it would be really easy to read like we did and expect it to work with one database and multiple schemas.
Maybe it should be changed to "restore multiple databases in parallel" instead?
Jeff
On 2023-05-22 21:10:48 -0500, Ron wrote: > On 5/22/23 18:42, Tom Lane wrote: > > It looks like the assumption was that issuing link() > > requests in parallel wouldn't help much but just swamp your disk > > if they're all on the same filesystem. > > Maybe that could use rethinking, not sure. > > It does need rethinking in the era of VMs and SANs. /var/lib/pgsql/15 is > going to be on a different LUN from /var/lib/pgsql/9.6 You can't hardlink between different file systems. Even if you could assign single directories to specific LUNs (does any file system allow this?) this would at best spread the updates across two LUNs (the inodes would presumable stay on the source LUN and the target directory would be on the target LUN). hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment
On 5/23/23 12:19, Peter J. Holzer wrote: > On 2023-05-22 21:10:48 -0500, Ron wrote: >> On 5/22/23 18:42, Tom Lane wrote: >>> It looks like the assumption was that issuing link() >>> requests in parallel wouldn't help much but just swamp your disk >>> if they're all on the same filesystem. >>> Maybe that could use rethinking, not sure. >> It does need rethinking in the era of VMs and SANs. /var/lib/pgsql/15 is >> going to be on a different LUN from /var/lib/pgsql/9.6 > You can't hardlink between different file systems. We'd never hardlink. Eliminates the ability to return to the old system if something goes wrong. -- Born in Arizona, moved to Babylonia.
## Ron (ronljohnsonjr@gmail.com): > We'd never hardlink. Eliminates the ability to return to the old > system if something goes wrong. That's why you get yourself a recent XFS and use clone mode (still sticks you to the same filesystem, but gets you up running much faster). Regards, Christoph -- Spare Space
On 2023-05-23 13:17:24 -0500, Ron wrote: > On 5/23/23 12:19, Peter J. Holzer wrote: > > On 2023-05-22 21:10:48 -0500, Ron wrote: > > > On 5/22/23 18:42, Tom Lane wrote: > > > > It looks like the assumption was that issuing link() ^^^^^^ > > > > requests in parallel wouldn't help much but just swamp your disk > > > > if they're all on the same filesystem. > > > > Maybe that could use rethinking, not sure. > > > It does need rethinking in the era of VMs and SANs. /var/lib/pgsql/15 is > > > going to be on a different LUN from /var/lib/pgsql/9.6 > > You can't hardlink between different file systems. > > We'd never hardlink. But that was what Jeff and Tom were talking about. If you are changing the subject you should at least make it explicit. hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment
On 5/23/23 13:58, Christoph Moench-Tegeder wrote:
Sadly, our 9.6 systems (the ones we really need to get off of) are RHEL 6.10 and ext4.
## Ron (ronljohnsonjr@gmail.com):We'd never hardlink. Eliminates the ability to return to the old system if something goes wrong.That's why you get yourself a recent XFS and use clone mode (still sticks you to the same filesystem, but gets you up running much faster).
Sadly, our 9.6 systems (the ones we really need to get off of) are RHEL 6.10 and ext4.
--
Born in Arizona, moved to Babylonia.
Born in Arizona, moved to Babylonia.
On 2023-May-23, Ron wrote: > We'd never hardlink. Eliminates the ability to return to the old system if > something goes wrong. If you'd never hardlink, then you should run your test without the -k option. Otherwise, the timings are meaningless. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "La vida es para el que se aventura"