Thread: pg_upgrade --jobs
The pg_upgrade --jobs option is not passed as an argument when it calls pg_dump. I haven't found anything in docs or forumsmentioning a reason for not supporting under certain circumstances other than possibly for pre-9.2. The pg_upgradedocs page states that it allows multiple CPUs to be used for dump and reload of schemas. Some databases I'm upgradinghave 500,000+ tables and running with a single process is greatly increasing the upgrade time. I am also using the --link option. I have tried "--jobs 20", "--jobs=20", placing this option first and last and many other variations. I am upgrading 9.2.4 to 9.6.12 on CentOS 6. Varying hardware but all with 32+ CPU cores. su - postgres -c "/usr/pgsql-9.6/bin/pg_upgrade --jobs=20 --link \ --old-bindir=/usr/pgsql-9.2/bin/ \ --new-bindir=/usr/pgsql-9.6/bin/ \ --old-datadir=/var/lib/pgsql/9.2/data/ \ --new-datadir=/var/lib/pgsql/9.6/data/" I feel like there's a simple reason I've missed but this seems pretty straight forward. A secondary plan would be to find instructions for doing the same as "pg_upgrade --link" manually so I can run "pg_dump --jobs20". Any assist is appreciated. Thanks, S. Cervesa
On 4/6/19 11:44 AM, senor wrote: > The pg_upgrade --jobs option is not passed as an argument when it calls pg_dump. I haven't found anything in docs or forumsmentioning a reason for not supporting under certain circumstances other than possibly for pre-9.2. The pg_upgradedocs page states that it allows multiple CPUs to be used for dump and reload of schemas. Some databases I'm upgradinghave 500,000+ tables and running with a single process is greatly increasing the upgrade time. > > I am also using the --link option. > I have tried "--jobs 20", "--jobs=20", placing this option first and last and many other variations. > I am upgrading 9.2.4 to 9.6.12 on CentOS 6. > Varying hardware but all with 32+ CPU cores. > > su - postgres -c "/usr/pgsql-9.6/bin/pg_upgrade --jobs=20 --link \ > --old-bindir=/usr/pgsql-9.2/bin/ \ > --new-bindir=/usr/pgsql-9.6/bin/ \ > --old-datadir=/var/lib/pgsql/9.2/data/ \ > --new-datadir=/var/lib/pgsql/9.6/data/" > > I feel like there's a simple reason I've missed but this seems pretty straight forward. https://www.postgresql.org/docs/9.6/app-pgdump.html "--jobs=njobs Run the dump in parallel by dumping njobs tables simultaneously. This option reduces the time of the dump but it also increases the load on the database server. You can only use this option with the directory ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ output format because this is the only output format where multiple ^^^^^^^^^^^^^ processes can write their data at the same time." > A secondary plan would be to find instructions for doing the same as "pg_upgrade --link" manually so I can run "pg_dump--jobs 20". > Any assist is appreciated. > Thanks, > S. Cervesa > > -- Adrian Klaver adrian.klaver@aklaver.com
Thank you for responding. I did see that note and should have included that as part of my question. Since pg_upgrade is in control of how it is calling pg_dump, is there a reason pg_upgrade cannot use the directory outputformat when calling pg_dump? Is the schema-only operation incompatible? ________________________________________ From: Adrian Klaver <adrian.klaver@aklaver.com> Sent: Saturday, April 6, 2019 1:52 PM To: senor; pgsql-general@lists.postgresql.org Subject: Re: pg_upgrade --jobs On 4/6/19 11:44 AM, senor wrote: > The pg_upgrade --jobs option is not passed as an argument when it calls pg_dump. I haven't found anything in docs or forumsmentioning a reason for not supporting under certain circumstances other than possibly for pre-9.2. The pg_upgradedocs page states that it allows multiple CPUs to be used for dump and reload of schemas. Some databases I'm upgradinghave 500,000+ tables and running with a single process is greatly increasing the upgrade time. > > I am also using the --link option. > I have tried "--jobs 20", "--jobs=20", placing this option first and last and many other variations. > I am upgrading 9.2.4 to 9.6.12 on CentOS 6. > Varying hardware but all with 32+ CPU cores. > > su - postgres -c "/usr/pgsql-9.6/bin/pg_upgrade --jobs=20 --link \ > --old-bindir=/usr/pgsql-9.2/bin/ \ > --new-bindir=/usr/pgsql-9.6/bin/ \ > --old-datadir=/var/lib/pgsql/9.2/data/ \ > --new-datadir=/var/lib/pgsql/9.6/data/" > > I feel like there's a simple reason I've missed but this seems pretty straight forward. https://www.postgresql.org/docs/9.6/app-pgdump.html "--jobs=njobs Run the dump in parallel by dumping njobs tables simultaneously. This option reduces the time of the dump but it also increases the load on the database server. You can only use this option with the directory ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ output format because this is the only output format where multiple ^^^^^^^^^^^^^ processes can write their data at the same time." > A secondary plan would be to find instructions for doing the same as "pg_upgrade --link" manually so I can run "pg_dump--jobs 20". > Any assist is appreciated. > Thanks, > S. Cervesa > > -- Adrian Klaver adrian.klaver@aklaver.com
senor <frio_cervesa@hotmail.com> writes: > Since pg_upgrade is in control of how it is calling pg_dump, is there a reason pg_upgrade cannot use the directory outputformat when calling pg_dump? Is the schema-only operation incompatible? Well, there's no point in it. pg_dump can only parallelize data dumping, and there's none to be done in the --schema-only case that pg_upgrade uses. Also, since pg_upgrade *does* use parallelism across multiple pg_dump calls (if you've got multiple databases in the cluster), it'd be a bit problematic to have another layer of parallelism below that, if it did indeed do anything. You don't want "--jobs=10" to suddenly turn into 100 sessions. regards, tom lane
Thanks Tom. I suppose "pg_dump can only parallelize data dumping" answers my original question as "expected behavior" butI would like to understand the reason better. My knowledge of Postgres and other DBMSs is at casual admin level with the occasional deep dive on specific errors or analysis.I'm not averse to getting into the code. Before my OP I searched for reasons that the schema-only option would preventpg_dump from being able to run multiple jobs and didn't find anything that I understood to confirm either way. Is the limitation simply the state of development to date or is there something about dumping the schemas that conflictswith paralleling? I'm willing to do some studying if provided links to relevant articles. The --link option to pg_upgrade would be so much more useful if it weren't still bound to serially dumping the schemas ofhalf a million tables. As already mentioned, if there is an alternate process that mimics pg_upgrade but allows for paralleling,I'm open to that. Thanks all ________________________________________ From: Tom Lane <tgl@sss.pgh.pa.us> Sent: Saturday, April 6, 2019 3:02 PM To: senor Cc: pgsql-general@lists.postgresql.org Subject: Re: pg_upgrade --jobs senor <frio_cervesa@hotmail.com> writes: > Since pg_upgrade is in control of how it is calling pg_dump, is there a reason pg_upgrade cannot use the directory outputformat when calling pg_dump? Is the schema-only operation incompatible? Well, there's no point in it. pg_dump can only parallelize data dumping, and there's none to be done in the --schema-only case that pg_upgrade uses. Also, since pg_upgrade *does* use parallelism across multiple pg_dump calls (if you've got multiple databases in the cluster), it'd be a bit problematic to have another layer of parallelism below that, if it did indeed do anything. You don't want "--jobs=10" to suddenly turn into 100 sessions. regards, tom lane
senor <frio_cervesa@hotmail.com> writes: > Is the limitation simply the state of development to date or is there > something about dumping the schemas that conflicts with paralleling? At minimum, it'd take a complete redesign of pg_dump's output format, and I'm not even very sure what such a redesign would look like. All the schema information goes into a single file that has to be written serially. Trying to make it be one file per table definition wouldn't really fix much: somewhere there has to be a "table of contents", plus where are you going to put the dependency info that shows what ordering is required for restore? > The --link option to pg_upgrade would be so much more useful if it > weren't still bound to serially dumping the schemas of half a million > tables. To be perfectly blunt, if you've got a database with half a million tables, You're Doing It Wrong. regards, tom lane
On 4/6/19 6:50 PM, Tom Lane wrote:
Heavy (really heavy) partitioning?
[snip]senor <frio_cervesa@hotmail.com> writes:
The --link option to pg_upgrade would be so much more useful if it weren't still bound to serially dumping the schemas of half a million tables.To be perfectly blunt, if you've got a database with half a million tables, You're Doing It Wrong.
Heavy (really heavy) partitioning?
--
Angular momentum makes the world go 'round.
Angular momentum makes the world go 'round.
Thanks Tom for the explanation. I assumed it was my ignorance of how the schema was handled that was making this look likea problem that had already been solved and I was missing something. I fully expected the "You're Doing It Wrong" part. That is out of my control but not beyond my influence. I suspect I know the answer to this but have to ask. Using a simplified example where there are 100K sets of 4 tables, eachrepresenting the output of a single job, are there any shortcuts to upgrading that would circumvent exporting the entireschema? I'm sure a different DB design would be better but that's not what I'm working with. Thanks ________________________________________ From: Ron <ronljohnsonjr@gmail.com> Sent: Saturday, April 6, 2019 4:57 PM To: pgsql-general@lists.postgresql.org Subject: Re: pg_upgrade --jobs On 4/6/19 6:50 PM, Tom Lane wrote: senor <frio_cervesa@hotmail.com><mailto:frio_cervesa@hotmail.com> writes: [snip] The --link option to pg_upgrade would be so much more useful if it weren't still bound to serially dumping the schemas of half a million tables. To be perfectly blunt, if you've got a database with half a million tables, You're Doing It Wrong. Heavy (really heavy) partitioning? -- Angular momentum makes the world go 'round.
are there any shortcuts to upgrading that would circumvent exporting the entire schema?
By "shortcuts," do you mean you want to minimize the time and energy you put into the upgrade, or that you want to minimize database downtime? If you mean downtime, I was able to upgrade a customer-facing database with ~350,000 tables from Postgres 9.0 to 9.6 last year with only 86 seconds of downtime, using Slony, but I had to make many custom modifications to Slony and test thoroughly beforehand, and it was not for the faint of heart, the pressed for time, or the inexperienced. There may be better ways (and if so, I would be curious to learn about them), but Slony was the tool with which I was most familiar at the time.
This method does, of course, require exporting the entire schema, but because our only constraint was to minimize customer downtime, and the database was online while the schema was being exported, we didn't care how long it took. Your constraints may be different.
For those reading: we do know that 350,000 tables is Doing It Wrong, and we're getting rid of them, but we decided being on an EOLed version of Postgres was worse and should be fixed first.
Sherrylyn
On 4/6/19 5:47 PM, senor wrote: > Thanks Tom for the explanation. I assumed it was my ignorance of how the schema was handled that was making this look likea problem that had already been solved and I was missing something. > > I fully expected the "You're Doing It Wrong" part. That is out of my control but not beyond my influence. > > I suspect I know the answer to this but have to ask. Using a simplified example where there are 100K sets of 4 tables,each representing the output of a single job, are there any shortcuts to upgrading that would circumvent exportingthe entire schema? I'm sure a different DB design would be better but that's not what I'm working with. An answer is going to depend on more information: 1) What is the time frame for moving from one version to another? Both the setup and the actual downtime. 2) There are 500,000+ tables, but what is the amount of data involved? 3) Are all the tables active? 4) How are the tables distributed across databases in the cluster and schemas in each database? > > Thanks > > ________________________________________ > From: Ron <ronljohnsonjr@gmail.com> > Sent: Saturday, April 6, 2019 4:57 PM > To: pgsql-general@lists.postgresql.org > Subject: Re: pg_upgrade --jobs > > On 4/6/19 6:50 PM, Tom Lane wrote: > > senor <frio_cervesa@hotmail.com><mailto:frio_cervesa@hotmail.com> writes: > > > [snip] > > The --link option to pg_upgrade would be so much more useful if it > weren't still bound to serially dumping the schemas of half a million > tables. > > > > To be perfectly blunt, if you've got a database with half a million > tables, You're Doing It Wrong. > > Heavy (really heavy) partitioning? > > -- > Angular momentum makes the world go 'round. > > > -- Adrian Klaver adrian.klaver@aklaver.com
Thank you Adrian. I'm not sure if I can provide as much as you'd need for a definite answer but I'll give you what I have. The original scheduled downtime for one installation was 24 hours. By 21 hours it had not completed the pg_dump schema-onlyso it was returned to operation. The amount of data per table is widely varied. Some daily tables are 100-200GB and thousands of reports tables with statsare much smaller. I'm not connected to check now but I'd guess 1GB max. We chose to use the --link option partly becausesome servers do not have the disk space to copy. The time necessary to copy 1-2TB was also going to be an issue. The vast majority of activity is on current day inserts and stats reports of that data. All previous days and existing reportsare read only. As is all too common, the DB usage grew with no redesign so it is a single database on a single machine with a single schema. I get the impression there may be an option of getting the schema dump while in service but possibly not in this scenario.Plan B is to drop a lot of tables and deal with imports later. I appreciate the help. ________________________________________ From: Adrian Klaver <adrian.klaver@aklaver.com> Sent: Sunday, April 7, 2019 8:19 AM To: senor; pgsql-general@lists.postgresql.org Subject: Re: pg_upgrade --jobs On 4/6/19 5:47 PM, senor wrote: > Thanks Tom for the explanation. I assumed it was my ignorance of how the schema was handled that was making this look likea problem that had already been solved and I was missing something. > > I fully expected the "You're Doing It Wrong" part. That is out of my control but not beyond my influence. > > I suspect I know the answer to this but have to ask. Using a simplified example where there are 100K sets of 4 tables,each representing the output of a single job, are there any shortcuts to upgrading that would circumvent exportingthe entire schema? I'm sure a different DB design would be better but that's not what I'm working with. An answer is going to depend on more information: 1) What is the time frame for moving from one version to another? Both the setup and the actual downtime. 2) There are 500,000+ tables, but what is the amount of data involved? 3) Are all the tables active? 4) How are the tables distributed across databases in the cluster and schemas in each database? > > Thanks > > ________________________________________ > From: Ron <ronljohnsonjr@gmail.com> > Sent: Saturday, April 6, 2019 4:57 PM > To: pgsql-general@lists.postgresql.org > Subject: Re: pg_upgrade --jobs > > On 4/6/19 6:50 PM, Tom Lane wrote: > > senor <frio_cervesa@hotmail.com><mailto:frio_cervesa@hotmail.com> writes: > > > [snip] > > The --link option to pg_upgrade would be so much more useful if it > weren't still bound to serially dumping the schemas of half a million > tables. > > > > To be perfectly blunt, if you've got a database with half a million > tables, You're Doing It Wrong. > > Heavy (really heavy) partitioning? > > -- > Angular momentum makes the world go 'round. > > > -- Adrian Klaver adrian.klaver@aklaver.com
I just noticed I missed Sherrylyn's post. I did some reading about Slony and believe it is would be useful if I had the time to dig in. As pointed out, it's not anout-of-the box solution. It is included on the TODO list though. For now I can only dream of the 86 second down time. Thanks ________________________________________ From: Sherrylyn Branchaw <sbranchaw@gmail.com> Sent: Sunday, April 7, 2019 6:43 AM To: senor Cc: pgsql-general@lists.postgresql.org Subject: Re: pg_upgrade --jobs are there any shortcuts to upgrading that would circumvent exporting the entire schema? By "shortcuts," do you mean you want to minimize the time and energy you put into the upgrade, or that you want to minimizedatabase downtime? If you mean downtime, I was able to upgrade a customer-facing database with ~350,000 tables fromPostgres 9.0 to 9.6 last year with only 86 seconds of downtime, using Slony, but I had to make many custom modificationsto Slony and test thoroughly beforehand, and it was not for the faint of heart, the pressed for time, or theinexperienced. There may be better ways (and if so, I would be curious to learn about them), but Slony was the tool withwhich I was most familiar at the time. This method does, of course, require exporting the entire schema, but because our only constraint was to minimize customerdowntime, and the database was online while the schema was being exported, we didn't care how long it took. Yourconstraints may be different. For those reading: we do know that 350,000 tables is Doing It Wrong, and we're getting rid of them, but we decided beingon an EOLed version of Postgres was worse and should be fixed first. Sherrylyn
On 4/7/19 12:05 PM, senor wrote: > Thank you Adrian. I'm not sure if I can provide as much as you'd need for a definite answer but I'll give you what I have. > > The original scheduled downtime for one installation was 24 hours. By 21 hours it had not completed the pg_dump schema-onlyso it was returned to operation. So this is more then one cluster? I am assuming the below was repeated at different sites? > The amount of data per table is widely varied. Some daily tables are 100-200GB and thousands of reports tables with statsare much smaller. I'm not connected to check now but I'd guess 1GB max. We chose to use the --link option partly becausesome servers do not have the disk space to copy. The time necessary to copy 1-2TB was also going to be an issue. > The vast majority of activity is on current day inserts and stats reports of that data. All previous days and existingreports are read only. > As is all too common, the DB usage grew with no redesign so it is a single database on a single machine with a single schema. > I get the impression there may be an option of getting the schema dump while in service but possibly not in this scenario.Plan B is to drop a lot of tables and deal with imports later. I take the above to mean that a lot of the tables are cruft, correct? > > I appreciate the help. > -- Adrian Klaver adrian.klaver@aklaver.com
> The original scheduled downtime for one installation was 24 hours. By 21 hours it had not >completed the pg_dump schema-only so it was returned to operation.
To me, your best option is to create a slony cluster with the version you need to upgrade to.
When slony is in sync, simply make it the master and switch to it. It may take a while for
slony replication to be in sync, but when it is, there will be very little down time to switch
over.
When slony is in sync, simply make it the master and switch to it. It may take a while for
slony replication to be in sync, but when it is, there will be very little down time to switch
over.
On Sun, Apr 7, 2019 at 3:36 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 4/7/19 12:05 PM, senor wrote:
> Thank you Adrian. I'm not sure if I can provide as much as you'd need for a definite answer but I'll give you what I have.
>
> The original scheduled downtime for one installation was 24 hours. By 21 hours it had not completed the pg_dump schema-only so it was returned to operation.
So this is more then one cluster?
I am assuming the below was repeated at different sites?
> The amount of data per table is widely varied. Some daily tables are 100-200GB and thousands of reports tables with stats are much smaller. I'm not connected to check now but I'd guess 1GB max. We chose to use the --link option partly because some servers do not have the disk space to copy. The time necessary to copy 1-2TB was also going to be an issue.
> The vast majority of activity is on current day inserts and stats reports of that data. All previous days and existing reports are read only.
> As is all too common, the DB usage grew with no redesign so it is a single database on a single machine with a single schema.
> I get the impression there may be an option of getting the schema dump while in service but possibly not in this scenario. Plan B is to drop a lot of tables and deal with imports later.
I take the above to mean that a lot of the tables are cruft, correct?
>
> I appreciate the help.
>
--
Adrian Klaver
adrian.klaver@aklaver.com
--
Melvin Davidson
Maj. Database & Exploration Specialist
Universe Exploration Command – UXC
Employment by invitation only!
Maj. Database & Exploration Specialist
Universe Exploration Command – UXC
Employment by invitation only!
> It may take a while for slony replication to be in sync, but when it is, there will be very little down time to switch over.
I agree in principle, which is why I chose Slony over pg_upgrade for my company's very similar situation, but my experience was that, out of the box, Slony was projected to take unacceptably long (months) to sync our 350,000 tables, and downtime was going to be many hours. In order to get those numbers down, I had to reduce the number of per-table steps Slony was executing, e.g. by eliminating unnecessary-for-us ones and by rewriting others to happen in bulk.
Here's something I didn't know Slony did when I didn't have 350,000 tables to worry about: add a table to replication, run through the existing list of replicated tables to make sure it's captured any changes that have happened in the meantime, add one more table to replication, run through the existing list of tables to make sure no changes have happened, and so on. The more tables you add, the longer it takes to add the next table. Here's another thing I didn't know it did: during the switchover, manage 4 triggers per table serially on primary and standby. 4 * 350000 * 2 = 2.8 million triggers. (I knew it managed 4 triggers per table, but not that it was serial or how long that would take when pg_trigger had almost 3 million relevant rows.)
I would love to help the OP out in a more hands-on way (I have upgrade-via-Slony consulting experience), as well as to open source the custom changes I came up with, but I'm debating whether I have the time to offer to take on another project right now. I'm also reluctant to summarize exactly what I did, because messing with pg_catalog directly is very delicate and likely to go wrong, and I don't recommend it to the inexperienced all, or to the experienced if they have alternatives.
> Plan B is to drop a lot of tables and deal with imports later.
If it were me, I would take a copy of my database, restore it to a sandbox environment, set up Slony, and get an estimate for the projected sync time. Let it run for a few hours to see how dramatically the sync time slows down as the number of tables grows. I.e. don't count the number of replicated tables after an hour and assume that's your hourly rate of syncing. If it looks like you can sync your entire database in a few days or less, then let it run and test out your failover time. If that's acceptable, you're good to go.
If sync time looks like it's going to be incredibly long on your schema + hardware + Postgres version, etc., then failover time probably would be too. In that case, temporarily dropping the tables you can drop may be preferable to the complexity of making changes to speed up Slony, if you don't have a seriously experienced DBA on hand.
Sherrylyn