Thread: Suggestions for Large DB Dump/Reload

Suggestions for Large DB Dump/Reload

From
Chris Gamache
Date:
As I prepare to rev up to the latest postgresql, I find my stomach twisting yet
again at the thought of dumping my one huge table of 27,000,000 rows (of
heavily indexed data) and reloading and reindexing it.

I'm looking for suggestions for streamlining my process... Most of the steps
are normal upgrading steps

1. disallow access to database (except from me)
2. drop all my indexes on the HUGE table
3. pg_dumpall > outputfile
   (thankfully, I don't have large objects. I don't need to keep OID's)

wait... wait... wait...

4. kill -INT `cat /usr/local/pgsql/data/postmaster.pid`
5. mv /usr/local/pgsql /usr/local/pgsql.old
6. make new postgresql
7. /usr/local/pgsql/bin/initdb -D /usr/local/pgsql/data
8. /usr/local/pgsql/bin/postmaster -D /usr/local/pgsql/data
9. /usr/local/pgsql/bin/psql -d template1 -f outputfile

wait a whole lot more...

10. recreate the indexes on the huge table

many hours later....

11. vacuum analyze the whole database
12. go back into production

This takes _forever_ on a (dual) p2 450 with 256MB Ram and a 10000 RPM SCSI
filesystem... Besides upgrading the hardware, is there anything else I can do
process-wise to speed things up? The fsync is off, and I've increased WAL Files
to a good large number... Have I left any critical detail out of my problem
description? Do you need to see my actual config settings?

CG



__________________________________________________
Do You Yahoo!?
Yahoo! Games - play chess, backgammon, pool and more
http://games.yahoo.com/

Re: Suggestions for Large DB Dump/Reload

From
Neil Conway
Date:
On Tue, 23 Apr 2002 07:02:28 -0700 (PDT)
"Chris Gamache" <cgg007@yahoo.com> wrote:
> 1. disallow access to database (except from me)
> 2. drop all my indexes on the HUGE table
> 3. pg_dumpall > outputfile
>    (thankfully, I don't have large objects. I don't need to keep OID's)
>
> wait... wait... wait...
>
> 4. kill -INT `cat /usr/local/pgsql/data/postmaster.pid`
> 5. mv /usr/local/pgsql /usr/local/pgsql.old
> 6. make new postgresql
> 7. /usr/local/pgsql/bin/initdb -D /usr/local/pgsql/data
> 8. /usr/local/pgsql/bin/postmaster -D /usr/local/pgsql/data
> 9. /usr/local/pgsql/bin/psql -d template1 -f outputfile

Steps 6 & 7 can be done prior to taking the production DB offline.
That will save a little time, at any rate.

> wait a whole lot more...
>
> 10. recreate the indexes on the huge table

When you recreate the indexes, how are you doing it? If you
run several index creations in parallel, that should probably
speed things up, especially on an SMP box. However, I haven't
checked what locks CREATE INDEX acquires, it may prevent
other concurrent index creations...

> This takes _forever_ on a (dual) p2 450 with 256MB Ram and a 10000 RPM SCSI
> filesystem... Besides upgrading the hardware, is there anything else I can do
> process-wise to speed things up? The fsync is off, and I've increased WAL Files
> to a good large number... Have I left any critical detail out of my problem
> description? Do you need to see my actual config settings?

Perhaps increase shared_buffers and wal_buffers?

Cheers,

Neil

--
Neil Conway <neilconway@rogers.com>
PGP Key ID: DB3C29FC

Re: Suggestions for Large DB Dump/Reload

From
Chris Gamache
Date:
Since I'm at about 75% understanding on the whole WAL concept, I dare not push
the buffers any higher without understanding what I'm doing. :) I did read the
docs, but WAL just seems like a glorified scratchpad for postgresql to use so
it doesn't have to make too many trips to the actual database, and so you might
have a few extra bits of data upon crash. I have no idea how pushing it to its
limit might affect the performance of the database, pro or con. Why have a
setting at all if the max value gives the best performance in all situations?

Here are my settings in postgresql.conf that I've been using since my 7.1.3
install:
---[snip...]---
max_connections = 64
sort_mem = 512
shared_buffers = 128
fsync=false
wal_buffers = 8
wal_files = 32
wal_sync_method = fsync
wal_debug = 0
commit_delay = 0
commit_siblings = 5
checkpoint_segments = 3
checkpoint_timeout = 300
---[snip...]---

I compiled postgres --with-syslog, but I don't have it turned on. (or do I?) I
thought it was set to "off" by default, and only will be turned on if specified
explicitly, even when it is compiled in.


--- Neil Conway <nconway@klamath.dyndns.org> wrote:
> On Tue, 23 Apr 2002 07:02:28 -0700 (PDT)
> "Chris Gamache" <cgg007@yahoo.com> wrote:
> > 1. disallow access to database (except from me)
> > 2. drop all my indexes on the HUGE table
> > 3. pg_dumpall > outputfile
> >    (thankfully, I don't have large objects. I don't need to keep OID's)
> >
> > wait... wait... wait...
> >
> > 4. kill -INT `cat /usr/local/pgsql/data/postmaster.pid`
> > 5. mv /usr/local/pgsql /usr/local/pgsql.old
> > 6. make new postgresql
> > 7. /usr/local/pgsql/bin/initdb -D /usr/local/pgsql/data
> > 8. /usr/local/pgsql/bin/postmaster -D /usr/local/pgsql/data
> > 9. /usr/local/pgsql/bin/psql -d template1 -f outputfile
>
> Steps 6 & 7 can be done prior to taking the production DB offline.
> That will save a little time, at any rate.
>
> > wait a whole lot more...
> >
> > 10. recreate the indexes on the huge table
>
> When you recreate the indexes, how are you doing it? If you
> run several index creations in parallel, that should probably
> speed things up, especially on an SMP box. However, I haven't
> checked what locks CREATE INDEX acquires, it may prevent
> other concurrent index creations...
>
> > This takes _forever_ on a (dual) p2 450 with 256MB Ram and a 10000 RPM SCSI
> > filesystem... Besides upgrading the hardware, is there anything else I can
> do
> > process-wise to speed things up? The fsync is off, and I've increased WAL
> Files
> > to a good large number... Have I left any critical detail out of my problem
> > description? Do you need to see my actual config settings?
>
> Perhaps increase shared_buffers and wal_buffers?
>
> Cheers,
>
> Neil
>
> --
> Neil Conway <neilconway@rogers.com>
> PGP Key ID: DB3C29FC
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/users-lounge/docs/faq.html


__________________________________________________
Do You Yahoo!?
Yahoo! Games - play chess, backgammon, pool and more
http://games.yahoo.com/