Thread: problems with large objects dump

problems with large objects dump

From

Sergio Gabriel Rodriguez

Date:

20 September 2012, 12:06:42

Our production database, postgres 8.4 has an approximate size of 200 GB, most of the data are large objects (174 GB), until a few months ago we used pg_dump to perform backups, took about 3-4 hours to perform all the process. Some time ago the process became interminable, take one or two days to process, we noticed that the decay process considerably to startup backup of large object, so we had to opt for physical backups.

We perform various tests on similar servers with the same version and postgres 9.2 and it is exactly the same, the database does not have other problems, nor has performance problems during everyday use.

Could someone suggest a solution? thanks

Sergio

Re: problems with large objects dump

From

Tom Lane

Date:

20 September 2012, 14:36:28

Sergio Gabriel Rodriguez <sgrodriguez@gmail.com> writes:
> Our production database, postgres 8.4 has an approximate size of 200 GB,
> most of the data are large objects (174 GB), until a few months ago we used
> pg_dump to perform backups, took about 3-4 hours to perform all the
> process. Some time ago the process became interminable, take one or two
> days to process, we noticed that the decay process considerably to startup
> backup of large object, so we had to opt for physical backups.

Hm ... there's been some recent work to reduce O(N^2) behaviors in
pg_dump when there are many objects to dump, but I'm not sure that's
relevant to your situation, because before 9.0 pg_dump didn't treat
blobs as full-fledged database objects.  You wouldn't happen to be
trying to use a 9.0 or later pg_dump would you?  Exactly what 8.4.x
release is this, anyway?

            regards, tom lane

Re: problems with large objects dump

From

Sergio Gabriel Rodriguez

Date:

20 September 2012, 16:29:54

On Thu, Sep 20, 2012 at 11:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

You wouldn't happen to be
trying to use a 9.0 or later pg_dump would you? Exactly what 8.4.x
release is this, anyway?

Tom, thanks for replying, yes, we tried it with postgres postgres 9.1 and 9.2 and the behavior is exactly the same. The production version is 8.4.9

Greetings,

sergio.

Re: problems with large objects dump

From

Tom Lane

Date:

20 September 2012, 16:33:24

Sergio Gabriel Rodriguez <sgrodriguez@gmail.com> writes:
> On Thu, Sep 20, 2012 at 11:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> You wouldn't happen to be
>> trying to use a 9.0 or later pg_dump would you?  Exactly what 8.4.x
>> release is this, anyway?

> Tom, thanks for replying, yes, we tried it with postgres postgres 9.1 and
> 9.2 and the behavior is exactly the same. The production version is 8.4.9

Well, I see three different fixes for O(N^2) pg_dump performance
problems in the 8.4.x change logs since 8.4.9, so you're a bit behind
the times there.  However, all of those fixes would have been in 9.2.0,
so if you saw no improvement with a 9.2.0 pg_dump then the problem is
something else.  Can you put together a test case for somebody else to
try, or try to locate the bottleneck yourself using oprofile or perf?

            regards, tom lane

Re: problems with large objects dump

From

Sergio Gabriel Rodriguez

Date:

11 October 2012, 21:46:11

Hi,

I tried with Postgresql 9.2 and the process used to take almost a day and a half, was significantly reduced to 6 hours, before failing even used to take four hours. My question now is, how long should it take the backup for a 200GB database with 80% of large objects?

Hp proliant Xeon G5

32 GB RAM

OS SLES 10 + logs --> raid 6

data-->raid 6

thanks!

On Thu, Sep 20, 2012 at 12:53 PM, Sergio Gabriel Rodriguez <sgrodriguez@gmail.com> wrote:

On Thu, Sep 20, 2012 at 11:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
You wouldn't happen to be
trying to use a 9.0 or later pg_dump would you? Exactly what 8.4.x
release is this, anyway?

Tom, thanks for replying, yes, we tried it with postgres postgres 9.1 and 9.2 and the behavior is exactly the same. The production version is 8.4.9

Greetings,

sergio.

Re: problems with large objects dump

From

Tom Lane

Date:

11 October 2012, 22:16:37

Sergio Gabriel Rodriguez <sgrodriguez@gmail.com> writes:
>     I tried with Postgresql 9.2 and the process used to take almost a day
> and a half, was significantly reduced to 6 hours, before failing even used
> to take four hours. My question now is, how long should it take the backup
> for a 200GB database with 80% of large objects?

It's pretty hard to say without knowing a lot more info about your system
than you provided.  One thing that would shed some light is if you spent
some time finding out where the time is going --- is the system
constantly I/O busy, or is it CPU-bound, and if so in which process,
pg_dump or the connected backend?

Also, how many large objects is that?  (If you don't know already,
"select count(*) from pg_largeobject_metadata" would tell you.)

            regards, tom lane

Re: problems with large objects dump

From

Marcos Ortiz

Date:

11 October 2012, 22:26:42

On 10/11/2012 05:46 PM, Sergio Gabriel Rodriguez wrote:

Hi,
I tried with Postgresql 9.2 and the process used to take almost a day and a half, was significantly reduced to 6 hours, before failing even used to take four hours. My question now is, how long should it take the backup for a 200GB database with 80% of large objects?

Regards, Sergio.
That´s depends of several things.

Hp proliant Xeon G5
32 GB RAM
OS SLES 10 + logs --> raid 6
data-->raid 6

Can you share your postgresql.conf here?
Which filesystem are you using for your data directory?
What options are you using to do the backup?

thanks!

On Thu, Sep 20, 2012 at 12:53 PM, Sergio Gabriel Rodriguez <sgrodriguez@gmail.com> wrote:
On Thu, Sep 20, 2012 at 11:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
You wouldn't happen to be
trying to use a 9.0 or later pg_dump would you? Exactly what 8.4.x
release is this, anyway?

Tom, thanks for replying, yes, we tried it with postgres postgres 9.1 and 9.2 and the behavior is exactly the same. The production version is 8.4.9

Greetings,

sergio.

Marcos Luis Ortíz Valmaseda
about.me/marcosortiz
@marcosluis2186

Re: problems with large objects dump

From

Sergio Gabriel Rodriguez

Date:

12 October 2012, 22:35:12

On Thu, Oct 11, 2012 at 7:16 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

It's pretty hard to say without knowing a lot more info about your system
than you provided. One thing that would shed some light is if you spent
some time finding out where the time is going --- is the system
constantly I/O busy, or is it CPU-bound, and if so in which process,
pg_dump or the connected backend?

the greatest amount of time is lost in I/O busy.

database_test=# select count(*) from pg_largeobject_metadata;

count

---------

5231973

(1 row)

I never use oprofile, but for a few hours into the process, I could take this report:

opreport -l /var/lib/pgsql/bin/pg_dump

Using /var/lib/oprofile/samples/ for samples directory.

CPU: Core 2, speed 2333.42 MHz (estimated)

Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000

samples % symbol name

1202449 56.5535 sortDumpableObjects

174626 8.2130 DOTypeNameCompare

81181 3.8181 DeflateCompressorZlib

70640 3.3223 _WriteByte

68020 3.1991 DOCatalogIdCompare

53789 2.5298 WriteInt

39797 1.8717 WriteToc

38252 1.7991 WriteDataToArchive

32947 1.5496 WriteStr

32488 1.5280 pg_qsort

30122 1.4167 dumpTableData_copy

27706 1.3031 dumpDumpableObject

26078 1.2265 dumpBlobs

25591 1.2036 _tocEntryRequired

23030 1.0831 WriteData

21171 0.9957 buildTocEntryArrays

20825 0.9794 _WriteData

18936 0.8906 _WriteBuf

18113 0.8519 BuildArchiveDependencies

12607 0.5929 findComments

11642 0.5475 EndCompressor

10833 0.5095 _CustomWriteFunc

10562 0.4968 WriteDataChunks

10247 0.4819 dumpBlob

5947 0.2797 EndBlob

5824 0.2739 _EndBlob

5047 0.2374 main

5030 0.2366 dumpComment

4959 0.2332 AllocateCompressor

4762 0.2240 dumpSecLabel

4705 0.2213 StartBlob

4052 0.1906 WriteOffset

3285 0.1545 ArchiveEntry

2640 0.1242 _StartBlob

2391 0.1125 pg_calloc

2233 0.1050 findObjectByDumpId

2197 0.1033 SetArchiveRestoreOptions

2149 0.1011 pg_strdup

1760 0.0828 getDumpableObjects

1311 0.0617 ParseCompressionOption

1288 0.0606 med3

1248 0.0587 _WriteExtraToc

944 0.0444 AssignDumpId

916 0.0431 findSecLabels

788 0.0371 pg_malloc

340 0.0160 addObjectDependency

317 0.0149 _ArchiveEntry

144 0.0068 swapfunc

72 0.0034 ScanKeywordLookup

60 0.0028 findObjectByCatalogId

41 0.0019 fmtId

27 0.0013 ExecuteSqlQuery

20 9.4e-04 dumpTable

10 4.7e-04 getTableAttrs

8 3.8e-04 fmtCopyColumnList

6 2.8e-04 shouldPrintColumn

5 2.4e-04 findObjectByOid

3 1.4e-04 dumpFunc

3 1.4e-04 format_function_signature

3 1.4e-04 getTypes

2 9.4e-05 _StartData

2 9.4e-05 buildACLCommands

2 9.4e-05 findLoop

2 9.4e-05 getTables

2 9.4e-05 parseOidArray

2 9.4e-05 selectSourceSchema

1 4.7e-05 TocIDRequired

1 4.7e-05 _EndData

1 4.7e-05 archprintf

1 4.7e-05 dumpACL

1 4.7e-05 dumpCollation

1 4.7e-05 dumpConstraint

1 4.7e-05 dumpOpr

1 4.7e-05 expand_schema_name_patterns

1 4.7e-05 findDumpableDependencies

1 4.7e-05 fmtQualifiedId

1 4.7e-05 getCollations

1 4.7e-05 getExtensions

1 4.7e-05 getFormattedTypeName

1 4.7e-05 getIndexes

1 4.7e-05 makeTableDataInfo

1 4.7e-05 vwrite_msg

thank you very much for your help

regards.

Sergio

Re: problems with large objects dump

From

Tom Lane

Date:

12 October 2012, 23:18:49

Sergio Gabriel Rodriguez <sgrodriguez@gmail.com> writes:
> On Thu, Oct 11, 2012 at 7:16 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> It's pretty hard to say without knowing a lot more info about your system
>> than you provided.  One thing that would shed some light is if you spent
>> some time finding out where the time is going --- is the system
>> constantly I/O busy, or is it CPU-bound, and if so in which process,
>> pg_dump or the connected backend?

>  the greatest amount of time is lost in I/O busy.

In that case there's not going to be a whole lot you can do about it,
probably.  Or at least not that's very practical --- I assume "buy
faster disks" isn't a helpful answer.

If the blobs are relatively static, it's conceivable that clustering
pg_largeobject would help, but you're probably not going to want to take
down your database for as long as that would take --- and the potential
gains are unclear anyway.

> I never use oprofile, but for a few hours into the process, I could take
> this report:

> 1202449  56.5535  sortDumpableObjects

Hm.  I suspect a lot of that has to do with the large objects; and it's
really overkill to treat them as full-fledged objects since they never
have unique dependencies.  This wasn't a problem when commit
c0d5be5d6a736d2ee8141e920bc3de8e001bf6d9 went in, but I think now it
might be because of the additional constraints added in commit
a1ef01fe163b304760088e3e30eb22036910a495.  I wonder if it's time to try
to optimize pg_dump's handling of blobs a bit better.  But still, any
such fix probably wouldn't make a huge difference for you.  Most of the
time is going into pushing the blob data around, I think.

            regards, tom lane

Re: problems with large objects dump

From

Tom Lane

Date:

13 October 2012, 01:32:01

I wrote:
> Sergio Gabriel Rodriguez <sgrodriguez@gmail.com> writes:
>> I never use oprofile, but for a few hours into the process, I could take
>> this report:
>> 1202449  56.5535  sortDumpableObjects

> Hm.  I suspect a lot of that has to do with the large objects; and it's
> really overkill to treat them as full-fledged objects since they never
> have unique dependencies.  This wasn't a problem when commit
> c0d5be5d6a736d2ee8141e920bc3de8e001bf6d9 went in, but I think now it
> might be because of the additional constraints added in commit
> a1ef01fe163b304760088e3e30eb22036910a495.  I wonder if it's time to try
> to optimize pg_dump's handling of blobs a bit better.  But still, any
> such fix probably wouldn't make a huge difference for you.  Most of the
> time is going into pushing the blob data around, I think.

For fun, I tried adding 5 million empty blobs to the standard regression
database, and then did a pg_dump.  It took a bit under 9 minutes on my
workstation.  oprofile showed about 32% of pg_dump's runtime going into
sortDumpableObjects, which might make you think that's worth optimizing
... until you look at the bigger picture system-wide:

  samples|      %|
------------------
   727394 59.4098 kernel
   264874 21.6336 postgres
   136734 11.1677 /lib64/libc-2.14.90.so
    39878  3.2570 pg_dump
    37025  3.0240 libpq.so.5.6
    17964  1.4672 /usr/bin/wc
      354  0.0289 /usr/bin/oprofiled

So actually sortDumpableObjects took only about 1% of the CPU cycles.
And remember this is with empty objects.  If we'd been shoving 200GB of
data through the dump, the data pipeline would surely have swamped all
else.

So I think the original assumption that we didn't need to optimize
pg_dump's object management infrastructure for blobs still holds good.
If there's anything that is worth fixing here, it's the number of server
roundtrips being used ...

            regards, tom lane

Re: problems with large objects dump

From

Sergio Gabriel Rodriguez

Date:

15 October 2012, 10:19:36

On Fri, Oct 12, 2012 at 10:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

So I think the original assumption that we didn't need to optimize
pg_dump's object management infrastructure for blobs still holds good.
If there's anything that is worth fixing here, it's the number of server
roundtrips being used ...

I found something similar

samples| %|

------------------

233391664 60.5655 no-vmlinux

78789949 20.4461 libz.so.1.2.3

31984753 8.3001 postgres

21564413 5.5960 libc-2.4.so

4086941 1.0606 ld-2.4.so

2427151 0.6298 bash

2355895 0.6114 libc-2.4.so

2173558 0.5640 pg_dump

1771931 0.4598 oprofiled

there are anything I can do to improve this?

Thanks

Sergio