Thread: problems with large objects dump
Our production database, postgres 8.4 has an approximate size of 200 GB, most of the data are large objects (174 GB), until a few months ago we used pg_dump to perform backups, took about 3-4 hours to perform all the process. Some time ago the process became interminable, take one or two days to process, we noticed that the decay process considerably to startup backup of large object, so we had to opt for physical backups.
We perform various tests on similar servers with the same version and postgres 9.2 and it is exactly the same, the database does not have other problems, nor has performance problems during everyday use.
Could someone suggest a solution? thanks
Sergio
Sergio Gabriel Rodriguez <sgrodriguez@gmail.com> writes: > Our production database, postgres 8.4 has an approximate size of 200 GB, > most of the data are large objects (174 GB), until a few months ago we used > pg_dump to perform backups, took about 3-4 hours to perform all the > process. Some time ago the process became interminable, take one or two > days to process, we noticed that the decay process considerably to startup > backup of large object, so we had to opt for physical backups. Hm ... there's been some recent work to reduce O(N^2) behaviors in pg_dump when there are many objects to dump, but I'm not sure that's relevant to your situation, because before 9.0 pg_dump didn't treat blobs as full-fledged database objects. You wouldn't happen to be trying to use a 9.0 or later pg_dump would you? Exactly what 8.4.x release is this, anyway? regards, tom lane
On Thu, Sep 20, 2012 at 11:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
You wouldn't happen to betrying to use a 9.0 or later pg_dump would you? Exactly what 8.4.x
release is this, anyway?
Tom, thanks for replying, yes, we tried it with postgres postgres 9.1 and 9.2 and the behavior is exactly the same. The production version is 8.4.9
Greetings,
sergio.
Sergio Gabriel Rodriguez <sgrodriguez@gmail.com> writes: > On Thu, Sep 20, 2012 at 11:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> You wouldn't happen to be >> trying to use a 9.0 or later pg_dump would you? Exactly what 8.4.x >> release is this, anyway? > Tom, thanks for replying, yes, we tried it with postgres postgres 9.1 and > 9.2 and the behavior is exactly the same. The production version is 8.4.9 Well, I see three different fixes for O(N^2) pg_dump performance problems in the 8.4.x change logs since 8.4.9, so you're a bit behind the times there. However, all of those fixes would have been in 9.2.0, so if you saw no improvement with a 9.2.0 pg_dump then the problem is something else. Can you put together a test case for somebody else to try, or try to locate the bottleneck yourself using oprofile or perf? regards, tom lane
Hi,
I tried with Postgresql 9.2 and the process used to take almost a day and a half, was significantly reduced to 6 hours, before failing even used to take four hours. My question now is, how long should it take the backup for a 200GB database with 80% of large objects?
Hp proliant Xeon G5
32 GB RAM
OS SLES 10 + logs --> raid 6
data-->raid 6
thanks!
On Thu, Sep 20, 2012 at 12:53 PM, Sergio Gabriel Rodriguez <sgrodriguez@gmail.com> wrote:
On Thu, Sep 20, 2012 at 11:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:You wouldn't happen to betrying to use a 9.0 or later pg_dump would you? Exactly what 8.4.x
release is this, anyway?Tom, thanks for replying, yes, we tried it with postgres postgres 9.1 and 9.2 and the behavior is exactly the same. The production version is 8.4.9Greetings,sergio.
Sergio Gabriel Rodriguez <sgrodriguez@gmail.com> writes: > I tried with Postgresql 9.2 and the process used to take almost a day > and a half, was significantly reduced to 6 hours, before failing even used > to take four hours. My question now is, how long should it take the backup > for a 200GB database with 80% of large objects? It's pretty hard to say without knowing a lot more info about your system than you provided. One thing that would shed some light is if you spent some time finding out where the time is going --- is the system constantly I/O busy, or is it CPU-bound, and if so in which process, pg_dump or the connected backend? Also, how many large objects is that? (If you don't know already, "select count(*) from pg_largeobject_metadata" would tell you.) regards, tom lane
On 10/11/2012 05:46 PM, Sergio Gabriel Rodriguez wrote:
Hi,Regards, Sergio.I tried with Postgresql 9.2 and the process used to take almost a day and a half, was significantly reduced to 6 hours, before failing even used to take four hours. My question now is, how long should it take the backup for a 200GB database with 80% of large objects?
That´s depends of several things.
Can you share your postgresql.conf here?Hp proliant Xeon G532 GB RAMOS SLES 10 + logs --> raid 6data-->raid 6
Which filesystem are you using for your data directory?
What options are you using to do the backup?
thanks!On Thu, Sep 20, 2012 at 12:53 PM, Sergio Gabriel Rodriguez <sgrodriguez@gmail.com> wrote:On Thu, Sep 20, 2012 at 11:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:You wouldn't happen to betrying to use a 9.0 or later pg_dump would you? Exactly what 8.4.x
release is this, anyway?Tom, thanks for replying, yes, we tried it with postgres postgres 9.1 and 9.2 and the behavior is exactly the same. The production version is 8.4.9Greetings,sergio.
--
Marcos Luis Ortíz Valmaseda
about.me/marcosortiz
@marcosluis2186

On Thu, Oct 11, 2012 at 7:16 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
It's pretty hard to say without knowing a lot more info about your system
than you provided. One thing that would shed some light is if you spent
some time finding out where the time is going --- is the system
constantly I/O busy, or is it CPU-bound, and if so in which process,
pg_dump or the connected backend?
the greatest amount of time is lost in I/O busy.
database_test=# select count(*) from pg_largeobject_metadata;
count
---------
5231973
(1 row)
I never use oprofile, but for a few hours into the process, I could take this report:
opreport -l /var/lib/pgsql/bin/pg_dump
Using /var/lib/oprofile/samples/ for samples directory.
CPU: Core 2, speed 2333.42 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000
samples % symbol name
1202449 56.5535 sortDumpableObjects
174626 8.2130 DOTypeNameCompare
81181 3.8181 DeflateCompressorZlib
70640 3.3223 _WriteByte
68020 3.1991 DOCatalogIdCompare
53789 2.5298 WriteInt
39797 1.8717 WriteToc
38252 1.7991 WriteDataToArchive
32947 1.5496 WriteStr
32488 1.5280 pg_qsort
30122 1.4167 dumpTableData_copy
27706 1.3031 dumpDumpableObject
26078 1.2265 dumpBlobs
25591 1.2036 _tocEntryRequired
23030 1.0831 WriteData
21171 0.9957 buildTocEntryArrays
20825 0.9794 _WriteData
18936 0.8906 _WriteBuf
18113 0.8519 BuildArchiveDependencies
12607 0.5929 findComments
11642 0.5475 EndCompressor
10833 0.5095 _CustomWriteFunc
10562 0.4968 WriteDataChunks
10247 0.4819 dumpBlob
5947 0.2797 EndBlob
5824 0.2739 _EndBlob
5047 0.2374 main
5030 0.2366 dumpComment
4959 0.2332 AllocateCompressor
4762 0.2240 dumpSecLabel
4705 0.2213 StartBlob
4052 0.1906 WriteOffset
3285 0.1545 ArchiveEntry
2640 0.1242 _StartBlob
2391 0.1125 pg_calloc
2233 0.1050 findObjectByDumpId
2197 0.1033 SetArchiveRestoreOptions
2149 0.1011 pg_strdup
1760 0.0828 getDumpableObjects
1311 0.0617 ParseCompressionOption
1288 0.0606 med3
1248 0.0587 _WriteExtraToc
944 0.0444 AssignDumpId
916 0.0431 findSecLabels
788 0.0371 pg_malloc
340 0.0160 addObjectDependency
317 0.0149 _ArchiveEntry
144 0.0068 swapfunc
72 0.0034 ScanKeywordLookup
60 0.0028 findObjectByCatalogId
41 0.0019 fmtId
27 0.0013 ExecuteSqlQuery
20 9.4e-04 dumpTable
10 4.7e-04 getTableAttrs
8 3.8e-04 fmtCopyColumnList
6 2.8e-04 shouldPrintColumn
5 2.4e-04 findObjectByOid
3 1.4e-04 dumpFunc
3 1.4e-04 format_function_signature
3 1.4e-04 getTypes
2 9.4e-05 _StartData
2 9.4e-05 buildACLCommands
2 9.4e-05 findLoop
2 9.4e-05 getTables
2 9.4e-05 parseOidArray
2 9.4e-05 selectSourceSchema
1 4.7e-05 TocIDRequired
1 4.7e-05 _EndData
1 4.7e-05 archprintf
1 4.7e-05 dumpACL
1 4.7e-05 dumpCollation
1 4.7e-05 dumpConstraint
1 4.7e-05 dumpOpr
1 4.7e-05 expand_schema_name_patterns
1 4.7e-05 findDumpableDependencies
1 4.7e-05 fmtQualifiedId
1 4.7e-05 getCollations
1 4.7e-05 getExtensions
1 4.7e-05 getFormattedTypeName
1 4.7e-05 getIndexes
1 4.7e-05 makeTableDataInfo
1 4.7e-05 vwrite_msg
thank you very much for your help
regards.
Sergio
Sergio Gabriel Rodriguez <sgrodriguez@gmail.com> writes: > On Thu, Oct 11, 2012 at 7:16 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> It's pretty hard to say without knowing a lot more info about your system >> than you provided. One thing that would shed some light is if you spent >> some time finding out where the time is going --- is the system >> constantly I/O busy, or is it CPU-bound, and if so in which process, >> pg_dump or the connected backend? > the greatest amount of time is lost in I/O busy. In that case there's not going to be a whole lot you can do about it, probably. Or at least not that's very practical --- I assume "buy faster disks" isn't a helpful answer. If the blobs are relatively static, it's conceivable that clustering pg_largeobject would help, but you're probably not going to want to take down your database for as long as that would take --- and the potential gains are unclear anyway. > I never use oprofile, but for a few hours into the process, I could take > this report: > 1202449 56.5535 sortDumpableObjects Hm. I suspect a lot of that has to do with the large objects; and it's really overkill to treat them as full-fledged objects since they never have unique dependencies. This wasn't a problem when commit c0d5be5d6a736d2ee8141e920bc3de8e001bf6d9 went in, but I think now it might be because of the additional constraints added in commit a1ef01fe163b304760088e3e30eb22036910a495. I wonder if it's time to try to optimize pg_dump's handling of blobs a bit better. But still, any such fix probably wouldn't make a huge difference for you. Most of the time is going into pushing the blob data around, I think. regards, tom lane
I wrote: > Sergio Gabriel Rodriguez <sgrodriguez@gmail.com> writes: >> I never use oprofile, but for a few hours into the process, I could take >> this report: >> 1202449 56.5535 sortDumpableObjects > Hm. I suspect a lot of that has to do with the large objects; and it's > really overkill to treat them as full-fledged objects since they never > have unique dependencies. This wasn't a problem when commit > c0d5be5d6a736d2ee8141e920bc3de8e001bf6d9 went in, but I think now it > might be because of the additional constraints added in commit > a1ef01fe163b304760088e3e30eb22036910a495. I wonder if it's time to try > to optimize pg_dump's handling of blobs a bit better. But still, any > such fix probably wouldn't make a huge difference for you. Most of the > time is going into pushing the blob data around, I think. For fun, I tried adding 5 million empty blobs to the standard regression database, and then did a pg_dump. It took a bit under 9 minutes on my workstation. oprofile showed about 32% of pg_dump's runtime going into sortDumpableObjects, which might make you think that's worth optimizing ... until you look at the bigger picture system-wide: samples| %| ------------------ 727394 59.4098 kernel 264874 21.6336 postgres 136734 11.1677 /lib64/libc-2.14.90.so 39878 3.2570 pg_dump 37025 3.0240 libpq.so.5.6 17964 1.4672 /usr/bin/wc 354 0.0289 /usr/bin/oprofiled So actually sortDumpableObjects took only about 1% of the CPU cycles. And remember this is with empty objects. If we'd been shoving 200GB of data through the dump, the data pipeline would surely have swamped all else. So I think the original assumption that we didn't need to optimize pg_dump's object management infrastructure for blobs still holds good. If there's anything that is worth fixing here, it's the number of server roundtrips being used ... regards, tom lane
On Fri, Oct 12, 2012 at 10:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
So I think the original assumption that we didn't need to optimizepg_dump's object management infrastructure for blobs still holds good.
If there's anything that is worth fixing here, it's the number of server
roundtrips being used ...
I found something similar
samples| %|
------------------
233391664 60.5655 no-vmlinux
78789949 20.4461 libz.so.1.2.3
31984753 8.3001 postgres
21564413 5.5960 libc-2.4.so
4086941 1.0606 ld-2.4.so
2427151 0.6298 bash
2355895 0.6114 libc-2.4.so
2173558 0.5640 pg_dump
1771931 0.4598 oprofiled
there are anything I can do to improve this?
Thanks
Sergio