Re: parallel pg_restore - WIP patch - Mailing list pgsql-hackers
From | Andrew Dunstan |
---|---|
Subject | Re: parallel pg_restore - WIP patch |
Date | |
Msg-id | 48DCD590.4000207@dunslane.net Whole thread Raw |
In response to | parallel pg_restore - WIP patch (Andrew Dunstan <andrew@dunslane.net>) |
Responses |
Re: parallel pg_restore - WIP patch
|
List | pgsql-hackers |
Russell Smith wrote: > Hi, > > As I'm interested in this topic, I thought I'd take a look at the > patch. I have no capability to test it on high end hardware but did > some basic testing on my workstation and basic review of the patch. > > I somehow had the impression that instead of creating a new connection > for each restore item we would create the processes at the start and > then send them the dumpId's they should be restoring. That would allow > the controller to batch dumpId's together and expect the worker to > process them in a transaction. But this is probably just an idea I > created in my head. > Yes it is. To do that I would have to invent a protocol for talking to the workers, etc, and there is not the slightest chance I would get that done by November. And I don't see the virtue in processing them all in a transaction. I've provided a much simpler means of avoiding WAL logging of the COPY. > Do we know why we experience "tuple concurrently updated" errors if we > spawn thread too fast? > No. That's an open item. > I completed some test restores using the pg_restore from head with the > patch applied. The dump was a custom dump created with pg 8.2 and > restored to an 8.2 database. To confirm this would work, I completed a > restore using the standard single threaded mode. The schema restore > successfully. The only errors reported involved non-existent roles. > > When I attempt to restore using parallel restore I get out of memory > errors reported from _PrintData. The code returning the error is; > > _PrintData(... > while (blkLen != 0) > { > if (blkLen + 1 > ctx->inSize) > { > free(ctx->zlibIn); > ctx->zlibIn = NULL; > ctx->zlibIn = (char *) malloc(blkLen + 1); > if (!ctx->zlibIn) > die_horribly(AH, modulename, " out of memory\n"); > > ctx->inSize = blkLen + 1; > in = ctx->zlibIn; > } > > > It appears from my debugging and looking at the code that in _PrintData; > lclContext *ctx = (lclContext *) AH->formatData; > > the memory context is shared across all threads. Which means that it's > possible the memory contexts are stomping on each other. My GDB skills > are now up to being able to reproduce this in a gdb session as there are > forks going on all over the place. And if you process them in a serial > fashion, there aren't any errors. I'm not sure of the fix for this. > But in a parallel environment it doesn't seem possible to store the > memory context in the AH. > There are no threads, hence nothing is shared. fork() create s new process, not a new thread, and all they share are file descriptors. > I also receive messages saying "pg_restore: [custom archiver] could not > read from input file: end of file". I have not investigated these > further as my current guess is they are linked to the out of memory error. > > Given I ran into this error at my first testing attempt I haven't > evaluated much else at this point in time. Now all this could be > because I'm using the 8.2 archive, but it works fine in single restore > mode. The dump file is about 400M compressed and an entire archive > schema was removed from the restore path with a custom restore list. > > Command line used; PGPORT=5432 ./pg_restore -h /var/run/postgresql -m4 > --truncate-before-load -v -d tt2 -L tt.list > /home/mr-russ/pg-index-test/timetable.pgdump 2> log.txt > > I've attached the log.txt file so you can review the errors that I saw. > I have adjusted the "out of memory" error to include a number to work > out which one was being triggered. So you'll see "5 out of memory" in > the log file, which corresponds to the code above. > However, there does seem to be something odd happening with the compression lib, which I will investigate. Thanks for the report. cheers andrew
pgsql-hackers by date: