Thread: Re: [PERFORM] multi-threaded pgloader makes it in version 2.3.0
Hi, Le samedi 01 mars 2008, Simon Riggs a écrit : > On Tue, 2008-02-26 at 13:08 +0100, Dimitri Fontaine wrote: > > I'd like to have some feedback about the new version, in term of bugs > > encountered and performance limitations (is pgloader up to what you would > > expect a multi-threaded loader to be at?) > > Maybe post to general as well if you don't get any replies here. > New feature is very important for us. So, here's yet another mail about pgloader new 2.3.0 version, please forgive me for being over zealous here if that's how I appear to be to you... Those links will give you detailed information about the new release. http://pgfoundry.org/projects/pgloader http://pgfoundry.org/forum/forum.php?forum_id=1283 http://pgloader.projects.postgresql.org/#_parallel_loading Regards, -- dim
Attachment
On Mon, 2008-03-10 at 17:18 +0100, Dimitri Fontaine wrote: > Le samedi 01 mars 2008, Simon Riggs a écrit : > > On Tue, 2008-02-26 at 13:08 +0100, Dimitri Fontaine wrote: > > > I'd like to have some feedback about the new version, in term of bugs > > > encountered and performance limitations (is pgloader up to what you would > > > expect a multi-threaded loader to be at?) > > > > Maybe post to general as well if you don't get any replies here. > > New feature is very important for us. > > So, here's yet another mail about pgloader new 2.3.0 version, please forgive > me for being over zealous here if that's how I appear to be to you... > > Those links will give you detailed information about the new release. > http://pgfoundry.org/projects/pgloader > http://pgfoundry.org/forum/forum.php?forum_id=1283 > http://pgloader.projects.postgresql.org/#_parallel_loading Sounds good. Not sure when or why I would want an rrqueue_size larger than copy_every, and less sounds very strange. Can we get away with it being the same thing in all cases? Do you have some basic performance numbers? It would be good to understand the overhead of the parallelism on a large file with 1, 2 and 4 threads. Would be good to see if synchronous_commit = off helped speed things up as well. Presumably -V and -T still work when we go parallel, but just issue one query? -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk
Le lundi 10 mars 2008, Simon Riggs a écrit : > Not sure when or why I would want an rrqueue_size larger than > copy_every, and less sounds very strange. Can we get away with it being > the same thing in all cases? In fact, that's just that you asked for a reader which reads one line at a time and feed the workers in a round robin fashion, and I wanted to feed them more than 1 line at a time, hence this parameter. Of course it could well be it's not needed, and I'll then deprecate it in next version. Please note it defaults to what you want it to be, so you can just forget about it... I'm beginning to think you asked 1 line at a time for the first version to be easier to implement... :) > Do you have some basic performance numbers? It would be good to > understand the overhead of the parallelism on a large file with 1, 2 and > 4 threads. Would be good to see if synchronous_commit = off helped speed > things up as well. Didn't have the time to test this performance wise, that's why I asked for testing last time. I've planned some perf tests if only to have the opportunity to write up some presentation article, but didn't find the time to run them yet. > Presumably -V and -T still work when we go parallel, but just issue one > query? Still work, of course, the 'controller' thread will issue them before to parallelize the work or begin to read the input file. Rejecting still works the same too, threads share a reject object which is protected by a lock (mutex), so the file don't get mixed line. I've tried not to compromise any existing feature by adding the parallel ones, and didn't have to at the end of it. Regards, -- dim