Thread: Re: [PERFORM] multi-threaded pgloader makes it in version 2.3.0

Re: [PERFORM] multi-threaded pgloader makes it in version 2.3.0

From

Dimitri Fontaine

Date:

10 March 2008, 13:18:33

Hi,

Le samedi 01 mars 2008, Simon Riggs a écrit :
> On Tue, 2008-02-26 at 13:08 +0100, Dimitri Fontaine wrote:
> > I'd like to have some feedback about the new version, in term of bugs
> > encountered and performance limitations (is pgloader up to what you would
> > expect a multi-threaded loader to be at?)
>
> Maybe post to general as well if you don't get any replies here.
> New feature is very important for us.

So, here's yet another mail about pgloader new 2.3.0 version, please forgive
me for being over zealous here if that's how I appear to be to you...

Those links will give you detailed information about the new release.
  http://pgfoundry.org/projects/pgloader
  http://pgfoundry.org/forum/forum.php?forum_id=1283
  http://pgloader.projects.postgresql.org/#_parallel_loading

Regards,
--
dim

Attachment

signature.asc

Re: [PERFORM] multi-threaded pgloader makes it in version 2.3.0

From

Simon Riggs

Date:

10 March 2008, 14:14:14

On Mon, 2008-03-10 at 17:18 +0100, Dimitri Fontaine wrote:

> Le samedi 01 mars 2008, Simon Riggs a écrit :
> > On Tue, 2008-02-26 at 13:08 +0100, Dimitri Fontaine wrote:
> > > I'd like to have some feedback about the new version, in term of bugs
> > > encountered and performance limitations (is pgloader up to what you would
> > > expect a multi-threaded loader to be at?)
> >
> > Maybe post to general as well if you don't get any replies here.
> > New feature is very important for us.
>
> So, here's yet another mail about pgloader new 2.3.0 version, please forgive
> me for being over zealous here if that's how I appear to be to you...
>
> Those links will give you detailed information about the new release.
>   http://pgfoundry.org/projects/pgloader
>   http://pgfoundry.org/forum/forum.php?forum_id=1283
>   http://pgloader.projects.postgresql.org/#_parallel_loading

Sounds good.

Not sure when or why I would want an rrqueue_size larger than
copy_every, and less sounds very strange. Can we get away with it being
the same thing in all cases?

Do you have some basic performance numbers? It would be good to
understand the overhead of the parallelism on a large file with 1, 2 and
4 threads. Would be good to see if synchronous_commit = off helped speed
things up as well.

Presumably -V and -T still work when we go parallel, but just issue one
query?

--
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com

  PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk

Re: [PERFORM] multi-threaded pgloader makes it in version 2.3.0

From

Dimitri Fontaine

Date:

10 March 2008, 14:57:56

Le lundi 10 mars 2008, Simon Riggs a écrit :
> Not sure when or why I would want an rrqueue_size larger than
> copy_every, and less sounds very strange. Can we get away with it being
> the same thing in all cases?

In fact, that's just that you asked for a reader which reads one line at a
time and feed the workers in a round robin fashion, and I wanted to feed them
more than 1 line at a time, hence this parameter. Of course it could well be
it's not needed, and I'll then deprecate it in next version.
Please note it defaults to what you want it to be, so you can just forget
about it...

I'm beginning to think you asked 1 line at a time for the first version to be
easier to implement... :)

> Do you have some basic performance numbers? It would be good to
> understand the overhead of the parallelism on a large file with 1, 2 and
> 4 threads. Would be good to see if synchronous_commit = off helped speed
> things up as well.

Didn't have the time to test this performance wise, that's why I asked for
testing last time. I've planned some perf tests if only to have the
opportunity to write up some presentation article, but didn't find the time
to run them yet.

> Presumably -V and -T still work when we go parallel, but just issue one
> query?

Still work, of course, the 'controller' thread will issue them before to
parallelize the work or begin to read the input file. Rejecting still works
the same too, threads share a reject object which is protected by a lock
(mutex), so the file don't get mixed line.
I've tried not to compromise any existing feature by adding the parallel ones,
and didn't have to at the end of it.

Regards,
--
dim

Attachment

signature.asc