Home > mailing lists

Re: Concurrent COPY commands - Mailing list pgsql-novice

From	Phillip Sitbon
Subject	Re: Concurrent COPY commands
Date	July 9, 2008 13:35:23
Msg-id	536685ea0807090935i2e191146y7f3acf5bba4ca0ea@mail.gmail.com Whole thread Raw
In response to	Re: Concurrent COPY commands (Alan Hodgson <ahodgson@simkin.ca>)
Responses	Re: Concurrent COPY commands
List	pgsql-novice

Tree view

Sorry about the late reply.

I only have two fast SATA drives on software RAID, but that really isn't the issue- while the copy commands are going, disk activity is relatively low. By relatively I mean that I have seen it a lot higher under certain circumstances, and I know for sure the disks aren't holding anything back. I know it's a bad comparison, but the process generating this huge amount of data can write directly to the disk very fast and still be CPU-bound, while it eventually ends up waiting for postgres when I try to pipe it into the database. I figured some overhead was to be expected and that's why I tried the parallel setup in the first place.

What I see is that after some buffering (not sure it is buffering, but after it gets some data), one postgres process will ramp up to 100% CPU (on one core) for some time, thus blocking its input FIFO. That is when the hard drive activity goes up a bit, but whatever it is doing is definitely CPU-bound on that core.

No more than one worker process does this at a time. And no matter what kind of FIFO buffers and select() calls I use, the calling process eventually gets blocked because the postgres processes don't appear to be working in parallel as well as they could be; hence, postgres doesn't take in any more data for a while. I'm really curious about why going parallel x6 is so much slower than one process when the disks aren't being pushed that hard compared to their capabilities.

I am suspecting something wrong with my config, but I can't be sure. Is 1-2 GB for work_mem ok? Would that hurt it?

On a positive note, I let the single-process version run to completion and I now have a solid TB of data that I can access and use at lightning speed :)

Cheers,

Phillip

On Wed, Jul 2, 2008 at 10:02 AM, Alan Hodgson <ahodgson@simkin.ca> wrote:

On Wednesday 02 July 2008, Phillip Sitbon <phillip@sitbon.net> wrote:
> Hello,
>
> I am running some queries that use multiple connections to issue COPY
> commands which bring data into the same table via different files (FIFOs
> to be precise). This is being done on a SMP machine and I am noticing
> that none of the postgres worker processes operate in parallel, even
> though there is data available to all of them. The performance is nearly
> exactly the same as it is for issuing a single COPY command.
> Is this
> normal behavior, even with all of the separate transactions still in
> progress? Would I be better off doing multithreaded bulk inserts from my
> C program rather than sending the data to FIFOs?

Sounds like you're I/O bound - I doubt any other concurrency mechanism will
change that much.

>
> The machine I am using has 16GB of memory and 8 cores, so I've tried to
> optimize the configuration accordingly but I am a little lost in some
> places.

Ah, but what does your RAID controller and drives look like?

--
Alan

--
Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-novice

pgsql-novice by date:

From: nobs@nobswolf.info (Emil Obermayr)
Date: 09 July 2008, 04:13:01
Subject: Re: how to get dependancies of a table?

From: Alan Hodgson
Date: 09 July 2008, 13:45:50
Subject: Re: Concurrent COPY commands

Re: Concurrent COPY commands - Mailing list pgsql-novice

Previous

Next