Re: Concurrent COPY commands - Mailing list pgsql-novice
From | Phillip Sitbon |
---|---|
Subject | Re: Concurrent COPY commands |
Date | |
Msg-id | 536685ea0807090935i2e191146y7f3acf5bba4ca0ea@mail.gmail.com Whole thread Raw |
In response to | Re: Concurrent COPY commands (Alan Hodgson <ahodgson@simkin.ca>) |
Responses |
Re: Concurrent COPY commands
|
List | pgsql-novice |
Sorry about the late reply.
I only have two fast SATA drives on software RAID, but that really isn't the issue- while the copy commands are going, disk activity is relatively low. By relatively I mean that I have seen it a lot higher under certain circumstances, and I know for sure the disks aren't holding anything back. I know it's a bad comparison, but the process generating this huge amount of data can write directly to the disk very fast and still be CPU-bound, while it eventually ends up waiting for postgres when I try to pipe it into the database. I figured some overhead was to be expected and that's why I tried the parallel setup in the first place.
What I see is that after some buffering (not sure it is buffering, but after it gets some data), one postgres process will ramp up to 100% CPU (on one core) for some time, thus blocking its input FIFO. That is when the hard drive activity goes up a bit, but whatever it is doing is definitely CPU-bound on that core.
No more than one worker process does this at a time. And no matter what kind of FIFO buffers and select() calls I use, the calling process eventually gets blocked because the postgres processes don't appear to be working in parallel as well as they could be; hence, postgres doesn't take in any more data for a while. I'm really curious about why going parallel x6 is so much slower than one process when the disks aren't being pushed that hard compared to their capabilities.
I am suspecting something wrong with my config, but I can't be sure. Is 1-2 GB for work_mem ok? Would that hurt it?
On a positive note, I let the single-process version run to completion and I now have a solid TB of data that I can access and use at lightning speed :)
Cheers,
Phillip
I only have two fast SATA drives on software RAID, but that really isn't the issue- while the copy commands are going, disk activity is relatively low. By relatively I mean that I have seen it a lot higher under certain circumstances, and I know for sure the disks aren't holding anything back. I know it's a bad comparison, but the process generating this huge amount of data can write directly to the disk very fast and still be CPU-bound, while it eventually ends up waiting for postgres when I try to pipe it into the database. I figured some overhead was to be expected and that's why I tried the parallel setup in the first place.
What I see is that after some buffering (not sure it is buffering, but after it gets some data), one postgres process will ramp up to 100% CPU (on one core) for some time, thus blocking its input FIFO. That is when the hard drive activity goes up a bit, but whatever it is doing is definitely CPU-bound on that core.
No more than one worker process does this at a time. And no matter what kind of FIFO buffers and select() calls I use, the calling process eventually gets blocked because the postgres processes don't appear to be working in parallel as well as they could be; hence, postgres doesn't take in any more data for a while. I'm really curious about why going parallel x6 is so much slower than one process when the disks aren't being pushed that hard compared to their capabilities.
I am suspecting something wrong with my config, but I can't be sure. Is 1-2 GB for work_mem ok? Would that hurt it?
On a positive note, I let the single-process version run to completion and I now have a solid TB of data that I can access and use at lightning speed :)
Cheers,
Phillip
On Wed, Jul 2, 2008 at 10:02 AM, Alan Hodgson <ahodgson@simkin.ca> wrote:
On Wednesday 02 July 2008, Phillip Sitbon <phillip@sitbon.net> wrote:Sounds like you're I/O bound - I doubt any other concurrency mechanism will
> Hello,
>
> I am running some queries that use multiple connections to issue COPY
> commands which bring data into the same table via different files (FIFOs
> to be precise). This is being done on a SMP machine and I am noticing
> that none of the postgres worker processes operate in parallel, even
> though there is data available to all of them. The performance is nearly
> exactly the same as it is for issuing a single COPY command.
> Is this
> normal behavior, even with all of the separate transactions still in
> progress? Would I be better off doing multithreaded bulk inserts from my
> C program rather than sending the data to FIFOs?
change that much.Ah, but what does your RAID controller and drives look like?
>
> The machine I am using has 16GB of memory and 8 cores, so I've tried to
> optimize the configuration accordingly but I am a little lost in some
> places.
--
Alan
--
Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-novice
pgsql-novice by date: