Re: COPY v. java performance comparison - Mailing list pgsql-general
From | Adrian Klaver |
---|---|
Subject | Re: COPY v. java performance comparison |
Date | |
Msg-id | 533C74CB.4070403@aklaver.com Whole thread Raw |
In response to | Re: COPY v. java performance comparison (Rob Sargent <robjsargent@gmail.com>) |
Responses |
Re: COPY v. java performance comparison
Re: COPY v. java performance comparison |
List | pgsql-general |
On 04/02/2014 01:14 PM, Rob Sargent wrote: > On 04/02/2014 01:56 PM, Steve Atkins wrote: >> On Apr 2, 2014, at 12:37 PM, Rob Sargent<robjsargent@gmail.com> wrote: >> >>> I'm playing with various data models to compare performance and practicalities and not sure if I should be surprisedby the numbers I'm getting. I hope this report isn't too wishy-washy for reasoned comment. >>> >>> One model says a genotype is defined as follows: >>> Table "public.oldstyle" >>> +-------------+--------------+-----------+ >>> | Column | Type | Modifiers | >>> +-------------+--------------+-----------+ >>> | id | uuid | not null | >>> | sample_name | text | not null | >>> | marker_name | text | not null | >>> | allele1 | character(1) | | >>> | allele2 | character(1) | | >>> +-------------+--------------+-----------+ >>> (0. id is a Primary Key) >>> (1. Take what you will from the table name.) >>> (2. I hadn't thought of "char" type at this point) >>> (3. Ultimately the names would become ids, RI included) >>> (4. We're loading 39 samples and ~950K markers) >>> I loaded 37M+ records using jOOQ (batching every 1000 lines) in 12+ hours (800+ records/sec). Then I tried COPY andkilled that after 11.25 hours when I realised that I had added on non-unque index on the name fields after the first load.By that point is was on line 28301887, so ~0.75 done which implies it would have take ~15hours to complete. >>> >>> Would the overhead of the index likely explain this decrease in throughput? >>> >>> Impatience got the better of me and I killed the second COPY. This time it had done 54% of the file in 6.75 hours, extrapolatingto roughly 12 hours to do the whole thing. >> That seems rather painfully slow. How exactly are you doing the bulk load? Are you CPU limited or disk limited? >> >> Have you readhttp://www.postgresql.org/docs/current/interactive/populate.html ? >> >> Cheers, >> Steve >> > The copy command was pretty vanilla: > > copy oldstyle from '/export/home/rob/share/testload/<file-redacted>' > with delimiter ' '; > > I've been to that page, but (as I read them) none sticks out as a sure > thing. I'm not so worried about the actual performance as I am with the > relative throughput (sixes so far). Have you looked at the Postgres logs from that time period to see if there is anything of interest, say complaining about checkpoints. > > I'm not cpu bound, but I confess I didn't look at io stats during the > copy runs. I just assume it was pegged :) > > Thanks, -- Adrian Klaver adrian.klaver@aklaver.com
pgsql-general by date: