Thread: Bulk Insert

Bulk Insert

From

David Jarvis

Date:

15 May 2010, 23:25:57

Hi,

What is the fastest way to insert 237 million records into a table that has rules (for distributing the data across 84 child tables)?

First I tried inserts. No go.
Then I tried inserts with BEGIN/COMMIT. Not nearly fast enough.
Next, I tried COPY FROM, but then noticed the documentation states that the rules are ignored. (And it was having difficulties with the column order and date format -- it said that '1984-07-1' was not a valid integer; true, but a bit unexpected.)

Here is some example data:

station_id,taken,amount,category_id,flag
1,'1984-07-1',0,4,
1,'1984-07-2',0,4,
1,'1984-07-3',0,4,
 1,'1984-07-4',0,4,T

Here is the table structure (with one rule included):

CREATE TABLE climate.measurement
(
id bigserial NOT NULL,
station_id integer NOT NULL,
taken date NOT NULL,
amount numeric(8,2) NOT NULL,
category_id smallint NOT NULL,
flag character varying(1) NOT NULL DEFAULT ' '::character varying
)
WITH (
OIDS=FALSE
);
ALTER TABLE climate.measurement OWNER TO postgres;

-- Rule: "i_measurement_01_001 ON climate.measurement"

-- DROP RULE i_measurement_01_001 ON climate.measurement;

CREATE OR REPLACE RULE i_measurement_01_001 AS
ON INSERT TO climate.measurement
WHERE date_part('month'::text, new.taken)::integer = 1 AND new.category_id = 1 DO INSTEAD INSERT INTO climate.measurement_01_001 (id, station_id, taken, amount, category_id, flag)
VALUES (new.id, new.station_id, new.taken, new.amount, new.category_id, new.flag);

I can generate the data into any format.

Am looking for something that won't take four days.

I originally had the data in MySQL (still do), but am hoping to get a performance increase by switching to PostgreSQL and am eager to use its PL/R extensions for stats.

I was also thinking about using:

http://pgbulkload.projects.postgresql.org/

Any help, tips, or guidance would be greatly appreciated.

Thank you!

Dave

Re: Bulk Insert

From

Jasen Betts

Date:

16 May 2010, 07:45:13

On 2010-05-16, David Jarvis <thangalin@gmail.com> wrote:
> --000e0cd516808361a40486acd38a
> Content-Type: text/plain; charset=ISO-8859-1
>
> Hi,
>
> What is the fastest way to insert 237 million records into a table that has
> rules (for distributing the data across 84 child tables)?

segregate the data first and then copy it into the child tables.

else copy it into a temp table and insert it from there using

 insert into CHILD_NAME select * from TEMPNAME where SEGREGATION CONDITION;

oryou may have success with the simple copy if you increase the number of file
handles and buffers sufficiently.

Re: Bulk Insert

From

Mladen Gogala

Date:

16 May 2010, 15:48:55

On 05/16/2010 06:44 AM, Jasen Betts wrote:
> On 2010-05-16, David Jarvis<thangalin@gmail.com>  wrote:
>
>> --000e0cd516808361a40486acd38a
>> Content-Type: text/plain; charset=ISO-8859-1
>>
>> Hi,
>>
>> What is the fastest way to insert 237 million records into a table that has
>> rules (for distributing the data across 84 child tables)?
>>
> segregate the data first and then copy it into the child tables.
>
> else copy it into a temp table and insert it from there using
>
>   insert into CHILD_NAME select * from TEMPNAME where SEGREGATION CONDITION;
>
> oryou may have success with the simple copy if you increase the number of file
> handles and buffers sufficiently.
>
>
>

If you segregate data, as Jasen suggests, you can even populate the
child tables in parallel, which should provide some performance improvement.

--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com
The Leader in integrated Media Intelligence Solutions