Thread: Bulk Insert
Hi,
What is the fastest way to insert 237 million records into a table that has rules (for distributing the data across 84 child tables)?
First I tried inserts. No go.
Then I tried inserts with BEGIN/COMMIT. Not nearly fast enough.
Next, I tried COPY FROM, but then noticed the documentation states that the rules are ignored. (And it was having difficulties with the column order and date format -- it said that '1984-07-1' was not a valid integer; true, but a bit unexpected.)
Here is some example data:
Here is the table structure (with one rule included):
I can generate the data into any format.
Am looking for something that won't take four days.
I originally had the data in MySQL (still do), but am hoping to get a performance increase by switching to PostgreSQL and am eager to use its PL/R extensions for stats.
I was also thinking about using:
Any help, tips, or guidance would be greatly appreciated.
Thank you!
Dave
What is the fastest way to insert 237 million records into a table that has rules (for distributing the data across 84 child tables)?
First I tried inserts. No go.
Then I tried inserts with BEGIN/COMMIT. Not nearly fast enough.
Next, I tried COPY FROM, but then noticed the documentation states that the rules are ignored. (And it was having difficulties with the column order and date format -- it said that '1984-07-1' was not a valid integer; true, but a bit unexpected.)
Here is some example data:
station_id,taken,amount,category_id,flag
1,'1984-07-1',0,4,
1,'1984-07-2',0,4,
1,'1984-07-3',0,4,
1,'1984-07-4',0,4,T
1,'1984-07-1',0,4,
1,'1984-07-2',0,4,
1,'1984-07-3',0,4,
1,'1984-07-4',0,4,T
Here is the table structure (with one rule included):
CREATE TABLE climate.measurement
(
id bigserial NOT NULL,
station_id integer NOT NULL,
taken date NOT NULL,
amount numeric(8,2) NOT NULL,
category_id smallint NOT NULL,
flag character varying(1) NOT NULL DEFAULT ' '::character varying
)
WITH (
OIDS=FALSE
);
ALTER TABLE climate.measurement OWNER TO postgres;
-- Rule: "i_measurement_01_001 ON climate.measurement"
-- DROP RULE i_measurement_01_001 ON climate.measurement;
CREATE OR REPLACE RULE i_measurement_01_001 AS
ON INSERT TO climate.measurement
WHERE date_part('month'::text, new.taken)::integer = 1 AND new.category_id = 1 DO INSTEAD INSERT INTO climate.measurement_01_001 (id, station_id, taken, amount, category_id, flag)
VALUES (new.id, new.station_id, new.taken, new.amount, new.category_id, new.flag);
(
id bigserial NOT NULL,
station_id integer NOT NULL,
taken date NOT NULL,
amount numeric(8,2) NOT NULL,
category_id smallint NOT NULL,
flag character varying(1) NOT NULL DEFAULT ' '::character varying
)
WITH (
OIDS=FALSE
);
ALTER TABLE climate.measurement OWNER TO postgres;
-- Rule: "i_measurement_01_001 ON climate.measurement"
-- DROP RULE i_measurement_01_001 ON climate.measurement;
CREATE OR REPLACE RULE i_measurement_01_001 AS
ON INSERT TO climate.measurement
WHERE date_part('month'::text, new.taken)::integer = 1 AND new.category_id = 1 DO INSTEAD INSERT INTO climate.measurement_01_001 (id, station_id, taken, amount, category_id, flag)
VALUES (new.id, new.station_id, new.taken, new.amount, new.category_id, new.flag);
I can generate the data into any format.
Am looking for something that won't take four days.
I originally had the data in MySQL (still do), but am hoping to get a performance increase by switching to PostgreSQL and am eager to use its PL/R extensions for stats.
I was also thinking about using:
Any help, tips, or guidance would be greatly appreciated.
Thank you!
Dave
On 2010-05-16, David Jarvis <thangalin@gmail.com> wrote: > --000e0cd516808361a40486acd38a > Content-Type: text/plain; charset=ISO-8859-1 > > Hi, > > What is the fastest way to insert 237 million records into a table that has > rules (for distributing the data across 84 child tables)? segregate the data first and then copy it into the child tables. else copy it into a temp table and insert it from there using insert into CHILD_NAME select * from TEMPNAME where SEGREGATION CONDITION; oryou may have success with the simple copy if you increase the number of file handles and buffers sufficiently.
On 05/16/2010 06:44 AM, Jasen Betts wrote: > On 2010-05-16, David Jarvis<thangalin@gmail.com> wrote: > >> --000e0cd516808361a40486acd38a >> Content-Type: text/plain; charset=ISO-8859-1 >> >> Hi, >> >> What is the fastest way to insert 237 million records into a table that has >> rules (for distributing the data across 84 child tables)? >> > segregate the data first and then copy it into the child tables. > > else copy it into a temp table and insert it from there using > > insert into CHILD_NAME select * from TEMPNAME where SEGREGATION CONDITION; > > oryou may have success with the simple copy if you increase the number of file > handles and buffers sufficiently. > > > If you segregate data, as Jasen suggests, you can even populate the child tables in parallel, which should provide some performance improvement. -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 www.vmsinfo.com The Leader in integrated Media Intelligence Solutions