Thread: need help with import

need help with import

From

Andreas

Date:

15 February 2012, 21:03:16

Hi
I get CSV files to import.
Th structure is like this.
main part, sub part
Could be like this

A, a1
A, a2
A, a3
B, b1
B, b2

The database has a table for main_part and one for sub_part.
The relation needs to be n:m so there is a relation table that holds ( 
main_id, sub_id ).
The 2 primary keys main_part.id and sub_part.id are both serials.

Is there a way to do an import with SQL?

I can read the CSV into a temporary table
and I can do a
INSERT INTO main_part ( ... ) SELECT DISTINCT main columns FROM import;
as well as a
INSERT INTO sub_part ( ... ) SELECT sub columns FROM import;

But how would I know what main_id and sub_id to insert into the n:m 
relation?

At first when I do the import the relation is actually 1:n.

Re: need help with import

From

"David Johnston"

Date:

15 February 2012, 21:14:19

-----Original Message-----
From: pgsql-sql-owner@postgresql.org [mailto:pgsql-sql-owner@postgresql.org] On Behalf Of Andreas
Sent: Wednesday, February 15, 2012 8:03 PM
To: pgsql-sql@postgresql.org
Subject: [SQL] need help with import

Hi
I get CSV files to import.
Th structure is like this.
main part, sub part
Could be like this

A, a1
A, a2
A, a3
B, b1
B, b2

The database has a table for main_part and one for sub_part.
The relation needs to be n:m so there is a relation table that holds ( main_id, sub_id ).
The 2 primary keys main_part.id and sub_part.id are both serials.

Is there a way to do an import with SQL?

I can read the CSV into a temporary table and I can do a INSERT INTO main_part ( ... ) SELECT DISTINCT main columns
FROMimport; as well as a INSERT INTO sub_part ( ... ) SELECT sub columns FROM import; 

But how would I know what main_id and sub_id to insert into the n:m relation?

At first when I do the import the relation is actually 1:n.

--------------------------------------------

You will need to use the temporary table and perform multiple insert+select.

I do not understand where you are confused.  It would help to provide more meaningful sample data and/or the final
resultyou are trying to achieve.  Keep in mind any n:m setup requires three tables with the joining table usually
havingsome descriptive meaning.  Is time one of your components that you are not  showing us? 

Dave

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: need help with import

From

Andreas

Date:

15 February 2012, 22:06:12

Am 16.02.2012 02:13, schrieb David Johnston:
> -----Original Message-----
> From: pgsql-sql-owner@postgresql.org [mailto:pgsql-sql-owner@postgresql.org] On Behalf Of Andreas
> Sent: Wednesday, February 15, 2012 8:03 PM
> To: pgsql-sql@postgresql.org
> Subject: [SQL] need help with import
>
> Hi
> I get CSV files to import.
> Th structure is like this.
> main part, sub part
> Could be like this
>
> A, a1
> A, a2
> A, a3
> B, b1
> B, b2
>
> The database has a table for main_part and one for sub_part.
> The relation needs to be n:m so there is a relation table that holds ( main_id, sub_id ).
> The 2 primary keys main_part.id and sub_part.id are both serials.
>
> Is there a way to do an import with SQL?
>
> I can read the CSV into a temporary table and I can do a INSERT INTO main_part ( ... ) SELECT DISTINCT main columns
FROMimport; as well as a INSERT INTO sub_part ( ... ) SELECT sub columns FROM import;
 
>
> But how would I know what main_id and sub_id to insert into the n:m relation?
>
> At first when I do the import the relation is actually 1:n.
>
> --------------------------------------------
>
> You will need to use the temporary table and perform multiple insert+select.
>
> I do not understand where you are confused.  It would help to provide more meaningful sample data and/or the final
resultyou are trying to achieve.  Keep in mind any n:m setup requires three tables with the joining table usually
havingsome descriptive meaning.  Is time one of your components that you are not  showing us?
 
>
>
As you say there are 3 tables
main_part ( id serial primary key, ... )
sub_part ( id serial primary key, ... )
main_to_sub ( main_id, sub_id )

I would read the csv into a temporary table "import" and insert the main 
columns into main_part ().
Then there are new tuples in main_part()
42, A
43, B

Now I insert the sub columns into sub_part()
I'll get e.g.
1000, a1
1001, a2
1002, a3
1003, b1
1004, b2

To fill up main_to_sub ( ... ) I needed to know which sub_id belongs to 
which main_id.
( 42, 1000 )
( 42, 1001 )
( 42, 1002 )
( 43, 1003 )
( 43, 1004 )

I could compare every main-column in "import" to every related 
data-column in main_part to get the newly created main_id and do the 
same with every sub-data-column but this seems to be a wee bit tedious.

Is there a more elegant way hat I don't see, yet?

Re: need help with import

From

David Johnston

Date:

15 February 2012, 22:42:05

On Feb 15, 2012, at 21:05, Andreas <maps.on@gmx.net> wrote:

> Am 16.02.2012 02:13, schrieb David Johnston:
>> -----Original Message-----
>> From: pgsql-sql-owner@postgresql.org [mailto:pgsql-sql-owner@postgresql.org] On Behalf Of Andreas
>> Sent: Wednesday, February 15, 2012 8:03 PM
>> To: pgsql-sql@postgresql.org
>> Subject: [SQL] need help with import
>>
>> Hi
>> I get CSV files to import.
>> Th structure is like this.
>> main part, sub part
>> Could be like this
>>
>> A, a1
>> A, a2
>> A, a3
>> B, b1
>> B, b2
>>
>> The database has a table for main_part and one for sub_part.
>> The relation needs to be n:m so there is a relation table that holds ( main_id, sub_id ).
>> The 2 primary keys main_part.id and sub_part.id are both serials.
>>
>> Is there a way to do an import with SQL?
>>
>> I can read the CSV into a temporary table and I can do a INSERT INTO main_part ( ... ) SELECT DISTINCT main columns
FROMimport; as well as a INSERT INTO sub_part ( ... ) SELECT sub columns FROM import; 
>>
>> But how would I know what main_id and sub_id to insert into the n:m relation?
>>
>> At first when I do the import the relation is actually 1:n.
>>
>> --------------------------------------------
>>
>> You will need to use the temporary table and perform multiple insert+select.
>>
>> I do not understand where you are confused.  It would help to provide more meaningful sample data and/or the final
resultyou are trying to achieve.  Keep in mind any n:m setup requires three tables with the joining table usually
havingsome descriptive meaning.  Is time one of your components that you are not  showing us? 
>>
>>
> As you say there are 3 tables
> main_part ( id serial primary key, ... )
> sub_part ( id serial primary key, ... )
> main_to_sub ( main_id, sub_id )
>
> I would read the csv into a temporary table "import" and insert the main columns into main_part ().
> Then there are new tuples in main_part()
> 42, A
> 43, B
>
> Now I insert the sub columns into sub_part()
> I'll get e.g.
> 1000, a1
> 1001, a2
> 1002, a3
> 1003, b1
> 1004, b2
>
> To fill up main_to_sub ( ... ) I needed to know which sub_id belongs to which main_id.
> ( 42, 1000 )
> ( 42, 1001 )
> ( 42, 1002 )
> ( 43, 1003 )
> ( 43, 1004 )
>
> I could compare every main-column in "import" to every related data-column in main_part to get the newly created
main_idand do the same with every sub-data-column but this seems to be a wee bit tedious. 
>
> Is there a more elegant way hat I don't see, yet?
>

Ditch the whole idea of using a sequence and take your primary keys from the source data. Otherwise yes, you will need
toperform the join between the import and live tables to determine the newly created identifier. 

The question to answer is if you see the same values in subsequent import files do you create a new sequence value or
reusethe existing value?  Why? 

David J.

Re: need help with import

From

"Raj Mathur (राज माथुर)"

Date:

15 February 2012, 22:56:15

On Thursday 16 Feb 2012, Andreas wrote:
> Hi
> I get CSV files to import.
> Th structure is like this.
> main part, sub part
> Could be like this
> 
> A, a1
> A, a2
> A, a3
> B, b1
> B, b2
> 
> The database has a table for main_part and one for sub_part.
> The relation needs to be n:m so there is a relation table that holds
> ( main_id, sub_id ).
> The 2 primary keys main_part.id and sub_part.id are both serials.
> 
> Is there a way to do an import with SQL?
> 
> I can read the CSV into a temporary table
> and I can do a
> INSERT INTO main_part ( ... ) SELECT DISTINCT main columns FROM
> import; as well as a
> INSERT INTO sub_part ( ... ) SELECT sub columns FROM import;
> 
> But how would I know what main_id and sub_id to insert into the n:m
> relation?

Is this what you need?

foo=> create table mp(mid serial primary key, m text);
CREATE TABLE
foo=> create table sp(sid serial primary key, s text);
CREATE TABLE
foo=> create table ms(mid int references mp, sid int references sp, 
primary key(mid, sid));
CREATE TABLE
foo=> create temporary table t(m text, s text);
CREATE TABLE
foo=> \copy t from '/tmp/x' csv
foo=> select * from t;m | s  
---+----A | a1A | a2A | a3B | b1B | b2
(5 rows)

foo=> insert into mp(m) (select distinct m from t);
INSERT 0 2
foo=> insert into sp(s) (select distinct s from t);
INSERT 0 5
foo=> select * from mp;mid | m 
-----+---  1 | A  2 | B
(2 rows)

foo=> select * from sp;sid | s  
-----+----  1 | a1  2 | a2  3 | a3  4 | b1  5 | b2
(5 rows)

foo=> insert into ms (select mid, sid from mp, sp where (m,s) in (select 
m, s from t));
INSERT 0 5
foo=> select * from ms;mid | sid 
-----+-----  1 |   1  1 |   2  1 |   3  2 |   4  2 |   5
(5 rows)

foo=> 

Regards,

-- Raj
-- 
Raj Mathur                          || raju@kandalaya.org   || GPG:
http://otheronepercent.blogspot.com || http://kandalaya.org || CC68
It is the mind that moves           || http://schizoid.in   || D17F