Thread: postgres external table
all
is there a external table create method ( similar to oracle external table ) ? where to find the information ?
thanks
Amy
Amy Smith wrote: > all > is there a external table create method ( similar to oracle external > table ) ? where to find the information ? For those non-Oracle users among us, what's an external table? What are you trying to achieve? Random guess: you might be looking for tablespaces. -- Craig Ringer
On Sun, Jan 17, 2010 at 07:27:34PM -0800, Amy Smith wrote: > all > is there a external table create method ( similar to oracle external table ) > ? where to find the information ? There is a project on pgfoundry which has had some activity lately that's similar. You might also try DBI-Link. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Craig Ringer wrote: > For those non-Oracle users among us, what's an external table? External tables let you map a text file directly to a table without explicitly loading it. In PostgreSQL, if you have data in a CSV file, usually you'd import it with COPY before you'd use it. If external tables were available, you'd just say there's an external table as a CSV file and you could start running queries against it. So the quick answer is no, there is no built-in external table support in PostgreSQL; normally people load the data using COPY instead. There's a work in progress on this topic at http://pgfoundry.org/projects/pgexternaltable/ , but it's extremely rough at this point and I'm not sure if it's even moving in the right direction--the main project would do something like this via SQL/MED, and I don't think that's how the prototype is being built at all. The only PostgreSQL-based product I'm aware of that has working external table support already is Greenplum DB. -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.com
Greg Smith <greg@2ndquadrant.com> writes: > Craig Ringer wrote: >> For those non-Oracle users among us, what's an external table? > External tables let you map a text file directly to a table without > explicitly loading it. In PostgreSQL, if you have data in a CSV file, > usually you'd import it with COPY before you'd use it. If external > tables were available, you'd just say there's an external table as a CSV > file and you could start running queries against it. I'm finding it hard to visualize a use-case for that. We must postulate that the table is so big that you don't want to import it, and yet you don't feel a need to have any index on it. Which among other things implies that every query will seqscan the whole table. Where's the savings? regards, tom lane
On Mon, Jan 18, 2010 at 09:57:02AM -0500, Tom Lane wrote: > Greg Smith <greg@2ndquadrant.com> writes: > > Craig Ringer wrote: > >> For those non-Oracle users among us, what's an external table? > > > External tables let you map a text file directly to a table without > > explicitly loading it. In PostgreSQL, if you have data in a CSV file, > > usually you'd import it with COPY before you'd use it. If external > > tables were available, you'd just say there's an external table as a CSV > > file and you could start running queries against it. > > I'm finding it hard to visualize a use-case for that. We must postulate > that the table is so big that you don't want to import it, and yet you > don't feel a need to have any index on it. Which among other things > implies that every query will seqscan the whole table. Where's the > savings? I've mostly wanted something like it when I've been dealing with externally maintained data. The best idea I've seen so far has just been a function similar to: copy_csv_from(filename text) returns setof text[] to be used as: SELECT d[0]::Int AS id, d[1] AS name FROM copy_csv_from('/tmp/usernames.csv') d; This could be wrapped in a VIEW giving what I'd expect to be similar semantics to an "external table", however I've never used one so I could be missing something. It's possible to write this function at the moment, it's somewhat suboptimal as the csv file is completely imported before anything else happens so is only good for small files. -- Sam http://samason.me.uk/
On Mon, Jan 18, 2010 at 2:57 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I'm finding it hard to visualize a use-case for that. We must postulate > that the table is so big that you don't want to import it, and yet you > don't feel a need to have any index on it. Which among other things > implies that every query will seqscan the whole table. Where's the > savings? I think it's usually more "my data is updated by other tools and it would be hard/impossible/annoying to insert another step into the pipeline to copy it to yet another place". The main benefit is that you can access the authoritative data directly without having to copy it and have some sort of process in place to do that regularly. Text files are kind of useless but they're a baseline bit of functionality on top of which to add more sophisticated external forms such as data available over at some url or over some kind of rpc -- to which various conditions could be pushed using external indexes -- or ultimately in another database to which whole joins can be pushed. -- greg
On Mon, Jan 18, 2010 at 7:57 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Greg Smith <greg@2ndquadrant.com> writes: >> Craig Ringer wrote: >>> For those non-Oracle users among us, what's an external table? > >> External tables let you map a text file directly to a table without >> explicitly loading it. In PostgreSQL, if you have data in a CSV file, >> usually you'd import it with COPY before you'd use it. If external >> tables were available, you'd just say there's an external table as a CSV >> file and you could start running queries against it. > > I'm finding it hard to visualize a use-case for that. We must postulate > that the table is so big that you don't want to import it, and yet you > don't feel a need to have any index on it. Which among other things > implies that every query will seqscan the whole table. Where's the > savings? I've used it mostly for importing in the past. Saves the step of loading a large file into a table with no constraints as a middle step.
Tom Lane wrote: > I'm finding it hard to visualize a use-case for that. We must postulate > that the table is so big that you don't want to import it, and yet you > don't feel a need to have any index on it. Which among other things > implies that every query will seqscan the whole table. Where's the > savings? > I've mainly seen it used for data loading where there's some sort of transformation going on, typically to cleanup junk fields that would fail a constraint or derive new columns. If you have external tables, there's no need to load the data into a temporary table if all you're going to do is modify a few things and then write the result to somewhere else. Most of these use cases process the whole file anyway, so having to do a whole scan isn't an issue. I used to run an app that imported gigabytes a day of text files dumped from another server that used a weird date format I had to process via pl/pgsql function. Having to pass them through COPY and then INSERT processed versions to somewhere else was really a drag, given that there was no use for the intermediate data. It also can be handy for bootstrapping apps that are converting stuff out of a legacy system too. Just make the mainframe/whatever dump a new text file periodically into where the external table looks for its data, and you skip having to schedule reloads when the content changes. Can make your life easier while running the two systems in parallel initially. -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.com
2010/1/18 Tom Lane <tgl@sss.pgh.pa.us>: > Greg Smith <greg@2ndquadrant.com> writes: >> Craig Ringer wrote: >>> For those non-Oracle users among us, what's an external table? > >> External tables let you map a text file directly to a table without >> explicitly loading it. In PostgreSQL, if you have data in a CSV file, >> usually you'd import it with COPY before you'd use it. If external >> tables were available, you'd just say there's an external table as a CSV >> file and you could start running queries against it. > > I'm finding it hard to visualize a use-case for that. We must postulate > that the table is so big that you don't want to import it, and yet you > don't feel a need to have any index on it. Which among other things > implies that every query will seqscan the whole table. Where's the > savings? > > regards, tom lane > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > Another case, Tom, could be when the file is updated from a non-DB application and you need to synchronize the data with other DB applications ... -- Vincenzo Romano NotOrAnd Information Technologies NON QVIETIS MARIBVS NAVTA PERITVS
On 19/01/2010 1:13 AM, Vincenzo Romano wrote: > Another case, Tom, could be when the file is updated from a non-DB > application and you need to synchronize > the data with other DB applications ... How can that work without a transactional file system, though? If the external process writes to the file while you're half-way through reading it, what's the database to do? In general, how do external tables cope with the fact that they're on non-transactional storage? -- Craig Ringer
Having 'external tables' lets us avoid the step of loading data from a file into the table. We do not have to check whether a load job has run successfully, whether the data in the table really corresponds to the data in the file etc. It also lets us decide how many rejects can be allowed and so forth.
http://www.adp-gmbh.ch/ora/misc/ext_table.html
Other than that, I have not found any advantage.
Jayadevan
From: Tom Lane <tgl@sss.pgh.pa.us>
To: Greg Smith <greg@2ndquadrant.com>
Cc: Craig Ringer <craig@postnewspapers.com.au>, Amy Smith <vah123@gmail.com>, pgsql-general@postgresql.org
Date: 01/18/2010 08:25 PM
Subject: Re: [GENERAL] postgres external table
Sent by: pgsql-general-owner@postgresql.org
Greg Smith <greg@2ndquadrant.com> writes:
> Craig Ringer wrote:
>> For those non-Oracle users among us, what's an external table?
> External tables let you map a text file directly to a table without
> explicitly loading it. In PostgreSQL, if you have data in a CSV file,
> usually you'd import it with COPY before you'd use it. If external
> tables were available, you'd just say there's an external table as a CSV
> file and you could start running queries against it.
I'm finding it hard to visualize a use-case for that. We must postulate
that the table is so big that you don't want to import it, and yet you
don't feel a need to have any index on it. Which among other things
implies that every query will seqscan the whole table. Where's the
savings?
regards, tom lane
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

DISCLAIMER:
"The information in this e-mail and any attachment is intended only for the person to whom it is addressed and may contain confidential and/or privileged material. If you have received this e-mail in error, kindly contact the sender and destroy all copies of the original communication. IBS makes no warranty, express or implied, nor guarantees the accuracy, adequacy or completeness of the information contained in this email or any attachment and is not liable for any errors, defects, omissions, viruses or for resultant loss or damage, if any, direct or indirect."
http://www.adp-gmbh.ch/ora/misc/ext_table.html
Other than that, I have not found any advantage.
Jayadevan
From: Tom Lane <tgl@sss.pgh.pa.us>
To: Greg Smith <greg@2ndquadrant.com>
Cc: Craig Ringer <craig@postnewspapers.com.au>, Amy Smith <vah123@gmail.com>, pgsql-general@postgresql.org
Date: 01/18/2010 08:25 PM
Subject: Re: [GENERAL] postgres external table
Sent by: pgsql-general-owner@postgresql.org
Greg Smith <greg@2ndquadrant.com> writes:
> Craig Ringer wrote:
>> For those non-Oracle users among us, what's an external table?
> External tables let you map a text file directly to a table without
> explicitly loading it. In PostgreSQL, if you have data in a CSV file,
> usually you'd import it with COPY before you'd use it. If external
> tables were available, you'd just say there's an external table as a CSV
> file and you could start running queries against it.
I'm finding it hard to visualize a use-case for that. We must postulate
that the table is so big that you don't want to import it, and yet you
don't feel a need to have any index on it. Which among other things
implies that every query will seqscan the whole table. Where's the
savings?
regards, tom lane
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

DISCLAIMER:
"The information in this e-mail and any attachment is intended only for the person to whom it is addressed and may contain confidential and/or privileged material. If you have received this e-mail in error, kindly contact the sender and destroy all copies of the original communication. IBS makes no warranty, express or implied, nor guarantees the accuracy, adequacy or completeness of the information contained in this email or any attachment and is not liable for any errors, defects, omissions, viruses or for resultant loss or damage, if any, direct or indirect."
On Tue, Jan 19, 2010 at 4:41 AM, Craig Ringer <craig@postnewspapers.com.au> wrote: > How can that work without a transactional file system, though? If the > external process writes to the file while you're half-way through reading > it, what's the database to do? In general, how do external tables cope with > the fact that they're on non-transactional storage? Well if you use mv to replace the old file with the new one then it should be safe. Unless your query involves opening the table multiple times or your transactions are more complex than a single query... -- greg
Yes. We get quite a few files as 'feeds' from external systems. Once the files are in our network, we know that no changes will happen to those files. We access them using Oracle external tables and process them (the data, after some processing, end up in other real tables). If external tables were not there, we would have had to schedule some job to load these files.
Jayadevan
From: Greg Stark <gsstark@mit.edu>
To: Craig Ringer <craig@postnewspapers.com.au>
Cc: Vincenzo Romano <vincenzo.romano@notorand.it>, Tom Lane <tgl@sss.pgh.pa.us>, Greg Smith <greg@2ndquadrant.com>, Amy Smith <vah123@gmail.com>, pgsql-general@postgresql.org
Date: 01/19/2010 04:37 PM
Subject: Re: [GENERAL] postgres external table
Sent by: pgsql-general-owner@postgresql.org
On Tue, Jan 19, 2010 at 4:41 AM, Craig Ringer
<craig@postnewspapers.com.au> wrote:
> How can that work without a transactional file system, though? If the
> external process writes to the file while you're half-way through reading
> it, what's the database to do? In general, how do external tables cope with
> the fact that they're on non-transactional storage?
Well if you use mv to replace the old file with the new one then it
should be safe. Unless your query involves opening the table multiple
times or your transactions are more complex than a single query...
--
greg
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

DISCLAIMER:
"The information in this e-mail and any attachment is intended only for the person to whom it is addressed and may contain confidential and/or privileged material. If you have received this e-mail in error, kindly contact the sender and destroy all copies of the original communication. IBS makes no warranty, express or implied, nor guarantees the accuracy, adequacy or completeness of the information contained in this email or any attachment and is not liable for any errors, defects, omissions, viruses or for resultant loss or damage, if any, direct or indirect."
Jayadevan
From: Greg Stark <gsstark@mit.edu>
To: Craig Ringer <craig@postnewspapers.com.au>
Cc: Vincenzo Romano <vincenzo.romano@notorand.it>, Tom Lane <tgl@sss.pgh.pa.us>, Greg Smith <greg@2ndquadrant.com>, Amy Smith <vah123@gmail.com>, pgsql-general@postgresql.org
Date: 01/19/2010 04:37 PM
Subject: Re: [GENERAL] postgres external table
Sent by: pgsql-general-owner@postgresql.org
On Tue, Jan 19, 2010 at 4:41 AM, Craig Ringer
<craig@postnewspapers.com.au> wrote:
> How can that work without a transactional file system, though? If the
> external process writes to the file while you're half-way through reading
> it, what's the database to do? In general, how do external tables cope with
> the fact that they're on non-transactional storage?
Well if you use mv to replace the old file with the new one then it
should be safe. Unless your query involves opening the table multiple
times or your transactions are more complex than a single query...
--
greg
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

DISCLAIMER:
"The information in this e-mail and any attachment is intended only for the person to whom it is addressed and may contain confidential and/or privileged material. If you have received this e-mail in error, kindly contact the sender and destroy all copies of the original communication. IBS makes no warranty, express or implied, nor guarantees the accuracy, adequacy or completeness of the information contained in this email or any attachment and is not liable for any errors, defects, omissions, viruses or for resultant loss or damage, if any, direct or indirect."
On 19 Jan 2010, at 12:16, Jayadevan M wrote: > Yes. We get quite a few files as 'feeds' from external systems. Once the files are in our network, we know that no changeswill happen to those files. We access them using Oracle external tables and process them (the data, after some processing,end up in other real tables). If external tables were not there, we would have had to schedule some job to loadthese files. I'm surprised you even need to download them to your network. I suspect it would be quite possible to implement 'externaltables' so that they could be read from eg. an RSS feed. That would be a pretty slick feature. There's a gazillion number of data formats that people would want support for though; CSV, probably excel, RSS feeds, webservices, etc. Most of that should probably be implemented by means of stored procedures (using the unsafe languages - pl/pythonu, pl/perlu,etc) returning TABLE (...), which means this is in fact already possible I think? It's just that nobody's (publicly)thought of doing this so far. Alban Hertroys -- Screwing up is the best way to attach something to the ceiling. !DSPAM:737,4b559f7d10601504612258!
Hi,
We use Oracle's external tables to process the files (so no need to look for the other alternatives you mentioned). We do not have access to the data providers' network to access the files. That is why we process them after we receive the files.
If there is a database implementation of 'external tables' in the way you mentioned (CSV, probably excel, RSS feeds, web services, etc) at db level, that would be great. Most ETL tools already do accept feeds of the type you mentioned and load those into tables.
Jayadevan
From: Alban Hertroys <dalroi@solfertje.student.utwente.nl>
To: Jayadevan M <Jayadevan.Maymala@ibsplc.com>
Cc: pgsql-general@postgresql.org
Date: 01/19/2010 05:31 PM
Subject: Re: [GENERAL] postgres external table
On 19 Jan 2010, at 12:16, Jayadevan M wrote:
> Yes. We get quite a few files as 'feeds' from external systems. Once the files are in our network, we know that no changes will happen to those files. We access them using Oracle external tables and process them (the data, after some processing, end up in other real tables). If external tables were not there, we would have had to schedule some job to load these files.
I'm surprised you even need to download them to your network. I suspect it would be quite possible to implement 'external tables' so that they could be read from eg. an RSS feed. That would be a pretty slick feature.
There's a gazillion number of data formats that people would want support for though; CSV, probably excel, RSS feeds, web services, etc.
Most of that should probably be implemented by means of stored procedures (using the unsafe languages - pl/pythonu, pl/perlu, etc) returning TABLE (...), which means this is in fact already possible I think? It's just that nobody's (publicly) thought of doing this so far.
Alban Hertroys
--
Screwing up is the best way to attach something to the ceiling.
!DSPAM:1019,4b559f7b10604920110887!

DISCLAIMER:
"The information in this e-mail and any attachment is intended only for the person to whom it is addressed and may contain confidential and/or privileged material. If you have received this e-mail in error, kindly contact the sender and destroy all copies of the original communication. IBS makes no warranty, express or implied, nor guarantees the accuracy, adequacy or completeness of the information contained in this email or any attachment and is not liable for any errors, defects, omissions, viruses or for resultant loss or damage, if any, direct or indirect."
We use Oracle's external tables to process the files (so no need to look for the other alternatives you mentioned). We do not have access to the data providers' network to access the files. That is why we process them after we receive the files.
If there is a database implementation of 'external tables' in the way you mentioned (CSV, probably excel, RSS feeds, web services, etc) at db level, that would be great. Most ETL tools already do accept feeds of the type you mentioned and load those into tables.
Jayadevan
From: Alban Hertroys <dalroi@solfertje.student.utwente.nl>
To: Jayadevan M <Jayadevan.Maymala@ibsplc.com>
Cc: pgsql-general@postgresql.org
Date: 01/19/2010 05:31 PM
Subject: Re: [GENERAL] postgres external table
On 19 Jan 2010, at 12:16, Jayadevan M wrote:
> Yes. We get quite a few files as 'feeds' from external systems. Once the files are in our network, we know that no changes will happen to those files. We access them using Oracle external tables and process them (the data, after some processing, end up in other real tables). If external tables were not there, we would have had to schedule some job to load these files.
I'm surprised you even need to download them to your network. I suspect it would be quite possible to implement 'external tables' so that they could be read from eg. an RSS feed. That would be a pretty slick feature.
There's a gazillion number of data formats that people would want support for though; CSV, probably excel, RSS feeds, web services, etc.
Most of that should probably be implemented by means of stored procedures (using the unsafe languages - pl/pythonu, pl/perlu, etc) returning TABLE (...), which means this is in fact already possible I think? It's just that nobody's (publicly) thought of doing this so far.
Alban Hertroys
--
Screwing up is the best way to attach something to the ceiling.
!DSPAM:1019,4b559f7b10604920110887!

DISCLAIMER:
"The information in this e-mail and any attachment is intended only for the person to whom it is addressed and may contain confidential and/or privileged material. If you have received this e-mail in error, kindly contact the sender and destroy all copies of the original communication. IBS makes no warranty, express or implied, nor guarantees the accuracy, adequacy or completeness of the information contained in this email or any attachment and is not liable for any errors, defects, omissions, viruses or for resultant loss or damage, if any, direct or indirect."
Craig Ringer wrote: > On 19/01/2010 1:13 AM, Vincenzo Romano wrote: > >> Another case, Tom, could be when the file is updated from a non-DB >> application and you need to synchronize >> the data with other DB applications ... > > How can that work without a transactional file system, though? If the > external process writes to the file while you're half-way through > reading it, what's the database to do? In general, how do external > tables cope with the fact that they're on non-transactional storage? With Oracle's implementation, you version the input files (timestamp or something) and then do ALTER TABLE to change the location the external table points to. That will block waiting for exclusive access before it fires, then you get a clean switch to the new location. There is no smartness here to cope with weird behavior built-in here--I expect it will just crash the query. One thing you always have to be careful about when using these is that an external table might return a weird query error under odd circumstances such as you describe, which you might not normally expect from a simple SELECT. -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.com