Thread: SQL Programming Question
Coming from 25 years of programming applications based on dBASE and FoxPro tables, I have a question about how to deal with a programming task. I have a situation where I receive a file with transactions that have a unique key from a vendor. These transactions should only be imported into my system once, but the vendor system will occasionally resend a transaction by mistake. The way I am currently handling this with Micorosft ADO and FoxPro files is to open a table with an index on the vendor key and seek on the key. If there is no match I add it, if there is a match I put it in an exception file to be manually checked. Using PostgreSQL I can't open a table and do seeks against an index. I could do a select against the database and see if 0 records are returned, but that seems to take more time than doing a seek on an index. Is there a more SQL friendly way of handling this task?
On 09/10/2010 08:07 PM, tony@exquisiteimages.com wrote: > I have a situation where I receive a file with transactions that have a > unique key from a vendor. These transactions should only be imported into > my system once, but the vendor system will occasionally resend a > transaction by mistake. > > The way I am currently handling this with Micorosft ADO and FoxPro files > is to open a table with an index on the vendor key and seek on the key. If > there is no match I add it, if there is a match I put it in an exception > file to be manually checked. > > Using PostgreSQL I can't open a table and do seeks against an index. I > could do a select against the database and see if 0 records are returned, > but that seems to take more time than doing a seek on an index. Is there a > more SQL friendly way of handling this task? Postgres isn't going to just use the index because it needs the visibility information in the table. But it will be better to load all of the data into a staging table using COPY and then insert the missing rows from there. It will be a ton faster than going a row at a time, looking for a match then doing an insert. Scott
On 09/10/10 8:07 PM, tony@exquisiteimages.com wrote: > Coming from 25 years of programming applications based on dBASE and FoxPro > tables, I have a question about how to deal with a programming task. > > I have a situation where I receive a file with transactions that have a > unique key from a vendor. These transactions should only be imported into > my system once, but the vendor system will occasionally resend a > transaction by mistake. > > The way I am currently handling this with Micorosft ADO and FoxPro files > is to open a table with an index on the vendor key and seek on the key. If > there is no match I add it, if there is a match I put it in an exception > file to be manually checked. > > Using PostgreSQL I can't open a table and do seeks against an index. I > could do a select against the database and see if 0 records are returned, > but that seems to take more time than doing a seek on an index. Is there a > more SQL friendly way of handling this task? > > the table should have a UNIQUE constraint on that primary key field. use a transaction. go to insert the data. if you get a constraint violation, roll back the transaction, then insert it into your exceptions table.
On 11 Sep 2010, at 6:10, Scott Bailey wrote: > On 09/10/2010 08:07 PM, tony@exquisiteimages.com wrote: >> I have a situation where I receive a file with transactions that have a >> unique key from a vendor. These transactions should only be imported into >> my system once, but the vendor system will occasionally resend a >> transaction by mistake. >> >> The way I am currently handling this with Micorosft ADO and FoxPro files >> is to open a table with an index on the vendor key and seek on the key. If >> there is no match I add it, if there is a match I put it in an exception >> file to be manually checked. >> >> Using PostgreSQL I can't open a table and do seeks against an index. I >> could do a select against the database and see if 0 records are returned, >> but that seems to take more time than doing a seek on an index. Is there a >> more SQL friendly way of handling this task? > > Postgres isn't going to just use the index because it needs the visibility information in the table. But it will be betterto load all of the data into a staging table using COPY and then insert the missing rows from there. It will be a tonfaster than going a row at a time, looking for a match then doing an insert. This is probably the best way to go about this. Basically you perform: BEGIN; -- read in data-file COPY staging_table FROM STDIN; Your data here \. -- delete duplicates DELETE FROM staging_table USING live_table WHERE live_table.key = staging_table.key; -- insert remaining data (non-duplicates) INSERT INTO live_table (key, data1, data2, etc) SELECT key, data1, data2, etc FROM staging_table; COMMIT; You could add a step before deleting duplicates that would: INSERT INTO duplicate_table (key, data1, data2, etc) SELECT key, data1, data2, etc FROM staging_table WHERE EXISTS (SELECT 1 FROM live_table WHERE key = staging_table.key); The following are a few would-be-nice-to-have's that AFAIK aren't possible yet. Often the reason we don't have these is theSQL standard, which is a pretty good reason. Still... It would be great to be able to use a WITH statement to lock down a data set for multiple subsequent operations, somethinglike: WITH nonduplicates (key, data1, data2, etc) AS ( SELECT key, data1, data2, etc FROM staging_table EXCEPT SELECT key, data1, data2, etc FROM live_table ) INSERT INTO live_table (key, data1, data2, etc) SELECT key, data1, data2, etc FROM nonduplicates RETURNING key, data1, data2, etc UNION ALL DELETE FROM staging_table USING nonduplicates WHERE key = nonduplicates.key RETURNING key, data1, data2, etc; Or something like that. It's just an example from what I have in mind, after all ;) But of course for this particular situation it would be really ideal to be able to just do: MOVE * FROM staging_table TO live_table WHERE NOT EXISTS ( SELECT 1 FROM live_table WHERE key = staging_table.key ); Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll see there is no forest. !DSPAM:737,4c8b557b10401521071037!
On 11 Sep 2010, at 12:09, Alban Hertroys wrote: > It would be great to be able to use a WITH statement to lock down a data set for multiple subsequent operations, somethinglike: > > WITH nonduplicates (key, data1, data2, etc) AS ( > SELECT key, data1, data2, etc FROM staging_table > EXCEPT > SELECT key, data1, data2, etc FROM live_table > ) > INSERT INTO live_table (key, data1, data2, etc) > SELECT key, data1, data2, etc FROM nonduplicates > RETURNING key, data1, data2, etc > UNION ALL > DELETE FROM staging_table USING nonduplicates > WHERE key = nonduplicates.key > RETURNING key, data1, data2, etc; > > Or something like that. It's just an example from what I have in mind, after all ;) Gosh, I was thinking too far ahead and forgot to explain why that would be cool! First off, you'd end up with having moved all your non-duplicate data into the live_table and are left with all the duplicatesin your staging_table. No need for an extra table to store them! Secondly, you get a list returned of all the non-duplicate records that were moved into the live_table. I realise that shouldhave been a UNION and not a UNION ALL, or you get every record twice. As an alternative you could add a fictive columnto each RETURNING statement to specify the origin of each record. That all depends on what you need the results forof course... I think the RETURNING clauses are pretty much obligatory there, how else would you UNION that INSERT and DELETE together? Lastly, of course this is already entirely possible using a temp table, but that seems a bit ugly... Big kudos to the peoplewho added WITH-queries to Postgres, I love that feature! :) Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll see there is no forest. !DSPAM:737,4c8b5a5710401646614364!
Hi Tony
-- i usually
INSERT the record
--then check for PK VIOLATION e.g.
IF PK_VIOLATION then UPDATE record
is FoxPro still supported???
Martin Gainty
______________________________________________
Verzicht und Vertraulichkeitanmerkung/Note de déni et de confidentialité
Diese Nachricht ist vertraulich. Sollten Sie nicht der vorgesehene Empfaenger sein, so bitten wir hoeflich um eine Mitteilung. Jede unbefugte Weiterleitung oder Fertigung einer Kopie ist unzulaessig. Diese Nachricht dient lediglich dem Austausch von Informationen und entfaltet keine rechtliche Bindungswirkung. Aufgrund der leichten Manipulierbarkeit von E-Mails koennen wir keine Haftung fuer den Inhalt uebernehmen.
> Date: Fri, 10 Sep 2010 23:07:11 -0400
> Subject: [GENERAL] SQL Programming Question
> From: tony@exquisiteimages.com
> To: pgsql-general@postgresql.org
>
> Coming from 25 years of programming applications based on dBASE and FoxPro
> tables, I have a question about how to deal with a programming task.
>
> I have a situation where I receive a file with transactions that have a
> unique key from a vendor. These transactions should only be imported into
> my system once, but the vendor system will occasionally resend a
> transaction by mistake.
>
> The way I am currently handling this with Micorosft ADO and FoxPro files
> is to open a table with an index on the vendor key and seek on the key. If
> there is no match I add it, if there is a match I put it in an exception
> file to be manually checked.
>
> Using PostgreSQL I can't open a table and do seeks against an index. I
> could do a select against the database and see if 0 records are returned,
> but that seems to take more time than doing a seek on an index. Is there a
> more SQL friendly way of handling this task?
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
-- i usually
INSERT the record
--then check for PK VIOLATION e.g.
IF PK_VIOLATION then UPDATE record
is FoxPro still supported???
Martin Gainty
______________________________________________
Verzicht und Vertraulichkeitanmerkung/Note de déni et de confidentialité
Diese Nachricht ist vertraulich. Sollten Sie nicht der vorgesehene Empfaenger sein, so bitten wir hoeflich um eine Mitteilung. Jede unbefugte Weiterleitung oder Fertigung einer Kopie ist unzulaessig. Diese Nachricht dient lediglich dem Austausch von Informationen und entfaltet keine rechtliche Bindungswirkung. Aufgrund der leichten Manipulierbarkeit von E-Mails koennen wir keine Haftung fuer den Inhalt uebernehmen.
Ce message est confidentiel et peut être privilégié. Si vous n'êtes pas le destinataire prévu, nous te demandons avec bonté que pour satisfaire informez l'expéditeur. N'importe quelle diffusion non autorisée ou la copie de ceci est interdite. Ce message sert à l'information seulement et n'aura pas n'importe quel effet légalement obligatoire. Étant donné que les email peuvent facilement être sujets à la manipulation, nous ne pouvons accepter aucune responsabilité pour le contenu fourni.
> Date: Fri, 10 Sep 2010 23:07:11 -0400
> Subject: [GENERAL] SQL Programming Question
> From: tony@exquisiteimages.com
> To: pgsql-general@postgresql.org
>
> Coming from 25 years of programming applications based on dBASE and FoxPro
> tables, I have a question about how to deal with a programming task.
>
> I have a situation where I receive a file with transactions that have a
> unique key from a vendor. These transactions should only be imported into
> my system once, but the vendor system will occasionally resend a
> transaction by mistake.
>
> The way I am currently handling this with Micorosft ADO and FoxPro files
> is to open a table with an index on the vendor key and seek on the key. If
> there is no match I add it, if there is a match I put it in an exception
> file to be manually checked.
>
> Using PostgreSQL I can't open a table and do seeks against an index. I
> could do a select against the database and see if 0 records are returned,
> but that seems to take more time than doing a seek on an index. Is there a
> more SQL friendly way of handling this task?
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
Martin Gainty wrote: > -- i usually > INSERT the record > --then check for PK VIOLATION e.g. > IF PK_VIOLATION then UPDATE record > > is FoxPro still supported??? My understanding is that exception handling carries significant overhead and so doing it within a heavy-iterated loop like this import would perform badly. Partly for that reason and partly for code readability, I would recommend instead the suggestion of copy the whole source to a staging table with COPY and then use a single SQL statement to reconcile/integrate that staging table with the main table. In particular, I like the staging table approach because the single SQL statement afterwards is a concise declarational code saying what you actually want to do, relative to a client-interfacing loop which is a more verbose imperative version. -- Darren Duncan
On Fri, Sep 10, 2010 at 11:07 PM, <tony@exquisiteimages.com> wrote: > Using PostgreSQL I can't open a table and do seeks against an index. I > could do a select against the database and see if 0 records are returned, > but that seems to take more time than doing a seek on an index. Is there a > more SQL friendly way of handling this task? I come a foxpro background too, and I can tell you for sure postgres indexes should give you similar performance to what you're used to. Do you have an index on the table? merlin
On 09/11/10 12:26 PM, Merlin Moncure wrote: > On Fri, Sep 10, 2010 at 11:07 PM,<tony@exquisiteimages.com> wrote: >> Using PostgreSQL I can't open a table and do seeks against an index. I >> could do a select against the database and see if 0 records are returned, >> but that seems to take more time than doing a seek on an index. Is there a >> more SQL friendly way of handling this task? > I come a foxpro background too, and I can tell you for sure postgres > indexes should give you similar performance to what you're used to. > Do you have an index on the table? > > again, the table should have either a primary key on this field, or a unique constraint (a primary key is by definition unique) BEGIN transaction INSERT row If Constraint Violation, ROLLBACK transaction INSERT row into exceptions table ELSE COMMIT transaction (thats pseudocode) if you want to do it faster, you batch multiple inserts in a single transaction, using savepoints before each insert and rollback the savepoint on the exception.