Thread: SQL Programming Question

SQL Programming Question

From

tony@exquisiteimages.com

Date:

11 September 2010, 00:50:15

Coming from 25 years of programming applications based on dBASE and FoxPro
tables, I have a question about how to deal with a programming task.

I have a situation where I receive a file with transactions that have a
unique key from a vendor. These transactions should only be imported into
my system once, but the vendor system will occasionally resend a
transaction by mistake.

The way I am currently handling this with Micorosft ADO and FoxPro files
is to open a table with an index on the vendor key and seek on the key. If
there is no match I add it, if there is a match I put it in an exception
file to be manually checked.

Using PostgreSQL I can't open a table and do seeks against an index. I
could do a select against the database and see if 0 records are returned,
but that seems to take more time than doing a seek on an index. Is there a
more SQL friendly way of handling this task?

Re: SQL Programming Question

From

Scott Bailey

Date:

11 September 2010, 01:10:16

On 09/10/2010 08:07 PM, tony@exquisiteimages.com wrote:
> I have a situation where I receive a file with transactions that have a
> unique key from a vendor. These transactions should only be imported into
> my system once, but the vendor system will occasionally resend a
> transaction by mistake.
>
> The way I am currently handling this with Micorosft ADO and FoxPro files
> is to open a table with an index on the vendor key and seek on the key. If
> there is no match I add it, if there is a match I put it in an exception
> file to be manually checked.
>
> Using PostgreSQL I can't open a table and do seeks against an index. I
> could do a select against the database and see if 0 records are returned,
> but that seems to take more time than doing a seek on an index. Is there a
> more SQL friendly way of handling this task?

Postgres isn't going to just use the index because it needs the
visibility information in the table. But it will be better to load all
of the data into a staging table using COPY and then insert the missing
rows from there. It will be a ton faster than going a row at a time,
looking for a match then doing an insert.

Scott

Re: SQL Programming Question

From

John R Pierce

Date:

11 September 2010, 01:15:35

  On 09/10/10 8:07 PM, tony@exquisiteimages.com wrote:
> Coming from 25 years of programming applications based on dBASE and FoxPro
> tables, I have a question about how to deal with a programming task.
>
> I have a situation where I receive a file with transactions that have a
> unique key from a vendor. These transactions should only be imported into
> my system once, but the vendor system will occasionally resend a
> transaction by mistake.
>
> The way I am currently handling this with Micorosft ADO and FoxPro files
> is to open a table with an index on the vendor key and seek on the key. If
> there is no match I add it, if there is a match I put it in an exception
> file to be manually checked.
>
> Using PostgreSQL I can't open a table and do seeks against an index. I
> could do a select against the database and see if 0 records are returned,
> but that seems to take more time than doing a seek on an index. Is there a
> more SQL friendly way of handling this task?
>
>

the table should have a UNIQUE constraint on that primary key field.
use a transaction.
go to insert the data.
if you get a constraint violation, roll back the transaction, then
insert it into your exceptions table.

Re: SQL Programming Question

From

Alban Hertroys

Date:

11 September 2010, 07:10:21

On 11 Sep 2010, at 6:10, Scott Bailey wrote:

> On 09/10/2010 08:07 PM, tony@exquisiteimages.com wrote:
>> I have a situation where I receive a file with transactions that have a
>> unique key from a vendor. These transactions should only be imported into
>> my system once, but the vendor system will occasionally resend a
>> transaction by mistake.
>>
>> The way I am currently handling this with Micorosft ADO and FoxPro files
>> is to open a table with an index on the vendor key and seek on the key. If
>> there is no match I add it, if there is a match I put it in an exception
>> file to be manually checked.
>>
>> Using PostgreSQL I can't open a table and do seeks against an index. I
>> could do a select against the database and see if 0 records are returned,
>> but that seems to take more time than doing a seek on an index. Is there a
>> more SQL friendly way of handling this task?
>
> Postgres isn't going to just use the index because it needs the visibility information in the table. But it will be
betterto load all of the data into a staging table using COPY and then insert the missing rows from there. It will be a
tonfaster than going a row at a time, looking for a match then doing an insert. 


This is probably the best way to go about this.

Basically you perform:
BEGIN;
-- read in data-file
COPY staging_table FROM STDIN;
Your data here
\.
-- delete duplicates
DELETE FROM staging_table USING live_table
    WHERE live_table.key = staging_table.key;
-- insert remaining data (non-duplicates)
INSERT INTO live_table (key, data1, data2, etc)
    SELECT key, data1, data2, etc
    FROM staging_table;
COMMIT;

You could add a step before deleting duplicates that would:

INSERT INTO duplicate_table (key, data1, data2, etc)
    SELECT key, data1, data2, etc
    FROM staging_table
    WHERE EXISTS (SELECT 1 FROM live_table WHERE key = staging_table.key);


The following are a few would-be-nice-to-have's that AFAIK aren't possible yet. Often the reason we don't have these is
theSQL standard, which is a pretty good reason. Still... 

It would be great to be able to use a WITH statement to lock down a data set for multiple subsequent operations,
somethinglike: 

WITH nonduplicates (key, data1, data2, etc) AS (
    SELECT key, data1, data2, etc FROM staging_table
    EXCEPT
    SELECT key, data1, data2, etc FROM live_table
)
INSERT INTO live_table (key, data1, data2, etc)
    SELECT key, data1, data2, etc FROM nonduplicates
    RETURNING key, data1, data2, etc
UNION ALL
DELETE FROM staging_table USING nonduplicates
    WHERE key = nonduplicates.key
    RETURNING key, data1, data2, etc;

Or something like that. It's just an example from what I have in mind, after all ;)

But of course for this particular situation it would be really ideal to be able to just do:

MOVE * FROM staging_table TO live_table WHERE NOT EXISTS (
    SELECT 1 FROM live_table WHERE key = staging_table.key
);

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4c8b557b10401521071037!

Re: SQL Programming Question

From

Alban Hertroys

Date:

11 September 2010, 07:30:57

On 11 Sep 2010, at 12:09, Alban Hertroys wrote:

> It would be great to be able to use a WITH statement to lock down a data set for multiple subsequent operations,
somethinglike: 
>
> WITH nonduplicates (key, data1, data2, etc) AS (
>     SELECT key, data1, data2, etc FROM staging_table
>     EXCEPT
>     SELECT key, data1, data2, etc FROM live_table
> )
> INSERT INTO live_table (key, data1, data2, etc)
>     SELECT key, data1, data2, etc FROM nonduplicates
>     RETURNING key, data1, data2, etc
> UNION ALL
> DELETE FROM staging_table USING nonduplicates
>     WHERE key = nonduplicates.key
>     RETURNING key, data1, data2, etc;
>
> Or something like that. It's just an example from what I have in mind, after all ;)

Gosh, I was thinking too far ahead and forgot to explain why that would be cool!

First off, you'd end up with having moved all your non-duplicate data into the live_table and are left with all the
duplicatesin your staging_table. No need for an extra table to store them! 

Secondly, you get a list returned of all the non-duplicate records that were moved into the live_table. I realise that
shouldhave been a UNION and not a UNION ALL, or you get every record twice. As an alternative you could add a fictive
columnto each RETURNING statement to specify the origin of each record. That all depends on what you need the results
forof course... 
I think the RETURNING clauses are pretty much obligatory there, how else would you UNION that INSERT and DELETE
together?

Lastly, of course this is already entirely possible using a temp table, but that seems a bit ugly... Big kudos to the
peoplewho added WITH-queries to Postgres, I love that feature! :) 

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.

!DSPAM:737,4c8b5a5710401646614364!

Re: SQL Programming Question

From

Martin Gainty

Date:

11 September 2010, 10:55:13

Hi Tony

-- i usually
INSERT the record
--then check for PK VIOLATION e.g.
IF PK_VIOLATION then UPDATE record

is FoxPro still supported???
Martin Gainty
______________________________________________
Verzicht und Vertraulichkeitanmerkung/Note de déni et de confidentialité

Diese Nachricht ist vertraulich. Sollten Sie nicht der vorgesehene Empfaenger sein, so bitten wir hoeflich um eine Mitteilung. Jede unbefugte Weiterleitung oder Fertigung einer Kopie ist unzulaessig. Diese Nachricht dient lediglich dem Austausch von Informationen und entfaltet keine rechtliche Bindungswirkung. Aufgrund der leichten Manipulierbarkeit von E-Mails koennen wir keine Haftung fuer den Inhalt uebernehmen.

Ce message est confidentiel et peut être privilégié. Si vous n'êtes pas le destinataire prévu, nous te demandons avec bonté que pour satisfaire informez l'expéditeur. N'importe quelle diffusion non autorisée ou la copie de ceci est interdite. Ce message sert à l'information seulement et n'aura pas n'importe quel effet légalement obligatoire. Étant donné que les email peuvent facilement être sujets à la manipulation, nous ne pouvons accepter aucune responsabilité pour le contenu fourni.

> Date: Fri, 10 Sep 2010 23:07:11 -0400
> Subject: [GENERAL] SQL Programming Question
> From: tony@exquisiteimages.com
> To: pgsql-general@postgresql.org
>
> Coming from 25 years of programming applications based on dBASE and FoxPro
> tables, I have a question about how to deal with a programming task.
>
> I have a situation where I receive a file with transactions that have a
> unique key from a vendor. These transactions should only be imported into
> my system once, but the vendor system will occasionally resend a
> transaction by mistake.
>
> The way I am currently handling this with Micorosft ADO and FoxPro files
> is to open a table with an index on the vendor key and seek on the key. If
> there is no match I add it, if there is a match I put it in an exception
> file to be manually checked.
>
> Using PostgreSQL I can't open a table and do seeks against an index. I
> could do a select against the database and see if 0 records are returned,
> but that seems to take more time than doing a seek on an index. Is there a
> more SQL friendly way of handling this task?
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

Re: SQL Programming Question

From

Darren Duncan

Date:

11 September 2010, 15:20:33

Martin Gainty wrote:
> -- i usually
> INSERT the record
> --then check for PK VIOLATION e.g.
> IF PK_VIOLATION then UPDATE record
>
> is FoxPro still supported???

My understanding is that exception handling carries significant overhead and so
doing it within a heavy-iterated loop like this import would perform badly.
Partly for that reason and partly for code readability, I would recommend
instead the suggestion of copy the whole source to a staging table with COPY and
then use a single SQL statement to reconcile/integrate that staging table with
the main table.

In particular, I like the staging table approach because the single SQL
statement afterwards is a concise declarational code saying what you actually
want to do, relative to a client-interfacing loop which is a more verbose
imperative version.

-- Darren Duncan

Re: SQL Programming Question

From

Merlin Moncure

Date:

11 September 2010, 16:26:32

On Fri, Sep 10, 2010 at 11:07 PM,  <tony@exquisiteimages.com> wrote:
> Using PostgreSQL I can't open a table and do seeks against an index. I
> could do a select against the database and see if 0 records are returned,
> but that seems to take more time than doing a seek on an index. Is there a
> more SQL friendly way of handling this task?

I come a foxpro background too, and I can tell you for sure postgres
indexes should give you similar performance to what you're used to.
Do you have an index on the table?

merlin

Re: SQL Programming Question

From

John R Pierce

Date:

11 September 2010, 16:34:58

  On 09/11/10 12:26 PM, Merlin Moncure wrote:
> On Fri, Sep 10, 2010 at 11:07 PM,<tony@exquisiteimages.com>  wrote:
>> Using PostgreSQL I can't open a table and do seeks against an index. I
>> could do a select against the database and see if 0 records are returned,
>> but that seems to take more time than doing a seek on an index. Is there a
>> more SQL friendly way of handling this task?
> I come a foxpro background too, and I can tell you for sure postgres
> indexes should give you similar performance to what you're used to.
> Do you have an index on the table?
>
>

again, the table should have either a primary key on this field, or a
unique constraint (a primary key is by definition unique)

BEGIN transaction
     INSERT row
     If Constraint Violation,
         ROLLBACK transaction
         INSERT row into exceptions table
     ELSE
     COMMIT transaction

(thats pseudocode)

if you want to do it faster, you batch multiple inserts in a single
transaction, using savepoints before each insert and rollback the
savepoint on the exception.