Home > mailing lists

Re: Copy From & Insert UNLESS - Mailing list pgsql-hackers

From	James William Pye
Subject	Re: Copy From & Insert UNLESS
Date	February 6, 2006 18:27:30
Msg-id	20060206214610.GB22529@lit.jwp.name Whole thread Raw
In response to	Re: Copy From & Insert UNLESS (Josh Berkus <josh@agliodbs.com>)
Responses	Re: Copy From & Insert UNLESS
List	pgsql-hackers

Tree view

On Mon, Feb 06, 2006 at 11:03:06AM -0800, Josh Berkus wrote:
> Are you sure that a new type of constraint is the way to go for this?

[Thinking that you are referring to the new constraint mode that I was
confusingly referring to...]

Well, it really wouldn't be new. It's just labeling what we do now as something
other than "immediate". Considering that immediate constraints are meant to be
checked at the end of the SQL-statement, and our implementation of immediate is
truly "immediate", as Stephan pointed out to me. However, I think our current
timing method is better for normal cases, at least for Postgres, than what the
spec specifies.
[See pages 63-66: The second paragraph in 4.17.2 "Checking of constraints"]

Ultimately, I don't care about this very much. However, I think an
implementation of my proposal would aid in implementing spec compliant
immediate timing.

[If I misunderstood what you were getting at, sorry. :]

> It doesn't solve our issues in the data warehousing space.  The spec we 
> started with for "Error-tolerant COPY" is:
>
> 1) It must be able to handle parsing errors (i.e. bad char set);

My proposal did not handle this, and purposefully so. A constraint violation,
while inhibiting insertion into the target table would still yield a "kosher"
tuple--just not okay for that table, which could then be dropped or redirected
using the "THEN INSERT INTO" into another precisely structured table for later
analysis. Bad data errors would not even have a tuple to work with in the first
place, which is why I wanted to draw a distinction.

I think having something to handle bad data is useful, but I think it should be
distinct, syntactically and implementation-wise, from constraint violations.

That's not to say that it couldn't fit into the model that "UNLESS" would try to
create:"COPY ... UNLESS BAD DATA [ON COLUMN (y)] OR CONSTRAINT VIOLATION [ON (z)] ..."

> 2) It must be able to handle constraint violations;

Check. :)

> 3) It must output all row errors to a log or "errors" table which makes 
> it possible to determine which input row failed and why;

Check; save data errors for now.

> 4) It must not slow significantly (like, not more than 15%) the speed of 
> bulk loading.

Check. (See below)

> It seems like your idea, which would involve a second constraint 
> check, would achieve neigher #1 nor #4.

I'm not proposing that a second constraint check should be made.

The difficulty of my implementation comes from the position that I don't think
the current implementation of UNIQUE constraints is ideal. It is "hidden"
inside nbtree, which, while convenient, is not likely to be the best place for
it. I believe my original letter covered this by proposing a new pg_am column;
one that would hold a regproc that would be able to 'scan for insert' and return
the state(position, locks, whether an entry exists, anything else necessary for
a quick insert) of that scan to the caller for later use in the actual insert or
update. All other constraints appear to require trivial modifications to get it
to work with UNLESS without any redundancy.
-- 
Regards, James William Pye

pgsql-hackers by date:

From: Chris Browne
Date: 06 February 2006, 18:27:18
Subject: Re: [PORTS] Failed install - libgen.so doesn't exist

From: Josh Berkus
Date: 06 February 2006, 18:51:41
Subject: Re: Copy From & Insert UNLESS

Re: Copy From & Insert UNLESS - Mailing list pgsql-hackers

Previous

Next