Re: Copy From & Insert UNLESS - Mailing list pgsql-hackers
From | James William Pye |
---|---|
Subject | Re: Copy From & Insert UNLESS |
Date | |
Msg-id | 20060206214610.GB22529@lit.jwp.name Whole thread Raw |
In response to | Re: Copy From & Insert UNLESS (Josh Berkus <josh@agliodbs.com>) |
Responses |
Re: Copy From & Insert UNLESS
|
List | pgsql-hackers |
On Mon, Feb 06, 2006 at 11:03:06AM -0800, Josh Berkus wrote: > Are you sure that a new type of constraint is the way to go for this? [Thinking that you are referring to the new constraint mode that I was confusingly referring to...] Well, it really wouldn't be new. It's just labeling what we do now as something other than "immediate". Considering that immediate constraints are meant to be checked at the end of the SQL-statement, and our implementation of immediate is truly "immediate", as Stephan pointed out to me. However, I think our current timing method is better for normal cases, at least for Postgres, than what the spec specifies. [See pages 63-66: The second paragraph in 4.17.2 "Checking of constraints"] Ultimately, I don't care about this very much. However, I think an implementation of my proposal would aid in implementing spec compliant immediate timing. [If I misunderstood what you were getting at, sorry. :] > It doesn't solve our issues in the data warehousing space. The spec we > started with for "Error-tolerant COPY" is: > > 1) It must be able to handle parsing errors (i.e. bad char set); My proposal did not handle this, and purposefully so. A constraint violation, while inhibiting insertion into the target table would still yield a "kosher" tuple--just not okay for that table, which could then be dropped or redirected using the "THEN INSERT INTO" into another precisely structured table for later analysis. Bad data errors would not even have a tuple to work with in the first place, which is why I wanted to draw a distinction. I think having something to handle bad data is useful, but I think it should be distinct, syntactically and implementation-wise, from constraint violations. That's not to say that it couldn't fit into the model that "UNLESS" would try to create:"COPY ... UNLESS BAD DATA [ON COLUMN (y)] OR CONSTRAINT VIOLATION [ON (z)] ..." > 2) It must be able to handle constraint violations; Check. :) > 3) It must output all row errors to a log or "errors" table which makes > it possible to determine which input row failed and why; Check; save data errors for now. > 4) It must not slow significantly (like, not more than 15%) the speed of > bulk loading. Check. (See below) > It seems like your idea, which would involve a second constraint > check, would achieve neigher #1 nor #4. I'm not proposing that a second constraint check should be made. The difficulty of my implementation comes from the position that I don't think the current implementation of UNIQUE constraints is ideal. It is "hidden" inside nbtree, which, while convenient, is not likely to be the best place for it. I believe my original letter covered this by proposing a new pg_am column; one that would hold a regproc that would be able to 'scan for insert' and return the state(position, locks, whether an entry exists, anything else necessary for a quick insert) of that scan to the caller for later use in the actual insert or update. All other constraints appear to require trivial modifications to get it to work with UNLESS without any redundancy. -- Regards, James William Pye
pgsql-hackers by date: