Re: Avoiding rewrite in ALTER TABLE ALTER TYPE - Mailing list pgsql-hackers
From | Noah Misch |
---|---|
Subject | Re: Avoiding rewrite in ALTER TABLE ALTER TYPE |
Date | |
Msg-id | 20101229234613.GB30520@tornado.gateway.2wire.net Whole thread Raw |
In response to | Re: Avoiding rewrite in ALTER TABLE ALTER TYPE (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Avoiding rewrite in ALTER TABLE ALTER TYPE
|
List | pgsql-hackers |
On Wed, Dec 29, 2010 at 10:56:39AM -0500, Robert Haas wrote: > On Dec 29, 2010, at 7:56 AM, Noah Misch <noah@leadboat.com> wrote: > > Having thought on it > > more, though, it actually seems best to attempt the verification scan *every* > > time. In most ineligible conversions, an inequality will appear very early, so > > the scan is effectively O(1) in the negative case. A notable exception is > > something like char(6)->varchar(6) in a table with a billion tuples having > > length(col) = 6 and one with length(col) = 5. The verification scan might read > > most of the table before finding the one tuple that forces a rewrite. That > > isn't a particularly regular scenario in my experience, so the "just do the > > right thing" aspect of preceding every potential rewrite with a verification > > scan seems to win out. > > I think this scenario will be more common than you might think. Tables don't contain random data; they contain data thatthe DBA thinks is valid. The situation where the data is mostly as you expect but with a few kooky rows is, in my experience,extremely common. Perhaps. A few kooky rows is indeed common, but we're talking about a specific breed of kookiness: 99.9% of the rows have identical bits after an ALTER TYPE transformation expression, and 0.1% have different bits. Is that common? In case it was not obvious, I'll note that any error thrown by a transformation expression during the verification scan still aborts the ALTER TABLE. A varchar(20)->varchar(10) that finds an 11-char string will fail permanently during the verification scan. Indeed, the primary value of the verification scan is to distinguish positive and error, not positive and negative. Expanding on my introduction, none of the following can yield a negative verification scan; the result is always positive or an error: CREATE DOMAIN loosedom AS text; CREATE DOMAIN tightdom AS text CHECK (value LIKE '<%/>'); CREATE TABLE t (c varchar(6)); INSERT INTO t VALUES ('<abc/>'),('<de/>'); ALTER TABLE t ALTER c TYPE varchar(8); ALTER TABLE t ALTER c TYPE text; ALTER TABLE t ALTER c TYPE loosedom; ALTER TABLE t ALTER c TYPE xml USING c::xml; ALTER TABLE t ALTER c TYPE varchar(64); ALTER TABLE t ALTER c TYPE tightdom; Adding a bpchar into the mix makes a negative verification scan possible, as does a USING clause having a truncating effect. Continuing the example, these can and would get a negative verification scan: ALTER TABLE t ALTER c TYPE character(6); ALTER TABLE t ALTER c TYPE varchar(5) USING c::varchar(5); Plenty of academic USING clause examples exist: ALTER TABLE t ALTER c TYPE varchar(8) USING CASE c WHEN '<de/>' THEN 'foo' ELSE c END; Verification scans for conversions between fundamentally different types will generally end in the negative at the first tuple. Consider {timestamp,bigint, real,numeric,varbit}->text, int->bigint, interval->reltime, etc. I can't think of a decent non-academic example where heterogeneous conversions like these will dupe the verification scan for even a handful of tuples. Granted, one can flip that around as an argument for declaring the conversions that have a chance. > And it makes the worst case a LOT worse. I suppose the absolute worst case would involve an ALTER TABLE adding brutally expensive CHECK constraints, such that the cost of computing those constraints would dominate the cost of both the verification scan and the rewrite, yielding a 100% slower ALTER TABLE run. A more realistic bad case might be a table much larger than memory with no indexes, and the verification scan adds a full seq scan for nothing. A crude test here has rewriting such a table taking 7x as long as a seq scan on it. By the back of that envelope, we'll take about 15% more time. The right mix of expensive expressions will raise that percentage, and the presence of indexes will drop it. Remember though, we're still only talking about the relatively-rare cases that even can get a negative verification scan. > I really doubt this is worth the complexity anyway - We'd have the verification scan regardless of how we choose when to use it, because how else would we implement no-rewrite varchar(8)->varchar(4) or text->xml? An unconditional verification scan is merely the most trivial algorithm for deciding when to employ it. Those conversions are semantically similar to adding CHECK constraints, and in that sense we already have an initial verification scan implementation: ATRewriteTable(..., InvalidOid, ...). > converting between two types that are mostly-but-not-quite binary compatible seems like an edge case in every sense. Yes. Indeed, that's the intuitive basis for my hypothesis that the verification scan will usually either fail early. I don't advocate this approach to pick up edge cases, but to pick up reasonable cases _without explicit annotations_ showing them to be achievable. Take the text->xml example, certainly of genuine value if not top-frequency. I see three ways to ensure we do a verification scan for it: 1. Have the user tell us: ALTER TABLE t ALTER c TYPE xml USING c::xml IMPLICIT 2. Mark the text->xml cast as "possibly no-rewrite" and look for that 3. Do a verification scan every time (1) gets the job done, but quality of user experience is lacking. (2) is best for the user, as long as the annotations are well-maintained, but it adds the most maintenance burden. (3) has the lowest maintenance burden and, for common use cases, the user-experience quality of (2), but it can significantly add to the ALTER TABLE runtime in rare cases. One can also do (1) with (2) to provide an override when the annotations are incomplete. Ultimately, any of these would work for my own needs. What's your preference? Are there other notable options that preserve full functionality? Thanks, nm
pgsql-hackers by date: