Thread: ALTER TEXT field to VARCHAR(1024)
ALTER TABLE images ALTER COLUMN mask_descriptors TYPE VARCHAR(1024);
http://www.postgresql.org/docs/8.3/static/datatype-character.html
Tip: There are no performance differences between these three types, apart from increased storage size when using the blank-padded type, and a few extra cycles to check the length when storing into a length-constrained column. While character
has performance advantages in some other database systems, it has no such advantages in PostgreSQL. In most situations text or character varying should be used instead.
I am curious as to why you want to change text to VARCHAR(1024), especially as I think that the middleware should be controlling how long a string is saved in the database rather than end user client code (for several reasons., including security concerns). However, I do not know your use cases, nor your overall situation - so my concerns may not apply to you.Hello,i am using Postgres 9.2 and I'd like to perform the following ALTER statement on a database table with about 30M entries :ALTER TABLE images ALTER COLUMN mask_descriptors TYPE VARCHAR(1024);The mask_descriptors field is currently having the type TEXT.I want to perform the ALTER due to the fact that it seems that copying the contents of the table to a BI SQL Server is done in row by row (instead of batch) when handling CLOBs.From the Postgres documentation I got the following :http://www.postgresql.org/docs/8.3/static/datatype-character.html
Tip: There are no performance differences between these three types, apart from increased storage size when using the blank-padded type, and a few extra cycles to check the length when storing into a length-constrained column. While character
has performance advantages in some other database systems, it has no such advantages in PostgreSQL. In most situations text or character varying should be used instead.
Can anybody explain me what happens in the background when the alter statement is executed? I've tried it out on a small copy of the table (70K) and the operation completed in 0.2 seconds.Will the table be completely locked during the execution of the ALTER statement?
Cheers,
Gavin
On Fri, 19 Sep 2014 09:32:09 +0200 Marius Grama <mariusneo@gmail.com> wrote: > Hello, > > > i am using Postgres 9.2 and I'd like to perform the following ALTER > statement on a database table with about 30M entries : > > ALTER TABLE images ALTER COLUMN mask_descriptors TYPE VARCHAR(1024); > > > > The mask_descriptors field is currently having the type TEXT. > I want to perform the ALTER due to the fact that it seems that copying the > contents of the table to a BI SQL Server is done in row by row (instead of > batch) when handling CLOBs. > > From the Postgres documentation I got the following : > > http://www.postgresql.org/docs/8.3/static/datatype-character.html > > Tip: There are no performance differences between these three types, apart > from increased storage size when using the blank-padded type, and a few > extra cycles to check the length when storing into a length-constrained > column. While character has performance advantages in some other database > systems, it has no such advantages in PostgreSQL. In most situations text > or character varying should be used instead. > > > > Can anybody explain me what happens in the background when the alter > statement is executed? I've tried it out on a small copy of the table (70K) > and the operation completed in 0.2 seconds. > Will the table be completely locked during the execution of the ALTER > statement? I share Gavin's concern that you're fixing this in the wrong place. I expect that you'll be better served by configuring the middleware to do the right thing. However, a more direct answer to your question: VARCHAR and TEXT are _the_same_ internally. Thus: ALTER TABLE images ALTER COLUMN mask_descriptors TYPE VARCHAR; would do nothing more than change the table definition. There is no need for that statement to touch any data. However, adding the length constraint of (1024) will force Postgres to check every single value to ensure it complies with the constraint. I believe if any row is longer than 1024 it will throw an error and abort the entire ATLER. -- Bill Moran I need your help to succeed: http://gamesbybill.com
On Fri, Sep 19, 2014 at 7:16 AM, Bill Moran <wmoran@potentialtech.com> wrote: > On Fri, 19 Sep 2014 09:32:09 +0200 > Marius Grama <mariusneo@gmail.com> wrote: >> Can anybody explain me what happens in the background when the alter >> statement is executed? I've tried it out on a small copy of the table (70K) >> and the operation completed in 0.2 seconds. >> Will the table be completely locked during the execution of the ALTER >> statement? > > I share Gavin's concern that you're fixing this in the wrong place. I expect > that you'll be better served by configuring the middleware to do the right thing. I'll pile on here: in almost 20 years of professional database development I've never had an actual problem that was solved by introducing or shortening a length constraint to text columns except in cases where overlong strings violate the data model (like a two character state code for example). It's a database equivalent of "C programmer's disease". Input checks from untrusted actors should happen in the application. merlin
On Mon, Sep 22, 2014 at 10:31 AM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Fri, Sep 19, 2014 at 7:16 AM, Bill Moran <wmoran@potentialtech.com> wrote: >> On Fri, 19 Sep 2014 09:32:09 +0200 >> Marius Grama <mariusneo@gmail.com> wrote: >>> Can anybody explain me what happens in the background when the alter >>> statement is executed? I've tried it out on a small copy of the table (70K) >>> and the operation completed in 0.2 seconds. >>> Will the table be completely locked during the execution of the ALTER >>> statement? >> >> I share Gavin's concern that you're fixing this in the wrong place. I expect >> that you'll be better served by configuring the middleware to do the right thing. > > I'll pile on here: in almost 20 years of professional database > development I've never had an actual problem that was solved by > introducing or shortening a length constraint to text columns except > in cases where overlong strings violate the data model (like a two > character state code for example). It's a database equivalent of "C > programmer's disease". Input checks from untrusted actors should > happen in the application. > > merlin > I do not have your experience level with data bases, but if I may, I will make an addition. Input checks should also happen in the RDBMS server. I have learned you cannot trust end users _or_ programmers. Most are good and conscientious. But there are a few who just aren't. And those few seem to be very prolific in making _subtle_ errors. Had one person who was really good at replacing every p with a [ and P with { -- There is nothing more pleasant than traveling and meeting new people! Genghis Khan Maranatha! <>< John McKown
You don't want that string to get all the way to the server and fail, blow out a transaction and carry that joyous news back to the user who now has to start over completely. Further no mear length constraint is going to fix p<=>[. Not say the db cannot have the constraint (no [ allowed?) but a good app checks input on the fly.On Mon, Sep 22, 2014 at 10:31 AM, Merlin Moncure <mmoncure@gmail.com> wrote:On Fri, Sep 19, 2014 at 7:16 AM, Bill Moran <wmoran@potentialtech.com> wrote:On Fri, 19 Sep 2014 09:32:09 +0200 Marius Grama <mariusneo@gmail.com> wrote:Can anybody explain me what happens in the background when the alter statement is executed? I've tried it out on a small copy of the table (70K) and the operation completed in 0.2 seconds. Will the table be completely locked during the execution of the ALTER statement?I share Gavin's concern that you're fixing this in the wrong place. I expect that you'll be better served by configuring the middleware to do the right thing.I'll pile on here: in almost 20 years of professional database development I've never had an actual problem that was solved by introducing or shortening a length constraint to text columns except in cases where overlong strings violate the data model (like a two character state code for example). It's a database equivalent of "C programmer's disease". Input checks from untrusted actors should happen in the application. merlinI do not have your experience level with data bases, but if I may, I will make an addition. Input checks should also happen in the RDBMS server. I have learned you cannot trust end users _or_ programmers. Most are good and conscientious. But there are a few who just aren't. And those few seem to be very prolific in making _subtle_ errors. Had one person who was really good at replacing every p with a [ and P with {
On 22/09/14 17:18, Rob Sargent wrote: > On 09/22/2014 09:40 AM, John McKown wrote: >> On Mon, Sep 22, 2014 at 10:31 AM, Merlin Moncure <mmoncure@gmail.com> wrote: >>> On Fri, Sep 19, 2014 at 7:16 AM, Bill Moran <wmoran@potentialtech.com> wrote: >>>> On Fri, 19 Sep 2014 09:32:09 +0200 >>>> Marius Grama <mariusneo@gmail.com> wrote: >>>>> Can anybody explain me what happens in the background when the alter >>>>> statement is executed? I've tried it out on a small copy of the table (70K) >>>>> and the operation completed in 0.2 seconds. >>>>> Will the table be completely locked during the execution of the ALTER >>>>> statement? >>>> I share Gavin's concern that you're fixing this in the wrong place. I expect >>>> that you'll be better served by configuring the middleware to do the right thing. >>> I'll pile on here: in almost 20 years of professional database >>> development I've never had an actual problem that was solved by >>> introducing or shortening a length constraint to text columns except >>> in cases where overlong strings violate the data model (like a two >>> character state code for example). It's a database equivalent of "C >>> programmer's disease". Input checks from untrusted actors should >>> happen in the application. >>> >>> merlin >>> >> I do not have your experience level with data bases, but if I may, I >> will make an addition. Input checks should also happen in the RDBMS >> server. I have learned you cannot trust end users _or_ programmers. >> Most are good and conscientious. But there are a few who just aren't. >> And those few seem to be very prolific in making _subtle_ errors. Had >> one person who was really good at replacing every p with a [ and P >> with { >> >> > You don't want that string to get all the way to the server and fail, > blow out a transaction and carry that joyous news back to the user who > now has to start over completely. Further no mear length constraint > is going to fix p<=>[. Not say the db cannot have the constraint (no > [ allowed?) but a good app checks input on the fly. > > > Indeed - both is the answer; back-end (trigger) checks for safety, front-end application polite messages for clarity and ease of use. -- Tim Clarke
Sorry guess I wasn't being as clear as I thought. To be a bit more precise, I really think that validation should occur _first_ at the point of entry (for a web browser, I put in Javascript code to verify it there as well as in the web service doing the same validation because some people disable Javascript as a possible security breach vector), then also do the same, or even more, validation in the back end server. I.e. don't trust any step of the process which is not under your immediate control. As the "owner" of the data base, I want to validate the data "myself" according to the proper business rules. The application developer should also validate the input. What I don't believe in is a "trusted application" from which I would accept data and not validate it before updating the data base. If such an application were to exist, due to management dictum, I would audit everything that I could to prove any corruption to the data base was caused by this "can't ever be wrong" application. Yes, I am a paranoid. On Mon, Sep 22, 2014 at 11:18 AM, Rob Sargent <robjsargent@gmail.com> wrote: > On 09/22/2014 09:40 AM, John McKown wrote: > > On Mon, Sep 22, 2014 at 10:31 AM, Merlin Moncure <mmoncure@gmail.com> wrote: > > On Fri, Sep 19, 2014 at 7:16 AM, Bill Moran <wmoran@potentialtech.com> > wrote: > > On Fri, 19 Sep 2014 09:32:09 +0200 > Marius Grama <mariusneo@gmail.com> wrote: > > Can anybody explain me what happens in the background when the alter > statement is executed? I've tried it out on a small copy of the table (70K) > and the operation completed in 0.2 seconds. > Will the table be completely locked during the execution of the ALTER > statement? > > I share Gavin's concern that you're fixing this in the wrong place. I > expect > that you'll be better served by configuring the middleware to do the right > thing. > > I'll pile on here: in almost 20 years of professional database > development I've never had an actual problem that was solved by > introducing or shortening a length constraint to text columns except > in cases where overlong strings violate the data model (like a two > character state code for example). It's a database equivalent of "C > programmer's disease". Input checks from untrusted actors should > happen in the application. > > merlin > > I do not have your experience level with data bases, but if I may, I > will make an addition. Input checks should also happen in the RDBMS > server. I have learned you cannot trust end users _or_ programmers. > Most are good and conscientious. But there are a few who just aren't. > And those few seem to be very prolific in making _subtle_ errors. Had > one person who was really good at replacing every p with a [ and P > with { > > > You don't want that string to get all the way to the server and fail, blow > out a transaction and carry that joyous news back to the user who now has to > start over completely. Further no mear length constraint is going to fix > p<=>[. Not say the db cannot have the constraint (no [ allowed?) but a good > app checks input on the fly. > > > -- There is nothing more pleasant than traveling and meeting new people! Genghis Khan Maranatha! <>< John McKown
On Mon, Sep 22, 2014 at 10:40 AM, John McKown <john.archie.mckown@gmail.com> wrote: > On Mon, Sep 22, 2014 at 10:31 AM, Merlin Moncure <mmoncure@gmail.com> wrote: >> I'll pile on here: in almost 20 years of professional database >> development I've never had an actual problem that was solved by >> introducing or shortening a length constraint to text columns except >> in cases where overlong strings violate the data model (like a two >> character state code for example). It's a database equivalent of "C >> programmer's disease". Input checks from untrusted actors should >> happen in the application. >> >> merlin >> > > I do not have your experience level with data bases, but if I may, I > will make an addition. Input checks should also happen in the RDBMS > server. I have learned you cannot trust end users _or_ programmers. > Most are good and conscientious. But there are a few who just aren't. > And those few seem to be very prolific in making _subtle_ errors. Had > one person who was really good at replacing every p with a [ and P > with { Sure. The point is distinguishing things which are *demonstrably* false (like a US VIN must be exactly 17 chars) from those that are based assumption (such as a cityname must be <= 50 characters). The former should be validated in the schema and the latter should not be. If you're paranoid about the user submitting 100mb strings for "username" and don't trust the application to deal with that, I'd maybe consider making a domain 'safetext' which checks length on the order of a few thousand bytes and using that instead of 'text' and use it everywhere. This will prevent the dba from outsmarting the datamodel which is a *much* bigger problem in practice than the one length checks attempt to solve. Domains have certain disadvantages (like no array type) -- be advised. merlin
Merlin Moncure-2 wrote > On Mon, Sep 22, 2014 at 10:40 AM, John McKown > < > john.archie.mckown@ > > wrote: >> On Mon, Sep 22, 2014 at 10:31 AM, Merlin Moncure < > mmoncure@ > > wrote: >>> I'll pile on here: in almost 20 years of professional database >>> development I've never had an actual problem that was solved by >>> introducing or shortening a length constraint to text columns except >>> in cases where overlong strings violate the data model (like a two >>> character state code for example). It's a database equivalent of "C >>> programmer's disease". Input checks from untrusted actors should >>> happen in the application. >>> >>> merlin >>> >> >> I do not have your experience level with data bases, but if I may, I >> will make an addition. Input checks should also happen in the RDBMS >> server. I have learned you cannot trust end users _or_ programmers. >> Most are good and conscientious. But there are a few who just aren't. >> And those few seem to be very prolific in making _subtle_ errors. Had >> one person who was really good at replacing every p with a [ and P >> with { > > Sure. The point is distinguishing things which are *demonstrably* > false (like a US VIN must be exactly 17 chars) from those that are > based assumption (such as a cityname must be <= 50 characters). The > former should be validated in the schema and the latter should not be. > If you're paranoid about the user submitting 100mb strings for > "username" and don't trust the application to deal with that, I'd > maybe consider making a domain 'safetext' which checks length on the > order of a few thousand bytes and using that instead of 'text' and use > it everywhere. This will prevent the dba from outsmarting the > datamodel which is a *much* bigger problem in practice than the one > length checks attempt to solve. > > Domains have certain disadvantages (like no array type) -- be advised. > > merlin These responses all seem beside the point. The OP isn't concerned that too-long data is making it into the database but rather that an unadorned text type is functionally a CLOB which the application he is using is treating like a document instead of a smallish text field that would be treated like any other value. It's like the difference between choosing input/text or textarea in HTML. Now, some tools distinguish between "text" and "varchar" only and the length piece is irrelevant; but whether that applies here I have no idea. It might be easier to simply create a view over the table, using the desired type (truncating the actual value if needed), and feed that view to the reporting engine. In the end the two questions are: 1) does adding a length restriction cause a table rewrite? 2) what level of locking occurs while the length check is resolving? I don't confidently know the answers to those two questions. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/ALTER-TEXT-field-to-VARCHAR-1024-tp5819608p5819939.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
David G Johnston <david.g.johnston@gmail.com> writes: > In the end the two questions are: > 1) does adding a length restriction cause a table rewrite? Yes. In principle the restriction could be checked with just a scan, not a rewrite, but ALTER TABLE doesn't currently understand that --- and in any case a scan would still be potentially a long time. > 2) what level of locking occurs while the length check is resolving? AccessExclusiveLock. This would be necessary in any case for a data type change. regards, tom lane
On Mon, Sep 22, 2014 at 10:31 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Fri, Sep 19, 2014 at 7:16 AM, Bill Moran <wmoran@potentialtech.com> wrote:
>> On Fri, 19 Sep 2014 09:32:09 +0200
>> Marius Grama <mariusneo@gmail.com> wrote:
>>> Can anybody explain me what happens in the background when the alter
>>> statement is executed? I've tried it out on a small copy of the table (70K)
>>> and the operation completed in 0.2 seconds.
>>> Will the table be completely locked during the execution of the ALTER
>>> statement?
>>
>> I share Gavin's concern that you're fixing this in the wrong place. I expect
>> that you'll be better served by configuring the middleware to do the right thing.
>
> I'll pile on here: in almost 20 years of professional database
> development I've never had an actual problem that was solved by
> introducing or shortening a length constraint to text columns except
> in cases where overlong strings violate the data model (like a two
> character state code for example). It's a database equivalent of "C
> programmer's disease". Input checks from untrusted actors should
> happen in the application.
>
> merlin
>
I do not have your experience level with data bases, but if I may, I
will make an addition. Input checks should also happen in the RDBMS
server. I have learned you cannot trust end users _or_ programmers.
Most are good and conscientious. But there are a few who just aren't.
And those few seem to be very prolific in making _subtle_ errors. Had
one person who was really good at replacing every p with a [ and P
with {