Thread: Re: User defined types -- Social Security number...
"Michael Chaney" <mdchaney@michaelchaney.com> wrote in message news:20040301211843.GB19105@michaelchaney.com... > On Mon, Mar 01, 2004 at 09:42:48AM -0800, Steve Crawford wrote: > > I missed the start of this thread but will chime in with a comment > > anyway. > > > > My rule is to select an appropriate numeric type of data if you will > > be doing numeric types of things to it, character types if you will > > be doing character manipulations, etc. > > > > I don't know of any good reasons to need to know SSN/6.9, sqrt(SSN), > > SSN+7.86 but there are plenty of good reasons to need the first three > > characters (the "area number"), the middle two characters (the "group > > number", and the last 4 characters (the "serial number", often > > (ab)used as a password for banking and other purposes). > > Another excellent point. I often store zip codes as text for this > reason. > > The only other thing that I would mention is that if the SSN field in > the db will be a key of some sort, which is often the case, then it > might be more efficient to store it as an integer. It might be more > efficient to store it as a character string. The author should test > in this case to determine the most efficient way. > > As for character vs. integer manipulations, in most scripting style > languages, which is pretty much exlusively what I use, there's no > need to think about types, and something like an SSN will silently > change between being character or integer depending on what operations > are being performed on it. > > Michael > -- > Michael Darrin Chaney > mdchaney@michaelchaney.com > http://www.michaelchaney.com/ > > ---------------------------(end of broadcast)--------------------------- > TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match > Ther are some other points I'd like to make -- If I store the SSN as an integer -- theoretically -- leading zeroes will be stripped (041-99-9999) -- my OWN ssn is a perfect example of this as it starts with a leading zero... This would cause a problem in that one of the requirements of an SSN is that the length be exactly 9 digits or 9 chars WITHOUT the dashes.... so a CHECK CONSTRAINT would be useful... But if the SSN is stored as an integer -- there is no check constraint that wouldn't fail for SSNs that start with one or more zeroes.... So I thought how about a varchar(9) field and insert/update triggers that do the formatting (adding the dashes on insert/update --) and validate the check contraints (9 chars + the dashes)... The two extra characters making a varchar(11) field are not a concern in the normalization or schema... I simply wanted a formatting function so that I dont have to do it in my scripting language or use the same CAST over and over and over in my select/insert/update statements.... I am mainly looking to do the formatting automatically rather than having to constantly format such a simple piece of data... It would be really sweet in postgreSQL if we could apply the equivalent of a printf(columnname) to the table definition -- MS Access has what they call an "input mask" and it comes in really handy -- however -- I havent used Access for anthing serious for about 4 years... -- Greg Patnude / The Digital Demention 2916 East Upper Hayden Lake Road Hayden Lake, ID 83835 (208) 762-0762
"Greg Patnude" <gpatnude@hotmail.com> wrote in message news:c22ceg$s1a$1@news.hub.org... > "Michael Chaney" <mdchaney@michaelchaney.com> wrote in message > news:20040301211843.GB19105@michaelchaney.com... > > On Mon, Mar 01, 2004 at 09:42:48AM -0800, Steve Crawford wrote: > > > I missed the start of this thread but will chime in with a comment > > > anyway. > > > > > > My rule is to select an appropriate numeric type of data if you will > > > be doing numeric types of things to it, character types if you will > > > be doing character manipulations, etc. > > > > > > I don't know of any good reasons to need to know SSN/6.9, sqrt(SSN), > > > SSN+7.86 but there are plenty of good reasons to need the first three > > > characters (the "area number"), the middle two characters (the "group > > > number", and the last 4 characters (the "serial number", often > > > (ab)used as a password for banking and other purposes). > > > > Another excellent point. I often store zip codes as text for this > > reason. > > > > The only other thing that I would mention is that if the SSN field in > > the db will be a key of some sort, which is often the case, then it > > might be more efficient to store it as an integer. It might be more > > efficient to store it as a character string. The author should test > > in this case to determine the most efficient way. > > > > As for character vs. integer manipulations, in most scripting style > > languages, which is pretty much exlusively what I use, there's no > > need to think about types, and something like an SSN will silently > > change between being character or integer depending on what operations > > are being performed on it. > > > > Michael > > -- > > Michael Darrin Chaney > > mdchaney@michaelchaney.com > > http://www.michaelchaney.com/ > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 9: the planner will ignore your desire to choose an index scan if your > > joining column's datatypes do not match > > > > Ther are some other points I'd like to make -- > > If I store the SSN as an integer -- theoretically -- leading zeroes will be > stripped (041-99-9999) -- my OWN ssn is a perfect example of this as it > starts with a leading zero... > > This would cause a problem in that one of the requirements of an SSN is that > the length be exactly 9 digits or 9 chars WITHOUT the dashes.... so a CHECK > CONSTRAINT would be useful... > > But if the SSN is stored as an integer -- there is no check constraint that > wouldn't fail for SSNs that start with one or more zeroes.... So I thought > how about a varchar(9) field and insert/update triggers that do the > formatting (adding the dashes on insert/update --) and validate the check > contraints (9 chars + the dashes)... > > The two extra characters making a varchar(11) field are not a concern in the > normalization or schema... I simply wanted a formatting function so that I > dont have to do it in my scripting language or use the same CAST over and > over and over in my select/insert/update statements.... > > I am mainly looking to do the formatting automatically rather than having to > constantly format such a simple piece of data... > > It would be really sweet in postgreSQL if we could apply the equivalent of a > printf(columnname) to the table definition -- MS Access has what they call > an "input mask" and it comes in really handy -- however -- I havent used > Access for anthing serious for about 4 years... > > > -- > Greg Patnude / The Digital Demention > 2916 East Upper Hayden Lake Road > Hayden Lake, ID 83835 > (208) 762-0762 You might want to look at CREATE DOMAIN e.g. (for ISBNs, we want to check the format, and the check digit - replace with suitable regex and validation function for social security numbers) CREATE OR REPLACE FUNCTION utilities."validate_ISBN_check_digit"(char(10)) RETURNS boolean AS ' DECLARE isbn_sum int:=0; BEGIN IF ($1 ~ ''^[0-9]{9}[0-9Xx]{1}$''::text) THEN FOR iIN 1..10 LOOP isbn_sum:= CASE WHEN substring($1 from i for 1) IN (''X'',''x'') AND i=10 THEN isbn_sum + (11-i * 10) ELSE isbn_sum + (11-i * substring($1 from i for 1)::int) END; END LOOP; IF mod(isbn_sum,11) = 0 THEN RETURN ''t''; ENDIF; END IF; RETURN ''f''; END; ' LANGUAGE 'plpgsql'; COMMENT ON FUNCTION utilities."validate_ISBN_check_digit"(char(10)) is 'validation function for ISBN check digits'; CREATE DOMAIN utilities."ISBN" AS char(10) CONSTRAINT "ISBN format" CHECK (VALUE ~ '^[0-9]{9}[0-9Xx]{1}$'::text) CONSTRAINT "ISBN checkdigit" CHECK (utilities."validate_ISBN_check_digit"(VALUE)); Hope that helps -- Tom Hebbron www.hebbron.com
"Greg Patnude" <gpatnude@hotmail.com> writes: > It would be really sweet in postgreSQL if we could apply the > equivalent of a printf(columnname) to the table definition -- MS > Access has what they call an "input mask" and it comes in really > handy -- however -- I havent used Access for anthing serious for > about 4 years... Hmm. % man "create type" NAME CREATE TYPE - define a new data type SYNOPSIS CREATE TYPE name AS ( attribute_name data_type [, ... ] ) CREATE TYPE name ( INPUT = input_function, OUTPUT = output_function [ , RECEIVE = receive_function] [ , SEND = send_function ] [ , INTERNALLENGTH = { internallength | VARIABLE } ] [ , PASSEDBYVALUE ] [ , ALIGNMENT = alignment ] [ , STORAGE = storage ] [ , DEFAULT = default] [ , ELEMENT = element ] [ , DELIMITER = delimiter ] ) "input_function" and "output_function" sure _sound_ like what you're looking for. They would make your imagined issue of "missing leading zeros" go away. I think that CREATE TYPE has been around for a goodly few years now; it is not impossible that this functionality has been around longer than MS Access (which is saying something!). -- output = reverse("gro.mca" "@" "enworbbc") http://www3.sympatico.ca/cbbrowne/x.html Rules of the Evil Overlord #116. "If I capture the hero's starship, I will keep it in the landing bay with the ramp down, only a few token guards on duty and a ton of explosives set to go off as soon as it clears the blast-range." <http://www.eviloverlord.com/>