Re: Alternative to serial primary key - Mailing list pgsql-sql
From | David Clarke |
---|---|
Subject | Re: Alternative to serial primary key |
Date | |
Msg-id | 12b7ac1e0607070107t78a808aeuc8aba982d593abff@mail.gmail.com Whole thread Raw |
In response to | Re: Alternative to serial primary key (Scott Marlowe <smarlowe@g2switchworks.com>) |
Responses |
Re: Alternative to serial primary key
Re: Alternative to serial primary key |
List | pgsql-sql |
On 7/7/06, Scott Marlowe <smarlowe@g2switchworks.com> wrote: > On Thu, 2006-07-06 at 16:43, Aaron Bono wrote: > I'll repeat my previous statement that this is premature optimization, > and the hash is kind the wrong direction. > > If you store an int and the 1 to 100 characters in a varchar, you'll > have about 4 to 8 bytes for the int (6 I think, but it's been a while) > plus 1 to 200 or possibly more for the characters in the address. > > If you use C local with ASCII encoding, you can get single byte. > > If you switch to an md5 hash, you'll need ~50 bytes (average address > about 1/2 max length, just a guess) plus 32 bytes, plus the extra bytes > to keep track of the length of the fields. > > The table now becomes wider itself, and the md5 is generally about as > big as the address, or fairly close to it. > > And you've got the possibility of md5 collisions to deal with. > > I'd say just FK off of the address field. It's a natural key, fairly > small (100 bytes ain't really that big) and your primary key never needs > any kind of regenerating or anything, because it's already there. > > Just set it up with cascading updates and deletes in case you need to > edit it in the future. > > The first rule of optimization: Don't > Yep, this was pretty much where I started from and I totally agree with you regarding premature optimisation. I would point out that md5 hash is 128 bits or 16 bytes and not 32 so the difference between the hash value and the source data is somewhat larger than you've suggested. My original post a few weeks back was really about the use of a natural key of varchar(100) as a foreign key in other tables. The response was to not do it and instead use a serial which is basically how I was progressing. Celko's comments re the use of autonumbering schemes have obviously been gnawing away at me. That is why I asked the question about the hashing approach which I saw as a way to fulfill a lot of the criteria he has suggested for a primary key and still have a reasonable value to use as a foreign key. Please ignore the regeneration comment, it was written in haste and not because I really anticipate any need to regenerate my primary key at any stage. The question remains regarding the use of a string value as a primary key for the table and as a foreign key in other tables. If I use the address column as a foreign key in a differrent table will postgres physically duplicate the data or will it simply attach the appropriate index magically behind the scenes? I agree that the address column is fairly small and I've heard of others using the likes of GUIDs as key values which are not a lot smaller than I would expect my average address to be. Thanks Dave Dave