Thread: storing binary data
I want to store some binary data in Postgres. The data is an MD5 checksum of the user's password, in binary. It will be exactly 16 bytes (since it is a one-way hash). Can I store this safely in a CHAR column? Can the data be treated normally (i.e. compare a binary checksum of the password the user entered with the binary value stored in the DB, etc). Are there any issues I'll need to watch out for when storing/manipulating binary data? (I was going to RTFM and search the mailing list archives, but they don't seem to be working.) TIA, Neil -- Neil Conway <neilconway@home.com> Get my GnuPG key from: http://klamath.dyndns.org/mykey.asc Encrypted mail welcomed Secrecy is the beginning of tyranny. -- Heinlein
Attachment
Neil Conway <nconway@klamath.dyndns.org> writes: > I want to store some binary data in Postgres. The data is an > MD5 checksum of the user's password, in binary. It will be > exactly 16 bytes (since it is a one-way hash). > Can I store this safely in a CHAR column? No. CHAR and friends assume there are no null (zero) bytes. In MULTIBYTE setups there are probably additional constraints. You could use bytea, but I would recommend converting the checksum to a hex digit string and then storing that in a char-type field. Hex is the usual textual representation for MD5 values, no? regards, tom lane
On Mon, Oct 16, 2000 at 11:22:40PM -0400, Tom Lane wrote: > Neil Conway <nconway@klamath.dyndns.org> writes: > > I want to store some binary data in Postgres. The data is an > > MD5 checksum of the user's password, in binary. It will be > > exactly 16 bytes (since it is a one-way hash). > > > Can I store this safely in a CHAR column? > > No. CHAR and friends assume there are no null (zero) bytes. > In MULTIBYTE setups there are probably additional constraints. > > You could use bytea, but I would recommend converting the checksum > to a hex digit string and then storing that in a char-type field. > Hex is the usual textual representation for MD5 values, no? It is, but (IMHO) it's a big waste of space. The actual MD5 digest is 128 bits. If stored in binary form, it's 16 bytes. If stored in hex form (as ASCII), it's 32 characters @ 1 byte per character = 32 bytes. In Unicode, that's 64 bytes (correct me if I'm wrong). It's not a huge deal, but it would be nice to store this efficiently. Is this possible? TIA, Neil -- Neil Conway <neilconway@home.com> Get my GnuPG key from: http://klamath.dyndns.org/mykey.asc Encrypted mail welcomed Being able to break into computers doesn't make you a hacker any more than being able to hotwire cars makes you an automotive engineer.
Attachment
* Neil Conway <nconway@klamath.dyndns.org> [001016 20:41] wrote: > On Mon, Oct 16, 2000 at 11:22:40PM -0400, Tom Lane wrote: > > Neil Conway <nconway@klamath.dyndns.org> writes: > > > I want to store some binary data in Postgres. The data is an > > > MD5 checksum of the user's password, in binary. It will be > > > exactly 16 bytes (since it is a one-way hash). > > > > > Can I store this safely in a CHAR column? > > > > No. CHAR and friends assume there are no null (zero) bytes. > > In MULTIBYTE setups there are probably additional constraints. > > > > You could use bytea, but I would recommend converting the checksum > > to a hex digit string and then storing that in a char-type field. > > Hex is the usual textual representation for MD5 values, no? > > It is, but (IMHO) it's a big waste of space. The actual MD5 digest is > 128 bits. If stored in binary form, it's 16 bytes. If stored in hex > form (as ASCII), it's 32 characters @ 1 byte per character = 32 bytes. > In Unicode, that's 64 bytes (correct me if I'm wrong). > > It's not a huge deal, but it would be nice to store this efficiently. > Is this possible? Why not use base64? It's pretty gross but might work for you. -Alfred
Neil Conway <nconway@klamath.dyndns.org> writes: >> You could use bytea, but I would recommend converting the checksum >> to a hex digit string and then storing that in a char-type field. >> Hex is the usual textual representation for MD5 values, no? > It is, but (IMHO) it's a big waste of space. The actual MD5 digest is > 128 bits. If stored in binary form, it's 16 bytes. If stored in hex > form (as ASCII), it's 32 characters @ 1 byte per character =3D 32 bytes. You're worried about 16 bytes per pg_shadow entry? Get real. I'd have recommended bytea if the amount of storage involved were actually significant, but for this application readability seems more important. regards, tom lane
On Mon, Oct 16, 2000 at 11:57:36PM -0400, Tom Lane wrote: > Neil Conway <nconway@klamath.dyndns.org> writes: > > It is, but (IMHO) it's a big waste of space. The actual MD5 digest is > > 128 bits. If stored in binary form, it's 16 bytes. If stored in hex > > form (as ASCII), it's 32 characters @ 1 byte per character =3D 32 bytes. > > You're worried about 16 bytes per pg_shadow entry? Get real. I'd > have recommended bytea if the amount of storage involved were actually > significant, but for this application readability seems more important. To clarify, these are 'application users', not Postgres users. So the info is stored in one of my own tables, not pg_shadow. Although I agree, this isn't a big deal either way. Alfred: thanks for the tip. Looks like base64 will solve my problems! Thanks to everyone who responded, Neil -- Neil Conway <neilconway@home.com> Get my GnuPG key from: http://klamath.dyndns.org/mykey.asc Encrypted mail welcomed Violence is to dictatorship as propaganda is to democracy. -- Noam Chomsky