Re: [INTERFACES] (libpq question) Holy cow, what's all this fluff?! - Mailing list pgsql-interfaces
From | Tom Lane |
---|---|
Subject | Re: [INTERFACES] (libpq question) Holy cow, what's all this fluff?! |
Date | |
Msg-id | 10723.919024706@sss.pgh.pa.us Whole thread Raw |
In response to | Re: [INTERFACES] (libpq question) Holy cow, what's all this fluff?! (Matthew Hagerty <matthew@venux.net>) |
List | pgsql-interfaces |
Matthew Hagerty <matthew@venux.net> writes: > At 01:45 PM 2/14/99 -0500, Tom Lane wrote: >> Whatever docs you are looking at are obsolete. char(n), varchar(n), and >> text have essentially interchangeable performance and representation. > The docs that mention the performance hit were really the FAQ, item 3.10. > Check it out. The FAQ is mostly right if you read it very carefully, but I think it's misleading. : CHAR char 1 character : CHAR(#) bpchar blank padded to the specified fixed length : VARCHAR(#) varchar size specifies maximum length, no padding : TEXT text length limited only by maximum tuple length : BYTEA bytea variable-length array of bytes : : The last four types above are "varlena" types (i.e. the first four : bytes are the length, followed by the data). char(#) allocates the : maximum number of bytes no matter how much data is stored in the : field. text, varchar(#), and bytea all have variable length on the : disk, and because of this, there is a small performance penalty for : using them. Specifically, the penalty is for access to all columns : after the first column of this type. The FAQ is wrong to claim that char is not a varlena type --- it is in fact the same as char(1). (There is a true single-byte type named char defined internally, but an appearance of "char" in a CREATE TABLE query gets translated to char(1) instead. The inconsistency of internal and external type names has doubtless contributed to confusion in this area.) As the FAQ correctly states, char(n) is a varlena type; that means that you pay a four-byte overhead for the length word, plus an alignment penalty of zero to three bytes to ensure the length word is long-aligned. (This is why I remarked that char(1) is effectively an 8-byte type: if you put several of them in a row, they take up 8 bytes apiece.) This space overhead, plus the time overhead from dealing with non-fixed-length columns, is the performance penalty paid for any varlena type. One particular component of the time cost for varlena types is that the within-tuple offset of later columns is not fixed, meaning that fastgetattr() can't cache their offsets, but has to scan the tuple data to find them. Although char(n) is represented as a varlena type, there is a special hack in fastgetattr() to make it treat char(n) as fixed-size for the specific purpose of deciding whether later columns' offsets are cacheable. The FAQ is correct to describe this effect as "small", but perhaps it should say "very small" --- for one thing the offset cache is irrelevant if you have any NULL fields in the current tuple. I wouldn't worry about it myself, and I certainly wouldn't choose an inappropriate data type for my application in order to avoid it. The FAQ is not doing you any service to suggest that you should prefer char(n) when you don't really want a fixed-length string. The space penalty of the unwanted pad data is likely to cost you far more than you can hope to save by making fastgetattr() a little quicker. > Also, item 3.8 mentions psql's \do command, however when I > type \do nothing happens and I have to ctrl-c to get the prompt back. \do works for me, but it's pretty slow (about 20 seconds on my hardware) ... how long did you give it? regards, tom lane
pgsql-interfaces by date: