TOASTing smaller things - Mailing list pgsql-hackers
From | Chris Browne |
---|---|
Subject | TOASTing smaller things |
Date | |
Msg-id | 60slbyk0xb.fsf@dba2.int.libertyrms.com Whole thread Raw |
In response to | HELP all women were raped during the May riots in Jakarta (adm@pu.go.id) |
Responses |
Re: TOASTing smaller things
Re: TOASTing smaller things |
List | pgsql-hackers |
In some of our applications, we have cases where it would be very nice if we could activate TOAST at some sort of lower threshold than the usual 2K that is true now. Let me note the current code that controls the threshold: /** These symbols control toaster activation. If a tuple is larger than* TOAST_TUPLE_THRESHOLD, we will try to toast itdown to no more than* TOAST_TUPLE_TARGET bytes. Both numbers include all tuple header overhead* and between-fields alignmentpadding, but we do *not* consider any* end-of-tuple alignment padding; hence the values can be compared directly*to a tuple's t_len field. We choose TOAST_TUPLE_THRESHOLD with the* knowledge that toast-table tuples will be exactlythat size, and we'd* like to fit four of them per page with minimal space wastage.** The numbers need not be the same,though they currently are.** Note: sizeof(PageHeaderData) includes the first ItemId, but we have* to allow for 3 more,if we want to fit 4 tuples on a page.*/ #define TOAST_TUPLE_THRESHOLD^I\ ^IMAXALIGN_DOWN((BLCKSZ - \ ^I^I^I^I MAXALIGN(sizeof(PageHeaderData) + 3 * sizeof(ItemIdData))) \ ^I^I^I^I / 4) We have cases where we're storing XML message information which is near the 0.5K mark, that being the case, tuples virtually never get TOASTed. somesystem=# select min(length(xml)), max(length(xml)), avg(length(xml)), stddev(length(xml)) from table_with_xml;min | max | avg | stddev -----+------+----------------------+------------------244 | 2883 | 651.6900720788174376 | 191.602077911138 (1 row) I can see four controls as being pretty plausible: 1. Compile time... #define TOAST_DENOMINATOR 17 /* Use this as the divisor; current default behaviour falls from TOAST_DENOMINATOR = 4 */ #define TOAST_TUPLE_THRESHOLD^I\ ^IMAXALIGN_DOWN((BLCKSZ - \ ^I^I^I^I MAXALIGN(sizeof(PageHeaderData) + 3 * sizeof(ItemIdData))) \ ^I^I^I^I / TOAST_DENOMINATOR) That's obviously cheapest to the DB engine. I just made this change to a checkout of CVS HEAD, and it readily survived a regression test. 2. GUC value for TOAST_DENOMINATOR Do the above, but with the added detail that TOAST_DENOMINATOR refers to a GUC value. I think I could probably make this change; the principle remains much the same as with #1. 3. GUC value for TOAST_TUPLE_THRESHOLD This probably has to get modified to the nearest feasible value, modulo alignment; it's not all that different from #1 or #2. 4. A different mechanism would be to add a fifth storage column strategy (the present four are PLAIN, EXTENDED, EXTERNAL, MAIN), let's say, TOAST. At present, the 4 values are essentially advisory; columns get TOASTed if the column permits EXTENDED storage, but that only occurs if the size is greater than TOAST_TUPLE_THRESHOLD. If the new value was chosen, the column would *always* get stored as TOAST. Presumably #1 or #2 could readily get into 8.3 as they're pretty easy; #3 is a bit trickier, whilst #4 is probably not "8.3-fittable". Question: Which of these sounds preferable? -- let name="cbbrowne" and tld="cbbrowne.com" in name ^ "@" ^ tld;; http://cbbrowne.com/info/finances.html Where do you *not* want to go today? "Confutatis maledictis, flammis acribus addictis" (<http://www.hex.net/~cbbrowne/msprobs.html>
pgsql-hackers by date: