Re: update on TOAST status' - Mailing list pgsql-hackers
From | JanWieck@t-online.de (Jan Wieck) |
---|---|
Subject | Re: update on TOAST status' |
Date | |
Msg-id | 200007071130.NAA24645@hot.jw.home Whole thread Raw |
In response to | Re: update on TOAST status' (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: update on TOAST status'
|
List | pgsql-hackers |
Tom Lane wrote: > JanWieck@t-online.de (Jan Wieck) writes: > > Tom Lane wrote: > >> One simple answer that might help for other scenarios too is to keep > >> a small cache of the last few values that had to be untoasted. Maybe > >> we only need it for moved-off values --- it could be that decompression > >> is fast enough that we should just do it over rather than trying to > >> cache. > > > I'm still argueing that indexing huge values is a hint for a > > misleading schema. If this is true, propagating toasted > > attributes into indices is a dead end street and I'd have to > > change the heap-access<->toaster interface so that the > > modified (stored) main tuple isn't visible to the following > > code (that does the index inserts). > > But you'll notice that is *not* what I suggested. A detoasted-value Haven't missed it in the first read - of course. > cache could be useful in more situations than just an index lookup. > I don't necessarily say we've got to have it in 7.1, but let's keep > the idea in mind in case we start finding there is a bottleneck here. > > > What is the value of supporting index tuples >2K? > > If you're toasting the whole main tuple down to <2K, you might find > yourself toasting individual fields that are a good bit less than > that. So I don't think indexing a toasted value will be all that > unusual. Exactly that's why I'm asking if we wouldn't be better off by limiting index tuples to (blocksize - overhead) / 4 and allways store plain, untoasted values in indices. I've asked now a couple of times "who really has the need for indexing huge values"? All responses I got so far where of the kind "would be nice if we support it" or "I don't like such restrictions". But noone really said "Ineed it". > But this is all speculation for now. Let's get it working bulletproof > for 7.1, and then worry about speedups after we know they are needed. Let me speculate too a little. The experience I have up to now is that the saved time from requiring less blocks in the buffer cache outweights thecost of decompression. Especially with our algorithm, because it is byte oriented (instead of huffman coding beeingbased on a bit stream), causing it to be extremely fast on decompression. And the technique ofmoving off values from the main heap causes the main tuples to be much smaller. As long as the toasted valuesaren't used in qualification or joining, only their references move around through the various executor steps, and only those values that are part of the final result set need to be fetched when sending them tothe client. Given a limited amount of total memory available for one running postmaster, we save alot of disk I/O and hold more values in their compressed format in the shared buffers. With the limit on total memory, the size of the buffercache must be lowered by the size of the new detoasted cache, and that only if we make it shared too. Givenfurther an average of 50% compression ratio (what's not unlikely with typical input like html pages), one cacheddetoasted value would require two compressed ones to go away. Wouldn't really surprise me if we gain speed from it in the average query. Even if some operations might slow down (sorting on maybe toasted fields). We need to see some results and wait for reports for this. But we know already that it can cause trouble with indexed fields, because these are likely to be used for comparision during scans. So do we want to have indices storing plain values allways and limit them in the index-tuple size or not? Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
pgsql-hackers by date: