Re: Sequential scans - Mailing list pgsql-hackers
From | Jeff Davis |
---|---|
Subject | Re: Sequential scans |
Date | |
Msg-id | 1178145421.28383.189.camel@dogma.v10.wvs Whole thread Raw |
In response to | Re: Sequential scans (Heikki Linnakangas <heikki@enterprisedb.com>) |
Responses |
Re: Sequential scans
|
List | pgsql-hackers |
On Wed, 2007-05-02 at 20:58 +0100, Heikki Linnakangas wrote: > Jeff Davis wrote: > > What should be the maximum size of this hash table? > > Good question. And also, how do you remove entries from it? > > I guess the size should somehow be related to number of backends. Each > backend will realistically be doing just 1 or max 2 seq scan at a time. > It also depends on the number of large tables in the databases, but we > don't have that information easily available. How about using just > NBackends? That should be plenty, but wasting a few hundred bytes of > memory won't hurt anyone. One entry per relation, not per backend, is my current design. > I think you're going to need an LRU list and counter of used entries in > addition to the hash table, and when all entries are in use, remove the > least recently used one. > > The thing to keep an eye on is that it doesn't add too much overhead or > lock contention in the typical case when there's no concurrent scans. > > For the locking, use a LWLock. > Ok. What would be the potential lock contention in the case of no concurrent scans? Also, is it easy to determine the space used by a dynahash with N entries? I haven't looked at the dynahash code yet, so perhaps this will be obvious. > No, not the segment. RelFileNode consists of tablespace oid, database > oid and relation oid. You can find it in scan->rs_rd->rd_node. The > segmentation works at a lower level. Ok, will do. > Hmm. Should we care then? CFG is the default on Linux, and an average > sysadmin is unlikely to change it. > Keep in mind that concurrent sequential scans with CFQ are *already* very poor. I think that alone is an interesting fact that's somewhat independent of Sync Scans. > - when ReadBuffer is called, let the caller know if the read did > physical I/O. > - when the previous ReadBuffer didn't result in physical I/O, assume > that we're not the pack leader. If the next buffer isn't already in > cache, wait a few milliseconds before initiating the read, giving the > pack leader a chance to do it instead. > > Needs testing, of course.. > An interesting idea. I like that the most out of the ideas of maintaining a "pack leader". That's very similar to what the Linux anticipatory scheduler does for us. > >> 4. It fails regression tests. You get an assertion failure on the portal > >> test. I believe that changing the direction of a scan isn't handled > >> properly; it's probably pretty easy to fix. > >> > > > > I will examine the code more carefully. As a first guess, is it possible > > that test is failing because of the non-deterministic order in which > > tuples are returned? > > No, it's an assertion failure, not just different output than expected. > But it's probably quite simple to fix.. > Ok, I'll find and correct it then. Regards,Jeff Davis
pgsql-hackers by date: