Home > mailing lists

Re: Sequential scans - Mailing list pgsql-hackers

From	Jeff Davis
Subject	Re: Sequential scans
Date	May 2, 2007 19:37:09
Msg-id	1178145421.28383.189.camel@dogma.v10.wvs Whole thread Raw
In response to	Re: Sequential scans (Heikki Linnakangas <heikki@enterprisedb.com>)
Responses	Re: Sequential scans
List	pgsql-hackers

Tree view

On Wed, 2007-05-02 at 20:58 +0100, Heikki Linnakangas wrote:
> Jeff Davis wrote:
> > What should be the maximum size of this hash table? 
> 
> Good question. And also, how do you remove entries from it?
> 
> I guess the size should somehow be related to number of backends. Each 
> backend will realistically be doing just 1 or max 2 seq scan at a time. 
>   It also depends on the number of large tables in the databases, but we 
> don't have that information easily available. How about using just 
> NBackends? That should be plenty, but wasting a few hundred bytes of 
> memory won't hurt anyone.

One entry per relation, not per backend, is my current design.

> I think you're going to need an LRU list and counter of used entries in 
> addition to the hash table, and when all entries are in use, remove the 
> least recently used one.
> 
> The thing to keep an eye on is that it doesn't add too much overhead or 
> lock contention in the typical case when there's no concurrent scans.
> 
> For the locking, use a LWLock.
> 

Ok. What would be the potential lock contention in the case of no
concurrent scans?

Also, is it easy to determine the space used by a dynahash with N
entries? I haven't looked at the dynahash code yet, so perhaps this will
be obvious.

> No, not the segment. RelFileNode consists of tablespace oid, database 
> oid and relation oid. You can find it in scan->rs_rd->rd_node. The 
> segmentation works at a lower level.

Ok, will do.

> Hmm. Should we care then? CFG is the default on Linux, and an average 
> sysadmin is unlikely to change it.
> 

Keep in mind that concurrent sequential scans with CFQ are *already*
very poor. I think that alone is an interesting fact that's somewhat
independent of Sync Scans.

> - when ReadBuffer is called, let the caller know if the read did 
> physical I/O.
> - when the previous ReadBuffer didn't result in physical I/O, assume 
> that we're not the pack leader. If the next buffer isn't already in 
> cache, wait a few milliseconds before initiating the read, giving the 
> pack leader a chance to do it instead.
> 
> Needs testing, of course..
> 

An interesting idea. I like that the most out of the ideas of
maintaining a "pack leader". That's very similar to what the Linux
anticipatory scheduler does for us.

> >> 4. It fails regression tests. You get an assertion failure on the portal 
> >> test. I believe that changing the direction of a scan isn't handled 
> >> properly; it's probably pretty easy to fix.
> >>
> > 
> > I will examine the code more carefully. As a first guess, is it possible
> > that test is failing because of the non-deterministic order in which
> > tuples are returned?
> 
> No, it's an assertion failure, not just different output than expected. 
> But it's probably quite simple to fix..
> 

Ok, I'll find and correct it then.

Regards,Jeff Davis

pgsql-hackers by date:

From: Tom Lane
Date: 02 May 2007, 19:25:13
Subject: Re: [ADMIN] reindexdb hangs

From: Heikki Linnakangas
Date: 02 May 2007, 20:00:32
Subject: Re: Sequential scans

Re: Sequential scans - Mailing list pgsql-hackers

Previous

Next