Re: Sequential scans - Mailing list pgsql-hackers
From | Jeff Davis |
---|---|
Subject | Re: Sequential scans |
Date | |
Msg-id | 1178214849.28383.264.camel@dogma.v10.wvs Whole thread Raw |
In response to | Re: Sequential scans (Heikki Linnakangas <heikki@enterprisedb.com>) |
Responses |
Re: Sequential scans
|
List | pgsql-hackers |
On Thu, 2007-05-03 at 09:25 +0100, Heikki Linnakangas wrote: > The hash table keeps track of ongoing seq scans. That's presumably > related to number of backends; I can't imagine a plan where a backend is > executing more than one seq scan at a time, so that gives us an upper > bound of NBackends simultaneous seq scans in the system. > What I was trying to say before is that, in my design, it keeps track of relations being scanned, not scans happening. So 5 backends scanning the same table would result in one entry that consists of the most recently- read block by any of those backends for that relation. Part of the reason I did this is because, with a Sync Scan, it seems like there may be possible reasons to do multiple seq scans concurrently in the same backend. This is completely contrived, but consider: SELECT * FROM bigtable UNION ALL SELECT * FROM bigtable; I'm not trying to argue for such a plan to exist right now, but if there is a possibility that we'd want to create such plans in the future, I think it's better to track by relation rather than by backend. There is really no reason that I can see to track 5 positions of 5 different backends scanning the same relation. It introduces a lot of new questions, such as: (1) When starting a new scan on relation R, with which backend do we choose to synchronize? (2) If 3/5 of those scans have finished (and there is therefore no point to synchronize with those scans), how do we find the two that are still moving? I think storing one entry per relation being scanned, regardless of the number of backends, nicely avoids both of those questions. > > What if I just make an LRU that occupies a fixed size? Any reads or > > updates can start at the MRU item and work backward, and any evictions > > can happen at the LRU item. > > > > Does a hash table really save us cycles if we keep the list small, say, > > 100 items? Looking at all 100 items would only be required perhaps on a > > scan startup. I don't have a good sense of scale here, is following 50 > > pointers while holding a lock a significant source of contention? > > Hmm. If you only need to scan through it when you start the scan, I > suppose it would be ok. The idea is that the list would never grow longer than the total number of tables that are larger than sync_seqscan_threshold, and would have some maximum (to fit into fixed-size shared memory). The list would be a bi-directional LRU list (MRU at head, LRU at tail). When evicting an relation from the list due to space exhaustion, it would delete the tail of the list. When a new scan starts and it's already in the list, it would most likely just need to read a few of the hot elements at the head of the list before it found the relation in question. When it's not in the list, the scan may need to read all the way through to see that it's not there. > > 100 is of course arbitrary, and that could grow or shrink automatically > > at runtime. > > Except that it can't shrink, because we don't have any convenient moment > to remove an entry from the list, other than dropping the least recently > used entry out of the list when inserting a new one. And since it's in > shared mem, the allocated size is fixed at boot up, so we can't grow it > either. > It can shrink when a new scan looks through the list and passes by entries that don't need to be in the list. I don't want to over-complicate things, so if you think having a hash table is a better, simpler, or more readable design, I'll go with that instead of the list-only approach. The only reason I suggested the list- only approach was to see if the hash table was really gaining anything for us. In practice, it will be quite rare that more than 10 different big relations are being scanned at once, and I don't know how much of a gain a hash table is when there are (in all but exceptional cases) only a few entries. Regards,Jeff Davis
pgsql-hackers by date: