Re: index prefetching - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: index prefetching |
Date | |
Msg-id | e00dac57-0181-4d9c-9eae-71346383e69b@enterprisedb.com Whole thread Raw |
In response to | Re: index prefetching (Andres Freund <andres@anarazel.de>) |
Responses |
Re: index prefetching
|
List | pgsql-hackers |
Hi, Here's a somewhat reworked version of the patch. My initial goal was to see if it could adopt the StreamingRead API proposed in [1], but that turned out to be less straight-forward than I hoped, for two reasons: (1) The StreamingRead API seems to be designed for pages, but the index code naturally works with TIDs/tuples. Yes, the callbacks can associate the blocks with custom data (in this case that'd be the TID), but it seemed a bit strange ... (2) The place adding requests to the StreamingRead queue is pretty far from the place actually reading the pages - for prefetching, the requests would be generated in nodeIndexscan, but the page reading happens somewhere deep in index_fetch_heap/heapam_index_fetch_tuple. Sure, the TIDs would come from a callback, so it's a bit as if the requests were generated in heapam_index_fetch_tuple - but it has no idea StreamingRead exists, so where would it get it. We might teach it about it, but what if there are multiple places calling index_fetch_heap()? Not all of which may be using StreamingRead (only indexscans would do that). Or if there are multiple index scans, there's need to be a separate StreamingRead queues, right? In any case, I felt a bit out of my depth here, and I chose not to do all this work without discussing the direction here. (Also, see the point about cursors and xs_heap_continue a bit later in this post.) I did however like the general StreamingRead API - how it splits the work between the API and the callback. The patch used to do everything, which meant it hardcoded a lot of the IOS-specific logic etc. I did plan to have some sort of "callback" for reading from the queue, but that didn't quite solve this issue - a lot of the stuff remained hard-coded. But the StreamingRead API made me realize that having a callback for the first phase (that adds requests to the queue) would fix that. So I did that - there's now one simple callback in for index scans, and a bit more complex callback for index-only scans. Thanks to this the hard-coded stuff mostly disappears, which is good. Perhaps a bigger change is that I decided to move this into a separate API on top of indexam.c. The original idea was to integrate this into index_getnext_tid/index_getnext_slot, so that all callers benefit from the prefetching automatically. Which would be nice, but it also meant it's need to happen in the indexam.c code, which seemed dirty. This patch introduces an API similar to StreamingRead. It calls the indexam.c stuff, but does all the prefetching on top of it, not in it. If a place calling index_getnext_tid() wants to allow prefetching, it needs to switch to IndexPrefetchNext(). (There's no function that would replace index_getnext_slot, at the moment. Maybe there should be.) Note 1: The IndexPrefetch name is a bit misleading, because it's used even with prefetching disabled - all index reads from the index scan happen through it. Maybe it should be called IndexReader or something like that. Note 2: I left the code in indexam.c for now, but in principle it could (should) be moved to a different place. I think this layering makes sense, and it's probably much closer to what Andres meant when he said the prefetching should happen in the executor. Even if the patch ends up using StreamingRead in the future, I guess we'll want something like IndexPrefetch - it might use the StreamingRead internally, but it would still need to do some custom stuff to detect I/O patterns or something that does not quite fit into the StreamingRead. Now, let's talk about two (mostly unrelated) problems I ran into. Firstly, I realized there's a bit of a problem with cursors. The prefetching works like this: 1) reading TIDs from the index 2) stashing them into a queue in IndexPrefetch 3) doing prefetches for the new TIDs added to the queue 4) returning the TIDs to the caller, one by one And all of this works ... unless the direction of the scan changes. Which for cursors can happen if someone does FETCH BACKWARD or stuff like that. I'm not sure how difficult it'd be to make this work. I suppose we could simply discard the prefetched entries and do the right number of steps back for the index scan. But I haven't tried, and maybe it's more complex than I'm imagining. Also, if the cursor changes the direction a lot, it'd make the prefetching harmful. The patch simply disables prefetching for such queries, using the same logic that we do for parallelism. This may be over-zealous. FWIW this is one of the things that probably should remain outside of StreamingRead API - it seems pretty index-specific, and I'm not sure we'd even want to support these "backward" movements in the API. The other issue I'm aware of is handling xs_heap_continue. I believe it works fine for "false" but I need to take a look at non-MVCC snapshots (i.e. when xs_heap_continue=true). I haven't done any benchmarks with this reworked API - there's a couple more allocations etc. but it did not change in a fundamental way. I don't expect any major difference. regards [1] https://www.postgresql.org/message-id/CA%2BhUKGJkOiOCa%2Bmag4BF%2BzHo7qo%3Do9CFheB8%3Dg6uT5TUm2gkvA%40mail.gmail.com -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
pgsql-hackers by date: