Cache relation sizes? - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Cache relation sizes? |
Date | |
Msg-id | CAEepm=3SSw-Ty1DFcK=1rU-K6GSzYzfdD4d+ZwapdN7dTa6=nQ@mail.gmail.com Whole thread Raw |
Responses |
Re: Cache relation sizes?
Re: Cache relation sizes? |
List | pgsql-hackers |
Hello, PostgreSQL likes to probe the size of relations with lseek(SEEK_END) a lot. For example, a fully prewarmed pgbench -S transaction consists of recvfrom(), lseek(SEEK_END), lseek(SEEK_END), sendto(). I think lseek() is probably about as cheap as a syscall can be so I doubt it really costs us much, but it's still a context switch and it stands out when tracing syscalls, especially now that all the lseek(SEEK_SET) calls are gone (commit c24dcd0cfd). If we had a different kind of buffer mapping system of the kind that Andres has described, there might be a place in shared memory that could track the size of the relation. Even if we could do that, I wonder if it would still be better to do a kind of per-backend lock-free caching, like this: 1. Whenever a file size has been changed by extending or truncating (ie immediately after the syscall), bump a shared "size_change" invalidation counter. 2. Somewhere in SMgrRelation, store the last known size_change counter and the last known size. In _mdnblocks(), if the counter hasn't moved, we can use the cached size and skip the call to FileSize(). 3. To minimise false cache invalidations (caused by other relations' size changes), instead of using a single size_change counter in shared memory, use an array of N and map relation OIDs onto them. 4. As for memory coherency, I think it might be enough to use uint32 without locks or read barriers on the read size, since you have a view of memory at least as new as your snapshot (the taking of which included a memory barrier). That's good enough because we don't need to see blocks added after our snapshot was taken (the same assumption applies today, this just takes further advantage of it), and truncations can't happen while we have a share lock on the relation (the taking of which also includes memory barrier, covering the case where the truncation happened after our snapshot and the acquisition of the share lock on the relation). In other words, there is heavy locking around truncation already, and for extension we don't care about recent extensions so we can be quite relaxed about memory. Right? I don't have a patch for this (though I did once try it as a throw-away hack and it seemed to work), but I just wanted to share the idea and see if anyone sees a problem with the logic/interlocking, or has a better idea for how to do this. It occurred to me that I might be missing something or this would have been done already... -- Thomas Munro http://www.enterprisedb.com
pgsql-hackers by date: