Re: Unhappy about API changes in the no-fsm-for-small-rels patch - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Unhappy about API changes in the no-fsm-for-small-rels patch |
Date | |
Msg-id | CA+TgmobgKtO8MwdwY5tp_Sqr8OZ_s+DX1OhMXFM+eyz77mCDKg@mail.gmail.com Whole thread Raw |
In response to | Re: Unhappy about API changes in the no-fsm-for-small-rels patch (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Unhappy about API changes in the no-fsm-for-small-rels patch
Re: Unhappy about API changes in the no-fsm-for-small-rels patch |
List | pgsql-hackers |
On Mon, May 6, 2019 at 11:27 AM Andres Freund <andres@anarazel.de> wrote: > > I think it's legitimate to question whether sending additional > > invalidation messages as part of the design of this feature is a good > > idea. If it happens frequently, it could trigger expensive sinval > > resets more often. I don't understand the various proposals well > > enough to know whether that's really a problem, but if you've got a > > lot of relations for which this optimization is in use, I'm not sure I > > see why it couldn't be. > > I don't think it's an actual problem. We'd only do so when creating an > FSM, or when freeing up additional space that'd otherwise not be visible > to other backends. The alternative to sinval would thus be a) not > discovering there's free space and extending the relation b) checking > disk state for a new FSM all the time. Which are much more expensive. None of that addresses the question of the distributed cost of sending more sinval messages. If you have a million little tiny relations and VACUUM goes through and clears one tuple out of each one, it will be spewing sinval messages really, really fast. How can that fail to threaten extra sinval resets? > > I think at some point it was proposed that, since an FSM access > > involves touching 3 blocks, it ought to be fine for any relation of 4 > > or fewer blocks to just check all the others. I don't really > > understand why we drifted off that design principle, because it seems > > like a reasonable theory. Such an approach doesn't require anything > > in the relcache, any global variables, or an every-other-page > > algorithm. > > It's not that cheap to touch three heap blocks every time a new target > page is needed. Requires determining at least the target relation size > or the existance of the FSM fork. > > We'll also commonly *not* end up touching 3 blocks in the FSM - > especially when there's actually no free space. And the FSM contents are > much less contended than the heap pages - the hot paths don't update the > FSM, and if so, the exclusive locks are held for a very short time only. Well, that seems like an argument that we just shouldn't do this at all. If the FSM is worthless for small relations, then eliding it makes sense. But if having it is valuable even when the relation is tiny, then eliding it is the wrong thing to do, isn't it? The underlying concerns that prompted this patch probably have to do with either [1] not wanting to have so many FSM forks on disk or [2] not wanting to consume 24kB of space to track free space for a relation that may be only 8kB. I think those goals are valid, but if we accept your argument then this is the wrong way to achieve them. I do find it a bit surprising that touching heap pages would be all that much more expensive than touching FSM pages, but that doesn't mean that it isn't the case. I would also note that this algorithm ought to beat the FSM algorithm in many cases where there IS space available, because you'll often find some usable free space on the very first page you try, which will never happen with the FSM. The case where the pages are all full doesn't seem very important, because I don't see how you can stay in that situation for all that long. Each time it happens, the relation grows by a block immediately afterwards, and once it hits 5 blocks, it never happens again. I guess you could incur the overhead repeatedly if the relation starts out at 1 block, grows to 4, is vacuumed back down to 1, lather, rinse, repeat, but is that actually realistic? It requires all the live tuples to live in block 0 at the beginning of each vacuum cycle, which seems like a fringe outcome. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: