Thread: SLRU limits
While reviewing the SLRU code in predicate.c again, I remembered this old thread: http://archives.postgresql.org/pgsql-hackers/2010-12/msg02374.php SLRU has a limit of 64k segment files, because the files are named using four hex digits like "00CE". Kevin's math shows that that's just enough to store 2^32 four-byte integers, which wasn't enough for predicate.c, which needs to store uint64s. Kevin worked around that by simply limiting the max range of open xids to fit the SLRU limit, ie. 2^31. However, that math was based on 8k block size, and the situation is worse for smaller block sizes. If you set BLCKSZ to 2048 or less, pg_subtrans can only hold 1 billion transactions. With 1024 block size, only half a billion. It's awfully late in the release cycle, but how about we add another digit to the filenames used by SLRU, to up the limit? At a quick glance, I don't see any protection against wrapping around page numbers in subtrans.c, so that ought to be fixed somehow. And it would make the SLRU code in predicate.c simpler (I note that the warning logic at least is wrong as it is - it doesn't take XID wrap-around into account). -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Thu, Jun 9, 2011 at 7:46 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > While reviewing the SLRU code in predicate.c again, I remembered this old > thread: > > http://archives.postgresql.org/pgsql-hackers/2010-12/msg02374.php > > SLRU has a limit of 64k segment files, because the files are named using > four hex digits like "00CE". Kevin's math shows that that's just enough to > store 2^32 four-byte integers, which wasn't enough for predicate.c, which > needs to store uint64s. Kevin worked around that by simply limiting the max > range of open xids to fit the SLRU limit, ie. 2^31. However, that math was > based on 8k block size, and the situation is worse for smaller block sizes. > If you set BLCKSZ to 2048 or less, pg_subtrans can only hold 1 billion > transactions. With 1024 block size, only half a billion. I'm pretty unexcited about this. It's not terribly sane to keep a transaction open for half a billion XIDs anyway, because of VACUUM. And I would guess that there's a lot more interest in raising BLCKSZ than lowering it. It might not be a bad idea to adopt the fix you propose anyway, but it doesn't seem urgent. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > It's awfully late in the release cycle, but how about we add another > digit to the filenames used by SLRU, to up the limit? It's way too late for that kind of thing, unless you are saying that SSI in and of itself is going to cause a release slip. (Which I'm getting the uncomfortable feeling may be true anyway.) That is not a one-line kind of fix --- it is likely to have a lot of unforeseen consequences. And who builds with smaller-than-default BLCKSZ anyway? regards, tom lane
On 09.06.2011 15:50, Robert Haas wrote: > On Thu, Jun 9, 2011 at 7:46 AM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >> While reviewing the SLRU code in predicate.c again, I remembered this old >> thread: >> >> http://archives.postgresql.org/pgsql-hackers/2010-12/msg02374.php >> >> SLRU has a limit of 64k segment files, because the files are named using >> four hex digits like "00CE". Kevin's math shows that that's just enough to >> store 2^32 four-byte integers, which wasn't enough for predicate.c, which >> needs to store uint64s. Kevin worked around that by simply limiting the max >> range of open xids to fit the SLRU limit, ie. 2^31. However, that math was >> based on 8k block size, and the situation is worse for smaller block sizes. >> If you set BLCKSZ to 2048 or less, pg_subtrans can only hold 1 billion >> transactions. With 1024 block size, only half a billion. > > I'm pretty unexcited about this. It's not terribly sane to keep a > transaction open for half a billion XIDs anyway, because of VACUUM. I agree, but if you or your application is insane enough to do that anyway, it's not appropriate response for the system to give warnings about apparent XID wrap-around, and in the worst case mix up subtransactions belonging to different transactions, possibly leading to data loss. I have not actually tested that, but I don't see any safeguards to stop it from happening. Of course, another alternative is to add such safeguards. > And I would guess that there's a lot more interest in raising BLCKSZ > than lowering it. It might not be a bad idea to adopt the fix you > propose anyway, but it doesn't seem urgent. I guess we could fix pg_subtrans by not allowing BLCKSZ < 8k. That leaves the problem with pg_serial. Kevin has already worked around, but I'm not very happy with that workaround. If we don't want to change it wholesale, one option would be to support different lengths of filenames in slru.c for different slrus. At a quick glance, it seems pretty easy. That would allow keeping clog unchanged - that's the one that's most likely to have unforeseen consequences if changed. pg_subtrans and pg_serial are more ephemeral, they don't need to be retained over shutdown, so they seem less likely to cause trouble. That seems like the best option to me. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > On 09.06.2011 15:50, Robert Haas wrote: >> And I would guess that there's a lot more interest in raising BLCKSZ >> than lowering it. It might not be a bad idea to adopt the fix you >> propose anyway, but it doesn't seem urgent. > I guess we could fix pg_subtrans by not allowing BLCKSZ < 8k. That > leaves the problem with pg_serial. Kevin has already worked around, but > I'm not very happy with that workaround. > If we don't want to change it wholesale, one option would be to support > different lengths of filenames in slru.c for different slrus. At a quick > glance, it seems pretty easy. That would allow keeping clog unchanged - > that's the one that's most likely to have unforeseen consequences if > changed. pg_subtrans and pg_serial are more ephemeral, they don't need > to be retained over shutdown, so they seem less likely to cause trouble. > That seems like the best option to me. I agree with Robert that this is completely not urgent. If you want to fool with it for 9.2, fine, but let's not destabilize 9.1 for it. (BTW, while I've not looked at the SLRU code in several years, I'm quite unconvinced that this is only a matter of filename lengths.) regards, tom lane
On 09.06.2011 19:24, Tom Lane wrote: > (BTW, while I've not looked at the SLRU code in several years, I'm quite > unconvinced that this is only a matter of filename lengths.) I don't see anything but the filename length needing adjustment. In fact, when I hacked predicate.c to ignore the issue and use too high page numbers, I ended up with files like "100AB" in the directory. So slru.c goes merrily above the "limit", it just won't recognize them when it's time to truncate the slru because of the unexpected length, and will not clean them up. (perhaps not worth risking it in 9.1 anyway, though..) -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Is this a TODO? --------------------------------------------------------------------------- Tom Lane wrote: > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > > On 09.06.2011 15:50, Robert Haas wrote: > >> And I would guess that there's a lot more interest in raising BLCKSZ > >> than lowering it. It might not be a bad idea to adopt the fix you > >> propose anyway, but it doesn't seem urgent. > > > I guess we could fix pg_subtrans by not allowing BLCKSZ < 8k. That > > leaves the problem with pg_serial. Kevin has already worked around, but > > I'm not very happy with that workaround. > > > If we don't want to change it wholesale, one option would be to support > > different lengths of filenames in slru.c for different slrus. At a quick > > glance, it seems pretty easy. That would allow keeping clog unchanged - > > that's the one that's most likely to have unforeseen consequences if > > changed. pg_subtrans and pg_serial are more ephemeral, they don't need > > to be retained over shutdown, so they seem less likely to cause trouble. > > That seems like the best option to me. > > I agree with Robert that this is completely not urgent. If you want to > fool with it for 9.2, fine, but let's not destabilize 9.1 for it. > > (BTW, while I've not looked at the SLRU code in several years, I'm quite > unconvinced that this is only a matter of filename lengths.) > > regards, tom lane > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Bruce Momjian <bruce@momjian.us> wrote: > Is this a TODO? > Tom Lane wrote: >> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: >>> If we don't want to change it wholesale, one option would be to >>> support different lengths of filenames in slru.c for different >>> slrus. At a quick glance, it seems pretty easy. That would allow >>> keeping clog unchanged - that's the one that's most likely to >>> have unforeseen consequences if changed. pg_subtrans and >>> pg_serial are more ephemeral, they don't need to be retained >>> over shutdown, so they seem less likely to cause trouble. That >>> seems like the best option to me. >> >> I agree with Robert that this is completely not urgent. If you >> want to fool with it for 9.2, fine, but let's not destabilize 9.1 >> for it. >> >> (BTW, while I've not looked at the SLRU code in several years, >> I'm quite unconvinced that this is only a matter of filename >> lengths.) I think it should be. I agree that the workaround is kinda ugly, and this is really the only clean solution I see. I don't think it's so ugly that it takes precedence over trying to bring some of Robert's LW locking improvements into the SSI area, so this one is likely not to get into 9.2, in spite of appearing to be a smaller change. So a TODO entry seems like the right way to deal with it. -Kevin