Thread: FSM rewrite committed, loose ends
I finally committed the new version of the FSM rewrite, after fixing the latest round of Tom's comments. There's a few loose ends I'm going to address as subsequent patches: pg_relation_size() doesn't include the size of the FSM. Should it? I'm thinking "no", but pg_total_relation_size() should. The FSM is not updated during WAL replay. That means that after crash recovery, the FSM won't be completely up-to-date, but at roughly the state it was at last checkpoint. In a warm stand-by, the FSM will reflect the situation at last full backup. We need to think when the FSM should be updated during WAL replay. Probably not after every record, because of the overhead, but certainly more often than never. VACUUM VERBOSE output no longer prints the number of pages with "usable free space", because we no longer track such a value during the vacuum. You can use contrib/pg_freespacemap to view the contents of the FSM in detail, but should VACUUM VERBOSE still print something about the amount of free space on the relation? Perhaps the total amount of free space in the relation? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Tue, Sep 30, 2008 at 4:32 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:
No
+1
I haven't seen the code, but would it be possible to have a resource manager for FSM, and let that rmgr accumulate the FSM updates in memory and then flush the changes after a threshold (or at Checkpoint record, whichever is first).
I vote for contrib/pg_freespacemap functions to be included in the core since FSM is in core. If that happens, we wouldn't need VACUUM for reporting this.
Best regards,
--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com
EnterpriseDB http://www.enterprisedb.com
Mail sent from my BlackLaptop device
pg_relation_size() doesn't include the size of the FSM. Should it? I'm thinking "no",
No
but pg_total_relation_size() should.
+1
The FSM is not updated during WAL replay. That means that after crash recovery, the FSM won't be completely up-to-date, but at roughly the state it was at last checkpoint. In a warm stand-by, the FSM will reflect the situation at last full backup. We need to think when the FSM should be updated during WAL replay. Probably not after every record, because of the overhead, but certainly more often than never.
I haven't seen the code, but would it be possible to have a resource manager for FSM, and let that rmgr accumulate the FSM updates in memory and then flush the changes after a threshold (or at Checkpoint record, whichever is first).
VACUUM VERBOSE output no longer prints the number of pages with "usable free space", because we no longer track such a value during the vacuum. You can use contrib/pg_freespacemap to view the contents of the FSM in detail, but should VACUUM VERBOSE still print something about the amount of free space on the relation? Perhaps the total amount of free space in the relation?
I vote for contrib/pg_freespacemap functions to be included in the core since FSM is in core. If that happens, we wouldn't need VACUUM for reporting this.
Best regards,
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com
EnterpriseDB http://www.enterprisedb.com
Mail sent from my BlackLaptop device
"Gurjeet Singh" <singh.gurjeet@gmail.com> writes: > I vote for contrib/pg_freespacemap functions to be included in the core > since FSM is in core. The old FSM was in core, too. That's not a helpful argument. regards, tom lane
Hi, Le mardi 30 septembre 2008, Heikki Linnakangas a écrit : > pg_relation_size() doesn't include the size of the FSM. Should it? I'm > thinking "no", but pg_total_relation_size() should. What's practical about pg_relation_size() and pg_total_relation_size() as of 8.3 is that the diff is the cumulated indexes storage volume. Your proposal makes it harder to get this information, but sounds good otherwise. Would it be possible to add in some new APIs to?a. pg_relation_size()b. pg_relation_fsm_size()c. pg_relation_indexes_size()d.pg_total_relation_size() = a + b + c This scheme will need new functions for each new kind of forks, but I think it's a good idea for the user to be able to know which fork is responsible for what on-disk volume. > VACUUM VERBOSE output no longer prints the number of pages with "usable > free space", because we no longer track such a value during the vacuum. > You can use contrib/pg_freespacemap to view the contents of the FSM in > detail, but should VACUUM VERBOSE still print something about the amount > of free space on the relation? Perhaps the total amount of free space in > the relation? What about another function to get just this information?e. pg_relation_free_space() Question for the slow readers: this new FSM scheme being dynamic, it's no longer possible to have table bloat, right? (where table bloat is full of dead-for-any-transaction tuples, and you have to CLUSTER or VACUUM FULL to be able to reuse the space it takes) Regards, keep up the good (team) work :) -- dim
On Tue, Sep 30, 2008 at 6:09 PM, Dimitri Fontaine <dfontaine@hi-media.com> wrote:
You forgot the toast size.
Best regards,
--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com
EnterpriseDB http://www.enterprisedb.com
Mail sent from my BlackLaptop device
Hi,
Le mardi 30 septembre 2008, Heikki Linnakangas a écrit :> pg_relation_size() doesn't include the size of the FSM. Should it? I'mWhat's practical about pg_relation_size() and pg_total_relation_size() as of
> thinking "no", but pg_total_relation_size() should.
8.3 is that the diff is the cumulated indexes storage volume. Your proposal
makes it harder to get this information, but sounds good otherwise.
Would it be possible to add in some new APIs to?
a. pg_relation_size()
b. pg_relation_fsm_size()
c. pg_relation_indexes_size()
d. pg_total_relation_size() = a + b + c
You forgot the toast size.
Best regards,
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com
EnterpriseDB http://www.enterprisedb.com
Mail sent from my BlackLaptop device
Dimitri Fontaine wrote: > Question for the slow readers: this new FSM scheme being dynamic, it's no > longer possible to have table bloat, right? > (where table bloat is full of dead-for-any-transaction tuples, and you have to > CLUSTER or VACUUM FULL to be able to reuse the space it takes) No, it didn't change that. Regular VACUUMing or autovacuum is still needed. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas napsal(a): > > The FSM is not updated during WAL replay. That means that after crash > recovery, the FSM won't be completely up-to-date, but at roughly the > state it was at last checkpoint. In a warm stand-by, the FSM will > reflect the situation at last full backup. We need to think when the FSM > should be updated during WAL replay. Probably not after every record, > because of the overhead, but certainly more often than never. > What's about after a page write during a WAL replay? Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql
Le mardi 30 septembre 2008, Heikki Linnakangas a écrit : > Dimitri Fontaine wrote: > > Question for the slow readers: this new FSM scheme being dynamic, it's no > > longer possible to have table bloat, right? > > (where table bloat is full of dead-for-any-transaction tuples, and you > > have to CLUSTER or VACUUM FULL to be able to reuse the space it takes) > > No, it didn't change that. Regular VACUUMing or autovacuum is still needed. But IIUC correctly it's no longer possible for PostgreSQL to forget about where bloat is, so regular vacuuming will know how clean out any bloat any time? For example, a purge script which does a large DELETE could overrun the max_fsm_pages setting, resulting in permanent bloat (until table rewrite). This large DELETE will now be tracked completely by the new FSM, so autovacuum will be able to have the space reused later? -- dim
Gurjeet Singh wrote: > On Tue, Sep 30, 2008 at 6:09 PM, Dimitri Fontaine <dfontaine@hi-media.com>wrote: >> What's practical about pg_relation_size() and pg_total_relation_size() as >> of >> 8.3 is that the diff is the cumulated indexes storage volume. Your proposal >> makes it harder to get this information, but sounds good otherwise. >> Would it be possible to add in some new APIs to? >> a. pg_relation_size() >> b. pg_relation_fsm_size() >> c. pg_relation_indexes_size() >> d. pg_total_relation_size() = a + b + c > > You forgot the toast size. Yeah, pg_total_relation_size() - pg_relation_size() is not equal to the total size of indexes because of that. But you can do SUM(pg_relation_size(index)) across all the indexes for that: SELECT SUM(pg_relation_size(i.oid)) FROM pg_index x JOIN pg_class c ON c.oid = x.indrelid JOIN pg_class i ON i.oid= x.indexrelid WHERE i.relkind = 'i'::"char" AND c.relname='foo'; -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Dimitri Fontaine wrote: > Le mardi 30 septembre 2008, Heikki Linnakangas a écrit : >> Dimitri Fontaine wrote: >>> Question for the slow readers: this new FSM scheme being dynamic, it's no >>> longer possible to have table bloat, right? >>> (where table bloat is full of dead-for-any-transaction tuples, and you >>> have to CLUSTER or VACUUM FULL to be able to reuse the space it takes) >> No, it didn't change that. Regular VACUUMing or autovacuum is still needed. > > But IIUC correctly it's no longer possible for PostgreSQL to forget about > where bloat is, so regular vacuuming will know how clean out any bloat any > time? That has never been a problem. VACUUM has always found and removed all dead tuples. > For example, a purge script which does a large DELETE could overrun the > max_fsm_pages setting, resulting in permanent bloat (until table rewrite). > This large DELETE will now be tracked completely by the new FSM, so > autovacuum will be able to have the space reused later? It isn't really permanent. If you increase max_fsm_pages, restart and vacuum again, vacuum will pick up all the empty pages. But yes, the new FSM is always large enough to hold information about all pages in the relation, so you can no longer get table bloat because of a too small max_fsm_pages or max_fsm_relations setting. You can still get table bloat because of other reasons, like not vacuuming often enough. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Dimitri Fontaine <dfontaine@hi-media.com> writes: > Question for the slow readers: this new FSM scheme being dynamic, it's no > longer possible to have table bloat, right? > (where table bloat is full of dead-for-any-transaction tuples, and you have to > CLUSTER or VACUUM FULL to be able to reuse the space it takes) What it does mean is that after a plain old VACUUM *all* the dead space in the table will be noted and available for reuse. There isn't the risk that your max_fsm_pages is too small and some of it gets forgotten. The two headline numbers are percentage (and # bytes) of reusable free space and percentage of non-reusable free space. The former being the thing to watch to make sure you're vacuuming frequently enough and the latter being the thing to watch to understand the impact of long-running transactions. I don't think we really need to worry about the number of pages, or the "usable" in the sense of "larger than the average allocation size" measurements. They're both interesting but not as critical as the bottom-line number which is how much of the table is being occupied by dead space. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support!
Le mardi 30 septembre 2008, Heikki Linnakangas a écrit : > > You forgot the toast size. > > Yeah, pg_total_relation_size() - pg_relation_size() is not equal to the > total size of indexes because of that. Oops. Thanks for pointing this to me... > But you can do SUM(pg_relation_size(index)) across all the indexes for > that: For convenience, would it be possible to see about having all this provided by PostgreSQL?a. pg_relation_size()b. pg_relation_toast_size()c. pg_relation_fsm_size()d. pg_relation_indexes_size()e. pg_total_relation_size()= a + b + c + d Are there some other things to add in the mix? Maybe I'm the only one with the need for some simple functions covering all the base, but I thought I'd ask nevertheless :) Regards, -- dim
Dimitri Fontaine wrote: > Le mardi 30 septembre 2008, Heikki Linnakangas a écrit : >>> You forgot the toast size. >> Yeah, pg_total_relation_size() - pg_relation_size() is not equal to the >> total size of indexes because of that. > > Oops. Thanks for pointing this to me... > >> But you can do SUM(pg_relation_size(index)) across all the indexes for >> that: > > For convenience, would it be possible to see about having all this provided by > PostgreSQL? > a. pg_relation_size() > b. pg_relation_toast_size() > c. pg_relation_fsm_size() > d. pg_relation_indexes_size() > e. pg_total_relation_size() = a + b + c + d > > Are there some other things to add in the mix? Should pg_relation_indexes_size() include the FSMs of the indexes? Should pg_relation_toast_size() include the toast index and FSM as well? I fear we're going to end up with quite a lot of different combinations if we go down this path. I think we should just provide the building blocks, pg_relation_size() and pg_relation_fsm_size(), and the one total function pg_total_relation_size() that includes everything. If you're interested in the other combinations, you can call pg_relation_size() over all indexes, toast table etc. and sum them as you wish. We'll also need to consider how the scheme scales when we add more relation forks. I'm still hoping to get the DSM into this release. Perhaps we should provide an overloaded version of pg_relation_size() that takes the fork number (or name, for user-friendliness), instead of pg_relation_fsm_size(). So, you could use: pg_relation_size('footable') for size of the main data fork pg_relation_size('footable', 'fsm') for FSM size > Maybe I'm the only one with the need for some simple functions covering all > the base, but I thought I'd ask nevertheless :) What is your use case for all these, BTW? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Tue, Sep 30, 2008 at 2:53 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: >> For convenience, would it be possible to see about having all this >> provided by PostgreSQL? >> a. pg_relation_size() >> b. pg_relation_toast_size() >> c. pg_relation_fsm_size() >> d. pg_relation_indexes_size() >> e. pg_total_relation_size() = a + b + c + d >> >> Are there some other things to add in the mix? > > Should pg_relation_indexes_size() include the FSMs of the indexes? Should > pg_relation_toast_size() include the toast index and FSM as well? It might be worth revisiting the near identical discussions we had when Andreas & I integrated this stuff into the backend for 8.1. -- Dave Page EnterpriseDB UK: http://www.enterprisedb.com
On Tue, Sep 30, 2008 at 5:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
In the absence of these functions, one has to run VACUUM just to be able to see the freespace information; putting these functions in contrib/ is just asking for extra work from DBAs, who IMHO are not very comfortable with 'make; make install'.
If a feature is mainstream, so should be the functions to extract information about it.
Best regards,
--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com
EnterpriseDB http://www.enterprisedb.com
Mail sent from my BlackLaptop device
"Gurjeet Singh" <singh.gurjeet@gmail.com> writes:The old FSM was in core, too. That's not a helpful argument.
> I vote for contrib/pg_freespacemap functions to be included in the core
> since FSM is in core.
In the absence of these functions, one has to run VACUUM just to be able to see the freespace information; putting these functions in contrib/ is just asking for extra work from DBAs, who IMHO are not very comfortable with 'make; make install'.
If a feature is mainstream, so should be the functions to extract information about it.
Best regards,
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com
EnterpriseDB http://www.enterprisedb.com
Mail sent from my BlackLaptop device
Dave Page wrote: > On Tue, Sep 30, 2008 at 2:53 PM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >> Should pg_relation_indexes_size() include the FSMs of the indexes? Should >> pg_relation_toast_size() include the toast index and FSM as well? > > It might be worth revisiting the near identical discussions we had > when Andreas & I integrated this stuff into the backend for 8.1. Good point. The previous discussions evolved to having two functions, pg_relation_size() and pg_total_relation_size(), where pg_relation_size() is as fine-grained as possible, allowing you to get the size of each heap, index, toast table and toast index individually, and pg_total_relation_size() is a convenience function to sum them all. Following that philosophy, I think the idea of adding a new optional "fork name" argument to pg_relation_size() is the right thing to do: pg_relation_size('footable') for size of the main data fork pg_relation_size('footable', 'fsm') for FSM size There's currently two variants of both pg_relation_size and pg_total_relation_size, one takes an OID and one takes a relation name as argument. Any objections to having just one of each function, taking a 'regclass'? The user-visible behavior wouldn't change, but I thought I'd ask first in case I'm missing something. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Zdenek Kotala wrote: > Heikki Linnakangas napsal(a): >> The FSM is not updated during WAL replay. That means that after crash >> recovery, the FSM won't be completely up-to-date, but at roughly the >> state it was at last checkpoint. In a warm stand-by, the FSM will >> reflect the situation at last full backup. We need to think when the >> FSM should be updated during WAL replay. Probably not after every >> record, because of the overhead, but certainly more often than never. > > What's about after a page write during a WAL replay? You mean when a page is evicted from the buffer cache? That might be pretty good from performance point of view, but from a modularity point of view, the buffer manager should have no business modifying the FSM. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Thu, Oct 2, 2008 at 8:56 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: >> It might be worth revisiting the near identical discussions we had >> when Andreas & I integrated this stuff into the backend for 8.1. > > Good point. The previous discussions evolved to having two functions, > pg_relation_size() and pg_total_relation_size(), where pg_relation_size() is > as fine-grained as possible, allowing you to get the size of each heap, > index, toast table and toast index individually, and > pg_total_relation_size() is a convenience function to sum them all. > Following that philosophy, I think the idea of adding a new optional "fork > name" argument to pg_relation_size() is the right thing to do: > > pg_relation_size('footable') for size of the main data fork > pg_relation_size('footable', 'fsm') for FSM size Sounds reasonable. > There's currently two variants of both pg_relation_size and > pg_total_relation_size, one takes an OID and one takes a relation name as > argument. Any objections to having just one of each function, taking a > 'regclass'? The user-visible behavior wouldn't change, but I thought I'd ask > first in case I'm missing something. None here. -- Dave Page EnterpriseDB UK: http://www.enterprisedb.com
Le jeudi 02 octobre 2008, Heikki Linnakangas a écrit : > pg_relation_size('footable') for size of the main data fork > pg_relation_size('footable', 'fsm') for FSM size As good as possible, if you ask me! Regards, -- dim
Heikki Linnakangas napsal(a): > Zdenek Kotala wrote: >> Heikki Linnakangas napsal(a): >>> The FSM is not updated during WAL replay. That means that after crash >>> recovery, the FSM won't be completely up-to-date, but at roughly the >>> state it was at last checkpoint. In a warm stand-by, the FSM will >>> reflect the situation at last full backup. We need to think when the >>> FSM should be updated during WAL replay. Probably not after every >>> record, because of the overhead, but certainly more often than never. >> >> What's about after a page write during a WAL replay? > > You mean when a page is evicted from the buffer cache? Yes > That might be > pretty good from performance point of view, but from a modularity point > of view, the buffer manager should have no business modifying the FSM. Yeah, it is true. I suggest to try modify FMS info on each manipulation in WAL replay and if it will have performance issue we can start think about improvements. Zdenek
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > Following that philosophy, I think the idea of adding a new optional > "fork name" argument to pg_relation_size() is the right thing to do: > pg_relation_size('footable') for size of the main data fork > pg_relation_size('footable', 'fsm') for FSM size +1. Note that the second form should also accept 'main' or some such for orthogonality. > There's currently two variants of both pg_relation_size and > pg_total_relation_size, one takes an OID and one takes a relation name > as argument. Any objections to having just one of each function, taking > a 'regclass'? The user-visible behavior wouldn't change, but I thought > I'd ask first in case I'm missing something. Um, it would only not change for someone typing pg_relation_size('literal'). Something like this: select sum(pg_relation_size(relname)) from pg_class would fail for lack of an implicit cast from name to regclass. Now the above is pretty stupid --- it would be faster and more schema-safe to be passing pg_class.oid --- so maybe we don't care about breaking it. On the whole I think it's probably a good change despite possible incompatibility. regards, tom lane
On Thursday 02 October 2008 08:37:59 Tom Lane wrote: > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > > Following that philosophy, I think the idea of adding a new optional > > "fork name" argument to pg_relation_size() is the right thing to do: > > > > pg_relation_size('footable') for size of the main data fork > > pg_relation_size('footable', 'fsm') for FSM size > > +1. Note that the second form should also accept 'main' or some such > for orthogonality. > > > There's currently two variants of both pg_relation_size and > > pg_total_relation_size, one takes an OID and one takes a relation name > > as argument. Any objections to having just one of each function, taking > > a 'regclass'? The user-visible behavior wouldn't change, but I thought > > I'd ask first in case I'm missing something. > > Um, it would only not change for someone typing > pg_relation_size('literal'). Something like this: > > select sum(pg_relation_size(relname)) from pg_class > > would fail for lack of an implicit cast from name to regclass. > Now the above is pretty stupid --- it would be faster and more > schema-safe to be passing pg_class.oid --- so maybe we don't care > about breaking it. > I would be more concerned about people doing: select pg_relation_size(tablename) from pg_tables; since pg_tables is presented as a more user-friendly option to something like pg_class this might be something more widely used, plus we don't have the easy way out of just telling them to use the oid instead like we do with pg_class. -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL
Robert Treat wrote: > On Thursday 02 October 2008 08:37:59 Tom Lane wrote: >> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: >>> There's currently two variants of both pg_relation_size and >>> pg_total_relation_size, one takes an OID and one takes a relation name >>> as argument. Any objections to having just one of each function, taking >>> a 'regclass'? The user-visible behavior wouldn't change, but I thought >>> I'd ask first in case I'm missing something. >> Um, it would only not change for someone typing >> pg_relation_size('literal'). Something like this: >> >> select sum(pg_relation_size(relname)) from pg_class >> >> would fail for lack of an implicit cast from name to regclass. >> Now the above is pretty stupid --- it would be faster and more >> schema-safe to be passing pg_class.oid --- so maybe we don't care >> about breaking it. > > I would be more concerned about people doing: > > select pg_relation_size(tablename) from pg_tables; > > since pg_tables is presented as a more user-friendly option to something like > pg_class this might be something more widely used, plus we don't have the > easy way out of just telling them to use the oid instead like we do with > pg_class. That won't generally work either, because "tablename" is not schema-qualified. With a WHERE clause, maybe. I'm going go ahead with this change. If an unlucky query stops working, fixing it is just a matter of adding a cast. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Oct 2, 2008, at 10:32 AM, Robert Treat wrote: > select pg_relation_size(tablename) from pg_tables; > > since pg_tables is presented as a more user-friendly option to > something like > pg_class this might be something more widely used, plus we don't > have the > easy way out of just telling them to use the oid instead like we do > with > pg_class. It would be really nice to have the table OID in pg_tables. That was one of the driving forces behind the pg_newsysviews project. -- Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828