Thread: autovacuum and orphaned large objects
Hi, The main point of autovacuum is maintenance tasks. Currently, it executes VACUUM and ANALYZE commands. I want to propose that we incorporate vacuumlo functionality into it. While dealing with large objects (LO), we have lo contrib module that helps with LO maintenance but has some limitations (does not handle DROP TABLE and TRUNCATE cases) and vacuumlo that does an excellent job but have to be executed outside DBMS. The proposal is to clean up LO when autovacuum triggers VACUUM; cleanup LO routine will starts after(?) VACUUM command. In a near future I want to propose that orphaned LO be cleaned up by VACUUM but that a history for another thread... Comments? -- Euler Taveira de Oliveira - Timbira http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte24x7 e Treinamento
On Mon, Oct 24, 2011 at 10:25 AM, Euler Taveira de Oliveira <euler@timbira.com> wrote: > On 24-10-2011 10:57, Robert Haas wrote: >> >> I think the main reason why vacuumlo is a contrib module rather than >> in core is that it is just a heuristic, and it might not be what >> everyone wants to do. You could store a bunch of large objects in the >> database and use the returned OIDs to generate links that you email to >> users, and then when the user clicks on the link we retrieve the >> corresponding LO and send it to the user over HTTP. In that design, >> there are no tables in the database at all, yet the large objects >> aren't orphaned. >> > Uau, what a strange method to solve a problem and possibly bloat your > database. No, I'm not suggesting that we forbid it. The proposed method > could cleanup orphaned LO in 95% (if not 99%) of the use cases. > > I've never heard someone using LO like you describe it. It seems strange > that someone distributes an OID number but (s)he does not store its > reference at the same database. Yes, it is a possibility but ... I guess we could make it an optional behavior, but once you go that far then you have to wonder whether what's really needed here is a general-purpose task scheduler. I mean, the autovacuum launcher's idea about how often to vacuum the database won't necessarily match the user's idea of how often they want to vacuum away large objects - and if the user is doing something funky (like storing arrays of large object OIDs, or inexplicably storing them using numeric or int8) then putting it in the backend removes a considerable amount of flexibility. Another case where vacuumlo will fall over is if you have a very, very large table with an OID column, but with lots of duplicate values so that the number of OIDs actually referenced is much smaller. You might end up doing a table scan on the large table every time this logic kicks in, and that might suck. I'm sort of unexcited about the idea of doing a lot of engineering around this; it seems to me that the only reasons we still have a separate large object facility rather than just letting everyone go through regular tables with toastable columns are (1) the size limit is 2GB rather than 1GB and (2) you can read and write parts of objects rather than the whole thing. If we're going to do some more engineering here, I'd rather set our sights a little higher. Complaints I often hear about the large object machinery include (1) 2GB is still not enough, (2) 4 billion large objects is not enough, (3) the performance is inadequate, particularly with large numbers of large objects from possibly-unrelated subsystems slammed into a single table, and (4) it would be nice to be able to partial reads and writes on any toastable field, not just large objects. I'm not saying that the problem you're complaining about isn't worth fixing in the abstract, and if it seemed like a nice, clean fix I'd be all in favor, but I just don't think it's going to be very simple, and for the amount of work involved I'd rather get a little bit more bang for the buck. Of course, you don't have to agree with me on any of this; I'm just giving you my take on it. :-) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Euler Taveira de Oliveira <euler@timbira.com> writes: > The main point of autovacuum is maintenance tasks. Currently, it executes > VACUUM and ANALYZE commands. I want to propose that we incorporate vacuumlo > functionality into it. I'm not terribly thrilled with that because (a) large objects seem like mostly a legacy feature from here, and (b) it's hard to see how to implement it without imposing overhead on everybody, whether they use LOs or not. This is especially problematic if you're proposing that cleanup triggers not be required. regards, tom lane
On Mon, Oct 24, 2011 at 12:56 AM, Euler Taveira de Oliveira <euler@timbira.com> wrote: > The main point of autovacuum is maintenance tasks. Currently, it executes > VACUUM and ANALYZE commands. I want to propose that we incorporate vacuumlo > functionality into it. While dealing with large objects (LO), we have lo > contrib module that helps with LO maintenance but has some limitations (does > not handle DROP TABLE and TRUNCATE cases) and vacuumlo that does an > excellent job but have to be executed outside DBMS. The proposal is to clean > up LO when autovacuum triggers VACUUM; cleanup LO routine will starts > after(?) VACUUM command. > > In a near future I want to propose that orphaned LO be cleaned up by VACUUM > but that a history for another thread... > > Comments? I think the main reason why vacuumlo is a contrib module rather than in core is that it is just a heuristic, and it might not be what everyone wants to do. You could store a bunch of large objects in the database and use the returned OIDs to generate links that you email to users, and then when the user clicks on the link we retrieve the corresponding LO and send it to the user over HTTP. In that design, there are no tables in the database at all, yet the large objects aren't orphaned. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 24-10-2011 10:57, Robert Haas wrote: > I think the main reason why vacuumlo is a contrib module rather than > in core is that it is just a heuristic, and it might not be what > everyone wants to do. You could store a bunch of large objects in the > database and use the returned OIDs to generate links that you email to > users, and then when the user clicks on the link we retrieve the > corresponding LO and send it to the user over HTTP. In that design, > there are no tables in the database at all, yet the large objects > aren't orphaned. > Uau, what a strange method to solve a problem and possibly bloat your database. No, I'm not suggesting that we forbid it. The proposed method could cleanup orphaned LO in 95% (if not 99%) of the use cases. I've never heard someone using LO like you describe it. It seems strange that someone distributes an OID number but (s)he does not store its reference at the same database. Yes, it is a possibility but ... -- Euler Taveira de Oliveira - Timbira http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte24x7 e Treinamento
On 24-10-2011 11:36, Tom Lane wrote: > Euler Taveira de Oliveira<euler@timbira.com> writes: >> The main point of autovacuum is maintenance tasks. Currently, it executes >> VACUUM and ANALYZE commands. I want to propose that we incorporate vacuumlo >> functionality into it. > > I'm not terribly thrilled with that because (a) large objects seem like > mostly a legacy feature from here, and> Right, but there isn't a solution for > 1 GB column data besides LO. > (b) it's hard to see how to > implement it without imposing overhead on everybody, whether they use > LOs or not. This is especially problematic if you're proposing that > cleanup triggers not be required. > I was thinking about starting the LO cleanup after autovacuum finishes the VACUUM command (so no trigger, no new mechanism). And about the overhead imposed, it will only execute the cleanup code in the tables that have oid/lo columns (this information will be collected when the autovacuum collects table information). -- Euler Taveira de Oliveira - Timbira http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte24x7 e Treinamento