Thread: autovacuum and orphaned large objects

autovacuum and orphaned large objects

From

Euler Taveira de Oliveira

Date:

24 October 2011, 01:58:49

Hi,

The main point of autovacuum is maintenance tasks. Currently, it executes 
VACUUM and ANALYZE commands. I want to propose that we incorporate vacuumlo 
functionality into it. While dealing with large objects (LO), we have lo 
contrib module that helps with LO maintenance but has some limitations (does 
not handle DROP TABLE and TRUNCATE cases) and vacuumlo that does an excellent 
job but have to be executed outside DBMS. The proposal is to clean up LO when 
autovacuum triggers VACUUM; cleanup LO routine will starts after(?) VACUUM 
command.

In a near future I want to propose that orphaned LO be cleaned up by VACUUM 
but that a history for another thread...

Comments?


--    Euler Taveira de Oliveira - Timbira       http://www.timbira.com.br/   PostgreSQL: Consultoria, Desenvolvimento,
Suporte24x7 e Treinamento

Re: autovacuum and orphaned large objects

From

Robert Haas

Date:

24 October 2011, 13:35:20

On Mon, Oct 24, 2011 at 10:25 AM, Euler Taveira de Oliveira
<euler@timbira.com> wrote:
> On 24-10-2011 10:57, Robert Haas wrote:
>>
>> I think the main reason why vacuumlo is a contrib module rather than
>> in core is that it is just a heuristic, and it might not be what
>> everyone wants to do.  You could store a bunch of large objects in the
>> database and use the returned OIDs to generate links that you email to
>> users, and then when the user clicks on the link we retrieve the
>> corresponding LO and send it to the user over HTTP.  In that design,
>> there are no tables in the database at all, yet the large objects
>> aren't orphaned.
>>
> Uau, what a strange method to solve a problem and possibly bloat your
> database. No, I'm not suggesting that we forbid it. The proposed method
> could cleanup orphaned LO in 95% (if not 99%) of the use cases.
>
> I've never heard someone using LO like you describe it. It seems strange
> that someone distributes an OID number but (s)he does not store its
> reference at the same database. Yes, it is a possibility but ...

I guess we could make it an optional behavior, but once you go that
far then you have to wonder whether what's really needed here is a
general-purpose task scheduler.  I mean, the autovacuum launcher's
idea about how often to vacuum the database won't necessarily match
the user's idea of how often they want to vacuum away large objects -
and if the user is doing something funky (like storing arrays of large
object OIDs, or inexplicably storing them using numeric or int8) then
putting it in the backend removes a considerable amount of
flexibility.  Another case where vacuumlo will fall over is if you
have a very, very large table with an OID column, but with lots of
duplicate values so that the number of OIDs actually referenced is
much smaller.  You might end up doing a table scan on the large table
every time this logic kicks in, and that might suck.

I'm sort of unexcited about the idea of doing a lot of engineering
around this; it seems to me that the only reasons we still have a
separate large object facility rather than just letting everyone go
through regular tables with toastable columns are (1) the size limit
is 2GB rather than 1GB and (2) you can read and write parts of objects
rather than the whole thing.  If we're going to do some more
engineering here, I'd rather set our sights a little higher.
Complaints I often hear about the large object machinery include (1)
2GB is still not enough, (2) 4 billion large objects is not enough,
(3) the performance is inadequate, particularly with large numbers of
large objects from possibly-unrelated subsystems slammed into a single
table, and (4) it would be nice to be able to partial reads and writes
on any toastable field, not just large objects.  I'm not saying that
the problem you're complaining about isn't worth fixing in the
abstract, and if it seemed like a nice, clean fix I'd be all in favor,
but I just don't think it's going to be very simple, and for the
amount of work involved I'd rather get a little bit more bang for the
buck.

Of course, you don't have to agree with me on any of this; I'm just
giving you my take on it.  :-)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: autovacuum and orphaned large objects

From

Tom Lane

Date:

24 October 2011, 13:36:26

Euler Taveira de Oliveira <euler@timbira.com> writes:
> The main point of autovacuum is maintenance tasks. Currently, it executes 
> VACUUM and ANALYZE commands. I want to propose that we incorporate vacuumlo 
> functionality into it.

I'm not terribly thrilled with that because (a) large objects seem like
mostly a legacy feature from here, and (b) it's hard to see how to
implement it without imposing overhead on everybody, whether they use
LOs or not.  This is especially problematic if you're proposing that
cleanup triggers not be required.
        regards, tom lane

Re: autovacuum and orphaned large objects

From

Robert Haas

Date:

24 October 2011, 13:54:57

On Mon, Oct 24, 2011 at 12:56 AM, Euler Taveira de Oliveira
<euler@timbira.com> wrote:
> The main point of autovacuum is maintenance tasks. Currently, it executes
> VACUUM and ANALYZE commands. I want to propose that we incorporate vacuumlo
> functionality into it. While dealing with large objects (LO), we have lo
> contrib module that helps with LO maintenance but has some limitations (does
> not handle DROP TABLE and TRUNCATE cases) and vacuumlo that does an
> excellent job but have to be executed outside DBMS. The proposal is to clean
> up LO when autovacuum triggers VACUUM; cleanup LO routine will starts
> after(?) VACUUM command.
>
> In a near future I want to propose that orphaned LO be cleaned up by VACUUM
> but that a history for another thread...
>
> Comments?

I think the main reason why vacuumlo is a contrib module rather than
in core is that it is just a heuristic, and it might not be what
everyone wants to do.  You could store a bunch of large objects in the
database and use the returned OIDs to generate links that you email to
users, and then when the user clicks on the link we retrieve the
corresponding LO and send it to the user over HTTP.  In that design,
there are no tables in the database at all, yet the large objects
aren't orphaned.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: autovacuum and orphaned large objects

From

Euler Taveira de Oliveira

Date:

24 October 2011, 14:13:26

On 24-10-2011 10:57, Robert Haas wrote:
> I think the main reason why vacuumlo is a contrib module rather than
> in core is that it is just a heuristic, and it might not be what
> everyone wants to do.  You could store a bunch of large objects in the
> database and use the returned OIDs to generate links that you email to
> users, and then when the user clicks on the link we retrieve the
> corresponding LO and send it to the user over HTTP.  In that design,
> there are no tables in the database at all, yet the large objects
> aren't orphaned.
>
Uau, what a strange method to solve a problem and possibly bloat your 
database. No, I'm not suggesting that we forbid it. The proposed method could 
cleanup orphaned LO in 95% (if not 99%) of the use cases.

I've never heard someone using LO like you describe it. It seems strange that 
someone distributes an OID number but (s)he does not store its reference at 
the same database. Yes, it is a possibility but ...


--    Euler Taveira de Oliveira - Timbira       http://www.timbira.com.br/   PostgreSQL: Consultoria, Desenvolvimento,
Suporte24x7 e Treinamento

Re: autovacuum and orphaned large objects

From

Euler Taveira de Oliveira

Date:

24 October 2011, 14:18:24

On 24-10-2011 11:36, Tom Lane wrote:
> Euler Taveira de Oliveira<euler@timbira.com>  writes:
>> The main point of autovacuum is maintenance tasks. Currently, it executes
>> VACUUM and ANALYZE commands. I want to propose that we incorporate vacuumlo
>> functionality into it.
>
> I'm not terribly thrilled with that because (a) large objects seem like
> mostly a legacy feature from here, and>
Right, but there isn't a solution for > 1 GB column data besides LO.

> (b) it's hard to see how to
> implement it without imposing overhead on everybody, whether they use
> LOs or not.  This is especially problematic if you're proposing that
> cleanup triggers not be required.
>
I was thinking about starting the LO cleanup after autovacuum finishes the 
VACUUM command (so no trigger, no new mechanism). And about the overhead 
imposed, it will only execute the cleanup code in the tables that have oid/lo 
columns (this information will be collected when the autovacuum collects table 
information).


--    Euler Taveira de Oliveira - Timbira       http://www.timbira.com.br/   PostgreSQL: Consultoria, Desenvolvimento,
Suporte24x7 e Treinamento