Re: Cleaning up unreferenced table files - Mailing list pgsql-patches
From | Bruce Momjian |
---|---|
Subject | Re: Cleaning up unreferenced table files |
Date | |
Msg-id | 200505021830.j42IU3l21583@candle.pha.pa.us Whole thread Raw |
In response to | Cleaning up unreferenced table files (Heikki Linnakangas <hlinnaka@iki.fi>) |
Responses |
Re: Cleaning up unreferenced table files
|
List | pgsql-patches |
Applied. --------------------------------------------------------------------------- pgman wrote: > Tom Lane wrote: > > Heikki Linnakangas <hlinnaka@iki.fi> writes: > > > On Tue, 26 Apr 2005, Tom Lane wrote: > > >> Heikki Linnakangas <hlinnaka@iki.fi> writes: > > >>> I feel that crashes that leaves behind stale files are rare. > > >> > > >> Indeed, and getting more so all the time ... > > > > > How so? Have changes been made in those parts of the code? > > > > No, just that the overall reliability of Postgres keeps improving. > > > > At the time that TODO entry was created, I don't think we even had the > > ability to roll back table creates/drops properly, so there were > > scenarios in which unreferenced files could be left behind without even > > assuming any software error. And the prevalence of backend crashes was > > way higher than it is now, too. So I think a good argument could be > > made that the TODO item isn't nearly as important as it was at the time. > > Agreed, but I don't think we are ever going to have the file system obey > the same semantics as the database, so there will always be corner cases > where we have the possibility for unreferenced files. > > > > If nobody ever runs into this issue in production, and this whole exercise > > > turns out to be completely unnecessary, at least we'll know. That alone > > > makes me feel better. > > > > We will know no such thing, unless the patch is made to announce the > > problem so intrusively that people are certain to see it *and report it > > to us*. Which I don't think will be acceptable. > > You are right that checking only after a server crash isn't going to > help us very much, because the odds someone is going to look at the logs > just at that moment are slim, and if they stop and start the server > again, the message will not appear so they will think it is fixed. > > Though the problem is caused by a server crash, it remains even after a > clean startup. My new version of the patch checks not just after a > server crash, but every time the server starts up. Here are some > timings: > > 3dbs without patch 0m0.230s 300 files > 3dbs with patch 0m0.230s 300 files > 100dbs with patch 0m0.270s 10000 files > > (The timing test started the server and waited for the message "database > system is ready".) > > You can see that for three empty databases (only system tables/indexes), > the patch has no impact. For 100 databases, or 10,000 files, there is a > 17% increase in startup time, which seems acceptable. > > The message printed in the logs is: > > LOG: The table or index file "/u/pgsql/data/base/1/100001" is stale and can be safely removed > > I made a few changes to the patch. I changed 'clean' to 'check' because > I don't think we are ever going to remove the files on startup (If we > did we could do the cleanup just on crash recovery startup.) I changed > the message from NOTICE to LOG, I made it run on all startups and not > just crash recovery, and it was missing FreeDir() calls. > > -- > Bruce Momjian | http://candle.pha.pa.us > pgman@candle.pha.pa.us | (610) 359-1001 > + If your life is a hard drive, | 13 Roberts Road > + Christ can be your backup. | Newtown Square, Pennsylvania 19073 > Index: doc/src/sgml/maintenance.sgml > =================================================================== > RCS file: /cvsroot/pgsql/doc/src/sgml/maintenance.sgml,v > retrieving revision 1.41 > diff -c -c -r1.41 maintenance.sgml > *** doc/src/sgml/maintenance.sgml 20 Feb 2005 02:21:26 -0000 1.41 > --- doc/src/sgml/maintenance.sgml 1 May 2005 04:30:48 -0000 > *************** > *** 474,479 **** > --- 474,496 ---- > </para> > </sect1> > > + <sect1 id="check-files-after-crash"> > + <title>Check files after crash</title> > + > + <indexterm zone="check-files-after-crash"> > + <primary>stale file</primary> > + </indexterm> > + > + <para> > + <productname>PostgreSQL</productname> recovers automatically after crash > + using the write-ahead log (see <xref linkend="wal">) and no manual > + operations are normally needed. However, if there was a transaction running > + when the crash occured that created or dropped a relation, the > + transaction might have left a stale file in the data directory. If this > + happens, you will get a notice in the log file stating which files can be > + deleted. > + </para> > + </sect1> > > <sect1 id="logfile-maintenance"> > <title>Log File Maintenance</title> > Index: src/backend/access/transam/xlog.c > =================================================================== > RCS file: /cvsroot/pgsql/src/backend/access/transam/xlog.c,v > retrieving revision 1.189 > diff -c -c -r1.189 xlog.c > *** src/backend/access/transam/xlog.c 28 Apr 2005 21:47:10 -0000 1.189 > --- src/backend/access/transam/xlog.c 1 May 2005 04:30:52 -0000 > *************** > *** 43,48 **** > --- 43,49 ---- > #include "utils/builtins.h" > #include "utils/guc.h" > #include "utils/relcache.h" > + #include "utils/flatfiles.h" > > > /* > *************** > *** 4525,4530 **** > --- 4526,4533 ---- > > CreateCheckPoint(true, true); > > + CheckStaleRelFiles(); > + > /* > * Close down recovery environment > */ > *************** > *** 4536,4541 **** > --- 4539,4550 ---- > */ > remove_backup_label(); > } > + else > + { > + XLogInitRelationCache(); > + CheckStaleRelFiles(); > + XLogCloseRelationCache(); > + } > > /* > * Preallocate additional log files, if wanted. > Index: src/backend/catalog/catalog.c > =================================================================== > RCS file: /cvsroot/pgsql/src/backend/catalog/catalog.c,v > retrieving revision 1.59 > diff -c -c -r1.59 catalog.c > *** src/backend/catalog/catalog.c 14 Apr 2005 20:03:23 -0000 1.59 > --- src/backend/catalog/catalog.c 1 May 2005 04:30:52 -0000 > *************** > *** 106,111 **** > --- 106,144 ---- > return path; > } > > + /* > + * GetTablespacePath - construct path to a tablespace symbolic link > + * > + * Result is a palloc'd string. > + * > + * XXX this must agree with relpath and GetDatabasePath! > + */ > + char * > + GetTablespacePath(Oid spcNode) > + { > + int pathlen; > + char *path; > + > + Assert(spcNode != GLOBALTABLESPACE_OID); > + > + if (spcNode == DEFAULTTABLESPACE_OID) > + { > + /* The default tablespace is {datadir}/base */ > + pathlen = strlen(DataDir) + 5 + 1; > + path = (char *) palloc(pathlen); > + snprintf(path, pathlen, "%s/base", > + DataDir); > + } > + else > + { > + /* All other tablespaces have symlinks in pg_tblspc */ > + pathlen = strlen(DataDir) + 11 + OIDCHARS + 1; > + path = (char *) palloc(pathlen); > + snprintf(path, pathlen, "%s/pg_tblspc/%u", > + DataDir, spcNode); > + } > + return path; > + } > > /* > * IsSystemRelation > Index: src/backend/commands/tablespace.c > =================================================================== > RCS file: /cvsroot/pgsql/src/backend/commands/tablespace.c,v > retrieving revision 1.17 > diff -c -c -r1.17 tablespace.c > *** src/backend/commands/tablespace.c 14 Apr 2005 20:03:24 -0000 1.17 > --- src/backend/commands/tablespace.c 1 May 2005 04:30:53 -0000 > *************** > *** 341,348 **** > /* > * All seems well, create the symlink > */ > ! linkloc = (char *) palloc(strlen(DataDir) + 11 + 10 + 1); > ! sprintf(linkloc, "%s/pg_tblspc/%u", DataDir, tablespaceoid); > > if (symlink(location, linkloc) < 0) > ereport(ERROR, > --- 341,347 ---- > /* > * All seems well, create the symlink > */ > ! linkloc = GetTablespacePath(tablespaceoid); > > if (symlink(location, linkloc) < 0) > ereport(ERROR, > *************** > *** 495,502 **** > char *subfile; > struct stat st; > > ! location = (char *) palloc(strlen(DataDir) + 11 + 10 + 1); > ! sprintf(location, "%s/pg_tblspc/%u", DataDir, tablespaceoid); > > /* > * Check if the tablespace still contains any files. We try to rmdir > --- 494,500 ---- > char *subfile; > struct stat st; > > ! location = GetTablespacePath(tablespaceoid); > > /* > * Check if the tablespace still contains any files. We try to rmdir > *************** > *** 1036,1043 **** > set_short_version(location); > > /* Create the symlink if not already present */ > ! linkloc = (char *) palloc(strlen(DataDir) + 11 + 10 + 1); > ! sprintf(linkloc, "%s/pg_tblspc/%u", DataDir, xlrec->ts_id); > > if (symlink(location, linkloc) < 0) > { > --- 1034,1040 ---- > set_short_version(location); > > /* Create the symlink if not already present */ > ! linkloc = GetTablespacePath(xlrec->ts_id); > > if (symlink(location, linkloc) < 0) > { > Index: src/backend/utils/adt/misc.c > =================================================================== > RCS file: /cvsroot/pgsql/src/backend/utils/adt/misc.c,v > retrieving revision 1.40 > diff -c -c -r1.40 misc.c > *** src/backend/utils/adt/misc.c 31 Dec 2004 22:01:22 -0000 1.40 > --- src/backend/utils/adt/misc.c 1 May 2005 04:30:53 -0000 > *************** > *** 26,31 **** > --- 26,32 ---- > #include "funcapi.h" > #include "catalog/pg_type.h" > #include "catalog/pg_tablespace.h" > + #include "catalog/catalog.h" > > #define atooid(x) ((Oid) strtoul((x), NULL, 10)) > > *************** > *** 144,154 **** > > fctx = palloc(sizeof(ts_db_fctx)); > > - /* > - * size = path length + tablespace dirname length + 2 dir sep > - * chars + oid + terminator > - */ > - fctx->location = (char *) palloc(strlen(DataDir) + 11 + 10 + 1); > if (tablespaceOid == GLOBALTABLESPACE_OID) > { > fctx->dirdesc = NULL; > --- 145,150 ---- > *************** > *** 157,168 **** > } > else > { > ! if (tablespaceOid == DEFAULTTABLESPACE_OID) > ! sprintf(fctx->location, "%s/base", DataDir); > ! else > ! sprintf(fctx->location, "%s/pg_tblspc/%u", DataDir, > ! tablespaceOid); > ! > fctx->dirdesc = AllocateDir(fctx->location); > > if (!fctx->dirdesc) > --- 153,159 ---- > } > else > { > ! fctx->location = GetTablespacePath(tablespaceOid); > fctx->dirdesc = AllocateDir(fctx->location); > > if (!fctx->dirdesc) > Index: src/backend/utils/init/Makefile > =================================================================== > RCS file: /cvsroot/pgsql/src/backend/utils/init/Makefile,v > retrieving revision 1.18 > diff -c -c -r1.18 Makefile > *** src/backend/utils/init/Makefile 20 Feb 2005 02:22:00 -0000 1.18 > --- src/backend/utils/init/Makefile 1 May 2005 04:30:53 -0000 > *************** > *** 12,18 **** > top_builddir = ../../../.. > include $(top_builddir)/src/Makefile.global > > ! OBJS = flatfiles.o globals.o miscinit.o postinit.o > > all: SUBSYS.o > > --- 12,18 ---- > top_builddir = ../../../.. > include $(top_builddir)/src/Makefile.global > > ! OBJS = flatfiles.o globals.o miscinit.o postinit.o checkfiles.o > > all: SUBSYS.o > > Index: src/include/catalog/catalog.h > =================================================================== > RCS file: /cvsroot/pgsql/src/include/catalog/catalog.h,v > retrieving revision 1.30 > diff -c -c -r1.30 catalog.h > *** src/include/catalog/catalog.h 31 Dec 2004 22:03:24 -0000 1.30 > --- src/include/catalog/catalog.h 1 May 2005 04:30:54 -0000 > *************** > *** 19,24 **** > --- 19,25 ---- > > extern char *relpath(RelFileNode rnode); > extern char *GetDatabasePath(Oid dbNode, Oid spcNode); > + extern char *GetTablespacePath(Oid spcNode); > > extern bool IsSystemRelation(Relation relation); > extern bool IsToastRelation(Relation relation); > Index: src/include/utils/flatfiles.h > =================================================================== > RCS file: /cvsroot/pgsql/src/include/utils/flatfiles.h,v > retrieving revision 1.1 > diff -c -c -r1.1 flatfiles.h > *** src/include/utils/flatfiles.h 20 Feb 2005 02:22:07 -0000 1.1 > --- src/include/utils/flatfiles.h 1 May 2005 04:30:54 -0000 > *************** > *** 30,33 **** > --- 30,36 ---- > > extern Datum flatfile_update_trigger(PG_FUNCTION_ARGS); > > + /* from checkfiles.c */ > + extern void CheckStaleRelFiles(void); > + > #endif /* FLATFILES_H */ > /*------------------------------------------------------------------------- > * > * checkfiles.c > * support to clean up stale relation files on crash recovery > * > * If a backend crashes while in a transaction that has created or > * deleted a relfilenode, a stale file can be left over in the data > * directory. This file contains routines to clean up those stale > * files on recovery. > * > * $PostgreSQL$ > * > *------------------------------------------------------------------------- > */ > #include "postgres.h" > > #include "storage/fd.h" > > #include "utils/flatfiles.h" > #include "miscadmin.h" > #include "catalog/pg_tablespace.h" > #include "catalog/catalog.h" > #include "access/skey.h" > #include "utils/fmgroids.h" > #include "access/relscan.h" > #include "access/heapam.h" > #include "utils/resowner.h" > > static void CheckStaleRelFilesFrom(Oid tablespaceoid, Oid dboid); > static void CheckStaleRelFilesFromTablespace(Oid tablespaceoid); > > /* Like AllocateDir, but ereports on failure */ > static DIR * > AllocateDirChecked(char *path) > { > DIR *dirdesc = AllocateDir(path); > if (dirdesc == NULL) > ereport(ERROR, > (errcode_for_file_access(), > errmsg("could not open directory \"%s\": %m", > path))); > return dirdesc; > } > > /* > * Scan through all tablespaces for relations left over > * by aborted transactions. > * > * For example, if a transaction issues > * BEGIN; CREATE TABLE foobar (); > * and then the backend crashes, the file is left in the > * tablespace until CheckStaleRelFiles deletes it. > */ > void > CheckStaleRelFiles(void) > { > DIR *dirdesc; > struct dirent *de; > char *path; > int pathlen; > > pathlen = strlen(DataDir) + 11 + 1; > path = (char *) palloc(pathlen); > snprintf(path, pathlen, "%s/pg_tblspc/", DataDir); > dirdesc = AllocateDirChecked(path); > while ((de = readdir(dirdesc)) != NULL) > { > char *invalid; > Oid tablespaceoid; > > /* Check that the directory name looks like valid tablespace link. */ > tablespaceoid = (Oid) strtol(de->d_name, &invalid, 10); > if(invalid[0] == '\0') > CheckStaleRelFilesFromTablespace(tablespaceoid); > } > FreeDir(dirdesc); > pfree(path); > > CheckStaleRelFilesFromTablespace(DEFAULTTABLESPACE_OID); > } > > /* Scan a specific tablespace for stale relations */ > static void > CheckStaleRelFilesFromTablespace(Oid tablespaceoid) > { > DIR *dirdesc; > struct dirent *de; > char *path; > > path = GetTablespacePath(tablespaceoid); > > dirdesc = AllocateDirChecked(path); > while ((de = readdir(dirdesc)) != NULL) > { > char *invalid; > Oid dboid; > > dboid = (Oid) strtol(de->d_name, &invalid, 10); > if(invalid[0] == '\0') > CheckStaleRelFilesFrom(tablespaceoid, dboid); > } > FreeDir(dirdesc); > pfree(path); > } > > /* Scan a specific database in a specific tablespace for stale relations. > * > * First, pg_class for the database is opened, and the relfilenodes of all > * relations mentioned there are stored in a hash table. > * > * Then the directory is scanned. Every file in the directory that's not > * found in pg_class (the hash table) is logged. > */ > static void > CheckStaleRelFilesFrom(Oid tablespaceoid, Oid dboid) > { > DIR *dirdesc; > struct dirent *de; > HASHCTL hashctl; > HTAB *relfilenodeHash; > MemoryContext mcxt; > RelFileNode rnode; > char *path; > > /* We create a private memory context so that we can easily deallocate > * the hash table and its contents > */ > mcxt = AllocSetContextCreate(TopMemoryContext, "CheckStaleRelFiles", > ALLOCSET_DEFAULT_MINSIZE, > ALLOCSET_DEFAULT_INITSIZE, > ALLOCSET_DEFAULT_MAXSIZE); > > hashctl.hash = tag_hash; > /* The entry contents is not used for anything, we just check > * if an oid is in the hash table or not. */ > hashctl.keysize = sizeof(Oid); > hashctl.entrysize = 1; > hashctl.hcxt = mcxt; > relfilenodeHash = hash_create("relfilenodeHash", 100, &hashctl, > HASH_FUNCTION > | HASH_ELEM | HASH_CONTEXT); > > /* Read all relfilenodes from pg_class into the hash table */ > { > ResourceOwner owner, oldowner; > Relation rel; > HeapScanDesc scan; > HeapTuple tuple; > > /* Need a resowner to keep the heapam and buffer code happy */ > owner = ResourceOwnerCreate(NULL, "CheckStaleRelFiles"); > oldowner = CurrentResourceOwner; > CurrentResourceOwner = owner; > > rnode.spcNode = tablespaceoid; > rnode.dbNode = dboid; > rnode.relNode = RelationRelationId; > rel = XLogOpenRelation(true, 0 , rnode); > > scan = heap_beginscan(rel, SnapshotNow, 0, NULL); > while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL) > { > Form_pg_class classform = (Form_pg_class) GETSTRUCT(tuple); > > hash_search(relfilenodeHash, &classform->relfilenode, > HASH_ENTER, NULL); > } > heap_endscan(scan); > > XLogCloseRelation(rnode); > CurrentResourceOwner = oldowner; > ResourceOwnerDelete(owner); > } > > /* Scan the directory */ > path = GetDatabasePath(dboid, tablespaceoid); > > dirdesc = AllocateDirChecked(path); > while ((de = readdir(dirdesc)) != NULL) > { > char *invalid; > Oid relfilenode; > > relfilenode = strtol(de->d_name, &invalid, 10); > if(invalid[0] == '\0') > { > /* Filename was a valid number, check if pg_class knows > about it */ > if(hash_search(relfilenodeHash, &relfilenode, > HASH_FIND, NULL) == NULL) > { > char *filepath; > rnode.spcNode = tablespaceoid; > rnode.dbNode = dboid; > rnode.relNode = relfilenode; > > filepath = relpath(rnode); > > ereport(LOG, > (errcode_for_file_access(), > errmsg("The table or index file \"%s\" is stale and can be safely removed", > filepath))); > pfree(filepath); > } > } > } > FreeDir(dirdesc); > pfree(path); > hash_destroy(relfilenodeHash); > MemoryContextDelete(mcxt); > } -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
pgsql-patches by date: