Thread: Bug? relpages, reltuples resets to zero
Hi, there seems to be a problem with the relation statistics in pg_class. Could someone explain why this happens? doc=> vacuum; VACUUM doc=> select relname, relpages, reltuples from pg_class doc-> where relname = 'doc_wordref'; relname |relpages|reltuples -----------+--------+--------- doc_wordref| 1099| 136027 (1 row) -- ******** That's right doc=> explain select distinct refpage from doc_wordref doc-> where refword ~ '^a'; NOTICE: QUERY PLAN: Unique -> Sort -> Index Scan using doc_wordref_word_idx on doc_wordref EXPLAIN -- ******** As expected doc=> select distinct refpage from doc_wordref doc-> where refword ~ '^a'; refpage ------- 2 3 ... (164 rows) doc=> select relname, relpages, reltuples from pg_class doc-> where relname = 'doc_wordref'; relname |relpages|reltuples -----------+--------+--------- doc_wordref| 0| 0 (1 row) -- ******** Ooops - where are they gone? doc=> explain select distinct refpage from doc_wordref dos-> where refword ~ '^a'; NOTICE: QUERY PLAN: Unique -> Sort -> Index Scan using doc_wordref_word_idx on doc_wordref -- ******** Doesn't matter in the same connection, so reconnect EXPLAIN doc=> \c - connecting to new database: doc doc=> explain select distinct refpage from doc_wordref dos-> where refword ~ '^a'; NOTICE: QUERY PLAN: Unique -> Sort -> Seq Scan on doc_wordref -- ******** Boom EXPLAIN doc=> Why does the SELECT throw away the information about the number of pages and tuples from pg_class? Is this a bug or a feature? If it's a feature, how can I disable it? It is reproduceable. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #======================================== jwieck@debis.com (Jan Wieck) #
> Hi, > > there seems to be a problem with the relation statistics in > pg_class. Could someone explain why this happens? > > doc=> vacuum; > VACUUM > doc=> select relname, relpages, reltuples from pg_class > doc-> where relname = 'doc_wordref'; > doc=> select relname, relpages, reltuples from pg_class > doc-> where relname = 'doc_wordref'; > relname |relpages|reltuples > -----------+--------+--------- > doc_wordref| 0| 0 > (1 row) > > -- ******** Ooops - where are they gone? > > doc=> explain select distinct refpage from doc_wordref > dos-> where refword ~ '^a'; > NOTICE: QUERY PLAN: I have seen the optimizer stop using indexes, but could never reproduce it, and hoped my mega-patch would have fix it. My only guess is that vacuum has changed the buffer cache copy of the pg_class tuple, but did not mark it as dirty, so it was not written back out when removed from the buffer cache. When reloaded after the query, the buffer cache is loaded from the disk copy, and the disk copy has zeros, because the vacuum copy was not written to disk. The active code is in vacuum.c::vc_updstats: /* XXX -- after write, should invalidate relcache in other backends */ WriteNoReleaseBuffer(ItemPointerGetBlockNumber(&rtup->t_ctid)); RelationInvalidateHeapTuple(rd, rtup); This should be marking the buffer as dirty and written out the buffer to disk, so when it gets reloaded, it has the new vacuum statatistics. It is also invalidating the catalog cache, so that doesn't get used for stats. The code looks fine to me. I can't figure out why that would happen. Can you try replacing WriteNoReleaseBuffer() with WriteBuffer() and see if that fixes it. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> Hi, > > there seems to be a problem with the relation statistics in > pg_class. Could someone explain why this happens? > > doc=> vacuum; > VACUUM > doc=> select relname, relpages, reltuples from pg_class > doc-> where relname = 'doc_wordref'; > relname |relpages|reltuples > -----------+--------+--------- > doc_wordref| 1099| 136027 > (1 row) > > -- ******** That's right > > doc=> explain select distinct refpage from doc_wordref > doc-> where refword ~ '^a'; > NOTICE: QUERY PLAN: > Attached is a patch to vacuum.c I would like you to try. I wonder if dirty pages are somehow never getting written out because of vacuum's use of transactions. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 *** ./backend/commands/vacuum.c.orig Thu Oct 22 17:14:37 1998 --- ./backend/commands/vacuum.c Thu Oct 22 17:15:35 1998 *************** *** 1875,1886 **** heap_close(sd); } - /* XXX -- after write, should invalidate relcache in other backends */ - WriteNoReleaseBuffer(ItemPointerGetBlockNumber(&rtup->t_ctid)); - RelationInvalidateHeapTuple(rd, rtup); ! ReleaseBuffer(buffer); heap_close(rd); } --- 1875,1885 ---- heap_close(sd); } RelationInvalidateHeapTuple(rd, rtup); ! /* XXX -- after write, should invalidate relcache in other backends */ ! WriteBuffer(ItemPointerGetBlockNumber(&rtup->t_ctid)); ! heap_close(rd); }
Bruce Momjian wrote: > I have seen the optimizer stop using indexes, but could never reproduce > it, and hoped my mega-patch would have fix it. > > My only guess is that vacuum has changed the buffer cache copy of the > pg_class tuple, but did not mark it as dirty, so it was not written back > out when removed from the buffer cache. When reloaded after the query, > the buffer cache is loaded from the disk copy, and the disk copy has > zeros, because the vacuum copy was not written to disk. > > The active code is in vacuum.c::vc_updstats: Your guess was right, thanks. But your solution does not work :-( I found a way to easily reproduce the error. Run VACUUM Restart postmaster -> relpages and reltuples gone I think the simple way of modifying the tuple in the page does not work. I found that catalog/index.c does the same for relpages and reltuples in UpdateStats(). Calling UpdateStats() after vc_updstats() as a quick hack solved the problem. I'm now cvsup'ing, then I'll modify vacuum.c to do it the same way as index.c does it. I don't know if calling UpdateStats() instead is really a good idea, because vacuum potentially truncates files and UpdateStats() does RelationGetNumberOfBlocks() instead of getting it by argument. This might be wrong at that time. Let's see what happens when vacuum does it harder. Later, Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #======================================== jwieck@debis.com (Jan Wieck) #
> I think the simple way of modifying the tuple in the page > does not work. I found that catalog/index.c does the same for > relpages and reltuples in UpdateStats(). Calling > UpdateStats() after vc_updstats() as a quick hack solved the > problem. And now I know the way of modifying them in the buffer is the only one to succeed. Doing it like index.c with heap_replace() irritates vacuum itself. Sometimes it's good to check returncodes :-). WriteNoReleaseBuffer() (as it's name says) takes a buffer number as argument, not a disk block number. Thus, it returned FALSE sometimes when called to write buffer # 0. The attdisbursion is also permanently saved on disk now. It had the same problem. Anyway, the fix is below. Patch is regression tested. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #======================================== jwieck@debis.com (Jan Wieck) # *** vacuum.c.orig Fri Oct 23 16:47:21 1998 --- vacuum.c Fri Oct 23 16:41:56 1998 *************** *** 1792,1799 **** --- 1792,1807 ---- /* overwrite the existing statistics in the tuple */ if (VacAttrStatsEqValid(stats)) { + Buffer abuffer; + /* + * We manipulate the heap tuple in the + * buffer, so we fetch it to get the + * buffer number + */ + atup = heap_fetch(ad, SnapshotNow, &atup->t_ctid, &abuffer); vc_setpagelock(ad, ItemPointerGetBlockNumber(&atup->t_ctid)); + attp = (Form_pg_attribute) GETSTRUCT(atup); if (stats->nonnull_cnt + stats->null_cnt == 0 || (stats->null_cnt <= 1 && stats->best_cnt == 1)) *************** *** 1822,1828 **** if (selratio > 1.0) selratio = 1.0; attp->attdisbursion = selratio; ! WriteNoReleaseBuffer(ItemPointerGetBlockNumber(&atup->t_ctid)); /* DO PG_STATISTIC INSERTS */ --- 1830,1843 ---- if (selratio > 1.0) selratio = 1.0; attp->attdisbursion = selratio; ! ! /* ! * Invalidate the cache for the tuple ! * and write the buffer ! */ ! RelationInvalidateHeapTuple(ad, atup); ! WriteNoReleaseBuffer(abuffer); ! ReleaseBuffer(abuffer); /* DO PG_STATISTIC INSERTS */ *************** *** 1875,1884 **** heap_close(sd); } RelationInvalidateHeapTuple(rd, rtup); ! /* XXX -- after write, should invalidate relcache in other backends */ ! WriteBuffer(ItemPointerGetBlockNumber(&rtup->t_ctid)); heap_close(rd); } --- 1890,1904 ---- heap_close(sd); } + /* + * Invalidate the cached pg_class tuple and + * write the buffer + */ RelationInvalidateHeapTuple(rd, rtup); ! WriteNoReleaseBuffer(buffer); ! ! ReleaseBuffer(buffer); heap_close(rd); }
> > I think the simple way of modifying the tuple in the page > > does not work. I found that catalog/index.c does the same for > > relpages and reltuples in UpdateStats(). Calling > > UpdateStats() after vc_updstats() as a quick hack solved the > > problem. > > And now I know the way of modifying them in the buffer is the > only one to succeed. Doing it like index.c with > heap_replace() irritates vacuum itself. > > Sometimes it's good to check returncodes :-). > WriteNoReleaseBuffer() (as it's name says) takes a buffer > number as argument, not a disk block number. Thus, it > returned FALSE sometimes when called to write buffer # 0. > > The attdisbursion is also permanently saved on disk now. It > had the same problem. > > Anyway, the fix is below. > > Patch is regression tested. Applied. Thanks. I am glad you were able to find the actual cause, rather than calling that UpdateStats function. Vacuum is too strange in the way it modifies things, and calling a general-purpose function that calls all sorts of other functions could be a problem. Of course, you are right. I was passing in a block number, rather than a buffer number. I checked 6.3.2, and I had it right there, so somehow I got confused in the megapatch on that item. I just checked the mega-patch, and that is the only place I used ItemPointerGetBlockNumber as a replacement for Buffer. Glad to see you were able to make sense of the new heap_fetch() syntax to get the buffer. The old code was very unclear in these areas. Now that it is in the developers FAQ, it should be better. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Sorry to be nit-picky, but is pg_indexes proper English. Isn't it pg_indices? -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <maillist@candle.pha.pa.us> writes: > Sorry to be nit-picky, but is pg_indexes proper English. Isn't it > pg_indices? In proper English, "indices" is plural of the singular noun "index". In American, "indexes" is also allowed. (In proper English, "indexes" is a conjugation of the transitive verb "index", of course.) -tih -- Popularity is the hallmark of mediocrity. --Niles Crane, "Frasier"
Bruce Momjian wrote: > > Sorry to be nit-picky, but is pg_indexes proper English. Isn't it > pg_indices? I thought that too. I named it just pg_indexes because Oracle has ALL_INDEXES, DBA_INDEXES and USER_INDEXES. I wonder why I'm allways copying those typos when implementing something similar to what they have. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #======================================== jwieck@debis.com (Jan Wieck) #