Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM) - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM) |
Date | |
Msg-id | CAA4eK1+2vvhzVveagojZDgwy96FBoTOCAaWBF-O=WUX_qZJzCg@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM) (Pavan Deolasee <pavan.deolasee@gmail.com>) |
Responses |
Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)
Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM) |
List | pgsql-hackers |
On Tue, Mar 21, 2017 at 6:47 PM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote: > >> > > Please find attached rebased patches. > Few comments on 0005_warm_updates_v18.patch: 1. @@ -806,20 +835,35 @@ hashbucketcleanup(Relation rel, Bucket cur_bucket, Buffer bucket_buf, { .. - if (callback && callback(htup, callback_state)) + if(callback) { - kill_tuple = true; - if (tuples_removed) - *tuples_removed += 1; +result = callback(htup, is_warm, callback_state); + if (result== IBDCR_DELETE) + { + kill_tuple = true; + if (tuples_removed) +*tuples_removed += 1; + } + else if (result ==IBDCR_CLEAR_WARM) + { + clear_tuple= true; + } } else if (split_cleanup) .. } I think this will break the existing mechanism of split cleanup. We need to check for split cleanup if the tuple is tuple is not deletable by the callback. This is not merely an optimization but a must condition because we will clear the split cleanup flag after this bucket is scanned completely. 2. - PageIndexMultiDelete(page, deletable, ndeletable); + /* + * Clear the WARM pointers. + * + * We mustdo this before dealing with the dead items because + * PageIndexMultiDelete may move items around to compactify the + * array and hence offnums recorded earlierwon't make any sense + * after PageIndexMultiDelete is called. +*/ + if (nclearwarm > 0) + _hash_clear_items(page,clearwarm, nclearwarm); + + /* + * And delete the deletableitems + */ + if (ndeletable > 0) + PageIndexMultiDelete(page, deletable, ndeletable); I think this assumes that the items where we need to clear warm flag are not deletable, otherwise what is the need to clear the flag if we are going to delete the tuple. The deletable tuple can have a warm flag if it is deletable due to split cleanup. 3. + /* + * HASH indexes compute a hash value of the key and store that in the + * index. So we must first obtain the hash of the value obtained from the + * heap and then do a comparison +*/ + _hash_convert_tuple(indexRel, values, isnull, values2, isnull2); I think here, you need to handle the case where heap has a NULL value as the hash index doesn't contain NULL values, otherwise, the code in below function can return true which is not right. 4. +bool +hashrecheck(Relation indexRel, IndexInfo *indexInfo, IndexTuple indexTuple, + Relation heapRel, HeapTuple heapTuple) { .. + att = indexRel->rd_att->attrs[i - 1]; + if (!datumIsEqual(values2[i - 1], indxvalue, att->attbyval, + att->attlen)) + { + equal = false; + break; + } .. } Hash values are always uint32 and attlen can be different for different datatypes, so I think above doesn't seem to be the right way to do the comparison. 5. @@ -274,6 +301,8 @@ hashgettuple(IndexScanDesc scan, ScanDirection dir) OffsetNumber offnum; ItemPointer current; bool res; + IndexTuple itup; + /* Hash indexes are always lossy since we store only the hash code */ scan->xs_recheck = true; @@ -316,8 +345,6 @@ hashgettuple(IndexScanDesc scan, ScanDirection dir) offnum <= maxoffnum; offnum = OffsetNumberNext(offnum)) { -IndexTuple itup; - Why above change? 6. + *stats = index_bulk_delete(&ivinfo, *stats, +lazy_indexvac_phase1, (void *) vacrelstats); + ereport(elevel, +(errmsg("scanned index \"%s\" to remove %d row version, found " +"%0.f warm pointers, %0.f clear pointers, removed " +"%0.f warm pointers, removed %0.f clear pointers", +RelationGetRelationName(indrel), + vacrelstats->num_dead_tuples, + (*stats)->num_warm_pointers, +(*stats)->num_clear_pointers, +(*stats)->warm_pointers_removed, + (*stats)->clear_pointers_removed))); + + (*stats)->num_warm_pointers = 0; + (*stats)->num_clear_pointers = 0; + (*stats)->warm_pointers_removed = 0; + (*stats)->clear_pointers_removed = 0; + (*stats)->pointers_cleared = 0; + + *stats =index_bulk_delete(&ivinfo, *stats, + lazy_indexvac_phase2, (void *)vacrelstats); To convert WARM chains, we need to do two index passes for all the indexes. I think it can substantially increase the random I/O. I think this can help us in doing more WARM updates, but I don't see how the downside of that (increased random I/O) will be acceptable for all kind of cases. +exists. Since index vacuum may visit these pointers in any order, we will need +another index pass to remove dead index pointers. So in the first index pass we +check which WARM candidates have 2 index pointers. In the second pass, we +remove the dead pointer and clear the INDEX_WARM_POINTER flag if that's the +surviving index pointer. I think there is some mismatch between README and code. In README, it is mentioned that dead pointers will be removed in the second phase, but I think the first phase code lazy_indexvac_phase1() will also allow to delete the dead pointers (it can return IBDCR_DELETE which will allow index am to remove dead items.). Am I missing something here? 7. + * For CLEAR chains, we just kill the WARM pointer, if it exist,s and keep + * the CLEAR pointer. typo (exist,s) 8. +/* + * lazy_indexvac_phase2() -- run first pass of index vacuum Shouldn't this be -- run the second pass 9. - indexInfo); /* index AM may need this */ +indexInfo, /* index AM may need this */ +(modified_attrs != NULL)); /* type of uniqueness check to do */ comment for the last parameter seems to be wrong. 10. +follow the update chain everytime to the end to see check if this is a WARM +chain. "see check" - seems one of those words is sufficient to explain the meaning. 11. +chain. This simplifies the design and addresses certain issues around +duplicate scans. "duplicate scans" - shouldn't be duplicate key scans. 12. +index on the table, irrespective of whether the key pertaining to the +index changed or not. typo. /index changed/index is changed 13. +For example, if we have a table with two columns and two indexes on each +of the column. When a tuple is first inserted the table, we have exactly typo. /inserted the table/inserted in the table 14. + lp [1] [2] + [1111, aaaa]->[111, bbbb] Here, after the update, the first column should be 1111. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: