Re: [HACKERS] A design for amcheck heapam verification - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: [HACKERS] A design for amcheck heapam verification |
Date | |
Msg-id | 20180331012031.ftf4vbiwabsg7uvb@alap3.anarazel.de Whole thread Raw |
In response to | Re: [HACKERS] A design for amcheck heapam verification (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: [HACKERS] A design for amcheck heapam verification
|
List | pgsql-hackers |
On 2018-03-29 19:42:42 -0700, Peter Geoghegan wrote: > >> + /* > >> + * Aim for two bytes per element; this is sufficient to get a false > >> + * positive rate below 1%, independent of the size of the bitset or total > >> + * number of elements. Also, if rounding down the size of the bitset to > >> + * the next lowest power of two turns out to be a significant drop, the > >> + * false positive rate still won't exceed 2% in almost all cases. > >> + */ > >> + bitset_bytes = Min(bloom_work_mem * 1024L, total_elems * 2); > >> + /* Minimum allowable size is 1MB */ > >> + bitset_bytes = Max(1024L * 1024L, bitset_bytes); > > > > Some upcasting might be advisable, to avoid dangers of overflows? > > When it comes to sizing work_mem, using long literals to go from KB to > bytes is How It's Done™. I actually think that's silly myself, because > it's based on the assumption that long is wider than int, even though > it isn't on Windows. But that's okay because we have the old work_mem > size limits on Windows. > What would the upcasting you have in mind look like? Just use UINT64CONST()? Let's try not to introduce further code that'll need to get painfully fixed. > > >> + /* Use 64-bit hashing to get two independent 32-bit hashes */ > >> + hash = DatumGetUInt64(hash_any_extended(elem, len, filter->seed)); > > > > Hm. Is that smart given how some hash functions are defined? E.g. for > > int8 the higher bits aren't really that independent for small values: > > Robert suggested that I do this. I don't think that we need to make it > about the quality of the hash function that we have available. That > really seems like a distinct question to me. It seems clear that this > ought to be fine (or should be fine, if you prefer). I understand why > you're asking about this, but it's not scalable to ask every user of a > hash function to care that it might be a bit crap. Hash functions > aren't supposed to be a bit crap. Well, they're not supposed to be, but they are. Practical realities matter... But I think it's moot because we don't use any of the bad ones, I only got to that later... > >> +DROP FUNCTION bt_index_check(regclass); > >> +CREATE FUNCTION bt_index_check(index regclass, > >> + heapallindexed boolean DEFAULT false) > >> +RETURNS VOID > >> +AS 'MODULE_PATHNAME', 'bt_index_check' > >> +LANGUAGE C STRICT PARALLEL RESTRICTED; > > > > This breaks functions et al referencing the existing function. > > This sounds like a general argument against ever changing a function's > signature. It's not like we haven't done that a number of times in > extensions like pageinspect. Does it really matter? Yes, it does. And imo we shouldn't. > > Also, I > > don't quite recall the rules, don't we have to drop the function from > > the extension first? > > But...I did drop the function? I mean in the ALTER EXTENSION ... DROP FUNCTION sense. Greetings, Andres Freund
pgsql-hackers by date: