Re: [HACKERS] PATCH: multivariate histograms and MCV lists - Mailing list pgsql-hackers
From | Alvaro Herrera |
---|---|
Subject | Re: [HACKERS] PATCH: multivariate histograms and MCV lists |
Date | |
Msg-id | 20180310171906.gd6vyz6hnou2n3h7@alvherre.pgsql Whole thread Raw |
In response to | Re: [HACKERS] PATCH: multivariate histograms and MCV lists (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Responses |
Re: [HACKERS] PATCH: multivariate histograms and MCV lists
|
List | pgsql-hackers |
On 0002: In terms of docs, I think it's better not to have anything user-facing in the README. Consider that users are going to be reading the HTML docs only, and many of them may not have the README available at all. So anything that could be useful to users must be in the XML docs only; keep in the README only stuff that would be useful to a developer (a section such as "not yet implemented" would belong there, for example). Stuff that's in the XML should not appear in the README (because DRY). For the same reason, having the XML docs end with "see the README" seems a bad idea to me. UPDATE_RESULT() is a bit weird to me. I think after staring at it for a while it looks okay, but why was it such a shock? In 0002 it's only used in one place so I would suggest to have it expanded, but I see you use it in 0003 also, three times I think. IMO for clarity it seems better to just have the expanded code rather than the macro. find_ext_attnums (and perhaps other places) have references to renamed columns, "starelid" and others. Also there is this comment: /* Prepare to scan pg_statistic_ext for entries having indrelid = this rel. */ which is outdated since it uses syscache, not a scan. Just remove the comment ... Please add a comment on what does build_attnums() do. pg_stats_ext_mcvlist_items is odd. I suppose you made it take oid to avoid having to deal with a malicious bytea? The query in docs is pretty odd-looking, SELECT * FROM pg_mcv_list_items((SELECT oid FROM pg_statistic_ext WHERE stxname = 'stts2')); If we keep the function as is, I would suggest to use LATERAL instead, SELECT m.* FROM pg_statistic_ext, pg_mcv_list_items(oid) m WHERE stxname = 'stts2'; but seems like it should be more like this instead: SELECT m.* FROM pg_statistic_ext, pg_mcv_list_items(stxmcv) m WHERE stxname = 'stts2'; and not have the output formatting function load the data again from the table. It'd be a bit like a type-specific UNNEST. There are a few elog(ERROR) messages. The vast majority seem to be just internal messages so they're okay, but there is one that should be ereport: + if (total_length > (1024 * 1024)) + elog(ERROR, "serialized MCV list exceeds 1MB (%ld)", total_length); I think we have some precedent for better wording, such as errmsg("index row size %zu exceeds maximum %zu for index \"%s\"" so I would say errmsg("serialized MCV list size %zu exceedes maximum %zu" ) though I wonder when is this error thrown -- if this is detected during analyze for example, what happens? There is this FIXME: + * FIXME Should skip already estimated clauses (using the estimatedclauses + * bitmap). Are you planning on implementing this before commit? There are other FIXMEs also. This in particular caught my attention: + /* merge the bitmap into the existing one */ + for (i = 0; i < mcvlist->nitems; i++) + { + /* + * Merge the result into the bitmap (Min for AND, Max for OR). + * + * FIXME this does not decrease the number of matches + */ + UPDATE_RESULT(matches[i], or_matches[i], is_or); + } We come back to UPDATE_RESULT again ... and note how the comment makes no sense unless you know what UPDATE_RESULT does internally. This is one more indication that the macro is not a great thing to have. Let's lose it. But while at it, what to do about the FIXME? You also have this + /* XXX syscache contains OIDs of deleted stats (not invalidated) */ + if (!HeapTupleIsValid(htup)) + return NULL; but what does it mean? Is it here to cover for some unknown bug? Should we maybe not have this at all? Another XXX comment says + * XXX All the memory is allocated in a single chunk, so that the caller + * can simply pfree the return value to release all of it. but I would say just remove the XXX and leave the rest of the comment. There is another XXX comment that says "this is useless", and I agree. Just take it all out ... -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: