Re: Hybrid Hash/Nested Loop joins and caching results from subplans - Mailing list pgsql-hackers

From Zhihong Yu
Subject Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Date
Msg-id CALNJ-vTi+MNGjPJn8WDcY2DODEBbnxGYA4_CcG==-2JCA3BQFw@mail.gmail.com
Whole thread Raw
In response to Hybrid Hash/Nested Loop joins and caching results from subplans  (Zhihong Yu <zyu@yugabyte.com>)
Responses Re: Hybrid Hash/Nested Loop joins and caching results from subplans
List pgsql-hackers
There are two blocks with almost identical code (second occurrence in cache_store_tuple):

+   if (rcstate->mem_used > rcstate->mem_upperlimit)
+   {

It would be nice if the code can be extracted to a method and shared.

                    node->rc_status = RC_END_OF_SCAN;
                    return NULL;
                }
                else

There are several places where the else keyword for else block can be omitted because the if block ends with return.
This would allow the code in else block to move leftward (for easier reading).

       if (!get_op_hash_functions(hashop, &left_hashfn, &right_hashfn))

I noticed that right_hashfn isn't used. Would this cause some warning from the compiler (for some compiler the warning would be treated as error) ?
Maybe NULL can be passed as the last parameter. The return value of get_op_hash_functions would keep the current meaning (find both hash fn's).

    rcstate->mem_lowerlimit = rcstate->mem_upperlimit * 0.98;

Maybe (in subsequent patch) GUC variable can be introduced for tuning the constant 0.98.

For +paraminfo_get_equal_hashops :

+       else
+           Assert(false);

Add elog would be good for debugging.

Cheers

On Fri, Dec 4, 2020 at 5:09 PM Zhihong Yu <zyu@yugabyte.com> wrote:
Hi, David:
For nodeResultCache.c :

+#define SH_EQUAL(tb, a, b) ResultCacheHash_equal(tb, a, b) == 0

I think it would be safer if the comparison is enclosed in parentheses (in case the macro appears in composite condition).

+ResultCacheHash_equal(struct resultcache_hash *tb, const ResultCacheKey *key1,
+                     const ResultCacheKey *key2)

Since key2 is not used, maybe name it unused_key ?

+   /* Make a guess at a good size when we're not given a valid size. */
+   if (size == 0)
+       size = 1024;

Should the default size be logged ?

+   /* Update the memory accounting */
+   rcstate->mem_used -= freed_mem;

Maybe add an assertion that mem_used is >= 0 after the decrement (there is an assertion in remove_cache_entry however, that assertion is after another decrement).

+ * 'specialkey', if not NULL, causes the function to return false if the entry
+ * entry which the key belongs to is removed from the cache.

duplicate entry (one at the end of first line and one at the beginning of second line).

For cache_lookup(), new key is allocated before checking whether rcstate->mem_used > rcstate->mem_upperlimit. It seems new entries should probably have the same size.
Can we check whether upper limit is crossed (assuming the addition of new entry) before allocating new entry ?

+       if (unlikely(!cache_reduce_memory(rcstate, key)))
+           return NULL;

Does the new entry need to be released in the above case?

Cheers

pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Add Information during standby recovery conflicts
Next
From: Bruce Momjian
Date:
Subject: Re: Proposed patch for key managment