Re: A performance regression issue with Memoize - Mailing list pgsql-hackers

From Robert Haas
Subject Re: A performance regression issue with Memoize
Date
Msg-id CA+TgmoY6C=PrWRbHsQqCMWoHWPuYoFLKfpnryTpn_1fEDOqJLw@mail.gmail.com
Whole thread Raw
In response to Re: A performance regression issue with Memoize  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Tue, Jul 29, 2025 at 12:57 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> David Rowley <dgrowleyml@gmail.com> writes:
> > For the record, I 100% agree that there will always be cases where
> > statistics are just unable to represent what is discovered at
> > run-time, so having some sort of ability to adapt at run-time seems
> > like a natural progression on the evolutionary chain. I just don't
> > know if it's the best or best next step to make. I suspect we might be
> > skipping a few steps from what we have now if we went there in the
> > near future. We don't yet have extended statistics for joins yet, for
> > example.
>
> Yeah.  There is plenty remaining to be done towards collecting and
> applying traditional sorts of statistics.  I worry about ideas
> such as run-time plan changes because we will have exactly zero
> ability to predict what'll happen if the executor starts doing
> that.  Maybe it'll be great, but what do you do if it isn't?

Well, you already know that what you're doing isn't great. If the
currently-selected alternative is terrible, the other alternative
doesn't have to be that great to be a win.

I've thought about this mostly in the context of the decision between
a Nested Loop and a Hash Join. Subject to some conditions, these are
interchangeable: at any point you could decide that on the next
iteration you're going to put all the inner rows into a hash table and
just probe that. The "only" downside is that it could turn out that,
unluckily, the next iteration was also the last one that was ever
going to happen, and then the overhead to build the hash table was
wasted. If the Nested Loop is parameterized, the Hash Join requires a
complete scan of the inner side of the join, which requires a
different plan variant, and which is potentially quite expensive.

But switching from a plain Nested Loop to Nested Loop + Memoize
wouldn't have that problem. You never have to make a complete scan of
the inner side. You can just decide to start caching some results for
individual parameter values whenever you want, and if it turns out
that they're never useful, you haven't lost nearly as much. So a
strategy like "start memoizing when we exceed the expected loop count
by 20x" might be viable. I'm not really sure, I haven't done the
experiments, but it seems to me that the downsides of this kind of
strategy switch might be pretty minimal even when things work out
anti-optimally.

--
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: "Matheus Alcantara"
Date:
Subject: Re: Only one version can be installed when using extension_control_path
Next
From: "David E. Wheeler"
Date:
Subject: Re: ABI Compliance Checker GSoC Project