Thread: A modest proposal: make parser/rewriter/planner inputs read-only

A modest proposal: make parser/rewriter/planner inputs read-only

From

Tom Lane

Date:

05 April, 19:46:37

The parser, rewriter, and planner all have a bad habit of scribbling
on their input parsetrees.  At this point I've lost count of how many
bugs that's caused (but e33f2335a and 0f43083d1 are the two latest
examples), and of how many places there are that are brute-forcing a
solution to the problem by copying a whole parsetree before letting
one of these subsystems see it.

Years ago we fixed the executor to treat its input Plan trees as
read-only.  It seems like we really ought to do the same for these
upstream subsystems.  Surely, whatever benefit we get from changing
Node contents in-place is lost due to all these other hacks.

While I've had this thought in the back of my head for a long while,
I haven't pursued it because I couldn't think of a practical way
to find all the places that are violating such a policy, much less
prevent future violations.  However, today I thought of this sketch 
of a solution:

1. Invent a way to make a particular memory context read-only
after putting some data into it.

2. In debug builds, after we've built a tree that should be considered
read-only, copy it into such a context and make it read-only.  Or
perhaps build it there in the first place.

3. Fix the resulting crashes.

4. Profit!  (In particular, nuke a lot of no-longer-needed copyObject
calls.)

My first thought about implementing #1 was to seek Valgrind's help,
but so far as I can find out there's no VALGRIND_MAKE_MEM_READ_ONLY.
Step #3 would be pretty tedious anyway if it required running under
Valgrind.  However, all modern hardware has the ability to mark
memory read-only at the page level, and most platforms expose that
in some way or other.  So it doesn't seem unreasonable to invent
a memory context option (or whole new context type, if that seems
easier) that is careful to align its allocation blocks on page
boundaries and then can set or clear the hardware R/O flag on
demand.  It'd be enough if the R/O enforcement worked on popular
development platforms, we don't have to make it work absolutely
everywhere.

I'm not planning to pursue this idea Right Now, but it seems like
something that could happen for v19 or so.  In the meantime I wanted
to get the ideas down on electrons.

Thoughts?

            regards, tom lane

Re: A modest proposal: make parser/rewriter/planner inputs read-only

From

Andres Freund

Date:

06 April, 05:30:54

Hi,

On 2025-04-05 12:46:37 -0400, Tom Lane wrote:
> 1. Invent a way to make a particular memory context read-only
> after putting some data into it.
> 
> 2. In debug builds, after we've built a tree that should be considered
> read-only, copy it into such a context and make it read-only.  Or
> perhaps build it there in the first place.

> 3. Fix the resulting crashes.
> 
> 4. Profit!  (In particular, nuke a lot of no-longer-needed copyObject
> calls.)
> 
> My first thought about implementing #1 was to seek Valgrind's help,
> but so far as I can find out there's no VALGRIND_MAKE_MEM_READ_ONLY.
> Step #3 would be pretty tedious anyway if it required running under
> Valgrind.  However, all modern hardware has the ability to mark
> memory read-only at the page level, and most platforms expose that
> in some way or other.  So it doesn't seem unreasonable to invent
> a memory context option (or whole new context type, if that seems
> easier) that is careful to align its allocation blocks on page
> boundaries and then can set or clear the hardware R/O flag on
> demand.  It'd be enough if the R/O enforcement worked on popular
> development platforms, we don't have to make it work absolutely
> everywhere.

FWIW, while hacking on patch to making hint bit writes not happening while IO
is going on (so we don't need to copy the page anymore and don't cause
filesystem level issues with DIO), I hacked up protection for shared buffers
using mprotect() - it worked way better than I thought it would. The overhead
ended up surprisingly low:

base:
real    1m4.613s
user    4m31.409s
sys     3m20.445s

ENFORCE_BUFFER_PROT

real    1m11.912s
user    4m27.332s
sys     3m28.063s


See https://postgr.es/m/043c8b50-d183-46e5-b054-145cc0f6f908%40iki.fi


I'm mostly sharing that to say that
a) yes, mprotect() is viable and works surprisingly well
b) it might be worth inventing some common platform abstraction for mprotect

That prototype patch already worked on most platforms, windows should be
entirely doable.


Greetings,

Andres Freund

Re: A modest proposal: make parser/rewriter/planner inputs read-only

From

Richard Guo

Date:

15 April, 12:20:42

On Sun, Apr 6, 2025 at 1:46 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The parser, rewriter, and planner all have a bad habit of scribbling
> on their input parsetrees.  At this point I've lost count of how many
> bugs that's caused (but e33f2335a and 0f43083d1 are the two latest
> examples), and of how many places there are that are brute-forcing a
> solution to the problem by copying a whole parsetree before letting
> one of these subsystems see it.
>
> Years ago we fixed the executor to treat its input Plan trees as
> read-only.  It seems like we really ought to do the same for these
> upstream subsystems.  Surely, whatever benefit we get from changing
> Node contents in-place is lost due to all these other hacks.

While I'm in favor of this idea, it seems that it will require lots of
code changes.  In particular, within the planner, the parsetree goes
through considerable transformations during the preprocessing phase,
such as subquery pull-up, constant folding, and so on.  Would this
proposal mean we'd need to refactor all those places to treat the
input parsetrees as read-only?  Just trying to understand the scope of
what would be involved.

Thanks
Richard

Re: A modest proposal: make parser/rewriter/planner inputs read-only

From

Andrei Lepikhov

Date:

15 April, 13:00:53

On 4/5/25 18:46, Tom Lane wrote:
> I'm not planning to pursue this idea Right Now, but it seems like
> something that could happen for v19 or so.  In the meantime I wanted
> to get the ideas down on electrons.
> 
> Thoughts?
I generally like the idea because, for now, I need to be sure that no 
one touched the parse tree before copying it to do additional 
transformations before the optimisation phase.
But what is the way you are proposing here? Do you mean that one more 
entity will be explicitly introduced: a transformed parse tree?
It would open an opportunity for extensions to build a set of 
alternative transformed trees, pass them through the optimisation phase 
and choose the best plan.

-- 
regards, Andrei Lepikhov

Re: A modest proposal: make parser/rewriter/planner inputs read-only

From

Tom Lane

Date:

15 April, 20:31:44

Richard Guo <guofenglinux@gmail.com> writes:
> On Sun, Apr 6, 2025 at 1:46 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Years ago we fixed the executor to treat its input Plan trees as
>> read-only.  It seems like we really ought to do the same for these
>> upstream subsystems.  Surely, whatever benefit we get from changing
>> Node contents in-place is lost due to all these other hacks.

> While I'm in favor of this idea, it seems that it will require lots of
> code changes.  In particular, within the planner, the parsetree goes
> through considerable transformations during the preprocessing phase,
> such as subquery pull-up, constant folding, and so on.  Would this
> proposal mean we'd need to refactor all those places to treat the
> input parsetrees as read-only?  Just trying to understand the scope of
> what would be involved.

From a high-level point of view, I'd be satisfied to have this
property at the subsystem level: parse analysis can't modify the raw
parse tree output by the grammar, rewriter can't modify the analyzed
parse tree it's given, planner can't modify the parse tree it's given.
That would not preclude, say, portions of the planner doing
modify-in-place as long as it was certain that what they were
modifying was a planner-local copy.  (The proposed test infrastructure
would only be able to enforce things at the subsystem level anyway.)

As we get into the project we might decide that it'd be worth
enforcing similar rules on portions of those subsystems, eg successive
phases of the planner.  I don't have strong feelings about that at
present.  An example here is that it'd be nice to have some way of
verifying that a GEQO search step doesn't change any of the data
structures that are supposed to remain the same.

I do suspect that there are specific functions that will need to be
made strict about this, for example eval_const_expressions, but that
doesn't necessarily translate to a hard requirement across the board.

            regards, tom lane

Re: A modest proposal: make parser/rewriter/planner inputs read-only

From

Tom Lane

Date:

15 April, 20:34:55

Andrei Lepikhov <lepihov@gmail.com> writes:
> But what is the way you are proposing here? Do you mean that one more 
> entity will be explicitly introduced: a transformed parse tree?

No, I wasn't thinking of adding new concepts, just saying that the
inputs to certain operations need to be treated as read-only.
What would you envision a "transformed parse tree" being that's not
what we have today?

> It would open an opportunity for extensions to build a set of 
> alternative transformed trees, pass them through the optimisation phase 
> and choose the best plan.

Yeah, the ability to repeatedly operate on a tree without the overhead
of making copies would be a major benefit of being stricter here.

            regards, tom lane

Re: A modest proposal: make parser/rewriter/planner inputs read-only

From

Ashwin Agrawal

Date:

16 April, 03:06:13

On Sat, Apr 5, 2025 at 7:31 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2025-04-05 12:46:37 -0400, Tom Lane wrote:
> 1. Invent a way to make a particular memory context read-only
> after putting some data into it.
>
> 2. In debug builds, after we've built a tree that should be considered
> read-only, copy it into such a context and make it read-only. Or
> perhaps build it there in the first place.

> 3. Fix the resulting crashes.
>
> 4. Profit! (In particular, nuke a lot of no-longer-needed copyObject
> calls.)
>
> My first thought about implementing #1 was to seek Valgrind's help,
> but so far as I can find out there's no VALGRIND_MAKE_MEM_READ_ONLY.
> Step #3 would be pretty tedious anyway if it required running under
> Valgrind. However, all modern hardware has the ability to mark
> memory read-only at the page level, and most platforms expose that
> in some way or other. So it doesn't seem unreasonable to invent
> a memory context option (or whole new context type, if that seems
> easier) that is careful to align its allocation blocks on page
> boundaries and then can set or clear the hardware R/O flag on
> demand. It'd be enough if the R/O enforcement worked on popular
> development platforms, we don't have to make it work absolutely
> everywhere.

FWIW, while hacking on patch to making hint bit writes not happening while IO
is going on (so we don't need to copy the page anymore and don't cause
filesystem level issues with DIO), I hacked up protection for shared buffers
using mprotect() - it worked way better than I thought it would. The overhead
ended up surprisingly low:

base:
real 1m4.613s
user 4m31.409s
sys 3m20.445s

ENFORCE_BUFFER_PROT

real 1m11.912s
user 4m27.332s
sys 3m28.063s

See https://postgr.es/m/043c8b50-d183-46e5-b054-145cc0f6f908%40iki.fi

I'm mostly sharing that to say that
a) yes, mprotect() is viable and works surprisingly well
b) it might be worth inventing some common platform abstraction for mprotect

That prototype patch already worked on most platforms, windows should be
entirely doable.

Also, I would like to provide reference to this old thread [1] in favor of mprotect().

In that thread, and in Greenplum using mprotect helps detect Shared buffer access rule violations.

[1] https://www.postgresql.org/message-id/CANXE4Tfxmjv8Z-kcwWUCYxx6zCFS%3DGeBPm-ZjjLucvrGewjUtg%40mail.gmail.com

This electronic communication and the information and any files transmitted with it, or attached to it, are confidential and are intended solely for the use of the individual or entity to whom it is addressed and may contain information that is confidential, legally privileged, protected by privacy laws, or otherwise restricted from disclosure to anyone else. If you are not the intended recipient or the person responsible for delivering the e-mail to the intended recipient, you are hereby notified that any use, copying, distributing, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. If you received this e-mail in error, please return the e-mail to the sender, delete it from your computer, and destroy any printed copy of it.

Re: A modest proposal: make parser/rewriter/planner inputs read-only

From

Andrei Lepikhov

Date:

16 April, 10:28:31

On 4/15/25 19:34, Tom Lane wrote:
> Andrei Lepikhov <lepihov@gmail.com> writes:
>> But what is the way you are proposing here? Do you mean that one more
>> entity will be explicitly introduced: a transformed parse tree?
> 
> No, I wasn't thinking of adding new concepts, just saying that the
> inputs to certain operations need to be treated as read-only.
> What would you envision a "transformed parse tree" being that's not
> what we have today?
I just want to understand how your idea will work. The query_planner 
does the job for subqueries separately. If a query is transformed in 
some way (let's say, an unnecessary join is deleted), we need to change 
references in the parse tree of another subquery, or it will not find 
the reference at the moment of planning, right?

-- 
regards, Andrei Lepikhov

Re: A modest proposal: make parser/rewriter/planner inputs read-only

From

Tom Lane

Date:

16 April, 19:22:42

Andrei Lepikhov <lepihov@gmail.com> writes:
> I just want to understand how your idea will work. The query_planner 
> does the job for subqueries separately. If a query is transformed in 
> some way (let's say, an unnecessary join is deleted), we need to change 
> references in the parse tree of another subquery, or it will not find 
> the reference at the moment of planning, right?

Don't see why.  If we're separately planning a subquery, we would
not dare to change anything about its outputs, just as we would
not make a change that affects the topmost level's outputs.  I don't
believe there's anything right now that requires a recursive
subquery_planner call to change the outer parsetree, and this idea
wouldn't affect that.

Now, subquery_planner does have side effects on the PlannerGlobal
struct, but that's planner-local data, not an input to the planner.

Maybe we would like to have some enforced contract about what
subquery_planner can and can't touch in the outer planner level's
data, but I'm not feeling a great need for that right now.

            regards, tom lane

Re: A modest proposal: make parser/rewriter/planner inputs read-only

From

Andrei Lepikhov

Date:

16 April, 21:07:45

On 4/16/25 18:22, Tom Lane wrote:
> Maybe we would like to have some enforced contract about what
> subquery_planner can and can't touch in the outer planner level's
> data, but I'm not feeling a great need for that right now.
I wouldn't say I'm entirely convinced that side effects are absent, but 
multiple copies of the same query tree look messy now. It makes sense to 
try this way and see what happens.

-- 
regards, Andrei Lepikhov