Re: dynamic shared memory - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: dynamic shared memory |
Date | |
Msg-id | 20130827140733.GD24807@alap2.anarazel.de Whole thread Raw |
In response to | dynamic shared memory (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: dynamic shared memory
|
List | pgsql-hackers |
Hi Robert, [just sending an email which sat in my outbox for two weeks] On 2013-08-13 21:09:06 -0400, Robert Haas wrote: > ... Nice to see this coming. I think it will actually be interesting for quite some things outside parallel query, but we'll see. I've not yet looked at the code, so I just have some highlevel comments so far. > To help solve these problems, I invented something called the "dynamic > shared memory control segment". This is a dynamic shared memory > segment created at startup (or reinitialization) time by the > postmaster before any user process are created. It is used to store a > list of the identities of all the other dynamic shared memory segments > we have outstanding and the reference count of each. If the > postmaster goes through a crash-and-reset cycle, it scans the control > segment and removes all the other segments mentioned there, and then > recreates the control segment itself. If the postmaster is killed off > (e.g. kill -9) and restarted, it locates the old control segment and > proceeds similarly. That way any corruption in that area will prevent restarts without reboot unless you use ipcrm, or such, right? > Creating a shared memory segment is a somewhat operating-system > dependent task. I decided that it would be smart to support several > different implementations and to let the user choose which one they'd > like to use via a new GUC, dynamic_shared_memory_type. I think we want that during development, but I'd rather not go there when releasing. After all, we don't support a manual choice between anonymous mmap/sysv shmem either. > In addition, I've included an implementation based on mmap of a plain > file. As compared with a true shared memory implementation, this > obviously has the disadvantage that the OS may be more likely to > decide to write back dirty pages to disk, which could hurt > performance. However, I believe it's worthy of inclusion all the > same, because there are a variety of situations in which it might be > more convenient than one of the other implementations. One is > debugging. Hm. Not sure what's the advantage over a corefile here. > On MacOS X, for example, there seems to be no way to list > POSIX shared memory segments, and no easy way to inspect the contents > of either POSIX or System V shared memory segments. Shouldn't we ourselves know which segments are around? > Another use case > is working around an administrator-imposed or OS-imposed shared memory > limit. If you're not allowed to allocate shared memory, but you are > allowed to create files, then this implementation will let you use > whatever facilities we build on top of dynamic shared memory anyway. I don't think we should try to work around limits like that. > A third possible reason to use this implementation is > compartmentalization. For example, you can put the directory that > stores the dynamic shared memory segments on a RAM disk - which > removes the performance concern - and then do whatever you like with > that directory: secure it, put filesystem quotas on it, or sprinkle > magic pixie dust on it. It doesn't even seem out of the question that > there might be cases where there are multiple RAM disks present with > different performance characteristics (e.g. on NUMA machines) and this > would provide fine-grained control over where your shared memory > segments get placed. To make a long story short, I won't be crushed > if the consensus is against including this, but I think it's useful. -1 so far. Seems a bit handwavy to me. > Other implementations are imaginable but not implemented here. For > example, you can imagine using the mmap() of an anonymous file. > However, since the point is that these segments are created on the fly > by individual backends and then shared with other backends, that gets > a little tricky. In order for the second backend to map the same > anonymous shared memory segment that the first one mapped, you'd have > to pass the file descriptor from one process to the other. It wouldn't even work. Several mappings of /dev/zero et al. do *not* result in the same virtual memory being mapped. Not even when using the same (passed around) fd. Believe me, I tried ;) > There are quite a few problems that this patch does not solve. First, > while it does give you a shared memory segment, it doesn't provide you > with any help at all in figuring out what to put in that segment. The > task of figuring out how to communicate usefully through shared memory > is thus, for the moment, left entirely to the application programmer. > While there may be cases where that's just right, I suspect there will > be a wider range of cases where it isn't, and I plan to work on some > additional facilities, sitting on top of this basic structure, next, > though probably as a separate patch. Agreed. > Second, it doesn't make any> policy decisions about what is sensible either in terms of number of > shared memory segments or the sizes of those segments, even though > there are serious practical limits in both cases. Actually, the total > number of segments system-wide is limited by the size of the control > segment, which is sized based on MaxBackends. But there's nothing to > keep a single backend from eating up all the slots, even though that's > pretty both unfriendly and unportable, and there's no real limit to > the amount of memory it can gobble up per slot, either. In other > words, it would be a bad idea to write a contrib module that exposes a > relatively uncooked version of this layer to the user. At this point I am rather unconcerned with this point to be honest. > --- /dev/null > +++ b/src/include/storage/dsm.h > @@ -0,0 +1,40 @@ > +/*------------------------------------------------------------------------- > + * > + * dsm.h > + * manage dynamic shared memory segments > + * > + * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group > + * Portions Copyright (c) 1994, Regents of the University of California > + * > + * src/include/storage/dsm.h > + * > + *------------------------------------------------------------------------- > + */ > +#ifndef DSM_H > +#define DSM_H > + > +#include "storage/dsm_impl.h" > + > +typedef struct dsm_segment dsm_segment; > + > +/* Initialization function. */ > +extern void dsm_postmaster_startup(void); > + > +/* Functions that create, update, or remove mappings. */ > +extern dsm_segment *dsm_create(uint64 size, char *preferred_address); > +extern dsm_segment *dsm_attach(dsm_handle h, char *preferred_address); > +extern void *dsm_resize(dsm_segment *seg, uint64 size, > + char *preferred_address); > +extern void *dsm_remap(dsm_segment *seg, char *preferred_address); > +extern void dsm_detach(dsm_segment *seg); Why do we want to expose something unreliable as preferred_address to the external interface? I haven't read the code yet, so I might be missing something here. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: