Thread: memory strangeness (fwd)
Hiya. I've installed Postgres 7.2 on a dedicated FreeBSD system with 384 MB RAM. Because the system will be doing nothing except PG, I'd like to dump as much memory as possible into PG's shared memory. I rebuilt the kernel with very large limits: 330 MB on the MAXDSIZ and DFLDSIZ, and 330 MB for SHMMAXPAGES. This gives me: # sysctl kern.ipc.shmall kern.ipc.shmall: 42240 # sysctl kern.ipc.shmmax kern.ipc.shmmax: 346030080 I still cannot set PG's shared_buffers higher than 20000 (160 MB): IpcMemoryCreate: shmget(key=8971001, size=183017472, 03600) failed: Cannot allocate memory This error usually means that PostgreSQL's request for a shared memory segment exceeded available memory or swap space. To reduce the request size (currently 183017472 bytes), reduce PostgreSQL's shared_buffers parameter (currently 22000) and/or its max_connections parameter (currently 5). That I'm out of RAM seems fishy, though, since there's very little running and vmstat reports plenty of free RAM: procs memory page disks faults cpu r b w avm fre flt re pi po fr sr ad0 ad3 in sy cs us sy id 0 0 0 8484 353908 1 0 0 0 2 0 2 0 231 5 5 0 0 100 Any ideas as to what I missed?
Gregor Mosheh <stigmata@blackangel.net> writes: > Hiya. I've installed Postgres 7.2 on a dedicated FreeBSD system with 384 > MB RAM. Because the system will be doing nothing except PG, I'd like to > dump as much memory as possible into PG's shared memory. > I rebuilt the kernel with very large limits: 330 MB on the MAXDSIZ and > DFLDSIZ, and 330 MB for SHMMAXPAGES. This gives me: Very likely the kernel has its own ideas on how much memory it needs to reserve for other purposes, and is unwilling to give you a shmem segment that represents the bulk of physical RAM. While I don't know FreeBSD well enough to speculate on exactly why it's limiting you, I do think that you are going in the wrong direction for Postgres anyhow. Pretty much everyone who has looked at the issue has concluded that it's a mistake to try to set shared memory that high. If you're trying to set it to more than a quarter of physical RAM you're off track IMHO. In practice, given that this isn't an especially huge system, I'd think a shared_buffers setting in the low thousands would be appropriate. Also, reducing max_connections as low as 5 seems the wrong direction too ... regards, tom lane
On Thu, 4 Jul 2002, Gregor Mosheh wrote: > Hiya. I've installed Postgres 7.2 on a dedicated FreeBSD system with 384 > MB RAM. Because the system will be doing nothing except PG, I'd like to > dump as much memory as possible into PG's shared memory. Well, see Tom's comments on this, 'cause that's what I always say too. (I'm a NetBSD developer, but FreeBSD's internals aren't so different, really.) However, if you wanted to do it both ways, and benchmark your application, I'd be really interested in hearing about the results. > I rebuilt the kernel with very large limits: 330 MB on the MAXDSIZ and > DFLDSIZ, and 330 MB for SHMMAXPAGES. This gives me: You don't need to rebuild for DFLDSIZ; you can always bump that up to MAXDSIZ with sysctl (assuming you're root--do it before you "su pgsql -c nadanada" in your startup script). And moving MAXDSIZ to 330 MB looks like a reduction to me; FreeBSD's default in 4.6-RELEASE is 512 MB. (NetBSD's is 1GB.) You probably want to check /usr/include/machine/vmparam.h for default settings before changing stuff like this, lest you accidently lower it instead. > I still cannot set PG's shared_buffers higher than 20000 (160 MB): Shared memory pages, IIRC, are locked, meaning that they cannot be swapped. There's always a limit on the number of locked pages you may have in the system in total, if only becuase the system has only so much physical RAM. But for some reasont he RLIMIT_MEMLOCK parameter on my nearest FreeBSD 4.6 system is a bit bogus: server2 $ ulimit -Ha | grep locked lockedmem(kbytes) unlimited (I certainly cannot lock an unlimited number of pages in this 1GB machine! And for some reason it's also saying that my RLIMIT_RSS is unlimited; again, this should return no more than the physical RAM available in the machine.) So I suspect you're running into some other, more secret, limit on the number of pages that the system or a process may lock into RAM. So you'll want to dig around the kernel to find out where this is, and tweak it. cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're all light. --XTC
Curt Sampson <cjs@cynic.net> writes: >> I still cannot set PG's shared_buffers higher than 20000 (160 MB): > Shared memory pages, IIRC, are locked, meaning that they cannot be > swapped. Is that really how it works on *BSD? That's great if so --- it's exactly what Postgres wants --- but you'll pardon my paranoia about assuming that any part of the universe works the way I want it to... There are Unixen that will happily swap shared-memory segments just like anything else. You don't have to think hard to see that this would be a dead loss for Postgres' shared disk buffers. For a clean buffer page, if we don't have room for it we can: (a) drop it from shared mem, and re-read it from the database file next time we need it. Cost: one disk read. (b) swap it out to swap area, and swap it in next time we need it. Cost: one disk write and one read. For a dirty page that we no longer have room for, we can: (a) write it out to the database file and then drop it from shmem. Cost: one disk write, and possibly a disk read if we need the page again later. (b) swap it out to swap area, then sometime later swap it in so we can write it out to the database file. Cost: two writes and a read. So a swappable shared buffer always comes out behind. The prospect of this effect is one of the reasons that I discourage people from setting shared_buffers really high. If their kernel will let them set shared_buffers to a very large fraction of RAM, it probably means that the kernel is willing to swap shared memory, and so they are going to end up behind not ahead. What's more, exactly the same analysis applies to other chunks of memory that the kernel might choose to swap out, if it's forced to because it's under memory pressure due to PG shared_buffers chewing too much of system RAM. Unless your swap disk is way faster than your database disk, it's never a win to swap instead of sending pages to their final resting place on the database disk. Bottom line: you don't ever want to have PG shmem large enough that it pushes the kernel into swapping. Even if the kernel will lock shmem into RAM, you can still lose due to swapping other stuff. So I'm pretty conservative about how much of physical RAM to assign to shared buffers. regards, tom lane
Tom Lane wrote: > Curt Sampson <cjs@cynic.net> writes: > >> I still cannot set PG's shared_buffers higher than 20000 (160 MB): > > > Shared memory pages, IIRC, are locked, meaning that they cannot be > > swapped. > > Is that really how it works on *BSD? That's great if so --- it's > exactly what Postgres wants --- but you'll pardon my paranoia about > assuming that any part of the universe works the way I want it to... Sure does. FreeBSD has a sysctl option to enable it, BSD/OS and I bet NetBSD do it by default. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
On Thu, 4 Jul 2002, Tom Lane wrote: > Curt Sampson <cjs@cynic.net> writes: > > Shared memory pages, IIRC, are locked, meaning that they cannot be > > swapped. > > Is that really how it works on *BSD? That's great if so --- it's > exactly what Postgres wants --- but you'll pardon my paranoia about > assuming that any part of the universe works the way I want it to... Well, note the IIRC. :-) But as it happens, I'm wrong. (I was probably remembering the old behaviour, from before the new VM system in NetBSD.) Now, under NetBSD, the memory segment is an anonymous memory object that is indeed pagable, so anybody allocating tons and tons of SysV shared memory to postgres on a NetBSD box is going very, very wrong indeed. FreeBSD uses pagable as well, though it appears that you can set a kern.ipc.shm_use_phys sysctl to a value other than 0 to make it allocate non-pagable memory to all future requests. (Segments already created will still be pagable.) > You don't have to think hard to see that this > would be a dead loss for Postgres' shared disk buffers. Quite so. cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're all light. --XTC ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? http://archives.postgresql.org
First off, thanks for your help, guys. I'm aware of the problems of over-allocating RAM, and I surely wouldn't want to force the buffers into swap. (thanks, Curt, for kern.ipc.shm_use_phys) On this particular system, though, it's doing nothing except PG. 384 MB of RAM, I can give PG 160 of it, which leaves me with some 170 MB of idle RAM. Here are the limits pulled from vmparam.h, by Curt's suggestion: #define MAXTSIZ (128UL*1024*1024) /* max text size */ #define DFLDSIZ (128UL*1024*1024) /* initial data size limit */ #define MAXDSIZ (512UL*1024*1024) /* max data size */ I'd read somewhere (obiously outdated) that the MAXDSIZ was 128 MB. I've since rebooted with the kernel.GENERIC... What's the sysctl setting I use to set/check the data size limits? Tom: You said that max_connections shouldn't be set as low as 5. I only intend to use 1 for the application, and I'll only need 1-2 for my own admin use, so a setting of 3 should work for my needs. Is there a technical reason it should be higher? On Fri, 5 Jul 2002, Curt Sampson wrote: > On Thu, 4 Jul 2002, Gregor Mosheh wrote: > > > Hiya. I've installed Postgres 7.2 on a dedicated FreeBSD system with 384 > > MB RAM. Because the system will be doing nothing except PG, I'd like to > > dump as much memory as possible into PG's shared memory. > > Well, see Tom's comments on this, 'cause that's what I always say > too. (I'm a NetBSD developer, but FreeBSD's internals aren't so > different, really.) > > However, if you wanted to do it both ways, and benchmark your > application, I'd be really interested in hearing about the results. > > > I rebuilt the kernel with very large limits: 330 MB on the MAXDSIZ and > > DFLDSIZ, and 330 MB for SHMMAXPAGES. This gives me: > > You don't need to rebuild for DFLDSIZ; you can always bump that > up to MAXDSIZ with sysctl (assuming you're root--do it before > you "su pgsql -c nadanada" in your startup script). And moving > MAXDSIZ to 330 MB looks like a reduction to me; FreeBSD's default in > 4.6-RELEASE is 512 MB. (NetBSD's is 1GB.) You probably want to check > /usr/include/machine/vmparam.h for default settings before changing > stuff like this, lest you accidently lower it instead. > > > I still cannot set PG's shared_buffers higher than 20000 (160 MB): > > Shared memory pages, IIRC, are locked, meaning that they cannot be > swapped. There's always a limit on the number of locked pages you > may have in the system in total, if only becuase the system has only > so much physical RAM. But for some reasont he RLIMIT_MEMLOCK parameter > on my nearest FreeBSD 4.6 system is a bit bogus: > > server2 $ ulimit -Ha | grep locked > lockedmem(kbytes) unlimited > > (I certainly cannot lock an unlimited number of pages in this 1GB > machine! And for some reason it's also saying that my RLIMIT_RSS > is unlimited; again, this should return no more than the physical > RAM available in the machine.) > > So I suspect you're running into some other, more secret, limit on > the number of pages that the system or a process may lock into RAM. > So you'll want to dig around the kernel to find out where this is, > and tweak it. > > cjs > -- > Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org > Don't you know, in this new Dark Age, we're all light. --XTC > > -- Gregor Mosheh, B.S. http://www.blackangel.net/ COMPUTER: An electronic entity which performs sequences of useful steps in a totally understandable, rigorously logical manner. If you believe this, see me about a bridge I have for sale in Manhattan.
On Fri, 5 Jul 2002, Gregor Mosheh wrote: > I'm aware of the problems of over-allocating RAM, and I surely wouldn't > want to force the buffers into swap. (thanks, Curt, for > kern.ipc.shm_use_phys) On this particular system, though, it's doing > nothing except PG. 384 MB of RAM, I can give PG 160 of it, which leaves me > with some 170 MB of idle RAM. No, that's not idle RAM; that's buffer cache. (FreeBSD, like most modern Unix operating systems, will use any spare physical memory to cache blocks read from the disk.) If you allocate 160/170, you now have pretty much maximized your chances that postgres and the operating system will be buffering the same data, and made your memory as minimally effective as possible. > I'd read somewhere (obiously outdated) that the MAXDSIZ was 128 MB. I've > since rebooted with the kernel.GENERIC... What's the sysctl setting I use > to set/check the data size limits? It's not a sysctl, because those limits are settable per-process. Use "ulimit -aS" to check the current (soft) limits, and "ulimit -aH" to check the hard limits (beyond which the soft limits may not be raised). Also see the getrlimit manpage. cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're all light. --XTC
> > kern.ipc.shm_use_phys) On this particular system, though, it's doing > > nothing except PG. 384 MB of RAM, I can give PG 160 of it, which leaves me > > with some 170 MB of idle RAM. > No, that's not idle RAM; that's buffer cache. (FreeBSD, like most > modern Unix operating systems, will use any spare physical memory Oh, of course! How silly of me to forget that! You're entirely right. > It's not a sysctl, because those limits are settable per-process. > Use "ulimit -aS" to check the current (soft) limits, and "ulimit > -aH" to check the hard limits (beyond which the soft limits may > not be raised). Also see the getrlimit manpage. I shall. Thanks a lot!
On Sun, 7 Jul 2002 13:06:00 +0900 (JST) Curt Sampson <cjs@cynic.net> wrote: > > On this particular system, though, it's doing > > nothing except PG. 384 MB of RAM, I can give PG 160 of it, which > > leaves me with some 170 MB of idle RAM. > > No, that's not idle RAM; that's buffer cache. (FreeBSD, like most > modern Unix operating systems, will use any spare physical memory > to cache blocks read from the disk.) If you allocate 160/170, you > now have pretty much maximized your chances that postgres and the > operating system will be buffering the same data, and made your > memory as minimally effective as possible. Is there not a method of reading a disk block and instructing the OS not to buffer it, or am I hallucinating? (My main RDBMS experience experience is on Solaris, and I could swear that major RDBMSs on that OS will ask the OS not to buffer disk I/O on behalf of the database) -- Tim Ellis DBA, Gamet
On Mon, 8 Jul 2002, Tim Ellis wrote: > Is there not a method of reading a disk block and instructing the OS not > to buffer it, or am I hallucinating? Yes. Read the block from a raw device. Of course, it's up to you to deal with an filesystem issues or whatever. :-) FreeBSD also has the O_DIRECT flag, but that's not guaranteed not to buffer the read. A much smarter way to do this is just to let the OS deal with the caching, and use the mmap system call to map blocks you want to modify into your processes' address space. This will also let you buffer more than 2GB of data, since the number of blocks buffered will not be limited by the maximum address space of a single process. cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're all light. --XTC
> > Is there not a method of reading a disk block and instructing the OS > > not to buffer it, or am I hallucinating? > > Yes. Read the block from a raw device. Of course, it's up to you to deal > with an filesystem issues or whatever. :-) > > FreeBSD also has the O_DIRECT flag, but that's not guaranteed not to > buffer the read. A much smarter way to do this is just to let the OS > deal with the caching, and use the mmap system call to map blocks you > want to modify into your processes' address space. This will also let > you buffer more than 2GB of data, since the number of blocks buffered > will not be limited by the maximum address space of a single process. A very diplomatic way to say: "Essentially you are hallucinating" :) I've always been a proponent of letting the OS do what it does best, so this letting-the-OS-buffer-for-you concept, to me, is very appealing. You still must know when the OS-buffered data has truly made it to disk while not needing to submit lots of sync() calls. Part of that nasty requirement that databases should guarantee data consistency and integrity, while still performing well... -- Tim Ellis DBA, Gamet
On Tue, 9 Jul 2002, Tim Ellis wrote: > I've always been a proponent of letting the OS do what it does best, so > this letting-the-OS-buffer-for-you concept, to me, is very appealing. Me too. :-) Especially since it's generally going to be doing some sort of buffering anyway, and has a whole bunch of code designed to optimzize this buffering as best as possible. > You still must know when the OS-buffered data has truly made it to disk > while not needing to submit lots of sync() calls. When you need to force blocks to disk, you just msync() them. The bigger problem, if you're using mmap, is how to make sure your modifications *don't* get written to disk until the WAL blocks have been written. There's no standard way to tell the OS that it may *not* yet write back certain blocks. So instead you'd probably have to reserve the space in a block, keep a separate list of changes to the block that any other processes must also examine, and only apply those changes to the block after the WAL entry is confirmed written to stable storage. cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're all light. --XTC
On Thu, 4 Jul 2002, Tom Lane wrote: > Curt Sampson <cjs@cynic.net> writes: > > Shared memory pages, IIRC, are locked, meaning that they cannot be > > swapped. > > Is that really how it works on *BSD? That's great if so --- it's > exactly what Postgres wants --- but you'll pardon my paranoia about > assuming that any part of the universe works the way I want it to... Well, note the IIRC. :-) But as it happens, I'm wrong. (I was probably remembering the old behaviour, from before the new VM system in NetBSD.) Now, under NetBSD, the memory segment is an anonymous memory object that is indeed pagable, so anybody allocating tons and tons of SysV shared memory to postgres on a NetBSD box is going very, very wrong indeed. FreeBSD uses pagable as well, though it appears that you can set a kern.ipc.shm_use_phys sysctl to a value other than 0 to make it allocate non-pagable memory to all future requests. (Segments already created will still be pagable.) > You don't have to think hard to see that this > would be a dead loss for Postgres' shared disk buffers. Quite so. cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're all light. --XTC ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? http://archives.postgresql.org