Re: [PATCH] Use MAP_HUGETLB where supported (v3) - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: [PATCH] Use MAP_HUGETLB where supported (v3) |
Date | |
Msg-id | 52861EEC.2090702@vmware.com Whole thread Raw |
In response to | Re: [PATCH] Use MAP_HUGETLB where supported (v3) (Andres Freund <andres@2ndquadrant.com>) |
Responses |
Re: [PATCH] Use MAP_HUGETLB where supported (v3)
Re: [PATCH] Use MAP_HUGETLB where supported (v3) Re: [PATCH] Use MAP_HUGETLB where supported (v3) Re: [PATCH] Use MAP_HUGETLB where supported (v3) |
List | pgsql-hackers |
On 30.10.2013 19:11, Andres Freund wrote: > On 2013-10-30 22:39:20 +0530, Abhijit Menon-Sen wrote: >> At 2013-10-30 11:04:36 -0400, tgl@sss.pgh.pa.us wrote: >>> >>>> As a compromise, perhaps we can unconditionally round the size up to be >>>> a multiple of 2MB? […] >>> >>> That sounds reasonably painless to me. >> >> Here's a patch that does that and adds a DEBUG1 log message when we try >> with MAP_HUGETLB and fail and fallback to ordinary mmap. > > But it's in no way guaranteed that the smallest hugepage size is > 2MB. It'll be on current x86 hardware, but not on any other platform... Sure, but there's no big harm done. We're just trying to avoid hitting a kernel bug, and as a bonus, we avoid wasting some memory that would otherwise be lost due to the kernel rounding the allocation. If the smallest hugepage size is smaller than 2MB, we round up the allocation unnecessarily, but that doesn't seem serious. I spent some time whacking this around, new patch version attached. I moved the mmap() code into a new function, that leaves the PGSharedMemoryCreate more readable. I modified the patch so that it throws an error if you set huge_tlb_pages=on, and the platform doesn't support MAP_HUGETLB (ie. non-Linux, or EXEC_BACKEND). 'try' is the default, so this only affects you if you explicitly set it to 'on'. I think that's the right behavior; if you explicitly ask for it, and you don't get it, that should be an error. But I'm not wedded to the idea if someone objects; a log message might also be reasonable: "LOG: huge TLB pages are not supported on this platform, but huge_tlb_pages was 'on'" The error message on failed allocation, if huge_tlb_pages=on, needs updating: $ bin/postmaster -D data FATAL: could not map anonymous shared memory: Cannot allocate memory HINT: This error usually means that PostgreSQL's request for a shared memory segment exceeded available memory or swap space. To reduce the request size (currently 189390848 bytes), reduce PostgreSQL's shared memory usage, perhaps by reducing shared_buffers or max_connections. The reason the allocation failed in this case was that I used huge_tlb_pages=on, but had not configured the kernel for huge pages. The hint is quite misleading in that case, it should advise to configure the kernel, or turn off huge_tlb_pages. The documentation needs some work. I think it's pretty user-unfriendly to link to https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt. It gives a lot of details, and although it explains stuff that is relevant, like setting the nr_hugepages sysctl, it also contains a lot of stuff that is not relevant to us, like how to mount hugetlbfs. Can we do better than that? Is there a better guide somewhere on how to set the kernel settings. If not, we should include step-by-step instructions in our manual. The "Managing Kernel Resources" section in the user manual should also be updated to mention how to enable huge pages. Also, now that I changed huge_tlb_pages='on' to fail on platforms where it's not supported at all, the docs need to be updated to reflect it. - Heikki
Attachment
pgsql-hackers by date: