Re: Scalability in postgres - Mailing list pgsql-performance
From | Mark Mielke |
---|---|
Subject | Re: Scalability in postgres |
Date | |
Msg-id | 4A2892DD.3090809@mark.mielke.cc Whole thread Raw |
In response to | Re: Scalability in postgres (david@lang.hm) |
Responses |
Re: Scalability in postgres
|
List | pgsql-performance |
david@lang.hm wrote: > On Thu, 4 Jun 2009, Mark Mielke wrote: >> You should really only have as 1X or 2X many threads as there are >> CPUs waiting on one monitor. Beyond that is waste. The idle threads >> can be pooled away, and only activated (with individual monitors >> which can be far more easily and effectively optimized) when the >> other threads become busy. > sometimes the decrease in complexity in the client makes it worthwhile > to 'brute force' things. > this actually works well for the vast majority of services (including > many databases) > the question is how much complexity (if any) it adds to postgres to > handle this condition better, and what those changes are. Sure. Locks that are not generally contended, for example, don't deserve the extra complexity. Locks that have any expected frequency of a "context storm" though, probably make good candidates. >> An alternative approach might be: 1) Idle processes not currently >> running a transaction do not need to be consulted for their snapshot >> (and other related expenses) - if they are idle for a period of time, >> they "unregister" from the actively used processes list - if they >> become active again, they "register" in the actively used process list, > how expensive is this register/unregister process? if it's cheap > enough do it all the time and avoid the complexity of having another > config option to tweak. Not really relevant if you look at the "idle for a period of time". An active process would not unregister/register. An inactive process, though, after it is not in a commit, and after it hits some time that is many times more than the cost of unregister + register, would free up other processes from having to take this process into account, allowing for better scaling. For example, let's say it doesn't unregister itself for 5 seconds. >> and 2) Processes could be reusable across different connections - >> they could stick around for a period after disconnect, and make >> themselves available again to serve the next connection. > depending on what criteria you have for the re-use, this could be a > significant win (if you manage to re-use the per process cache much. > but this is far more complex. Does it need to be? From a naive perspective - what's the benefit of a PostgreSQL process dying, and a new connection getting a new PostgreSQL process? I suppose bugs in PostgreSQL don't have the opportunity to affect later connections, but overall, this seems like an unnecessary cost. I was thinking of either: 1) The Apache model, where a PostreSQL process waits on accept(), or 2) When the PostgreSQL process is done, it does connection cleanup and then it waits for a file descriptor to be transferred to it through IPC and just starts over using it. Too hand wavy? :-) >> Still heavy-weight in terms of memory utilization, but cheap in terms >> of other impacts. Without the cost of connection "pooling" in the >> sense of requests always being indirect through a proxy of some sort. > it would seem to me that the cost of making the extra hop through the > external pooler would be significantly more than the overhead of idle > processes marking themselvs as such so that they don't get consulted > for MVCC decisions They're separate ideas to be considered separately on the complexity vs benefit merit. For the first - I think we already have an "external pooler", in the sense of the master process which forks to manage a connection, so it already involves a possible context switch to transfer control. I guess the question is whether or not we can do better than fork(). In multi-threaded programs, it's definitely possible to outdo fork using thread pools. Does the same remain true of a multi-process program that communicates using IPC? I'm not completely sure, although I believe Apache does achieve this by having the working processes do accept() rather than some master process that spawns off new processes on each connection. Apache re-uses the process. Cheers, mark -- Mark Mielke <mark@mielke.cc>
pgsql-performance by date: