Thread: Re: Reasoning behind process instead of thread based
> -----Original Message----- > From: pgsql-general-owner@postgresql.org > [mailto:pgsql-general-owner@postgresql.org] On Behalf Of > Thomas Hallgren > Sent: Wednesday, October 27, 2004 11:16 AM > To: pgsql-general@postgresql.org > Subject: Re: [GENERAL] Reasoning behind process instead of > thread based > > > nd02tsk@student.hig.se wrote: > >>Two: If a > >>single process in a multi-process application crashes, that process > >>alone dies. The buffer is flushed, and all the other child > processes > >>continue happily along. In a multi-threaded environment, when one > >>thread dies, they all die. > > > > > > > > So this means that if a single connection thread dies in MySQL, all > > connections die? > > > > Seems rather serious. I am doubtful that is how they have > implemented > > it. > > > That all depends on how you define crash. If a thread causes an > unhandled signal to be raised such as an illegal memory access or a > floating point exception, the process will die, hence killing all > threads. But a more advanced multi-threaded environment will install > handlers for such signals that will handle the error gracefully. It > might not even be necesarry to kill the offending thread. > > Some conditions are harder to handle than others, such as > stack overflow > and out of memory, but it can be done. So to state that > multi-threaded > environments in general kills all threads when one thread chrashes is > not true. Having said that, I have no clue as to how advanced > MySQL is > in this respect. There are clear advantages to separate process space for servers. 1. Separate threads can stomp on each other's memory space. (e.g. imagine a wild, home-brew C function gone bad). 2. Separate processes can have separate user ids, and [hence] different rights for file access. A threaded server will have to either be started at the level of the highest user who will attach or will have to impersonate the users in threads. Impersonation is very difficult to make portable. 3. Separate processes die when they finish, releasing all resources to the operating system. Imagine a threaded server with a teeny-tiny memory leak, that stays up 24x7. Eventually, you will start using disk for ram, or even use all available disk and simply crash. Threaded servers have one main advantate: Threads are lightweight processes and starting a new thread is faster than starting a new executable. The thread advantage can be partly mitigated by pre-launching a pool of servers.
Dann, I'm not advocating a multi-threaded PostgreSQL server (been there, done that :-). But I still must come to the defense of multi-threaded systems in general. You try to convince us that a single threaded system is better because it is more tolerant to buggy code. That argument is valid and I agree, a multi-threaded environment is more demanding in terms of developer skills and code quality. But what if I don't write crappy code or if I am prepared to take the consequences of my bugs, what then? Maybe I really know what I'm doing and really want to get the absolute best performance out of my server. > There are clear advantages to separate process space for servers. > 1. Separate threads can stomp on each other's memory space. (e.g. > imagine a wild, home-brew C function gone bad). Not all servers allow home-brewed C functions. And even when they do, not all home-brewers will write crappy code. This is only a clear advantage when buggy code is executed. > 2. Separate processes can have separate user ids, and [hence] different > rights for file access. A threaded server will have to either be > started at the level of the highest user who will attach or will have to > impersonate the users in threads. Impersonation is very difficult to > make portable. Yes, this is true and a valid advantage if you ever want access external and private files. Such access is normally discouraged though, since you are outside of the boundaries of your transaction. > 3. Separate processes die when they finish, releasing all resources to > the operating system. Imagine a threaded server with a teeny-tiny > memory leak, that stays up 24x7. Eventually, you will start using disk > for ram, or even use all available disk and simply crash. > Sure, but a memory leak is a serious bug and most leaks will have a negative impact on single threaded systems as well. I'm sure you will find memory leak examples that are fatal only in a multi-threaded 24x7 environment but they are probably very few overall. > Threaded servers have one main advantate: > Threads are lightweight processes and starting a new thread is faster > than starting a new executable. > A few more from the top of my head: 1. Threads communicate much faster than processes (applies to locking and parallel query processing). 2. All threads in a process can share a common set of optimized query plans. 3. All threads can share lots of data cached in memory (static but frequently accessed tables etc.). 4. In environments built using garbage collection, all threads can share the same heap of garbage collected data. 5. A multi-threaded system can apply in-memory heuristics for self adjusting heaps and other optimizations. 6. And lastly, my favorite; a multi-threaded system can be easily integrated with, and make full use of, a multi-threaded virtual execution environment such as a Java VM. ... Regards, Thomas Hallgren
On Wed, Oct 27, 2004 at 10:07:48PM +0200, Thomas Hallgren wrote: > >Threaded servers have one main advantate: > >Threads are lightweight processes and starting a new thread is faster > >than starting a new executable. > > > A few more from the top of my head: A lot of these advantages are due to sharing an address space, right? Well, the processes in PostgreSQL share address space, just not *all* of it. They communicate via this shared memory. > 1. Threads communicate much faster than processes (applies to locking > and parallel query processing). > 2. All threads in a process can share a common set of optimized query plans. PostgreSQL could do this too, but I don't think anyone's looked into sharing query plans, probably quite difficult. > 3. All threads can share lots of data cached in memory (static but > frequently accessed tables etc.). Table data is already shared. If two backends are manipulating the same table, they can lock directly via shared memory rather than some OS primitive. > 4. In environments built using garbage collection, all threads can share > the same heap of garbage collected data. > 5. A multi-threaded system can apply in-memory heuristics for self > adjusting heaps and other optimizations. > 6. And lastly, my favorite; a multi-threaded system can be easily > integrated with, and make full use of, a multi-threaded virtual > execution environment such as a Java VM. I can't really comment on these. I think PostgreSQL has nicely combined the benefits of shared memory with the robustness of multiple processes... -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
Martijn van Oosterhout wrote: > A lot of these advantages are due to sharing an address space, right? > Well, the processes in PostgreSQL share address space, just not *all* > of it. They communicate via this shared memory. > Whitch is a different beast altogether. The inter-process mutex handling that you need to synchronize shared memory access is much more expensive than the mechanisms used to synchronize threads. >>2. All threads in a process can share a common set of optimized query plans. > > > PostgreSQL could do this too, but I don't think anyone's looked into > sharing query plans, probably quite difficult. > Perhaps. It depends on the design. If the plans are immutable once generated, it should not be that difficult. But managing the mutable area where the plans are cached again calls for expensive inter-process synchronization. > Table data is already shared. If two backends are manipulating the same > table, they can lock directly via shared memory rather than some OS > primitive. > Sure, some functionality can be achieved using shared memory. But it consumes more resources and the mutexes are a lot slower. > I think PostgreSQL has nicely combined the benefits of shared memory > with the robustness of multiple processes... So do I. I've learned to really like PostgreSQL and the way it's built, and as I said in my previous mail, I'm not advocating a switch. I just react to the unfair bashing of multi-threaded systems. Regards, Thomas Hallgren
On Thu, Oct 28, 2004 at 12:13:41AM +0200, Thomas Hallgren wrote: > Martijn van Oosterhout wrote: > >A lot of these advantages are due to sharing an address space, right? > >Well, the processes in PostgreSQL share address space, just not *all* > >of it. They communicate via this shared memory. > > > Whitch is a different beast altogether. The inter-process mutex handling > that you need to synchronize shared memory access is much more expensive > than the mechanisms used to synchronize threads. Now you've piqued my curiosity. You have two threads of control (either two processes or two threads) which shared a peice of memory. How can the threads syncronise easier than processes, what other feature is there? AFAIK the futexes used by Linux threads is just as applicable and fast between two processes as two threads. All that is required is some shared memory. Or are you suggesting the only difference is in switching time (which is not that significant). Also, I admit that on some operating systems, threads are much faster than processes, but I'm talking specifically about linux here. Thanks in advance, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
Martijn van Oosterhout wrote: >Now you've piqued my curiosity. You have two threads of control (either >two processes or two threads) which shared a peice of memory. How can >the threads syncronise easier than processes, what other feature is >there? AFAIK the futexes used by Linux threads is just as applicable >and fast between two processes as two threads. All that is required is >some shared memory. > > Agree. On Linux, this is not a big issue. Linux is rather special though, since the whole kernel is built in a way that more or less puts an equal sign between a process and a thread. This is changing though. Don't know what relevance that will have on this issue. Shared Memory and multiple processes have other negative impacts on performance since you force the CPU to jump between different memory spaces. Switching between those address spaces will decrease the CPU cache hits. You might think this is esoteric and irrelevant, but the fact is, cache misses are extremely expensive and the problem is increasing. While CPU speed has increased 152 times or so since the 80's, the speed on memory has only quadrupled. >Or are you suggesting the only difference is in switching time (which >is not that significant). > > "not that significant" all depends on how often you need to switch. On most OS'es, a process switch is significantly slower than switching between threads (again, Linux may be an exception to the rule). Regards, Thomas Hallgren