Thread: Re: Reasoning behind process instead of thread based

Re: Reasoning behind process instead of thread based

From

"Dann Corbit"

Date:

27 October 2004, 19:34:48

> -----Original Message-----
> From: pgsql-general-owner@postgresql.org
> [mailto:pgsql-general-owner@postgresql.org] On Behalf Of
> Thomas Hallgren
> Sent: Wednesday, October 27, 2004 11:16 AM
> To: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Reasoning behind process instead of
> thread based
>
>
> nd02tsk@student.hig.se wrote:
> >>Two:  If a
> >>single process in a multi-process application crashes, that process
> >>alone dies.  The buffer is flushed, and all the other child
> processes
> >>continue happily along.  In a multi-threaded environment, when one
> >>thread dies, they all die.
> >
> >
> >
> > So this means that if a single connection thread dies in MySQL, all
> > connections die?
> >
> > Seems rather serious. I am doubtful that is how they have
> implemented
> > it.
> >
> That all depends on how you define crash. If a thread causes an
> unhandled signal to be raised such as an illegal memory access or a
> floating point exception, the process will die, hence killing all
> threads. But a more advanced multi-threaded environment will install
> handlers for such signals that will handle the error gracefully. It
> might not even be necesarry to kill the offending thread.
>
> Some conditions are harder to handle than others, such as
> stack overflow
> and out of memory, but it can be done. So to state that
> multi-threaded
> environments in general kills all threads when one thread chrashes is
> not true. Having said that, I have no clue as to how advanced
> MySQL is
> in this respect.

There are clear advantages to separate process space for servers.
1.  Separate threads can stomp on each other's memory space.  (e.g.
imagine a wild, home-brew C function gone bad).
2.  Separate processes can have separate user ids, and [hence] different
rights for file access.  A threaded server will have to either be
started at the level of the highest user who will attach or will have to
impersonate the users in threads.  Impersonation is very difficult to
make portable.
3.  Separate processes die when they finish, releasing all resources to
the operating system.  Imagine a threaded server with a teeny-tiny
memory leak, that stays up 24x7.  Eventually, you will start using disk
for ram, or even use all available disk and simply crash.

Threaded servers have one main advantate:
Threads are lightweight processes and starting a new thread is faster
than starting a new executable.

The thread advantage can be partly mitigated by pre-launching a pool of
servers.

Re: Reasoning behind process instead of thread based

From

Thomas Hallgren

Date:

27 October 2004, 21:07:57

Dann,
I'm not advocating a multi-threaded PostgreSQL server (been there, done
that :-). But I still must come to the defense of multi-threaded systems
in general.

You try to convince us that a single threaded system is better because
it is more tolerant to buggy code. That argument is valid and I agree, a
multi-threaded environment is more demanding in terms of developer
skills and code quality.

But what if I don't write crappy code or if I am prepared to take the
consequences of my bugs, what then? Maybe I really know what I'm doing
and really want to get the absolute best performance out of my server.

> There are clear advantages to separate process space for servers.
> 1.  Separate threads can stomp on each other's memory space.  (e.g.
> imagine a wild, home-brew C function gone bad).

Not all servers allow home-brewed C functions. And even when they do,
not all home-brewers will write crappy code. This is only a clear
advantage when buggy code is executed.

> 2.  Separate processes can have separate user ids, and [hence] different
> rights for file access.  A threaded server will have to either be
> started at the level of the highest user who will attach or will have to
> impersonate the users in threads.  Impersonation is very difficult to
> make portable.

Yes, this is true and a valid advantage if you ever want access external
and private files. Such access is normally discouraged though, since you
are outside of the boundaries of your transaction.

> 3.  Separate processes die when they finish, releasing all resources to
> the operating system.  Imagine a threaded server with a teeny-tiny
> memory leak, that stays up 24x7.  Eventually, you will start using disk
> for ram, or even use all available disk and simply crash.
>
Sure, but a memory leak is a serious bug and most leaks will have a
negative impact on single threaded systems as well. I'm sure you will
find memory leak examples that are fatal only in a multi-threaded 24x7
environment but they are probably very few overall.

> Threaded servers have one main advantate:
> Threads are lightweight processes and starting a new thread is faster
> than starting a new executable.
>
A few more from the top of my head:
1. Threads communicate much faster than processes (applies to locking
and parallel query processing).
2. All threads in a process can share a common set of optimized query plans.
3. All threads can share lots of data cached in memory (static but
frequently accessed tables etc.).
4. In environments built using garbage collection, all threads can share
the same heap of garbage collected data.
5. A multi-threaded system can apply in-memory heuristics for self
adjusting heaps and other optimizations.
6. And lastly, my favorite; a multi-threaded system can be easily
integrated with, and make full use of, a multi-threaded virtual
execution environment such as a Java VM.
...

Regards,
Thomas Hallgren

Re: Reasoning behind process instead of thread based

From

Martijn van Oosterhout

Date:

27 October 2004, 21:36:30

On Wed, Oct 27, 2004 at 10:07:48PM +0200, Thomas Hallgren wrote:
> >Threaded servers have one main advantate:
> >Threads are lightweight processes and starting a new thread is faster
> >than starting a new executable.
> >
> A few more from the top of my head:

A lot of these advantages are due to sharing an address space, right?
Well, the processes in PostgreSQL share address space, just not *all*
of it. They communicate via this shared memory.

> 1. Threads communicate much faster than processes (applies to locking
> and parallel query processing).
> 2. All threads in a process can share a common set of optimized query plans.

PostgreSQL could do this too, but I don't think anyone's looked into
sharing query plans, probably quite difficult.

> 3. All threads can share lots of data cached in memory (static but
> frequently accessed tables etc.).

Table data is already shared. If two backends are manipulating the same
table, they can lock directly via shared memory rather than some OS
primitive.

> 4. In environments built using garbage collection, all threads can share
> the same heap of garbage collected data.
> 5. A multi-threaded system can apply in-memory heuristics for self
> adjusting heaps and other optimizations.
> 6. And lastly, my favorite; a multi-threaded system can be easily
> integrated with, and make full use of, a multi-threaded virtual
> execution environment such as a Java VM.

I can't really comment on these.

I think PostgreSQL has nicely combined the benefits of shared memory
with the robustness of multiple processes...
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment

msg-32091-86846.dat

Re: Reasoning behind process instead of thread based

From

Thomas Hallgren

Date:

27 October 2004, 23:13:57

Martijn van Oosterhout wrote:
> A lot of these advantages are due to sharing an address space, right?
> Well, the processes in PostgreSQL share address space, just not *all*
> of it. They communicate via this shared memory.
>
Whitch is a different beast altogether. The inter-process mutex handling
that you need to synchronize shared memory access is much more expensive
than the mechanisms used to synchronize threads.

>>2. All threads in a process can share a common set of optimized query plans.
>
>
> PostgreSQL could do this too, but I don't think anyone's looked into
> sharing query plans, probably quite difficult.
>
Perhaps. It depends on the design. If the plans are immutable once
generated, it should not be that difficult. But managing the mutable
area where the plans are cached again calls for expensive inter-process
synchronization.

> Table data is already shared. If two backends are manipulating the same
> table, they can lock directly via shared memory rather than some OS
> primitive.
>
Sure, some functionality can be achieved using shared memory. But it
consumes more resources and the mutexes are a lot slower.

> I think PostgreSQL has nicely combined the benefits of shared memory
> with the robustness of multiple processes...

So do I. I've learned to really like PostgreSQL and the way it's built,
and as I said in my previous mail, I'm not advocating a switch. I just
react to the unfair bashing of multi-threaded systems.

Regards,
Thomas Hallgren

Re: Reasoning behind process instead of thread based

From

Martijn van Oosterhout

Date:

28 October 2004, 10:58:36

On Thu, Oct 28, 2004 at 12:13:41AM +0200, Thomas Hallgren wrote:
> Martijn van Oosterhout wrote:
> >A lot of these advantages are due to sharing an address space, right?
> >Well, the processes in PostgreSQL share address space, just not *all*
> >of it. They communicate via this shared memory.
> >
> Whitch is a different beast altogether. The inter-process mutex handling
> that you need to synchronize shared memory access is much more expensive
> than the mechanisms used to synchronize threads.

Now you've piqued my curiosity. You have two threads of control (either
two processes or two threads) which shared a peice of memory. How can
the threads syncronise easier than processes, what other feature is
there? AFAIK the futexes used by Linux threads is just as applicable
and fast between two processes as two threads. All that is required is
some shared memory.

Or are you suggesting the only difference is in switching time (which
is not that significant).

Also, I admit that on some operating systems, threads are much faster
than processes, but I'm talking specifically about linux here.

Thanks in advance,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment

msg-32091-86885.dat

Re: Reasoning behind process instead of thread based

From

Thomas Hallgren

Date:

28 October 2004, 12:02:50

Martijn van Oosterhout wrote:

>Now you've piqued my curiosity. You have two threads of control (either
>two processes or two threads) which shared a peice of memory. How can
>the threads syncronise easier than processes, what other feature is
>there? AFAIK the futexes used by Linux threads is just as applicable
>and fast between two processes as two threads. All that is required is
>some shared memory.
>
>
Agree. On Linux, this is not a big issue. Linux is rather special
though, since the whole kernel is built in a way that more or less puts
an equal sign between a process and a thread. This is changing though.
Don't know what relevance that will have on this issue.

Shared Memory and multiple processes have other negative impacts on
performance since you force the CPU to jump between different memory
spaces. Switching between those address spaces will decrease the CPU
cache hits. You might think this is esoteric and irrelevant, but the
fact is, cache misses are extremely expensive and the problem is
increasing. While CPU speed has increased 152 times or so since the
80's, the speed on memory has only quadrupled.

>Or are you suggesting the only difference is in switching time (which
>is not that significant).
>
>
"not that significant" all depends on how often you need to switch. On
most OS'es, a process switch is significantly slower than switching
between threads (again, Linux may be an exception to the rule).

Regards,
Thomas Hallgren