Home > mailing lists

Re: general PG network slowness (possible cure) (repost) - Mailing list pgsql-performance

From	Peter T. Breuer
Subject	Re: general PG network slowness (possible cure) (repost)
Date	May 25, 2007 10:44:50
Msg-id	200705251344.l4PDibQ00402@inv.it.uc3m.es Whole thread Raw
In response to	Re: general PG network slowness (possible cure) (repost) (Richard Huxton <dev@archonet.com>)
Responses	Re: general PG network slowness (possible cure) (repost)
List	pgsql-performance

Tree view

"Also sprach Richard Huxton:"
[Charset ISO-8859-1 unsupported, filtering to ASCII...]
> Peter T. Breuer wrote:
> > I set up pg to replace a plain gdbm database for my application.  But
> > even running to the same machine, via a unix socket
> >
> >    * the pg database ran 100 times slower
>
> For what operations? Bulk reads? 19-way joins?

The only operations being done are simple "find the row with this key",
or "update the row with this key". That's all. The queries are not an
issue (though why the PG thread choose to max out cpu when it gets the
chance to do so through a unix socket, I don't know).

> > Across the net it was
> >
> >   * about 500 to 1000 times slower than local gdbm
> >
> > with no cpu use to speak of.
>
> Disk-intensive or memory intensive?

There is no disk as such...  it's running on a ramdisk at the server
end.  But assuming you mean i/o, i/o was completely stalled.  Everything
was idle, all waiting on the net.

> > On a whim I mapped the network bandwidth per packet size with the NPtcp
> > suite, and got surprising answers ..  at 1500B, naturally, the bandwidth
> > was the full 10Mb/s (minus overheads, say 8.5Mb/s) of my pathetic little
> > local net.  At 100B the bandwidth available was only 25Kb/s.  At 10B,
> > you might as well use tin cans and taut string instead.
>
> This sounds like you're testing a single connection. You would expect
> "dead time" to dominate in that scenario. What happens when you have 50

Indeed, it is single, because that's my application. I don't have
50 simultaneous connections.  The use of the database is as a permanent
storage area for the results of previous analyses (static analysis of
the linux kernel codes) from a single client.

Multiple threads accessing at the same time might help keep the network
drivers busier, which would help. They would always see their buffers
filling at an even rate and be able to send out groups of packets at
once.

> simultaneous connections? Or do you think it's just packet overhead?

It's not quite overhead in the sense of the logical layer. It's a
physical layer thing.  I replied in another mail on this thread, but in
summary, tcp behaves badly with small packets on ethernet, even on a
dedicated line (as this was). One needs to keep it on a tight rein.

> > I also mapped the network flows using ntop, and yes, the average packet
> > size for both gdbm and pg in one direction was only about 100B or
> > so.  That's it!  Clearly there are a lot of short queries going out and
> > the answers were none too big either ( I had a LIMIT 1 in all my PG
> > queries).
>
> I'm not sure that 100B query-results are usually the bottleneck.
> Why would you have LIMIT 1 on all your queries?

Because there is always only one answer to the query, according to the
logic.  So I can always tell the database manager to stop looking after
one, which will always help it.

> > About 75% of traffic was in the 64-128B range while my application was
> > running, with the peak bandwidth in that range being about 75-125Kb/s
> > (and I do mean bits, not bytes).
>
> None of this sounds like typical database traffic to me. Yes, there are
> lots of small result-sets, but there are also typically larger (several
> kilobytes) to much larger (10s-100s KB).

There's none here.

> > Soooo ... I took a look at my implementation of remote gdbm, and did
> > a very little work to aggregate outgoing transmissions together into
> > lumps.  Three lines added in two places.  At the level of the protocol
> > where I could tell how long the immediate conversation segment would be,
> > I "corked" the tcp socket before starting the segment and "uncorked" it
> > after the segment (for "cork", see tcp(7), setsockopt(2) and TCP_CORK in
> > linux).
>
> I'm a bit puzzled, because I'd have thought the standard Nagle algorithm
> would manage this gracefully enough for short-query cases. There's no

On the contrary, Nagle is also often wrong here because it will delay
sending in order to accumulate more data into buffers when only a little
has arrived, then give up when no more data arrives to be sent out, then
send out the (short) packet anyway, late. There's no other traffic
apart from my (single thread) application.

What we want is to direct the sending exactly,n this situation saying
when to not send, and when to send.  Disable Nagle for a start, use
async read (noblock), and sync write, with sends from the socket blocked
from initiation of a message until the whole message is ready to be sent
out.  Sending the message piecemeal just hurts too.

> way (that I know of) for a backend to handle more than one query at a time.

That's not the scenario.

> > Surprise, ...  I got a speed up of hundreds of times.  The same application
> > that crawled under my original rgdbm implementation and under PG now
> > maxed out the network bandwidth at close to a full 10Mb/s and 1200
> > pkts/s, at 10% CPU on my 700MHz client, and a bit less on the 1GHz
> > server.
> >
> > So
> >
> >   * Is that what is holding up postgres over the net too?  Lots of tiny
> >     packets?
>
> I'm not sure your setup is typical, interesting though the figures are.
> Google a bit for pg_bench perhaps and see if you can reproduce the
> effect with a more typical load. I'd be interested in being proved wrong.

But the load is typical HERE. The application works well against gdbm
and I was hoping to see speedup from using a _real_ full-fledged DB
instead.

Well, at least it's very helpful for debugging.

> > And if so
> >
> >   * can one fix it the way I fixed it for remote gdbm?
> >
> > The speedup was hundreds of times. Can someone point me at the relevant
> > bits of pg code? A quick look seems to say that fe-*.c is
> > interesting. I need to find where the actual read and write on the
> > conn->sock is done.
>
> You'll want to look in backend/libpq and interfaces/libpq I think
> (although I'm not a developer).

I'll look around there. Specific directions are greatly
appreciated.

Thanks.

Peter

pgsql-performance by date:

From: "Peter T. Breuer"
Date: 25 May 2007, 10:23:28
Subject: Re: general PG network slowness (possible cure) (repost)

From: Richard Huxton
Date: 25 May 2007, 10:52:43
Subject: Re: general PG network slowness (possible cure) (repost)

Re: general PG network slowness (possible cure) (repost) - Mailing list pgsql-performance

Previous

Next