Home > mailing lists

Re: When should I worry? - Mailing list pgsql-general

From	Tom Allison
Subject	Re: When should I worry?
Date	June 11, 2007 22:44:14
Msg-id	466DFA65.8080604@tacocat.net Whole thread Raw
In response to	Re: When should I worry? (Greg Smith <gsmith@gregsmith.com>)
Responses	Re: When should I worry?
List	pgsql-general

Tree view

Greg Smith wrote:
>
> On Mon, 11 Jun 2007, Steve Crawford wrote:
>
>> In my experience the more common situation is to "go off a cliff."
>
> Yeah, I think the idea that you'll notice performance degrading and be
> able to extrapolate future trends using statistical techniques is a
> bit...optimistic.
>
> Anyway, back to the original question here.  If you're worried about
> catching when performance starts becoming an issue, you need to do some
> sort of logging of how long statements are taking to execute.  The main
> choice is whether to log everything, at which point the logging and
> sorting through all the data generated may become its own performance
> concern, or whether to just log statements that take a long time and
> then count how many of them show up.  Either way will give you some sort
> of early warning once you get a baseline; it may take a bit of tweaking
> to figure out where to draw the line at for what constitutes a "long"
> statement if you only want to see how many of those you get.
>
> There are two tools you should look at initially to help process the
> logging information you get back:  pgFouine and PQA.  Here are intros to
> each that also mention how to configure the postgresql.conf file:
>
> http://pgfouine.projects.postgresql.org/tutorial.html
> http://www.databasejournal.com/features/postgresql/article.php/3323561
>
> As they're similar programs, which would work better for you is hard to
> say; check out both and see which seems more practical or easier to get
> running.  For example, if you only have one of PHP/Ruby installed, that
> may make one tool or the easier preferred.
>
> If you can get yourself to the point where you can confidently say
> something like "yesterday we had 346 statements that took more then
> 200ms to execute, which is 25% above this month's average", you'll be in
> a positition to catch performance issues before they completely
> blindside you; makes you look good in meetings, too.
>

Starting to sound like a sane idea.
I've been running a test job for almost 24 hours and have accumulated only 8
million rows.  That's another 125 days to get to the big 'B'.  I think by then
I'll have blown a hard drive or worse.  I'm running this on some very old
hardware that I have available (more of this at the bottom).

However, at this point the machine is running all of the SQL at < 0.2 seconds
each.  Which I consider just fine for 7,599,519 rows.

Here's some specifics about the tables:
count() from headers: 890300
count() from tokens:  890000
count() from header_token: 7599519


CREATE TABLE header_token (
     header_idx integer NOT NULL,
     token_idx integer NOT NULL
);

CREATE TABLE headers (
     idx serial NOT NULL,
     hash character varying(64) NOT NULL
);

CREATE TABLE tokens (
     idx bigserial NOT NULL,
     hash character varying(64) NOT NULL
);

ALTER TABLE ONLY headers
     ADD CONSTRAINT headers_hash_key UNIQUE (hash);
ALTER TABLE ONLY headers
     ADD CONSTRAINT headers_pkey PRIMARY KEY (idx);
ALTER TABLE ONLY header_token
     ADD CONSTRAINT pkey_header_token PRIMARY KEY (header_idx, token_idx);
ALTER TABLE ONLY tokens
     ADD CONSTRAINT tokens_hash_key UNIQUE (hash);
ALTER TABLE ONLY tokens
     ADD CONSTRAINT tokens_pkey PRIMARY KEY (idx);
ALTER TABLE ONLY header_token
     ADD CONSTRAINT header_token_header_idx_fkey FOREIGN KEY (header_idx)
REFERENCES headers(idx) ON DELETE CASCADE;
ALTER TABLE ONLY header_token
     ADD CONSTRAINT header_token_token_idx_fkey FOREIGN KEY (token_idx)
REFERENCES tokens(idx) ON DELETE CASCADE;



The SQL I was timing were:
select t.hash, h.hash
from headers h, header_token ht, tokens t
where h.idx = ht.header_idx
and ht.token_idx = t.idx
and h.idx = ?


insert into header_token
select $header, idx from tokens where idx in (...)

The SELECT was <0.2
The INSERT was easily <.7 (most of the time -- ranged because the idx IN (..)
varied from 200 to 700.  The min was <2 and the max was >1.0 from a few minutes
of observation.


All of this was run on a Pentium II 450 MHz with 412MB RAM and a software linear
0 pair or UDMA 66 7200RPM 8MB Cache drives (really old) on seperate IDE channels
with ReiserFS disk format.  The actual script was running on a seperate machine
across a 100-base-T full duplex network through a firewall machine between the
two subnets.

I can't imagine how long it would take to run:
delete from tokens;
with the CASCADE option...

pgsql-general by date:

From: Angelo
Date: 11 June 2007, 19:59:14
Subject: Re: Schema as versioning strategy

From: Greg Smith
Date: 12 June 2007, 01:01:24
Subject: Re: When should I worry?

Re: When should I worry? - Mailing list pgsql-general

Previous

Next