Re: When should I worry? - Mailing list pgsql-general
From | Tom Allison |
---|---|
Subject | Re: When should I worry? |
Date | |
Msg-id | 466DFA65.8080604@tacocat.net Whole thread Raw |
In response to | Re: When should I worry? (Greg Smith <gsmith@gregsmith.com>) |
Responses |
Re: When should I worry?
|
List | pgsql-general |
Greg Smith wrote: > > On Mon, 11 Jun 2007, Steve Crawford wrote: > >> In my experience the more common situation is to "go off a cliff." > > Yeah, I think the idea that you'll notice performance degrading and be > able to extrapolate future trends using statistical techniques is a > bit...optimistic. > > Anyway, back to the original question here. If you're worried about > catching when performance starts becoming an issue, you need to do some > sort of logging of how long statements are taking to execute. The main > choice is whether to log everything, at which point the logging and > sorting through all the data generated may become its own performance > concern, or whether to just log statements that take a long time and > then count how many of them show up. Either way will give you some sort > of early warning once you get a baseline; it may take a bit of tweaking > to figure out where to draw the line at for what constitutes a "long" > statement if you only want to see how many of those you get. > > There are two tools you should look at initially to help process the > logging information you get back: pgFouine and PQA. Here are intros to > each that also mention how to configure the postgresql.conf file: > > http://pgfouine.projects.postgresql.org/tutorial.html > http://www.databasejournal.com/features/postgresql/article.php/3323561 > > As they're similar programs, which would work better for you is hard to > say; check out both and see which seems more practical or easier to get > running. For example, if you only have one of PHP/Ruby installed, that > may make one tool or the easier preferred. > > If you can get yourself to the point where you can confidently say > something like "yesterday we had 346 statements that took more then > 200ms to execute, which is 25% above this month's average", you'll be in > a positition to catch performance issues before they completely > blindside you; makes you look good in meetings, too. > Starting to sound like a sane idea. I've been running a test job for almost 24 hours and have accumulated only 8 million rows. That's another 125 days to get to the big 'B'. I think by then I'll have blown a hard drive or worse. I'm running this on some very old hardware that I have available (more of this at the bottom). However, at this point the machine is running all of the SQL at < 0.2 seconds each. Which I consider just fine for 7,599,519 rows. Here's some specifics about the tables: count() from headers: 890300 count() from tokens: 890000 count() from header_token: 7599519 CREATE TABLE header_token ( header_idx integer NOT NULL, token_idx integer NOT NULL ); CREATE TABLE headers ( idx serial NOT NULL, hash character varying(64) NOT NULL ); CREATE TABLE tokens ( idx bigserial NOT NULL, hash character varying(64) NOT NULL ); ALTER TABLE ONLY headers ADD CONSTRAINT headers_hash_key UNIQUE (hash); ALTER TABLE ONLY headers ADD CONSTRAINT headers_pkey PRIMARY KEY (idx); ALTER TABLE ONLY header_token ADD CONSTRAINT pkey_header_token PRIMARY KEY (header_idx, token_idx); ALTER TABLE ONLY tokens ADD CONSTRAINT tokens_hash_key UNIQUE (hash); ALTER TABLE ONLY tokens ADD CONSTRAINT tokens_pkey PRIMARY KEY (idx); ALTER TABLE ONLY header_token ADD CONSTRAINT header_token_header_idx_fkey FOREIGN KEY (header_idx) REFERENCES headers(idx) ON DELETE CASCADE; ALTER TABLE ONLY header_token ADD CONSTRAINT header_token_token_idx_fkey FOREIGN KEY (token_idx) REFERENCES tokens(idx) ON DELETE CASCADE; The SQL I was timing were: select t.hash, h.hash from headers h, header_token ht, tokens t where h.idx = ht.header_idx and ht.token_idx = t.idx and h.idx = ? insert into header_token select $header, idx from tokens where idx in (...) The SELECT was <0.2 The INSERT was easily <.7 (most of the time -- ranged because the idx IN (..) varied from 200 to 700. The min was <2 and the max was >1.0 from a few minutes of observation. All of this was run on a Pentium II 450 MHz with 412MB RAM and a software linear 0 pair or UDMA 66 7200RPM 8MB Cache drives (really old) on seperate IDE channels with ReiserFS disk format. The actual script was running on a seperate machine across a 100-base-T full duplex network through a firewall machine between the two subnets. I can't imagine how long it would take to run: delete from tokens; with the CASCADE option...
pgsql-general by date: