Re: postgresql meltdown on PlanetMath.org - Mailing list pgsql-performance
From | Logan Bowers |
---|---|
Subject | Re: postgresql meltdown on PlanetMath.org |
Date | |
Msg-id | Pine.LNX.4.53.0303171845190.8636@neo.magick.org Whole thread Raw |
In response to | Re: postgresql meltdown on PlanetMath.org (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: postgresql meltdown on PlanetMath.org
Re: postgresql meltdown on PlanetMath.org Re: postgresql meltdown on PlanetMath.org |
List | pgsql-performance |
All right, I performed a VACUUM FULL last night and after about 3 hours I tried running a select count(*) FROM tblwordidx and that did help things considerably (it runs in ~20 seconds instead of 1-2 minutes). Not as good as originally, but close. But, here's the breakdown of the db: I'm using the database as a keyword based file search engine (not the most efficient method, I know, but it works well for my purposes). The biggest and most relevant tables are a table of files and of words. The basic operation that each file has a set of keywords associated with it, I do a whole word search on tblwordidx and join with tblfiles (I know, the naming scheme sucks, sorry!). Three times a day I scan the network and update the database. I insert about 180,000 rows into a temporary table and then use it to update temporary table (tbltmp). With the aid of a few other tables, I clean up tblFiles so that existing rows have an updated timestamp in tblseen and files with a timestamp older than 1 day are removed. Then, I take the new rows in tblfiles and use a perl script to add more words to tblwordidx. After each update a do a VACUUM and VACUUM ANALYZE which usually grinds for 10 to 15 minutes. I'm running this db on a celeron 450Mhz with 256MB RAM and a 60GB HDD (7200 rpm). For the most part I have the db running "well enough." Over time however, I find that performance degrades, the count(*) above is an example of a command that does worse over time. It gets run once an hour for stats collection. When I first migrated the db to v7.3.1 it would take about 5-10 seconds (which it is close to now after a VACUUM FULL) but after a few weeks it would take over a minute of really intense HDD activity. Also of note is that when I first loaded the data it would cache very well with the query taking maybe taking 15 seconds if I had just started the db after reboot, but when it was in its "slow" state repeating the query didn't noticably use the disk less (nor did it take less time). I've attached a VACUUM VERBOSE and my conf file (which is pretty vanilla, I haven't tweaked it since updating). If you have any suggestions on how I can correct this situation through config changes that would be ideal and thanks for your help, if is just a case of doing lots of VACUUM FULLs, I can definitely see it as a performance bottleneck for postgres. Fortunately I can afford the huge peroformance penalty of a VACUUM FULL, but I can certainly think of apps that can't. Logan Bowers \d tblfiles: (219,248 rows) Column | Type | Modifiers ----------+-----------------------------+------------------------------------------- fid | integer | not null default nextval('fileids'::text) hid | integer | not null pid | integer | not null name | character varying(256) | not null size | bigint | not null Indexes: temp_fid_key unique btree (fid), filediridx btree (hid, pid, name, size, fid), fileidx btree (name, hid, pid, fid), fileidxfid btree (fid, name, pid) \d tblwordidx: (1,739,481 rows) Table "public.tblwordidx" Column | Type | Modifiers --------+------------------------+----------- fid | integer | not null word | character varying(128) | not null todel | boolean | Indexes: wordidxfid btree (fid, word), wordidxfidonly btree (fid), wordidxw btree (word, fid) On Mon, 17 Mar 2003, Tom Lane wrote: > "Neil Conway" <neilc@samurai.com> writes: > > Sean Chittenden said: > > A "select count(*) FROM tblwordidx" initially took about 1 second to > > return a count of 2 million but after a few months it took several > > minutes of really hard HDD grinding. > >> > >> That's because there are dead entries in the index that weren't being > >> reused or cleaned up. As I said, this has been fixed. > > > That's doubtful: "select count(*) FROM foo" won't use an index. > > To know what's going on, as opposed to guessing about it, we'd need to > know something about the physical sizes of the table and its indexes. > "vacuum verbose" output would be instructive... > > But my best theorizing-in-advance-of-the-data guess is that Logan's > FSM settings are too small, causing free space to be leaked over time. > If a vacuum full restores the original performance then that's probably > the right answer. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org >
Attachment
pgsql-performance by date: