Re: Critical performance problems on large databases - Mailing list pgsql-general
From | Nigel J. Andrews |
---|---|
Subject | Re: Critical performance problems on large databases |
Date | |
Msg-id | Pine.LNX.4.21.0204111346170.2690-100000@ponder.fairway2k.co.uk Whole thread Raw |
In response to | Re: Critical performance problems on large databases (Bill Gribble <grib@linuxdevel.com>) |
Responses |
Re: Critical performance problems on large databases
|
List | pgsql-general |
On 11 Apr 2002, Bill Gribble wrote: > On Wed, 2002-04-10 at 17:39, Gunther Schadow wrote: > > PS: we are seriously looking into using pgsql as the core > > of a BIG medical record system, but we already know that > > if we can't get quick online responses (< 2 s) on > > large rasult sets (10000 records) at least at the first > > page (~ 100 records) we are in trouble. > > There are a few tricks to getting fast results for pages of data in > large tables. I have an application in which we have a scrolling window > displaying data from a million-row table, and I have been able to make > it fairly interactively responsive (enough that it's not a problem). > > We grab pages of a few screenfuls of data at a time using LIMIT / > OFFSET, enough to scroll smoothly over a short range. For LIMIT / > OFFSET queries to be fast, I found it was necessary to CREATE INDEX, > CLUSTER and ORDER BY the key field. I took a different approach in what I did. I considered selecting the requested page of data only. However, I considered the fact that as the query had been requested it was likely that not just this one page would be viewed by the user. On top of that I always use a chronological time order for results which I could probably do without explicitly specifying in the query if I reloaded the data in time order to sort out the few blocks or rows that are out of sequence it isn't exactly correct to rely on the 'natural' order is it. This normal usage means that I very rarely will avoid a long running query on this large table and since my web server doesn't transmit the page data until the CGI-BIN process has finished I decided that I would utilise all the required multiple page data from the long running query on it's first request by caching the entire set of pages for it. I can do this becuase of the nature of the updates to the data. Updates are in batches at known times and so the cached pages for presentation can be held and then expired at the appropiate times. This doesn't give the quick response time Gunther is looking for but I think it is an alternative for some situations. > Then the biggest slowdown is count(*), which we have to do in order to > fake up the scrollbar (so we know what proportion of the data has been > scrolled through). I have not completely foxed this yet. I want to > keep a separate mini-table of how many records are in the big table and > update it with a trigger (the table is mostly static). ATM, I just try > hard to minimize the times I call count(*). Now this idea I like. I do count some things in the big table and again despite there being an index on the column used in the GROUP BY it does a seqscan. However, I can see that adding triggers to insert etc. in a table and maintain counts is going to hit the data loading time but is going to speed the count accesses tremendously. -- Nigel J. Andrews Director --- Logictree Systems Limited Computer Consultants
pgsql-general by date: