Home > mailing lists

Re: Critical performance problems on large databases - Mailing list pgsql-general

From	Nigel J. Andrews
Subject	Re: Critical performance problems on large databases
Date	April 11, 2002 09:06:03
Msg-id	Pine.LNX.4.21.0204111346170.2690-100000@ponder.fairway2k.co.uk Whole thread Raw
In response to	Re: Critical performance problems on large databases (Bill Gribble <grib@linuxdevel.com>)
Responses	Re: Critical performance problems on large databases
List	pgsql-general

Tree view

On 11 Apr 2002, Bill Gribble wrote:

> On Wed, 2002-04-10 at 17:39, Gunther Schadow wrote:
> > PS: we are seriously looking into using pgsql as the core
> > of a BIG medical record system, but we already know that
> > if we can't get quick online responses (< 2 s) on
> > large rasult sets (10000 records)  at least at the first
> > page (~ 100 records) we are in trouble.
>
> There are a few tricks to getting fast results for pages of data in
> large tables.  I have an application in which we have a scrolling window
> displaying data from a million-row table, and I have been able to make
> it fairly interactively responsive (enough that it's not a problem).
>
> We grab pages of a few screenfuls of data at a time using LIMIT /
> OFFSET, enough to scroll smoothly over a short range.  For LIMIT /
> OFFSET queries to be fast, I found it was necessary to CREATE INDEX,
> CLUSTER and ORDER BY the key field.

I took a different approach in what I did. I considered selecting the requested
page of data only. However, I considered the fact that as the query had been
requested it was likely that not just this one page would be viewed by the
user. On top of that I always use a chronological time order for results which
I could probably do without explicitly specifying in the query if I reloaded
the data in time order to sort out the few blocks or rows that are out of
sequence it isn't exactly correct to rely on the 'natural' order is it. This
normal usage means that I very rarely will avoid a long running query on this
large table and since my web server doesn't transmit the page data until the
CGI-BIN process has finished I decided that I would utilise all the required
multiple page data from the long running query on it's first request by caching
the entire set of pages for it.

I can do this becuase of the nature of the updates to the data. Updates are in
batches at known times and so the cached pages for presentation can be held and
then expired at the appropiate times.

This doesn't give the quick response time Gunther is looking for but I think it
is an alternative for some situations.

> Then the biggest slowdown is count(*), which we have to do in order to
> fake up the scrollbar (so we know what proportion of the data has been
> scrolled through).  I have not completely foxed this yet.  I want to
> keep a separate mini-table of how many records are in the big table and
> update it with a trigger (the table is mostly static).  ATM, I just try
> hard to minimize the times I call count(*).

Now this idea I like. I do count some things in the big table and again despite
there being an index on the column used in the GROUP BY it does a
seqscan. However, I can see that adding triggers to insert etc. in a table and
maintain counts is going to hit the data loading time but is going to speed the
count accesses tremendously.

--
Nigel J. Andrews
Director

---
Logictree Systems Limited
Computer Consultants

pgsql-general by date:

From: Bill Gribble
Date: 11 April 2002, 08:02:27
Subject: Re: Critical performance problems on large databases

From: "Samuel J. Sutjiono"
Date: 11 April 2002, 09:20:47
Subject: Transactional vs. Read-only (Retrieval) database

Re: Critical performance problems on large databases - Mailing list pgsql-general

Previous

Next