Home > mailing lists

PostgreSQL design question - Mailing list pgsql-jdbc

From	Jeffrey Tenny
Subject	PostgreSQL design question
Date	January 1, 2004 21:00:43
Msg-id	3FF4C297.4070907@comcast.net Whole thread Raw
Responses	Re: PostgreSQL design question
List	pgsql-jdbc

Tree view

I suppose this is really a general sql question, but since I'm
bound to JDBC and PostgreSQL, I'll ask it here, there may be some
PostgreSQL specific tricks.

I have a schema with many tables, some of which are cached in memory by
my Tomcat app server.

To determine the cache coherency of my java memory cache of database
table content,
I maintain an "update table" for all tables to be cached.
Whenever a transaction updates a table that is cached, it must also
update the 'update table'.
The update table looks something like this:

CREATE TABLE update_table (
       id          INTEGER       -- unique id of table in database,
primary key
       table_name  VARCHAR(64)   -- name of the table for which tuple
applies, primary key
       table_type  CHAR(3)       -- type of table
       n_rows      INTEGER     -- Number of rows in table, if out of
date we need to read
       update_time TIMESTAMP     -- Time of last update.
       )

So if I add rows to a table, I'll update n_rows and the update_time
timestamp.

At every transaction start, I consult this table to determine if any
cached table data is out of date,
and if so, lazily update it when the app requests the data.

All table updates are under control of my application, but there
may be multiple app servers accessing the same database.
Transactions are currently SERIALIZABLE, at some time I'd like to move
to READ COMMITTED,
but that'll be tricky (there are 50-60 tables in the schema, probably
5-10 of which are
used by any given transaction, sometimes with multiple query passes on
the same table for different attributes).

Many of my tables are append-only, so simply by looking at the number of
rows in memory
and the n_rows value, I can incrementally update my cache.  The memory
cache is smart, it essentially
runs a memory-enabled MVCC transaction model of its own on cached data,
where the view is pinned
to the underlying database view.

My problem is this.  On the one hand I can cache crucial table content
in each app server's
memory and get great cache hit usage.
On the other hand, the 'update table' is now a huge concurrency bottleneck.
The update_table is the last table updated in a transaction, when I have
all the counts
of rows affected in other tables.

So concurrent transactions battle it out to do all their stuff with as
much database concurrency
as they can muster, only to fail when they end up hitting the same
table, thus needing to update
the same row in the update_table.


I suppose one solution is to support multiple rows per table id in the
update_table
Thus a single table might have multiple records.  I can test for cache
coherency
state by looking at max(n_rows) and max(update_time), or other
appropriate test
depending on my model's semantic table attributes.

If I do that, I would probably want to periodically merge the multiple
rows for a table id.
I also lose the primary key uniqueness constraint (multiple rows for a
given table with the same primary key).

So I'm looking for suggestions on how to get concurrent updates in this
scenario.
If I use the multiple-rows-per-table solution, how can I determine a
good time to merge table records?
If VACUUM hasn't figured out when to do cleanups without impacting apps,
I suspect I won't figure it out either.

Does anybody have alternative mechanisms to suggest for the
update_table, or for
maintaining memory cache coherency in general?

I've avoided using any notifications from the server (I'm not sure JDBC
even supports it, haven't tried),
since I'm unsure such mechanisms are reliable.  Furthermore, app servers
in some hostile customer network environments
have idle network connections dropped after some number of hours. If
that happens, the app server will miss messages
and have to potentially completely reinitialize its cache.

I'm also curious if anybody has developed a sequence generator for
PostgreSQL that doesn't permit
gaps in the sequence based on failed transactions.  My transaction
through put is low enough,
and my need for sequential ID allocation in the event of failed
transactions is high enough,
that I've got my own sequence allocator table, and IT is also a point of
contention,
much like the update_table.  I could potentially use multiple records
there too to avoid
contention, but then I'd need to do perdiodic record merges there too.

Suggestions welcome, and thanks.

pgsql-jdbc by date:

From: Dave Cramer
Date: 01 January 2004, 12:50:11
Subject: Re: [HACKERS] PL/Java issues

From: Jeffrey Tenny
Date: 01 January 2004, 21:27:24
Subject: SELECT ... FOR UPDATE and ResultSet

PostgreSQL design question - Mailing list pgsql-jdbc

Previous

Next