PostgreSQL design question - Mailing list pgsql-jdbc
From | Jeffrey Tenny |
---|---|
Subject | PostgreSQL design question |
Date | |
Msg-id | 3FF4C297.4070907@comcast.net Whole thread Raw |
Responses |
Re: PostgreSQL design question
|
List | pgsql-jdbc |
I suppose this is really a general sql question, but since I'm bound to JDBC and PostgreSQL, I'll ask it here, there may be some PostgreSQL specific tricks. I have a schema with many tables, some of which are cached in memory by my Tomcat app server. To determine the cache coherency of my java memory cache of database table content, I maintain an "update table" for all tables to be cached. Whenever a transaction updates a table that is cached, it must also update the 'update table'. The update table looks something like this: CREATE TABLE update_table ( id INTEGER -- unique id of table in database, primary key table_name VARCHAR(64) -- name of the table for which tuple applies, primary key table_type CHAR(3) -- type of table n_rows INTEGER -- Number of rows in table, if out of date we need to read update_time TIMESTAMP -- Time of last update. ) So if I add rows to a table, I'll update n_rows and the update_time timestamp. At every transaction start, I consult this table to determine if any cached table data is out of date, and if so, lazily update it when the app requests the data. All table updates are under control of my application, but there may be multiple app servers accessing the same database. Transactions are currently SERIALIZABLE, at some time I'd like to move to READ COMMITTED, but that'll be tricky (there are 50-60 tables in the schema, probably 5-10 of which are used by any given transaction, sometimes with multiple query passes on the same table for different attributes). Many of my tables are append-only, so simply by looking at the number of rows in memory and the n_rows value, I can incrementally update my cache. The memory cache is smart, it essentially runs a memory-enabled MVCC transaction model of its own on cached data, where the view is pinned to the underlying database view. My problem is this. On the one hand I can cache crucial table content in each app server's memory and get great cache hit usage. On the other hand, the 'update table' is now a huge concurrency bottleneck. The update_table is the last table updated in a transaction, when I have all the counts of rows affected in other tables. So concurrent transactions battle it out to do all their stuff with as much database concurrency as they can muster, only to fail when they end up hitting the same table, thus needing to update the same row in the update_table. I suppose one solution is to support multiple rows per table id in the update_table Thus a single table might have multiple records. I can test for cache coherency state by looking at max(n_rows) and max(update_time), or other appropriate test depending on my model's semantic table attributes. If I do that, I would probably want to periodically merge the multiple rows for a table id. I also lose the primary key uniqueness constraint (multiple rows for a given table with the same primary key). So I'm looking for suggestions on how to get concurrent updates in this scenario. If I use the multiple-rows-per-table solution, how can I determine a good time to merge table records? If VACUUM hasn't figured out when to do cleanups without impacting apps, I suspect I won't figure it out either. Does anybody have alternative mechanisms to suggest for the update_table, or for maintaining memory cache coherency in general? I've avoided using any notifications from the server (I'm not sure JDBC even supports it, haven't tried), since I'm unsure such mechanisms are reliable. Furthermore, app servers in some hostile customer network environments have idle network connections dropped after some number of hours. If that happens, the app server will miss messages and have to potentially completely reinitialize its cache. I'm also curious if anybody has developed a sequence generator for PostgreSQL that doesn't permit gaps in the sequence based on failed transactions. My transaction through put is low enough, and my need for sequential ID allocation in the event of failed transactions is high enough, that I've got my own sequence allocator table, and IT is also a point of contention, much like the update_table. I could potentially use multiple records there too to avoid contention, but then I'd need to do perdiodic record merges there too. Suggestions welcome, and thanks.
pgsql-jdbc by date: