Re: Matching unique primary keys - Mailing list pgsql-general
From | Richard Huxton |
---|---|
Subject | Re: Matching unique primary keys |
Date | |
Msg-id | 200210231126.39242.dev@archonet.com Whole thread Raw |
In response to | Matching unique primary keys (Kevin Old <kold@carolina.rr.com>) |
Responses |
Re: Matching unique primary keys
|
List | pgsql-general |
On Tuesday 22 Oct 2002 9:12 pm, Kevin Old wrote: > Hello all, > > I have a table of data that is very large (502 columns) and whenever I > query it, I only query on up to 4 columns and on average only 20 fields > are returned. I've come up with a way to increase the speed of the > queries. I could could put the 20 fields that are searched and returned > into a separate table and then link them by a unique id. Sounds > easy......but I'm stuck on something. > > I am dealing with about 1 million records per day. Just to clarify - that's a table that's expanding by 1 million rows a day? > One option is to put a sequence on the tables, but dealing with so many > records I'd have to use a BIGINT and with a ceiling of > 9223372036854775807 it seems to me that numbers this large might slow > down the query process. I realize it would take quite a while to get to > this point, but would like other's opinions. Well an int4 would give you enough unique id's for about 4000 days, which may or may not be enough for you. One thing I'd say is that querying a table with 4 billion rows with 500 columns is going to require a hefty machine. Using a bigint (int8) means practically unlimited expansion, but handling int8 is slower than int4 on a 32-bit system. Details will depend on your hardware, and on 64-bit systems I'd expect no difference (but test - I haven't). Having said that, I can't believe the impact is relevant when compared with the overhead of 500 columns. > Another option is that I have 3 fields that when combine, make each > record unique. Is there some way I can combine these dynamically and > then use "views" of them to reference my records and display them from > the larger table. So - those 3 fields are a candidate key, and in the absence of any other info are your primary key. A first design would be to repeat these 3 fields in each table (if you split the 500 column one) and then if performance sucks add a sequence to the "core" table and use that as a key instead. I'd think it's unlikely that your table should (in design terms) be over 500 columns wide. If lots of these values are defaults or nulls then that's a sign that things need to be split up. If the 20 columns you mention being queried against are logically similar then that should form your foo_core table. If splitting the columns into logical groups can be done, that's where I'd start. If nothing else, a logical split is less likely to need changes in the future - an important consideration once you have a few hundred million records. From a performance point of view, I'd think it would be helpful to have your 4 query columns in one table and your 20 results columns in one table (possibly the same table). However, if they aren't logically related I'd be tempted to still split things logically and set up a separate foo_query table and maintain it using triggers. > I realize this is very confusing, please get back with me if I should > elaborate on something. Difficult for anyone on the list to give specific advice without knowing your hardware, performance requirements and schema details. If my post hasn't helped, try another message with more details. Bruce Momjian has written some notes on the internals of PG - how it accesses disk blocks and interacts with the OS disk cache. They're either in the developers section of the website or on techdocs.postgresql.org - can't remember which sorry. Might give you something to think about while you're looking at this problem. -- Richard Huxton
pgsql-general by date: