strategies for optimizing read on rather large tables - Mailing list pgsql-performance
From | hubert lubaczewski |
---|---|
Subject | strategies for optimizing read on rather large tables |
Date | |
Msg-id | 20050604081742.GA29123@eo.pl Whole thread Raw |
Responses |
Re: strategies for optimizing read on rather large tables
Re: strategies for optimizing read on rather large tables |
List | pgsql-performance |
hi first let me draw the outline. we have a database which stores "adverts". each advert is in one category, and one or more "region". regions and categories form (each) tree structure. assume category tree: a / \ b c / \ d e if any given advert is in category "e". it means it is also in "b" and "a". same goes for regions. as for now we have approx. 400 categories, 1300 regions, and 1000000 adverts. since checking always over the tress of categories and regions we created acr_cache table (advert/category/region) which stores information on all adverts and all categories and regions this particular region is in. plus some more information for sorting purposes. this table is ~ 11 milion records. now. we query this in more or less this manner: select advert_id from acr_cache where category_id = ? and region_id = ? order by XXX {asc|desc} limit 20; where XXX is one of 5 possible fields, timestamp, timestamp, text, text, numeric we created index on acr_cache (category_id, region_id) and it works rather well. usually. if a given "crossing" (category + region) has small amount of ads (less then 10000) - the query is good enough (up to 300 miliseconds). but when we enter the crossings which result in 50000 ads - the query takes up to 10 seconds. which is almost "forever". we thought about creating indices like this: index on acr_cache (effective_date); where effective_dateis on of the timestamp fields. it worked well for the crossings with lots of ads, but when we asked for small crossing (like 1000 ads) it took > 120 seconds! it appears that postgresql was favorizing this new advert instead of using much better index on category_id and region_id. actually - i'm not sure what to do next. i am even thinkinh about createing special indices (partial) for big crossings, but that's just weird. plus the fact that already the acr_cache vacuum time exceeds 3 hours!. any suggestions? hardware is dual xeon 3 ghz, 4G ram, hardware scsi raid put into raid 1. settings in postgresql.conf: listen_addresses = '*' port = 5800 max_connections = 300 superuser_reserved_connections = 50 shared_buffers = 131072 work_mem = 4096 maintenance_work_mem = 65536 fsync = false commit_delay = 100 commit_siblings = 5 checkpoint_segments = 10 effective_cache_size = 10000 random_page_cost = 1.1 log_destination = 'stderr' redirect_stderr = true log_directory = '/home/pgdba/logs' log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' log_truncate_on_rotation = false log_rotation_age = 1440 log_rotation_size = 502400 log_min_duration_statement = -1 log_connections = true log_duration = true log_line_prefix = '[%t] [%p] <%u@%d> ' log_statement = 'all' stats_start_collector = true stats_command_string = true stats_block_level = true stats_row_level = true stats_reset_on_server_start = true lc_messages = 'en_US.UTF-8' lc_monetary = 'en_US.UTF-8' lc_numeric = 'en_US.UTF-8' lc_time = 'en_US.UTF-8' actual max numer of connection is 120 plus some administrative connections (psql sessions). postgresql version 8.0.2 on linux debian sarge. best regards, depesz -- hubert lubaczewski Network Operations Center eo Networks Sp. z o.o.
Attachment
pgsql-performance by date: