Re: UPDATEDs slowing SELECTs in a fully cached database - Mailing list pgsql-performance
From | lars |
---|---|
Subject | Re: UPDATEDs slowing SELECTs in a fully cached database |
Date | |
Msg-id | 4E1CB56E.7080801@yahoo.com Whole thread Raw |
In response to | Re: UPDATEDs slowing SELECTs in a fully cached database (lars <lhofhansl@yahoo.com>) |
Responses |
Re: UPDATEDs slowing SELECTs in a fully cached
database
Re: UPDATEDs slowing SELECTs in a fully cached database |
List | pgsql-performance |
On 07/12/2011 01:04 PM, lars wrote: > On 07/12/2011 12:08 PM, Kevin Grittner wrote: >> lars<lhofhansl@yahoo.com> wrote: >> >>> select count(*) from test where tenant = $1 and created_date = $2 >> >> Ah, that might be a major clue -- prepared statements. >> >> What sort of a plan do you get for that as a prepared statement? >> (Note, it is very likely *not* to be the same plan as you get if you >> run with literal values!) It is not at all unlikely that it could >> resort to a table scan if you have one tenant which is five or ten >> percent of the table, which would likely trigger the pruning as it >> passed over the modified pages. >> >> -Kevin > So a read of a row *will* trigger dead tuple pruning, and that > requires WAL logging, and this is known/expected? > This is actually the only answer I am looking for. :) I have not seen > this documented anywhere. > > I know that Postgres will generate general plans for prepared > statements (how could it do otherwise?), > I also know that it sometimes chooses a sequential scan. > This can always be tweaked to touch fewer rows and/or use a different > plan. That's not my objective, though! > > The fact that a select (maybe a big analytical query we'll run) > touching many rows will update the WAL and wait > (apparently) for that IO to complete is making a fully cached database > far less useful. > I just artificially created this scenario. > > ... Just dropped the table to test something so I can't get the plan > right now. Will send an update as soon as I get > it setup again. > > Thanks again. > > -- Lars > > Ok... Slightly changes the indexes: \d test Table "lars.test" Column | Type | Modifiers --------------+---------------+----------- tenant | character(15) | created_by | character(15) | created_date | date | Indexes: "i1" btree (tenant) So just just a simple index on tenant. prepare x as select count(*) from test where tenant = $1 and created_date = $2; PREPARE explain execute x('000000000000001','2011-6-30'); QUERY PLAN ------------------------------------------------------------------------------- Aggregate (cost=263301.40..263301.41 rows=1 width=0) -> Bitmap Heap Scan on test (cost=3895.99..263299.28 rows=847 width=0) Recheck Cond: (tenant = $1) Filter: (created_date = $2) -> Bitmap Index Scan on i1 (cost=0.00..3895.77 rows=169372 width=0) Index Cond: (tenant = $1) (6 rows) -- this is when the WAL rows are written: explain (analyze on, buffers on) execute x('000000000000001','2011-6-30'); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=263301.40..263301.41 rows=1 width=0) (actual time=191.150..191.151 rows=1 loops=1) Buffers: shared hit=3716 -> Bitmap Heap Scan on test (cost=3895.99..263299.28 rows=847 width=0) (actual time=1.966..188.221 rows=3712 loops=1) Recheck Cond: (tenant = $1) Filter: (created_date = $2) Buffers: shared hit=3716 -> Bitmap Index Scan on i1 (cost=0.00..3895.77 rows=169372 width=0) (actual time=1.265..1.265 rows=3712 loops=1) Index Cond: (tenant = $1) Buffers: shared hit=20 Total runtime: 191.243 ms (10 rows) -- this is when no WAL is written: explain (analyze on, buffers on) execute x('000000000000001','2011-6-30'); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=263301.40..263301.41 rows=1 width=0) (actual time=11.529..11.530 rows=1 loops=1) Buffers: shared hit=3715 -> Bitmap Heap Scan on test (cost=3895.99..263299.28 rows=847 width=0) (actual time=1.341..9.187 rows=3712 loops=1) Recheck Cond: (tenant = $1) Filter: (created_date = $2) Buffers: shared hit=3715 -> Bitmap Index Scan on i1 (cost=0.00..3895.77 rows=169372 width=0) (actual time=0.756..0.756 rows=3712 loops=1) Index Cond: (tenant = $1) Buffers: shared hit=19 Total runtime: 11.580 ms (10 rows) If you wanted to recreate this scenario I created a simple script to create the table: create table test(tenant char(15), created_by char(15), created_date date); insert into test values('x', 'y','2011-6-30'); insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; -- 256k rows update test set tenant = lpad((random()*10000)::int::text,15,'0'), created_by = lpad((random()*10000)::int::text,15,'0'); insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; -- 32m rows create index i1 on test(tenant); vacuum analyze; I use JDBC to perform the updates and selects, but this will do too: prepare x as select count(*) from test where tenant = $1 and created_date = $2; prepare y as update test set created_by = $1 where tenant = $2 and created_date = $3; execute y('000000000000001', '000000000000001','2011-6-30'); execute x('000000000000001','2011-6-30'); I'll probably compile Postgres with WAL debug on next, and try the page inspect module. On the face of it, though, this looks like Postgres would not be that useful as database that resides (mostly) in the cache. Here what the query mentioned here returns (note that shared_buffers is very large, but I observed the same with smaller settings): http://wiki.postgresql.org/wiki/Server_Configuration name | current_setting ------------------------------+------------------------------------------------------------------------------------------------------------------ version | PostgreSQL 9.1beta2 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.4 20100726 (Red Hat 4.4.4-13), 64-bit autovacuum | off bgwriter_delay | 10ms bgwriter_lru_maxpages | 1000 checkpoint_completion_target | 0.9 checkpoint_segments | 128 client_encoding | UTF8 effective_cache_size | 64GB lc_collate | en_US.UTF-8 lc_ctype | en_US.UTF-8 log_checkpoints | on log_line_prefix | %m maintenance_work_mem | 2GB max_connections | 100 max_stack_depth | 2MB server_encoding | UTF8 shared_buffers | 20GB TimeZone | UTC wal_buffers | 16MB work_mem | 1GB (20 rows) Thanks. -- Lars
pgsql-performance by date: