Home > mailing lists

Re: UPDATEDs slowing SELECTs in a fully cached database - Mailing list pgsql-performance

From	lars
Subject	Re: UPDATEDs slowing SELECTs in a fully cached database
Date	July 12, 2011 17:58:59
Msg-id	4E1CB56E.7080801@yahoo.com Whole thread Raw
In response to	Re: UPDATEDs slowing SELECTs in a fully cached database (lars <lhofhansl@yahoo.com>)
Responses	Re: UPDATEDs slowing SELECTs in a fully cached database Re: UPDATEDs slowing SELECTs in a fully cached database
List	pgsql-performance

Tree view

On 07/12/2011 01:04 PM, lars wrote:
> On 07/12/2011 12:08 PM, Kevin Grittner wrote:
>> lars<lhofhansl@yahoo.com>  wrote:
>>
>>> select count(*) from test where tenant = $1 and created_date = $2
>>
>> Ah, that might be a major clue -- prepared statements.
>>
>> What sort of a plan do you get for that as a prepared statement?
>> (Note, it is very likely *not* to be the same plan as you get if you
>> run with literal values!)  It is not at all unlikely that it could
>> resort to a table scan if you have one tenant which is five or ten
>> percent of the table, which would likely trigger the pruning as it
>> passed over the modified pages.
>>
>> -Kevin
> So a read of a row *will* trigger dead tuple pruning, and that
> requires WAL logging, and this is known/expected?
> This is actually the only answer I am looking for. :) I have not seen
> this documented anywhere.
>
> I know that Postgres will generate general plans for prepared
> statements (how could it do otherwise?),
> I also know that it sometimes chooses a sequential scan.
> This can always be tweaked to touch fewer rows and/or use a different
> plan. That's not my objective, though!
>
> The fact that a select (maybe a big analytical query we'll run)
> touching many rows will update the WAL and wait
> (apparently) for that IO to complete is making a fully cached database
> far less useful.
> I just artificially created this scenario.
>
> ... Just dropped the table to test something so I can't get the plan
> right now. Will send an update as soon as I get
> it setup again.
>
> Thanks again.
>
> -- Lars
>
>
Ok... Slightly changes the indexes:
\d test
             Table "lars.test"
     Column    |     Type      | Modifiers
--------------+---------------+-----------
  tenant       | character(15) |
  created_by   | character(15) |
  created_date | date          |
Indexes:
     "i1" btree (tenant)

So just just a simple index on tenant.

prepare x as select count(*) from test where tenant = $1 and
created_date = $2;
PREPARE
explain execute x('000000000000001','2011-6-30');
                                   QUERY PLAN
-------------------------------------------------------------------------------
  Aggregate  (cost=263301.40..263301.41 rows=1 width=0)
    ->  Bitmap Heap Scan on test  (cost=3895.99..263299.28 rows=847 width=0)
          Recheck Cond: (tenant = $1)
          Filter: (created_date = $2)
          ->  Bitmap Index Scan on i1  (cost=0.00..3895.77 rows=169372
width=0)
                Index Cond: (tenant = $1)
(6 rows)

-- this is when the WAL rows are written:
explain (analyze on, buffers on) execute x('000000000000001','2011-6-30');
                                                          QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------
  Aggregate  (cost=263301.40..263301.41 rows=1 width=0) (actual
time=191.150..191.151 rows=1 loops=1)
    Buffers: shared hit=3716
    ->  Bitmap Heap Scan on test  (cost=3895.99..263299.28 rows=847
width=0) (actual time=1.966..188.221 rows=3712 loops=1)
          Recheck Cond: (tenant = $1)
          Filter: (created_date = $2)
          Buffers: shared hit=3716
          ->  Bitmap Index Scan on i1  (cost=0.00..3895.77 rows=169372
width=0) (actual time=1.265..1.265 rows=3712 loops=1)
                Index Cond: (tenant = $1)
                Buffers: shared hit=20
  Total runtime: 191.243 ms
(10 rows)

-- this is when no WAL is written:
explain (analyze on, buffers on) execute x('000000000000001','2011-6-30');
                                                          QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------
  Aggregate  (cost=263301.40..263301.41 rows=1 width=0) (actual
time=11.529..11.530 rows=1 loops=1)
    Buffers: shared hit=3715
    ->  Bitmap Heap Scan on test  (cost=3895.99..263299.28 rows=847
width=0) (actual time=1.341..9.187 rows=3712 loops=1)
          Recheck Cond: (tenant = $1)
          Filter: (created_date = $2)
          Buffers: shared hit=3715
          ->  Bitmap Index Scan on i1  (cost=0.00..3895.77 rows=169372
width=0) (actual time=0.756..0.756 rows=3712 loops=1)
                Index Cond: (tenant = $1)
                Buffers: shared hit=19
  Total runtime: 11.580 ms
(10 rows)

If you wanted to recreate this scenario I created a simple script to
create the table:

create table test(tenant char(15), created_by char(15), created_date date);
insert into test values('x', 'y','2011-6-30');
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test; -- 256k rows
update test set tenant = lpad((random()*10000)::int::text,15,'0'),
created_by = lpad((random()*10000)::int::text,15,'0');
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test; -- 32m rows
create index i1 on test(tenant);
vacuum analyze;

I use JDBC to perform the updates and selects, but this will do too:

prepare x as select count(*) from test where tenant = $1 and
created_date = $2;
prepare y as update test set created_by = $1 where tenant = $2 and
created_date = $3;

execute y('000000000000001', '000000000000001','2011-6-30');
execute x('000000000000001','2011-6-30');

I'll probably compile Postgres with WAL debug on next, and try the page
inspect module.
On the face of it, though, this looks like Postgres would not be that
useful as database that resides (mostly) in the cache.

Here what the query mentioned here returns (note that shared_buffers is
very large, but I observed the same with smaller settings):

http://wiki.postgresql.org/wiki/Server_Configuration


              name
|                                                 current_setting

------------------------------+------------------------------------------------------------------------------------------------------------------
  version                      | PostgreSQL 9.1beta2 on
x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.4 20100726 (Red Hat
4.4.4-13), 64-bit
  autovacuum                   | off
  bgwriter_delay               | 10ms
  bgwriter_lru_maxpages        | 1000
  checkpoint_completion_target | 0.9
  checkpoint_segments          | 128
  client_encoding              | UTF8
  effective_cache_size         | 64GB
  lc_collate                   | en_US.UTF-8
  lc_ctype                     | en_US.UTF-8
  log_checkpoints              | on
  log_line_prefix              | %m
  maintenance_work_mem         | 2GB
  max_connections              | 100
  max_stack_depth              | 2MB
  server_encoding              | UTF8
  shared_buffers               | 20GB
  TimeZone                     | UTC
  wal_buffers                  | 16MB
  work_mem                     | 1GB
(20 rows)

Thanks.

-- Lars

pgsql-performance by date:

From: "Kevin Grittner"
Date: 12 July 2011, 17:51:35
Subject: Re: UPDATEDs slowing SELECTs in a fully cached database

From: Merlin Moncure
Date: 12 July 2011, 18:38:13
Subject: Re: UPDATEDs slowing SELECTs in a fully cached database

Re: UPDATEDs slowing SELECTs in a fully cached database - Mailing list pgsql-performance

Previous

Next