Thread: Slow query, possibly not using index

Slow query, possibly not using index

From
Les
Date:
I have this table:

CREATE TABLE media.block (

id uuid NOT NULL,

"size" int8 NOT NULL,

nrefs int8 NOT NULL DEFAULT 0,

block bytea NOT NULL,

hs256 bytea NOT NULL,

CONSTRAINT block_pkey PRIMARY KEY (id),

CONSTRAINT chk_nrefs CHECK ((nrefs >= 0))

)

WITH (

toast_tuple_target=8160

)

TABLESPACE data_slow

;

alter table media.block alter column block set storage main;

alter table media.block alter column hs256 set storage main;

CREATE INDEX idx_block_unused ON media.block USING btree (id) WHERE (nrefs = 0);

CREATE UNIQUE INDEX uidx_block_hs256 ON media.block USING btree (hs256);


Number of rows in this table is about 40M, and most of the rows occupy a full 8K block (in most cases, the "block" field contains 7500 bytes).

The idx_block_unused index should be used to find blocks that are unused, so they can be deleted at some point.

The idx_block_unused index is less than 400MB:


SELECT i.relname "Table Name",indexrelname "Index Name",

pg_size_pretty(pg_total_relation_size(relid)) As "Total Size",

pg_size_pretty(pg_indexes_size(relid)) as "Total Size of all Indexes",

pg_size_pretty(pg_relation_size(relid)) as "Table Size",

pg_size_pretty(pg_relation_size(indexrelid)) "Index Size",

reltuples::bigint "Estimated table row count"

FROM pg_stat_all_indexes i JOIN pg_class c ON i.relid=c.oid

where i.relid ='media.block'::regclass


Table Name|Index Name      |Total Size|Total Size of all Indexes|Table Size|Index Size|Estimated table row count|
----------+----------------+----------+-------------------------+----------+----------+-------------------------+
block     |block_pkey      |352 GB    |5584 MB                  |347 GB    |1986 MB   |                 38958848|
block     |uidx_block_hs256|352 GB    |5584 MB                  |347 GB    |3226 MB   |                 38958848|
block     |idx_block_unused|352 GB    |5584 MB                  |347 GB    |372 MB    |                 38958848|


If I try to select a single unused block this way:

explain analyze select id from media.block b where nrefs =0 limit 1


then it runs for more than 10 minutes (I'm not sure how long, I cancelled the query after 10 minutes).

If I run this without analyze:

explain select id from media.block b where nrefs =0 limit 1


QUERY PLAN                                                                                     |
-----------------------------------------------------------------------------------------------+
Limit  (cost=0.38..0.76 rows=1 width=16)                                                       |
  ->  Index Only Scan using idx_block_unused on block b  (cost=0.38..869.83 rows=2274 width=16)|

I believe it is not actually using the index, because reading a single (random?) entry from an index should not run for >10 minutes.

What am I doing wrong?

Thank you,

     Laszlo


Re: Slow query, possibly not using index

From
Tom Lane
Date:
Les <nagylzs@gmail.com> writes:
> If I try to select a single unused block this way:
> explain analyze select id from media.block b where nrefs =0 limit 1
> then it runs for more than 10 minutes (I'm not sure how long, I cancelled
> the query after 10 minutes).

Are you sure it isn't blocked on a lock?

Another theory is that the index contains many thousands of references
to now-dead rows, and the query is vainly searching for a live entry.
Given that EXPLAIN thinks there are only about 2300 live entries,
and yet you say the index is 400MB, this seems pretty plausible.
Have you disabled autovacuum, or something like that?  (REINDEX
could help here, at least till the index gets bloated again.)

You might think that even so, it shouldn't take that long ... but
indexes on UUID columns are a well known performance antipattern.
The index entry order is likely to have precisely zip to do with
the table's physical order, resulting in exceedingly-random access
to the table, which'll be horribly expensive when the table is so
much bigger than RAM.  Can you replace the UUID column with a simple
serial (identity) column?

> I believe it is not actually using the index, because reading a single
> (random?) entry from an index should not run for >10 minutes.

You should believe what EXPLAIN tells you about the plan shape.
(Its rowcount estimates are only estimates, though.)

            regards, tom lane



Re: Slow query, possibly not using index

From
Les
Date:



> If I try to select a single unused block this way:
> explain analyze select id from media.block b where nrefs =0 limit 1
> then it runs for more than 10 minutes (I'm not sure how long, I cancelled
> the query after 10 minutes).

Are you sure it isn't blocked on a lock?

Yes, I'm sure. I have created a single database instance from a zfs snapshot and tried the query on that database. It was the only client.


Another theory is that the index contains many thousands of references
to now-dead rows, and the query is vainly searching for a live entry.
Given that EXPLAIN thinks there are only about 2300 live entries,
and yet you say the index is 400MB, this seems pretty plausible.

Nobody ever deleted anything from this table. Since it was created, this has been a write-only table.
 
Have you disabled autovacuum, or something like that?  (REINDEX
could help here, at least till the index gets bloated again.)
I did not disable autovacuum. 

You might think that even so, it shouldn't take that long ... but
indexes on UUID columns are a well known performance antipattern.
The index entry order is likely to have precisely zip to do with
the table's physical order, resulting in exceedingly-random access
to the table, which'll be horribly expensive when the table is so
much bigger than RAM.  Can you replace the UUID column with a simple
serial (identity) column?

I'm aware of the problems with random UUID values. I was using this function to create ulids from the beginning:

CREATE OR REPLACE FUNCTION public.gen_ulid()

RETURNS uuid

LANGUAGE sql

AS $function$

SELECT (lpad(to_hex(floor(extract(epoch FROM clock_timestamp()) * 1000)::bigint), 12, '0') || encode(gen_random_bytes(10), 'hex'))::uuid;

$function$

;


If I order some rows by id values, I can see that their creation times are strictly ascending. I did not write this function, it was taken from this website:


They have a benchmark section where they show that these ULID values are slower to generate (at least with this implementation) but much faster to insert. 

I might be able to replace these with int8 values, I need to check. 
 

> I believe it is not actually using the index, because reading a single
> (random?) entry from an index should not run for >10 minutes.

You should believe what EXPLAIN tells you about the plan shape.
(Its rowcount estimates are only estimates, though.)

All of the 40M rows in this table are live. I'm 100% sure about this, because nobody ever deleted rows from this table.

I can try to do VACUUM on this table, but I'm limited on resources. I think it will take days to do this. Maybe I can try to dump the whole database and restore it on another machine. Would that eliminate dead rows? (Is there a way to check the number of dead rows?)

Regards,

    Laszlo
 

Re: Slow query, possibly not using index

From
Wael Khobalatte
Date:
> Nobody ever deleted anything from this table. Since it was created, this has been a write-only table.

does write-only include updates? that would create the dead rows tom is referring to.

> I believe it is not actually using the index, because reading a single (random?) entry from an index should not run for >10 minutes.

it is using the index. You can disable it, and see how long your query takes ("set enable_indexonlyscan = off", optionally enable_indexscan too), chances are it will take even longer. If somehow it does not, by luck, then you no vacuum has run for a while, for some reason (perhaps canceled due to frequent table activity?). That too you can check. 

Re: Slow query, possibly not using index

From
Tom Lane
Date:
Les <nagylzs@gmail.com> writes:
>>> If I try to select a single unused block this way:
>>> explain analyze select id from media.block b where nrefs =0 limit 1
>>> then it runs for more than 10 minutes (I'm not sure how long, I cancelled
>>> the query after 10 minutes).

>> You might think that even so, it shouldn't take that long ... but
>> indexes on UUID columns are a well known performance antipattern.

> I'm aware of the problems with random UUID values. I was using this
> function to create ulids from the beginning:

Oh, well that would have been useful information to provide at the
outset.  Now that we know the index order is correlated with creation
time, I wonder if it is also correlated with nrefs, in such a way that
scanning in index order is disastrous because all the desired rows are
at the end of the index.

Also, you deny deleting any rows, but that wasn't the point.  Do you
ever update nrefs from zero to nonzero?  That'd also leave dead
entries behind in this index.  If that is a routine event that is
correlated with creation time, it gets easier to believe that your
index could have lots of dead entries at the front.

We'd still have to figure out why autovacuum is failing to clean out
those entries in a timely fashion, but that seems like a plausible
way for the performance problem to exist.

> I can try to do VACUUM on this table, but I'm limited on resources. I think
> it will take days to do this. Maybe I can try to dump the whole database
> and restore it on another machine.

Pretty hard to believe that dump-and-restore would be faster than
VACUUM.

> (Is there a way to check the number of dead rows?)

I think contrib/pgstattuple might help.

            regards, tom lane



Re: Slow query, possibly not using index

From
Les
Date:


> I'm aware of the problems with random UUID values. I was using this
> function to create ulids from the beginning:

Oh, well that would have been useful information to provide at the
outset. 
I'm sorry, I left this out. 
Now that we know the index order is correlated with creation
time, I wonder if it is also correlated with nrefs, in such a way that
scanning in index order is disastrous because all the desired rows are
at the end of the index.
Possibly, I have no idea.

Also, you deny deleting any rows, but that wasn't the point.  Do you
ever update nrefs from zero to nonzero?  That'd also leave dead
entries behind in this index.  If that is a routine event that is
correlated with creation time, it gets easier to believe that your
index could have lots of dead entries at the front.

I have checked the trigger that is maintaining the nrefs field. Blocks are referenced from a "file_block" table. Every time a block is created, it first has an initial value of nrefs=0, then a file_block row (reference) is inserted, and nrefs is incremented to one. It means that every block has shown up once in the index, and then disappeared. If the index was never vacuumed, then it is very plausible that it is full of dead rows.

CREATE OR REPLACE FUNCTION media.trg_aiud_file_block()

RETURNS trigger

LANGUAGE plpgsql

AS $function$

begin

if TG_OP='INSERT' then

update media.block set nrefs = nrefs + 1 where id = new.block_id;

return new;

end if;

if TG_OP='UPDATE' then

if old.block_id is distinct from new.block_id then

update media.block set nrefs = nrefs + 1 where id = new.block_id;

update media.block set nrefs = nrefs - 1 where id = old.block_id;

end if;

return new;

end if;

if TG_OP='DELETE' then

update media.block set nrefs = nrefs - 1 where id = old.block_id;

return old;

end if;

end;

$function$

;


The idea was to create an index that can help in quickly removing unused blocks, to free up disk space. It would be much better to keep out the initially inserted (not yet references) from the index, but I don't know how to do this.


We'd still have to figure out why autovacuum is failing to clean out
those entries in a timely fashion, but that seems like a plausible
way for the performance problem to exist.
Yes, that would be very good to know. I cloud drop and recreate the index now, but after some time I would be facing the same situation again.

I double checked, and the "autovacuum launcher" process is running.

Here are the current settings:

=# select name, setting, unit, min_val, max_val, boot_val, reset_val, pending_restart from pg_settings where name like '%vacuum%';
                 name                  |  setting   | unit | min_val |  max_val   |  boot_val  | reset_val  | pending_restart
---------------------------------------+------------+------+---------+------------+------------+------------+-----------------
 autovacuum                            | on         |      |         |            | on         | on         | f
 autovacuum_analyze_scale_factor       | 0.1        |      | 0       | 100        | 0.1        | 0.1        | f
 autovacuum_analyze_threshold          | 50         |      | 0       | 2147483647 | 50         | 50         | f
 autovacuum_freeze_max_age             | 200000000  |      | 100000  | 2000000000 | 200000000  | 200000000  | f
 autovacuum_max_workers                | 3          |      | 1       | 262143     | 3          | 3          | f
 autovacuum_multixact_freeze_max_age   | 400000000  |      | 10000   | 2000000000 | 400000000  | 400000000  | f
 autovacuum_naptime                    | 60         | s    | 1       | 2147483    | 60         | 60         | f
 autovacuum_vacuum_cost_delay          | 2          | ms   | -1      | 100        | 2          | 2          | f
 autovacuum_vacuum_cost_limit          | -1         |      | -1      | 10000      | -1         | -1         | f
 autovacuum_vacuum_insert_scale_factor | 0.2        |      | 0       | 100        | 0.2        | 0.2        | f
 autovacuum_vacuum_insert_threshold    | 1000       |      | -1      | 2147483647 | 1000       | 1000       | f
 autovacuum_vacuum_scale_factor        | 0.2        |      | 0       | 100        | 0.2        | 0.2        | f
 autovacuum_vacuum_threshold           | 50         |      | 0       | 2147483647 | 50         | 50         | f
 autovacuum_work_mem                   | -1         | kB   | -1      | 2147483647 | -1         | -1         | f
 log_autovacuum_min_duration           | 600000     | ms   | -1      | 2147483647 | 600000     | 600000     | f
 vacuum_cost_delay                     | 0          | ms   | 0       | 100        | 0          | 0          | f
 vacuum_cost_limit                     | 200        |      | 1       | 10000      | 200        | 200        | f
 vacuum_cost_page_dirty                | 20         |      | 0       | 10000      | 20         | 20         | f
 vacuum_cost_page_hit                  | 1          |      | 0       | 10000      | 1          | 1          | f
 vacuum_cost_page_miss                 | 2          |      | 0       | 10000      | 2          | 2          | f
 vacuum_defer_cleanup_age              | 0          |      | 0       | 1000000    | 0          | 0          | f
 vacuum_failsafe_age                   | 1600000000 |      | 0       | 2100000000 | 1600000000 | 1600000000 | f
 vacuum_freeze_min_age                 | 50000000   |      | 0       | 1000000000 | 50000000   | 50000000   | f
 vacuum_freeze_table_age               | 150000000  |      | 0       | 2000000000 | 150000000  | 150000000  | f
 vacuum_multixact_failsafe_age         | 1600000000 |      | 0       | 2100000000 | 1600000000 | 1600000000 | f
 vacuum_multixact_freeze_min_age       | 5000000    |      | 0       | 1000000000 | 5000000    | 5000000    | f
 vacuum_multixact_freeze_table_age     | 150000000  |      | 0       | 2000000000 | 150000000  | 150000000  | f
(27 rows)


I think I did not change the defaults.


> I can try to do VACUUM on this table, but I'm limited on resources. I think
> it will take days to do this. Maybe I can try to dump the whole database
> and restore it on another machine.

Pretty hard to believe that dump-and-restore would be faster than
VACUUM.

> (Is there a way to check the number of dead rows?)

I think contrib/pgstattuple might help.

 
All right, I started pgstattuple()  and I'll also do pgstatindex(), but it takes a while. I'll get back with the results.

Thank you for your help!

Regards,

    Laszlo


Re: Slow query, possibly not using index

From
Les
Date:


 
All right, I started pgstattuple()  and I'll also do pgstatindex(), but it takes a while. I'll get back with the results.

=# select * from pgstattuple('media.block');

  table_len   | tuple_count |  tuple_len   | tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space  | free_percent
--------------+-------------+--------------+---------------+------------------+----------------+--------------------+-------------+--------------
 372521984000 |    39652836 | 299148572428 |          80.3 |          3578319 |    26977942540 |               7.24 | 44638265312 |        11.98
(1 row)

=# select * from pgstatindex('media.idx_block_unused');
 version | tree_level | index_size | root_block_no | internal_pages | leaf_pages | empty_pages | deleted_pages | avg_leaf_density | leaf_fragmentation
---------+------------+------------+---------------+----------------+------------+-------------+---------------+------------------+--------------------
       4 |          2 |  389677056 |           546 |            114 |      23069 |           0 |         24384 |            90.03 |                  0
(1 row)


As far as I understand these numbers, the media.block table itself is in good shape, but the index is not. Should I vacuum the whole table? Or would it be better to REINDEX INDEX media.idx_block_unused CONCURRENTLY ?

More important question is, how can I find out why the index was not auto vacuumed.

Thank you,

   Laszlo

Re: Slow query, possibly not using index

From
Les
Date:



=# select * from pgstatindex('media.idx_block_unused');
 version | tree_level | index_size | root_block_no | internal_pages | leaf_pages | empty_pages | deleted_pages | avg_leaf_density | leaf_fragmentation
---------+------------+------------+---------------+----------------+------------+-------------+---------------+------------------+--------------------
       4 |          2 |  389677056 |           546 |            114 |      23069 |           0 |         24384 |            90.03 |                  0
(1 row)


After reindex:

=# select * from pgstatindex('media.idx_block_unused');
 version | tree_level | index_size | root_block_no | internal_pages | leaf_pages | empty_pages | deleted_pages | avg_leaf_density | leaf_fragmentation
---------+------------+------------+---------------+----------------+------------+-------------+---------------+------------------+--------------------
       4 |          0 |       8192 |             0 |              0 |          0 |           0 |             0 |              NaN |                NaN
(1 row)

 explain analyze select id from media.block b where nrefs =0 limit 1

QUERY PLAN                                                                                                                               |
-----------------------------------------------------------------------------------------------------------------------------------------+
Limit  (cost=0.14..0.46 rows=1 width=16) (actual time=0.010..0.011 rows=0 loops=1)                                                       |
  ->  Index Only Scan using idx_block_unused on block b  (cost=0.14..698.91 rows=2231 width=16) (actual time=0.008..0.009 rows=0 loops=1)|
        Heap Fetches: 0                                                                                                                  |
Planning Time: 0.174 ms                                                                                                                  |
Execution Time: 0.030 ms                                                                                                                 |

It is actually empty.

Now I only need to figure out why autovacuum did not work on the index.

Thank you

    Laszlo

Re: Slow query, possibly not using index

From
Pavel Stehule
Date:


po 28. 8. 2023 v 13:00 odesílatel Les <nagylzs@gmail.com> napsal:



=# select * from pgstatindex('media.idx_block_unused');
 version | tree_level | index_size | root_block_no | internal_pages | leaf_pages | empty_pages | deleted_pages | avg_leaf_density | leaf_fragmentation
---------+------------+------------+---------------+----------------+------------+-------------+---------------+------------------+--------------------
       4 |          2 |  389677056 |           546 |            114 |      23069 |           0 |         24384 |            90.03 |                  0
(1 row)


After reindex:

=# select * from pgstatindex('media.idx_block_unused');
 version | tree_level | index_size | root_block_no | internal_pages | leaf_pages | empty_pages | deleted_pages | avg_leaf_density | leaf_fragmentation
---------+------------+------------+---------------+----------------+------------+-------------+---------------+------------------+--------------------
       4 |          0 |       8192 |             0 |              0 |          0 |           0 |             0 |              NaN |                NaN
(1 row)

 explain analyze select id from media.block b where nrefs =0 limit 1

QUERY PLAN                                                                                                                               |
-----------------------------------------------------------------------------------------------------------------------------------------+
Limit  (cost=0.14..0.46 rows=1 width=16) (actual time=0.010..0.011 rows=0 loops=1)                                                       |
  ->  Index Only Scan using idx_block_unused on block b  (cost=0.14..698.91 rows=2231 width=16) (actual time=0.008..0.009 rows=0 loops=1)|
        Heap Fetches: 0                                                                                                                  |
Planning Time: 0.174 ms                                                                                                                  |
Execution Time: 0.030 ms                                                                                                                 |

It is actually empty.

Now I only need to figure out why autovacuum did not work on the index.

Autovacuum doesn't reindex.

Regards

Pavel
 

Thank you

    Laszlo

Re: Slow query, possibly not using index

From
David Rowley
Date:
On Mon, 28 Aug 2023 at 19:21, Les <nagylzs@gmail.com> wrote:
> More important question is, how can I find out why the index was not auto vacuumed.

You should have a look at pg_stat_user_tables.  It'll let you know if
the table is being autovacuumed and how often.  If you're concerned
about autovacuum not running properly, then you might want to lower
log_autovacuum_min_duration.  Generally, anything that takes a
conflicting lock will cause autovacuum to cancel so that the
conflicting locker can get through.  Things like ALTER TABLE or even
an ANALYZE running will cancel most autovacuum runs on tables.

Also, this is a fairly large table and you do have the standard
autovacuum settings.  Going by pgstattuple, the table has 39652836
tuples.  Autovacuum will trigger when the statistics indicate that 20%
of tuples are dead, which is about 8 million tuples. Perhaps that's
enough for the index scan to have to skip over a large enough number
of dead tuples to make it slow. You might want to consider lowering
the autovacuum scale factor for this table.

Also, ensure you're not doing anything like calling pg_stat_reset();

It might be worth showing us the output of:

select * from pg_stat_user_tables where relid = 'media.block'::regclass;

David



Re: Slow query, possibly not using index

From
Les
Date:


> More important question is, how can I find out why the index was not auto vacuumed.

You should have a look at pg_stat_user_tables.  It'll let you know if
the table is being autovacuumed and how often.  If you're concerned
about autovacuum not running properly, then you might want to lower
log_autovacuum_min_duration.  Generally, anything that takes a
conflicting lock will cause autovacuum to cancel so that the
conflicting locker can get through.  Things like ALTER TABLE or even
an ANALYZE running will cancel most autovacuum runs on tables.

Also, this is a fairly large table and you do have the standard
autovacuum settings.  Going by pgstattuple, the table has 39652836
tuples.  Autovacuum will trigger when the statistics indicate that 20%
of tuples are dead, which is about 8 million tuples. Perhaps that's
enough for the index scan to have to skip over a large enough number
of dead tuples to make it slow. You might want to consider lowering
the autovacuum scale factor for this table.

Also, ensure you're not doing anything like calling pg_stat_reset();

It might be worth showing us the output of:

select * from pg_stat_user_tables where relid = 'media.block'::regclass;
Thank you for your suggestion, this is really very helpful.

select * from pg_stat_user_tables where relid = 'media.block'::regclass;



Name               |Value                        |
-------------------+-----------------------------+
relid              |25872                        |
schemaname         |media                        |
relname            |block                        |
seq_scan           |8                            |
seq_tup_read       |139018370                    |
idx_scan           |45023556                     |
idx_tup_fetch      |37461539                     |
n_tup_ins          |7556051                      |
n_tup_upd          |7577720                      |
n_tup_del          |0                            |
n_tup_hot_upd      |0                            |
n_live_tup         |39782042                     |
n_dead_tup         |5938057                      |
n_mod_since_analyze|1653427                      |
n_ins_since_vacuum |5736676                      |
last_vacuum        |                             |
last_autovacuum    |2023-08-17 22:39:29.383 +0200|
last_analyze       |                             |
last_autoanalyze   |2023-08-22 16:02:56.093 +0200|
vacuum_count       |0                            |
autovacuum_count   |1                            |
analyze_count      |0                            |
autoanalyze_count  |4                            | 

Regards,

    Laszlo