Re: Indirect indexes - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: Indirect indexes
Date
Msg-id CAPpHfduZ_E5fPnUVQsB11yHMjE9wd4r=_j_Ct4FNcbya4tZU7g@mail.gmail.com
Whole thread Raw
In response to Re: Indirect indexes  (Alexander Korotkov <a.korotkov@postgrespro.ru>)
Responses Re: Indirect indexes
List pgsql-hackers
On Wed, Oct 19, 2016 at 12:21 AM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
On Tue, Oct 18, 2016 at 9:28 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Vacuuming presents an additional challenge: in order to remove index
items from an indirect index, it's critical to scan the PK index first
and collect the PK values that are being removed.  Then scan the
indirect index and remove any items that match the PK items removed.
This is a bit problematic because of the additional memory needed to
store the array of PK values.  I haven't implemented this yet.

Imagine another situation: PK column was not updated, but indirect indexed column was updated.
Thus, for single heap tuple we would have single PK tuple and two indirect index tuples (correct me if I'm wrong).
How are we going to delete old indirect index tuple?

Let me explain it in more details.

There is a table with two columns and indirect index on it.

CREATE TABLE tbl (id integer primary key, val integer);
CREAET INDIRECT INDEX tbl_val_indirect_idx ON tbl (val);

Then do insert and update.

INSERT INTO tbl VALUES (1, 1);
UPDATE tbl SET val = 2 WHERE id = 1;

Then heap would contain two tuples.

 ctid  | id | val
-------+----+-----
 (0;1) |  1 |   1
 (0;2) |  1 |   2

tbl_pk_idx would contain another two tuples

 id | item_pointer
----+--------------
  1 |        (0;1)
  1 |        (0;2)

And tbl_val_indirect_idx would have also two tuples

 val | id
-----+----
   1 |  1
   2 |  1

Then vacuum removes (0;1) from heap, reference to (0;1) from tbl_pk_idx.  But how will it remove (1,1) tuple from tbl_val_indirect_idx?  Thus, before vacuuming tbl_val_indirect_idx we should know not only values of id which are being removed, but actually (id, val) pairs which are being removed.  Should we collect those paris while scanning heap?  But we should also take into account that multiple heap tuples might have same (id, val) pair values (assuming there could be other columns being updated).  Therefore, we should take into account when last pair of particular (id, val) pair value was deleted from heap.  That would be very huge change to vacuum, may be even writing way more complex vacuum algorithm from scratch.  Probably, you see the better solution of this problem.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

pgsql-hackers by date:

Previous
From: Josh Berkus
Date:
Subject: Re: Remove vacuum_defer_cleanup_age
Next
From: Michael Paquier
Date:
Subject: Re: Mention column name in error messages