Thread: BRIN indexes
With regard to BRIN indexes:
http://www.postgresql.org/docs/9.5/interactive/brin-intro.html
62.1. Introduction
....
"A block range is a group of pages that are physically adjacent in the table; for each block range, some summary info is stored by the index."
From the above, may I presume that it is best to cluster (or sort), the table based on the intended
BRIN column(s) before actually creating the index to insure the pages are adjacent? If so, should
that not be included in the documentation, instead of implied?
--
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

Melvin Davidson wrote: > With regard to BRIN indexes: > > http://www.postgresql.org/docs/9.5/interactive/brin-intro.html > > 62.1. Introduction > .... > "A block range is a group of pages that are physically adjacent in the > table; for each block range, some summary info is stored by the index." > > From the above, may I presume that it is best to cluster (or sort), the > table based on the intended > BRIN column(s) before actually creating the index to insure the pages are > adjacent? If so, should > that not be included in the documentation, instead of implied? The issue is that you cannot normally afford to cluster a table every once in a while; if the natural order in which data is loaded isn't good for BRIN, then perhaps you shouldn't consider BRIN at all. If you're bulk-loading and then create a BRIN index, then it's better to load the data in order of the columns. But perhaps you have reasons to have the table sorted in some other order, in which case trying to satisfy BRIN would be worse. All in all, I think there are enough caveats about this that I'm not sure about putting it up in the doc. I don't have faith in CLUSTER anyway. Taking exclusive locks and all. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 29 January 2016 at 06:10, Melvin Davidson <melvin6925@gmail.com> wrote: > With regard to BRIN indexes: > > http://www.postgresql.org/docs/9.5/interactive/brin-intro.html > > 62.1. Introduction > .... > "A block range is a group of pages that are physically adjacent in the table; for each block range, some summary infois stored by the index." > > From the above, may I presume that it is best to cluster (or sort), the table based on the intended > BRIN column(s) before actually creating the index to insure the pages are adjacent? If so, should > that not be included in the documentation, instead of implied? I personally think the second sentence of the link to the documentation covers this quite well. Namely "BRIN is designed for handling very large tables in which certain columns have some natural correlation with their physical location within the table." Examples of this might be something like an "orders" table, where you have an orderdate column, probably you'll insert into this table as orders are received, so quite possibly the table will be naturally ordered in ascending orderdate order. Although UPDATEs might create new tuples in some free space elsewhere in the relation, but it's not hard to imagine other cases where there's no updates and "natural correlation" is persisted. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 29 January 2016 at 06:10, Melvin Davidson <melvin6925@gmail.com> wrote:
> With regard to BRIN indexes:
>
> http://www.postgresql.org/docs/9.5/interactive/brin-intro.html
>
> 62.1. Introduction
> ....
> "A block range is a group of pages that are physically adjacent in the table; for each block range, some summary info is stored by the index."
>
> From the above, may I presume that it is best to cluster (or sort), the table based on the intended
> BRIN column(s) before actually creating the index to insure the pages are adjacent? If so, should
> that not be included in the documentation, instead of implied?
I personally think the second sentence of the link to the
documentation covers this quite well. Namely "BRIN is designed for
handling very large tables in which certain columns have some natural
correlation with their physical location within the table."
Examples of this might be something like an "orders" table, where you
have an orderdate column, probably you'll insert into this table as
orders are received, so quite possibly the table will be naturally
ordered in ascending orderdate order. Although UPDATEs might create
new tuples in some free space elsewhere in the relation, but it's not
hard to imagine other cases where there's no updates and "natural
correlation" is persisted.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

On 01/28/2016 09:41 AM, Melvin Davidson wrote: > So, IOW, and the answer to my question is yes, it should be insured that > all pages involved are physically adjacent (by design or by pre-sort) > before creating a BRIN on them. > Further to the point, it is self defeating to have more than one BRIN > index on the table if the columns involved would have mutually > non-adjacent pages. > Therefore, it actually would be good to state that in the documentation, > even it were just a comment. BRIN indexes are best used on INSERT only tables with a sequence of numbers as a PK or indexed column that will be queried against. At least as I understand it. JD -- Command Prompt, Inc. http://the.postgres.company/ +1-503-667-4564 PostgreSQL Centered full stack support, consulting and development.
On 01/28/2016 09:41 AM, Melvin Davidson wrote:So, IOW, and the answer to my question is yes, it should be insured that
all pages involved are physically adjacent (by design or by pre-sort)
before creating a BRIN on them.
Further to the point, it is self defeating to have more than one BRIN
index on the table if the columns involved would have mutually
non-adjacent pages.
Therefore, it actually would be good to state that in the documentation,
even it were just a comment.
BRIN indexes are best used on INSERT only tables with a sequence of numbers as a PK or indexed column that will be queried against. At least as I understand it.
JD
--
Command Prompt, Inc. http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Felipe Santos
Sent: Thursday, January 28, 2016 1:17 PM
To: Joshua D. Drake <jd@commandprompt.com>
Cc: Melvin Davidson <melvin6925@gmail.com>; David Rowley <david.rowley@2ndquadrant.com>; pgsql-general@postgresql.org; Thomas Kellerer <spam_eater@gmx.net>
Subject: Re: [GENERAL] BRIN indexes
"Further to the point, it is self defeating to have more than one BRIN index on the table if the columns involved would have mutually non-adjacent pages."
Not really, if both columns are ordered, BRIN will work
"Therefore, it actually would be good to state that in the documentation, even it were just a comment."
It is = "BRIN is designed for handling very large tables in which certain columns have some natural correlation with their physical location within the table"
Also, I did some tests and here are the results I got:
Query with no index = completion time 43s
Same Query with BRIN = completion time 14s / index size 0,5 MB
Same Query without BRIN and with BTREE = completion time 10s / index size 5.000,00 MB
As you can see, BRIN can save 99% of disk space for just a slightly worse performance.
It seems like a huge improvement, given that your data fits BRIN's use case.
Felipe,
What kind of queries you used in your test?
Where they based on clustering columns?
Regards
Igor Neyman
>> From the above, may I presume that it is best to cluster (or sort), the >> table based on the intended >> BRIN column(s) before actually creating the index to insure the pages are >> adjacent? If so, should >> that not be included in the documentation, instead of implied? The same question is asked to me at PGConf.DE. I think it would be nice to address it in the documentation somehow. Maybe, we should also explain how the table is physically organised. It is not clear to users what kind of operations would make BRIN more useful. > I don't have faith in CLUSTER anyway. Taking exclusive locks and all. It also requires a btree index. If you can afford to have btree, you probably don't need BRIN anyway. Something lighter than CLUSTER which can use BRIN would be useful.
Emre Hasegeli wrote: > >> From the above, may I presume that it is best to cluster (or sort), the > >> table based on the intended > >> BRIN column(s) before actually creating the index to insure the pages are > >> adjacent? If so, should > >> that not be included in the documentation, instead of implied? > > The same question is asked to me at PGConf.DE. I think it would be > nice to address it in the documentation somehow. Maybe, we should > also explain how the table is physically organised. It is not clear > to users what kind of operations would make BRIN more useful. Grumble. > > I don't have faith in CLUSTER anyway. Taking exclusive locks and all. > > It also requires a btree index. If you can afford to have btree, you > probably don't need BRIN anyway. Something lighter than CLUSTER which > can use BRIN would be useful. What I think would be useful is a way for the BRIN index to guide location of a new tuple, so that it's put in the right spot right from the start, instead of having it be moved later. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Felipe Santos
Sent: Thursday, January 28, 2016 1:17 PM
To: Joshua D. Drake <jd@commandprompt.com>
Cc: Melvin Davidson <melvin6925@gmail.com>; David Rowley <david.rowley@2ndquadrant.com>; pgsql-general@postgresql.org; Thomas Kellerer <spam_eater@gmx.net>
Subject: Re: [GENERAL] BRIN indexes
"Further to the point, it is self defeating to have more than one BRIN index on the table if the columns involved would have mutually non-adjacent pages."
Not really, if both columns are ordered, BRIN will work
"Therefore, it actually would be good to state that in the documentation, even it were just a comment."
It is = "BRIN is designed for handling very large tables in which certain columns have some natural correlation with their physical location within the table"
Also, I did some tests and here are the results I got:
Query with no index = completion time 43s
Same Query with BRIN = completion time 14s / index size 0,5 MB
Same Query without BRIN and with BTREE = completion time 10s / index size 5.000,00 MB
As you can see, BRIN can save 99% of disk space for just a slightly worse performance.
It seems like a huge improvement, given that your data fits BRIN's use case.
Felipe,
What kind of queries you used in your test?
Where they based on clustering columns?
Regards
Igor Neyman
Felipe Santos wrote: > brin_db=# EXPLAIN ANALYSE SELECT count(*) FROM orders WHERE order_date > BETWEEN '2012-01-04 09:00:00' and '2014-01-04 14:30:00'; > > QUERY PLAN > > ------------------------------------------------------------------------------------------------------------------------------------------------------ > ----------------- > Aggregate (cost=2408269.34..2408269.35 rows=1 width=0) (actual > time=14164.923..14164.923 rows=1 loops=1) > -> Bitmap Heap Scan on orders (cost=326808.28..2328609.76 > rows=31863832 width=0) (actual time=42.742..11430.562 rows=31589101 loops=1) > Recheck Cond: ((order_date >= '2012-01-04 09:00:00-02'::timestamp > with time zone) AND (order_date <= '2014-01-04 14:30:00-02'::timestamp with > time zone)) > Rows Removed by Index Recheck: 21907 > Heap Blocks: lossy=201344 > -> Bitmap Index Scan on idx_order_date_brin > (cost=0.00..318842.32 rows=31863832 width=0) (actual time=36.151..36.151 > rows=2013440 loops=1) > Index Cond: ((order_date >= '2012-01-04 > 09:00:00-02'::timestamp with time zone) AND (order_date <= '2014-01-04 > 14:30:00-02'::timestamp > with time zone)) > Planning time: 0.297 ms > Execution time: 14164.985 ms > (9 rows) The number of blocks read from the heap is a key number to watch for, because when you get concurrency that's what going to matter the most. Here you have 201k buffer reads, versus, uh, I don't know how many pages read (because you didn't use the VERBOSE explain option). I think it's worth comparing how many buffer accesses your query had to read. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services