Home > mailing lists

Re: Proposal: Global Index - Mailing list pgsql-hackers

From	Jeremy Schneider
Subject	Re: Proposal: Global Index
Date	November 25, 2019 23:05:03
Msg-id	8cd6b125-2f3f-01bc-f41b-5c8343685a21@amazon.com Whole thread Raw
In response to	Re: Proposal: Global Index (Andres Freund <andres@anarazel.de>)
Responses	Re: Proposal: Global Index
List	pgsql-hackers

Tree view

On 10/30/19 10:27, Andres Freund wrote:
> On 2019-10-30 13:05:57 -0400, Tom Lane wrote:
>> Peter Geoghegan <pg@bowt.ie> writes:
>>> On Wed, Oct 30, 2019 at 9:23 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>> Well, the *effects* of the feature seem desirable, but that doesn't
>>>> mean that we want an implementation that actually has a shared index.
>>>> As soon as you do that, you've thrown away most of the benefits of
>>>> having a partitioned data structure in the first place.
>>
>>> Right, but that's only the case for the global index. Global indexes
>>> are useful when used judiciously.
>>
>> But ... why bother with partitioning then?  To me, the main reasons
>> why you might want a partitioned table are
> 
> Quite commonly there's a lot of *other* indexes, often on a lot wider
> data than the primary key, that don't need to be global. And whereas in
> a lot of cases the primary key in a partitioned table has pretty good
> locality (and thus will be mostly buffered IO), other indexes will often
> not have that property (i.e. not have much correlation with table
> position).

I asked around a little bit and got some interesting responses. Thought
I'd pass two of them along.

One person worked on a payments network (150,000+ installed readers),
the transaction table was date partitioned (1 per day) based on insert
timestamp, but lookups and updates were typically by the unique
transaction id.  Oracle DB, they kept 180 daily partitions, several
million rows per day. Transactions did not arrive in order, and could be
delayed if some part of the network was slow (they opted to allow the $2
charge rather than reject sales) and when the cash transaction records
were uploaded. Step one for their PG conversion created a read replica
in PG 9.6, and the cost of doing the individual index lookups across 180
partitions (and 180 indexes) was very high, so they stored max and min
txn id per partition and would generate a query with all the dates that
a txn id could have been in so that only a small number of partition
indexes would be accessed. They wanted a global index on txn id for
performance, not for uniqueness – id generated on reader with guid-like
semantics.

A second person worked on several large-scale systems and he relayed
that in some cases where they used Oracle global indexes on partitioned
tables, they ended up deciding to reverse that decision as things scaled
because of restrictive locking during partition maintenance (this is the
exact issue Tom points out). So even on a database _with_ the option of
using a global index, they've sometimes opted for "workaround" design
patterns instead:
* To solve uniqueness, manage serialization at the appliation level.
Isolate operations (e.g. using a queue) and use that to make sure that
two sessions don’t try to insert the same record at the same time.  From
an RDBMS, this looks like a separate, smaller table that is being used
to manage work activity.
* To solve the additional IO for a global table scan ... We often don’t
need to do this because the load in this pattern is not typically highly
concurrent.  If we are looking for higher concurrency, we can usually
add a hack/workaround that filters on a partition key to provide “pretty
good” pruning.  The net result is that you get 2-3x the IO due to the
lack of global index (same workaround as first story above).

Quote: "So ... I don’t actually like the idea of introducing this.
Unless, someone can solve the ugly challenges we have had [around
partition maintenance operations]."

I actually don't think those challenges are so un-solvable. I think that
global indexes will be irrelevant to most workloads. I'm not entirely
convinced that they won't be useful for a few people with specific
workloads and large amounts of data in PostgreSQL where the benefits
outweigh the costs. I definitely agree that care needs to be taken
around index maintenance operations if there's an effort here.

>> * ability to cheaply add and remove partitions, primarily so that
>> you can cheaply do things like "delete the oldest month's data".
> 
> You can still do that to some degree with a global index. Imagine
> e.g. keeping a 'partition id' as a sort-of column in the global
> index. That allows you to drop the partition, without having to
> immediately rebuild the index, by checking the partition id against the
> live partitions during lookup.  So sure, your'e wasting space for a bit
> in the global index, but it'll also be space that is likely to be fairly
> efficiently reclaimed the next time vacuum touches the index.  And if
> not the global index can be rebuilt concurrently without blocking
> writes.

Another idea might be to leverage PostgreSQL's partial indexes. If the
index is created "where date>2020" and you're dropping an index from
2019 then you can entirely ignore the index. Not a panacea for every
index maintenance operation, but for the super-common case of dropping
the oldest partition you can now:

1) create new index concurrently "where dt>2020"
2) drop the old index
3) drop the 2019 partition

doesn't solve world hunger but there's lots of benefit for such a simple
hack.

>> * ability to scale past our limits on the physical size of one table
>> --- both the hard BlockNumber-based limit, and the performance
>> constraints of e.g. vacuuming a very large table.
> 
> For that to be a problem for a global index the global index (which will
> often be something like two int4 or int8 columns) itself would need to
> be above the block number based limit - which doesn't seem that close.
> 
> WRT vacuuming - based on my observations the table itself isn't a
> performance problem for vacuuming all that commonly anymore, it's the
> associated index scans. So yea, that's a problem.

I'm sure zheap will make all our dreams come true, right?  :D

-Jeremy

-- 
Jeremy Schneider
Database Engineer
Amazon Web Services

pgsql-hackers by date:

From: Tom Lane
Date: 25 November 2019, 22:51:41
Subject: Re: benchmarking Flex practices

From: Jeremy Schneider
Date: 25 November 2019, 23:44:39
Subject: Re: Proposal: Global Index

Re: Proposal: Global Index - Mailing list pgsql-hackers

Previous

Next