Home > mailing lists

Re: [HACKERS] Declarative partitioning - another take - Mailing list pgsql-hackers

From	Venkata B Nagothi
Subject	Re: [HACKERS] Declarative partitioning - another take
Date	December 9, 2016 09:16:12
Msg-id	CAEyp7J-iw2-z8zeibP0ipgtdAc-+d_euwj=H0k8Nfz5gnz5H=Q@mail.gmail.com Whole thread
In response to	Re: [HACKERS] Declarative partitioning - another take (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>)
Responses	Re: [HACKERS] Declarative partitioning - another take
List	pgsql-hackers

Tree view

Hi,

I am testing the partitioning feature from the latest master and got the following error while loading the data -

db01=# create table orders_y1993 PARTITION OF orders FOR VALUES FROM ('1993-01-01') TO ('1993-12-31');

CREATE TABLE

db01=# copy orders from '/data/orders-1993.csv' delimiter '|';

ERROR: could not read block 6060 in file "base/16384/16412": read only 0 of 8192 bytes

CONTEXT: COPY orders, line 376589: "9876391|374509|O|54847|1997-07-16|3-MEDIUM |Clerk#000001993|0|ithely regular pack"

Am i doing something wrong ?

Regards,

Venkata B N

Database Consultant

On Fri, Dec 9, 2016 at 3:58 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:

On 2016/12/09 0:25, Robert Haas wrote:
> On Wed, Dec 7, 2016 at 11:42 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>>> Congrats to everyone working on this! This is a large step forward.
>>
>> Congratulations to all! It was a long way to this result.
>
> Yes. The last effort in this area which I can remember was by Itagaki
> Takahiro in 2010, so we've been waiting for this for more than 6
> years. It's really good that Amit was able to put in the effort to
> produce a committable patch, and I think he deserves all of our thanks
> for getting that done - and NTT deserves our thanks for paying him to
> do it.
>
> Even though I know he put in a lot more work than I did, let me just
> say: phew, even reviewing that was a ton of work.

Absolutely! Your review comments and design suggestions have been
instrumental in improving (and cutting down on the size of) the patches.

> Of course, this is the beginning, not the end.

+1000!

> I've been thinking
> about next steps -- here's an expanded list:
>
> - more efficient plan-time partition pruning (constraint exclusion is too slow)
> - run-time partition pruning
> - partition-wise join (Ashutosh Bapat is already working on this)
> - try to reduce lock levels
> - hash partitioning
> - the ability to create an index on the parent and have all of the
> children inherit it; this should work something like constraint
> inheritance. you could argue that this doesn't add any real new
> capability but it's a huge usability feature.
> - teaching autovacuum enough about inheritance hierarchies for it to
> update the parent statistics when they get stale despite the lack of
> any actual inserts/updates/deletes to the parent. this has been
> pending for a long time, but it's only going to get more important
> - row movement (aka avoiding the need for an ON UPDATE trigger on each
> partition)
> - insert (and eventually update) tuple routing for foreign partitions
> - not scanning the parent
> - fixing the insert routing so that we can skip tuple conversion where possible
> - fleshing out the documentation

I would definitely want to contribute to some of these items. It's great
that many others plan to contribute toward this as well.

> One thing I'm wondering is whether we can optimize away some of the
> heavyweight locks. For example, if somebody does SELECT * FROM ptab
> WHERE id = 1, they really shouldn't need to lock the entire
> partitioning hierarchy, but right now they do. If the root knows
> based on its own partitioning key that only one child is relevant, it
> would be good to lock *only that child*. For this feature to be
> competitive, it needs to scale to at least a few thousand partitions,
> and locking thousands of objects instead of one or two is bound to be
> slow. Similarly, you can imagine teaching COPY to lock partitions
> only on demand; if no tuples are routed to a particular partition, we
> don't need to lock it. There's a manageability component here, too:
> not locking partitions unnecessarily makes ti easier to get DDL on
> other partitions through. Alternatively, maybe we could rewrite the
> lock manager to be hierarchical, so that you can take a single lock
> that represents an AccessShareLock on all partitions and only need to
> make one entry in the lock table to do it. That means that attempts
> to lock individual partitions need to check not only for a lock on
> that partition but also on anything further up in the hierarchy, but
> that might be a good trade if it gives us O(1) locking on the parent.
> And maybe we could also have a level of the hierarchy that represents
> every-table-in-the-database, for the benefit of pg_dump. Of course,
> rewriting the lock manager is a big project not for the faint of
> heart, but I think if we don't it's going to be a scaling bottleneck.

Hierarchical lock manager stuff is interesting. Are you perhaps alluding
to a new *intention* lock mode as described in the literature on multiple
granularity locking [1]?

> We also need to consider other parts of the system that may not scale,
> like pg_dump. For a long time, we've been sorta-kinda willing to fix
> the worst of the scalability problems with pg_dump, but that's really
> no longer an adequate response. People want 1000 partitions. Heck,
> people want 1,000,000 partitions, but getting to where 1000 partitions
> works well would help PostgreSQL a lot. Our oft-repeated line that
> inheritance isn't designed for large numbers of inheritance children
> is basically just telling people who have the use case where they need
> that to go use some other product. Partitioning, like replication, is
> not an optional feature for a world-class database. And, from a
> technical point of view, I think we've now got an infrastructure that
> really should be able to be scaled up considerably higher than what
> we've been able to do in the past. When we were stuck with
> inheritance + constraint exclusion, we could say "well, there's not
> really any point because you'll hit these other limits anyway". But I
> think now that's not really true. This patch eliminates one of the
> core scalability problems in this area, and provides infrastructure
> for attacking some of the others. I hope that people will step up and
> do that. There's a huge opportunity here for PostgreSQL to become
> relevant in use cases where it currently falters badly, and we should
> try to take advantage of it. This patch is a big step by itself, but
> if we ignore the potential to do more with this as the base we will be
> leaving a lot of "win" on the table.

Agreed on all counts.

Thanks,
Amit

[1] https://en.wikipedia.org/wiki/Multiple_granularity_locking

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pgsql-hackers by date:

From: Junseok Yang
Date: 09 December 2016, 09:14:43
Subject: [HACKERS] `array_position...()` causes SIGSEGV

From: Michael Paquier
Date: 09 December 2016, 09:23:49
Subject: Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

Re: [HACKERS] Declarative partitioning - another take - Mailing list pgsql-hackers

Previous

Next