Re: brininsert optimization opportunity - Mailing list pgsql-hackers
From | Soumyadeep Chakraborty |
---|---|
Subject | Re: brininsert optimization opportunity |
Date | |
Msg-id | CAE-ML+-9K2h2R7zcvQGq8FarxvSVhgAGZkgH57iQCnWQoxOFww@mail.gmail.com Whole thread Raw |
In response to | Re: brininsert optimization opportunity (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Responses |
Re: brininsert optimization opportunity
|
List | pgsql-hackers |
On Tue, Jul 4, 2023 at 2:54 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > I'm not sure if memory context callbacks are the right way to rely on > for this purpose. The primary purpose of memory contexts is to track > memory, so using them for this seems a bit weird. Yeah, this just kept getting dirtier and dirtier. > There are cases that do something similar, like expandendrecord.c where > we track refcounted tuple slot, but IMHO there's a big difference > between tracking one slot allocated right there, and unknown number of > buffers allocated much later. Yeah, the following code in ER_mc_callbackis is there I think to prevent double freeing the tupdesc (since it might be freed in ResourceOwnerReleaseInternal()) (The part about: /* Ditto for tupdesc references */). if (tupdesc->tdrefcount > 0) { if (--tupdesc->tdrefcount == 0) FreeTupleDesc(tupdesc); } Plus the above code doesn't try anything with Resource owner stuff, whereas releasing a buffer means: ReleaseBuffer() -> UnpinBuffer() -> ResourceOwnerForgetBuffer(CurrentResourceOwner, b); > The fact that even with the extra context is still doesn't handle query > cancellations is another argument against that approach (I wonder how > expandedrecord.c handles that, but I haven't checked). > > > > > Maybe there is a better way of doing our cleanup? I'm not sure. Would > > love your input! > > > > The other alternative for all this is to introduce new AM callbacks for > > insert_begin and insert_end. That might be a tougher sell? > > > > That's the approach I wanted to suggest, more or less - to do the > cleanup from ExecCloseIndices() before index_close(). I wonder if it's > even correct to do that later, once we release the locks etc. I'll try this out and introduce a couple of new index AM callbacks. I think it's best to do it before releasing the locks - otherwise it might be weird to manipulate buffers of an index relation, without having some sort of lock on it. I'll think about it some more. > I don't think ii_AmCache was intended for stuff like this - GIN and GiST > only use it to cache stuff that can be just pfree-d, but for buffers > that's no enough. It's not surprising we need to improve this. Hmmm, yes, although the docs state: "If the index AM wishes to cache data across successive index insertions within an SQL statement, it can allocate space in indexInfo->ii_Context and store a pointer to the data in indexInfo->ii_AmCache (which will be NULL initially)." they don't mention anything about buffer usage. Well we will fix it! PS: It should be possible to make GIN and GiST use the new index AM APIs as well. > FWIW while debugging this (breakpoint on MemoryContextDelete) I was > rather annoyed the COPY keeps dropping and recreating the two BRIN > contexts - brininsert cxt / brin dtuple. I wonder if we could keep and > reuse those too, but I don't know how much it'd help. > Interesting, I will investigate that. > > Now, to finally answer your question about the speedup without > > generate_series(). We do see an even higher speedup! > > > > seq 1 200000000 > /tmp/data.csv > > \timing > > DROP TABLE heap; > > CREATE TABLE heap(i int); > > CREATE INDEX ON heap USING brin(i) WITH (pages_per_range=1); > > COPY heap FROM '/tmp/data.csv'; > > > > -- 3 runs (master 29cf61ade3f245aa40f427a1d6345287ef77e622) > > COPY 200000000 > > Time: 205072.444 ms (03:25.072) > > Time: 215380.369 ms (03:35.380) > > Time: 203492.347 ms (03:23.492) > > > > -- 3 runs (branch v2) > > > > COPY 200000000 > > Time: 135052.752 ms (02:15.053) > > Time: 135093.131 ms (02:15.093) > > Time: 138737.048 ms (02:18.737) > > > > That's nice, but it still doesn't say how much of that is reading the > data. If you do just copy into a table without any indexes, how long > does it take? So, I loaded the same heap table without any indexes and at the same scale. I got: postgres=# COPY heap FROM '/tmp/data.csv'; COPY 200000000 Time: 116161.545 ms (01:56.162) Time: 114182.745 ms (01:54.183) Time: 114975.368 ms (01:54.975) perf diff also attached between the three: w/ no indexes (baseline), master and v2. Regards, Soumyadeep (VMware)
Attachment
pgsql-hackers by date: