Re: GIN data corruption bug(s) in 9.6devel - Mailing list pgsql-hackers
| From | Tomas Vondra |
|---|---|
| Subject | Re: GIN data corruption bug(s) in 9.6devel |
| Date | |
| Msg-id | 568B8F15.9000407@2ndquadrant.com Whole thread Raw |
| In response to | Re: GIN data corruption bug(s) in 9.6devel (Jeff Janes <jeff.janes@gmail.com>) |
| Responses |
Re: GIN data corruption bug(s) in 9.6devel
|
| List | pgsql-hackers |
Hi,
On 12/23/2015 09:33 PM, Jeff Janes wrote:
> On Mon, Dec 21, 2015 at 11:51 AM, Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
>>
>>
>> On 12/21/2015 07:41 PM, Jeff Janes wrote:
>>>
>>> On Sat, Dec 19, 2015 at 3:19 PM, Tomas Vondra
>>> <tomas.vondra@2ndquadrant.com> wrote:
>>
>>
>> ...
>>
>>>> So both patches seem to do the trick, but (2) is faster. Not sure
>>>> if this is expected. (BTW all the results are without asserts
>>>> enabled).
>>>
>>>
>>> Do you know what the size of the pending list was at the end of each
>>> test?
>>>
>>> I think last one may be faster because it left a large mess behind
>>> that someone needs to clean up later.
>>
>>
>> No. How do I measure it?
>
> pageinspect's gin_metapage_info, or pgstattuple's pgstatginindex
Hmmm, so this turns out not very useful, because at the end the data I
get from gin_metapage_info is almost exactly the same for both patches
(more details below).
>>
>>>
>>> Also, do you have the final size of the indexes in each case?
>>
>>
>> No, I haven't realized the patches do affect that, so I haven't measured it.
>
> There shouldn't be a difference between the two approaches (although I
> guess there could be if one left a larger pending list than the other,
> as pending lists is very space inefficient), but since you included
> 9.5 in your test I thought it would be interesting to see how either
> patched version under 9.6 compared to 9.5.
Well, turns out there's a quite significant difference, actually. The
index sizes I get (quite stable after multiple runs):
9.5 : 2428 MB 9.6 + alone cleanup : 730 MB 9.6 + pending lock : 488 MB
So that's quite a significant difference, I guess. The load duration for
each version look like this:
9.5 : 1415 seconds 9.6 + alone cleanup : 1310 seconds 9.6 + pending lock : 1380 seconds
I'd say I'm happy with sacrificing ~5% of time in exchange for ~35%
reduction of index size.
The size of the index on 9.5 after VACUUM FULL (so pretty much the
smallest index possible) is 440MB, which suggests the "pending lock"
patch does a quite good job.
The gin_metapage_info at the end of one of the runs (pretty much all the
runs look exactly the same) looks like this:
pending lock alone cleanup 9.5
-------------------------------------------------------- pending_head 2 2 310460
pending_tail 338 345 310806 tail_free_size 812 812 812
n_pending_pages 330 339 347 n_pending_tuples 1003 1037 1059
n_total_pages 2 2 2 n_entry_pages 1 1 1
n_data_pages 0 0 0 n_entries 0 0 0 version
2 2 2
So almost no difference, except for the pending_* attributes, and even
in that case the values are only different for 9.5 branch. Not sure what
conclusion to draw from this - maybe it's necessary to collect the
function input while the load is running (but that'd be tricky to
process, I guess).
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: