Home > mailing lists

Re: GiST penalty functions [PoC] - Mailing list pgsql-hackers

From	Heikki Linnakangas
Subject	Re: GiST penalty functions [PoC]
Date	September 6, 2016 16:57:36
Msg-id	236d7d60-4415-f4c9-34d7-656a8314d1bd@iki.fi Whole thread Raw
In response to	GiST penalty functions [PoC] (Andrew Borodin <borodin@octonica.com>)
Responses	Re: GiST penalty functions [PoC]
List	pgsql-hackers

Tree view

On 08/29/2016 09:04 AM, Andrew Borodin wrote:
> In this message I address only first problem. I want to construct a
> penalty function that will choose:
> 1.    Entry with a zero volume and smallest margin, that can
> accommodate insertion tuple without extension, if one exists;
> 2.    Entry with smallest volume, that can accommodate insertion tuple
> without extension, if one exists;
> 3.    Entry with zero volume increase after insert and smallest margin
> increase, if one exists;
> 4.    Entry with minimal volume increase after insert.

Looking at the code, by "margin", you mean the sum of all "edges", i.e. 
of each dimension, of the cube. I guess the point of that is to 
differentiate between cubes that have the same volume, but a different 
shape.

So, let's try to come up with a scheme that doesn't require IEEE 754 
floats. Those above cases can be split into two categories, depending on 
whether the new value has zero volume or not. We can use a completely 
different scheme for the two cases, if we want, because when we're 
choosing a target page to insert to, penalty gets called for every 
original tuple, but with the same new tuple.

Here's a scheme I just came up with. There might be better ones, but 
it's something.

Let's have:

oX: Original tuple's edge sum
nX: New tuple's edge sum
dX: Edge increase

oV: Original tuple's volume
nX: New tuple's volume
dX: Volume increase

Case A: New entry has zero volume.
------

Two sub-cases:
A1: if dE > 0, use dE. dE must be in the range [0, nE]
A2: otherwise, use oE.

So how do we cram those two into a single float?

If we offset case A1 by n, we can use the range between [0, nE] for A2. 
Something like this pseudocode:

if (dE > 0)    return nE + dE;       /* A1, offset dE into range [nE, inf] */
else    return nE - (nE/oE);  /* A2, scale oE into range [0, nE] */

Case B: New entry has non-zero volume
------
B1: if dV > 0. use dV. dV must be in the range [0, nV].
B2: if dE > 0, use dE. dE must be in the range [0, nE].
B3: oV, otherwise.

By offsetting cases B1 and B2, we can again divide the range into three 
parts, with pseudocode like:

if (dV > 0)    return nV + nE + dV;   /* B1, offset dV into range [nV+nE, inf] */
else if (dE > 0)    return nV + dE;        /* B2, offset dE into range [nV, nV+nE] */
else    return nV - (nV/oV)    /* B3, scale oV into range [0, nV]

> I’ve tested patch performance with attached test. On my machine patch
> slows index construction time from 60 to 76 seconds, reduces size of
> the index from 85Mb to 82Mb, reduces time of aggregates computation
> from 36 seconds to 29 seconds.

Hmm. That's a pretty large increase in construction time. Have you done 
any profiling on where the time goes?

> I do not know: should I continue this work in cube, or it’d be better
> to fork cube?

Should definitely continue work within cube, IMHO. I don't think forking 
it to a new datatype would make this any easier.

- Heikki

pgsql-hackers by date:

From: Anastasia Lubennikova
Date: 06 September 2016, 16:48:39
Subject: Re: WIP: Covering + unique indexes.

From: Claudio Freire
Date: 06 September 2016, 16:58:33
Subject: Re: Vacuum: allow usage of more than 1GB of work mem

Re: GiST penalty functions [PoC] - Mailing list pgsql-hackers

Previous

Next