Re: PATCH: Using BRIN indexes for sorted output - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: PATCH: Using BRIN indexes for sorted output |
Date | |
Msg-id | 79133b5e-5244-4059-1d4a-3748ad4a19bc@enterprisedb.com Whole thread Raw |
In response to | Re: PATCH: Using BRIN indexes for sorted output (Matthias van de Meent <boekewurm+postgres@gmail.com>) |
Responses |
Re: PATCH: Using BRIN indexes for sorted output
|
List | pgsql-hackers |
On 7/14/23 16:42, Matthias van de Meent wrote: > On Fri, 14 Jul 2023 at 16:21, Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> >> >> >> >> On 7/11/23 13:20, Matthias van de Meent wrote: >>> On Mon, 10 Jul 2023 at 22:04, Tomas Vondra >>> <tomas.vondra@enterprisedb.com> wrote: >>>> Approximation in what sense? My understanding was we'd get a range of >>>> distances that we know covers all rows in that range. So the results >>>> should be accurate, no? >>> >>> The distance is going to be accurate only to the degree that the >>> summary can produce accurate distances for the datapoints it >>> represents. That can be quite imprecise due to the nature of the >>> contained datapoints: a summary of the points (-1, -1) and (1, 1) will >>> have a minimum distance of 0 to the origin, where the summary (-1, 0) >>> and (-1, 0.5) would have a much more accurate distance of 1. >> >> Ummm, I'm probably missing something, or maybe my mental model of this >> is just wrong, but why would the distance for the second summary be more >> accurate? Or what does "more accurate" mean? >> >> Is that about the range of distances for the summary? For the first >> range the summary is a bounding box [(-1,1), (1,1)] so all we know the >> points may have distance in range [0, sqrt(2)]. While for the second >> summary it's [1, sqrt(1.25)]. > > Yes; I was trying to refer to the difference between what results you > get from the summary vs what results you get from the actual > datapoints: In this case, for finding points which are closest to the > origin, the first bounding box has a less accurate estimate than the > second. > OK. I think regular minmax indexes have a similar issue with non-distance ordering, because we don't know if the min/max values are still in the page range (or deleted/updated). >>> The point I was making is that the summary can only approximate the >>> distance, and that approximation is fine w.r.t. the BRIN sort >>> algoritm. >>> >> >> I think as long as the approximation (whatever it means) does not cause >> differences in results (compared to not using an index), it's OK. > I haven't written any code yet, but I think if we don't try to find the exact min/max distances for the summary (e.g. by calculating the closest point exactly) but rather "estimates" that are guaranteed to bound the actual min/max, that's good enough for the sorting. For the max, this probably is not an issue, as we can just calculate distance for the corners and use a maximum of that. At least with reasonable euclidean distance ... in 2D I'm imagining the bounding box summary as a rectangle, with the "max distance" being a minimum radius of a circle containing it (the rectangle). For min we're looking for the largest radius not intersecting with the box, which seems harder to calculate I think. However, now that I'm thinking about it - don't (SP-)GiST indexes already do pretty much exactly this? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: