Thread: improving speed of query that uses a multi-column "filter" ?

improving speed of query that uses a multi-column "filter" ?

From

Jonathan Vanasco

Date:

30 September 2014, 23:50:28

I'm trying to improve the speed of suite of queries that go across a few million rows.

They use 2 main "filters" across a variety of columns:

    WHERE (col_1 IS NULL ) AND (col_2 IS NULL) AND ((col_3 IS NULL) OR (col_3 = col_1))
    WHERE (col_1 IS True ) AND (col_2 IS True) AND (col_3 IS True) OR (col_4 IS NULL)

I created a dedicated multi-column index for each query to speed them up.  That was great.

I still don't have the performance where I want it to be - the size of the index seems to be an issue.  If the index
wereon one column, instead of 4, I think the scans would complete in time. 

i looked online and the archives, and couldn't find much information on good strategies to deal with this.

It looks like my best option is to somehow index on the "interpretation" of this criteria, and not the criteria itself.

the two ways that come to mind are:

    1. alter the table: adding a boolean column for each filter-test to the table, index that, then query for that
field
    2. leave the table as-is: write a custom function for each filter, and then use a function index

has anyone else encountered a need like this?

are there any tips / tricks / things I should look out for.  are there better ways to handle this?

Re: improving speed of query that uses a multi-column "filter" ?

From

John R Pierce

Date:

01 October 2014, 00:04:58

On 9/30/2014 4:50 PM, Jonathan Vanasco wrote:
>     WHERE (col_1 IS NULL ) AND (col_2 IS NULL) AND ((col_3 IS NULL) OR (col_3 = col_1))

if col_1 IS NULL,   then that OR condition doesn't make much sense.
just saying...

these 4 columns are all nullable booleans, so they can be TRUE, FALSE,
or NULL ?  with 4 columns, there's 3^4 = 81 possible combinations of
these values...    you might get better speeds encoding this as a single
SHORT INTEGER, and enumerating those 81 states, then just do equals or
IN (set of values) conditions...   of course, this might make a lot of
OTHER code more complicated.   It might be easier to make each col_X 2
bits of this integer, such that one bit indicates the value was 'NULL',
and the other bit is the true/false state if that first bit isn't set,
this would make testing individual bits somewhat better.

--
john r pierce                                      37N 122W
somewhere on the middle of the left coast

Re: improving speed of query that uses a multi-column "filter" ?

From

Jonathan Vanasco

Date:

01 October 2014, 00:46:22

On Sep 30, 2014, at 8:04 PM, John R Pierce <pierce@hogranch.com> wrote:
> if col_1 IS NULL,   then that OR condition doesn't make much sense.     just saying...

I was just making a quick example.  There are two commonly used "filter sets", each are mostly on Bool columns that
allownull -- but one checks to see if the row references itself in a particular column. 

> these 4 columns are all nullable booleans, so they can be TRUE, FALSE, or NULL ?

Most of them, yes.

> with 4 columns, there's 3^4 = 81 possible combinations of these values...    you might get better speeds encoding
thisas a single SHORT INTEGER, and enumerating those 81 states, then just do equals or IN (set of values) conditions...
 of course, this might make a lot of OTHER code more complicated.   It might be easier to make each col_X 2 bits of
thisinteger, such that one bit indicates the value was 'NULL', and the other bit is the true/false state if that first
bitisn't set, this would make testing individual bits somewhat better. 

That's interesting.  I never thought of how Postgres processes the data.

For legacy reasons, I can't change the data types -- but I can add additional columns.  So I could do a
trigger/functionthat manages a filter_test column that is an int, give each filter a bit value, and then just run a
scanon that.  It wouldn't be much more work to test that and dedicated Bool columns for each filter.

Re: improving speed of query that uses a multi-column "filter" ?

From

Misa Simic

Date:

01 October 2014, 04:35:11

On Wednesday, October 1, 2014, Jonathan Vanasco <postgres@2xlp.com> wrote:

I'm trying to improve the speed of suite of queries that go across a few million rows.

They use 2 main "filters" across a variety of columns:

WHERE (col_1 IS NULL ) AND (col_2 IS NULL) AND ((col_3 IS NULL) OR (col_3 = col_1))
WHERE (col_1 IS True ) AND (col_2 IS True) AND (col_3 IS True) OR (col_4 IS NULL)

I created a dedicated multi-column index for each query to speed them up. That was great.

I still don't have the performance where I want it to be - the size of the index seems to be an issue. If the index were on one column, instead of 4, I think the scans would complete in time.

i looked online and the archives, and couldn't find much information on good strategies to deal with this.

It looks like my best option is to somehow index on the "interpretation" of this criteria, and not the criteria itself.

the two ways that come to mind are:

1. alter the table: adding a boolean column for each filter-test to the table, index that, then query for that field
2. leave the table as-is: write a custom function for each filter, and then use a function index

has anyone else encountered a need like this?

are there any tips / tricks / things I should look out for. are there better ways to handle this?

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Have you considered maybe partial indexes?

http://www.postgresql.org/docs/9.3/static/indexes-partial.html

I.e idx1 on pk column of the table with where inside index exactly the same as your first where

Idx2 on pk column with where inside index as second where

Cheers,

Misa

Re: improving speed of query that uses a multi-column "filter" ?

From

Jonathan Vanasco

Date:

01 October 2014, 16:12:44

On Oct 1, 2014, at 12:34 AM, Misa Simic wrote:

Have you considered maybe partial indexes?

http://www.postgresql.org/docs/9.3/static/indexes-partial.html

I.e idx1 on pk column of the table with where inside index exactly the same as your first where

Idx2 on pk column with where inside index as second where

That was actually my first attempt , and I was hoping it would work.

Unfortunately, there is always something in the queries that keeps Postgres trying to use other (slower) indexes or jumping to a sequential scan.

I haven't been able to trick the planner into using the partial index, and most online resources suggested it wasn't possible.