Home > mailing lists

Re: Query on indexed table too slow - Mailing list pgsql-novice

From	Fábio Moreira
Subject	Re: Query on indexed table too slow
Date	March 8, 2016 00:20:31
Msg-id	CANQddpOTBZz+HJm0ttugYPAMKmyro6OA+YWNe0XrhYkvydN9Sg@mail.gmail.com Whole thread Raw
In response to	Query on indexed table too slow (Shmagi Kavtaradze <kavtaradze.s@gmail.com>)
Responses	Re: Query on indexed table too slow
List	pgsql-novice

Tree view

Hi Shmagi,

Your table is tiny -- tiny enough that the actual working set (the rows of chunks2, without the "doc" field) easily fits in memory; you're unlikely to get a large improvement here, specially considering how, by the looks of the query planner, you have relatively few distinct chunkid (is that so? can you post SELECT COUNT(DISTINCT chunkid) FROM chunks2?).

Two things come to mind here:

1) that similarity comparison you're doing looks expensive, with a cast to TEXT and string manipulation and what not. I'm not sure if PostgreSQL has a native bit count method to speed this part of the code up; you could implement one in C to see if that improves things, but it might be easier (if indeed you only have 10 bits) to do instead:

CREATE TEMPORARY TABLE popcount AS

SELECT

i::BIT(10) AS value,
Length(Replace(Cast(i::BIT(10) AS TEXT), '0', '')) AS popcount

FROM

generate_series(0, 1023, 1) AS temp(i);

and join against that table (again, it's tiny, so no real need for an index here).

2) do you really need all 100M rows of the output? I assume this is some document similarity metric you're after? What will be done with this data after it is generated?

[]s, Fábio.

On Tue, Mar 8, 2016 at 2:12 AM, Shmagi Kavtaradze <kavtaradze.s@gmail.com> wrote:

I have a table with 46230 rows(1 doc, 4623 sentences and 10 chunks for each sentence):

create table chunks(
doc varchar,
sentenceid int,
chunkid int,
chunk bit(10)
);

With the query I want to compare sentence chunks with other sentence chunks that have same chunkid:

SELECT
a.sentenceid,
b.sentenceid, a.chunkid,
Length(Replace(Cast(a.chunk & b.chunk AS TEXT), '0', ''))::float / Length(a.chunk)::float
FROM chunks2 a
INNER JOIN chunks2 b
ON a.sentenceid < b.sentenceid and a.chunkid = b.chunkid;

I ran explain analyze on unindexed table, composite index and both indexed separately,but time is the same for all:

Indexed on (sentenceid, chunkid):

Hash Join (cost=1335.17..4549476.28 rows=71249559 width=26) (actual time=144.376..1156178.110 rows=106837530 loops=1)
Hash Cond: (a.chunkid = b.chunkid)
Join Filter: (a.sentenceid < b.sentenceid)
Rows Removed by Join Filter: 106883760
-> Seq Scan on chunks2 a (cost=0.00..757.30 rows=46230 width=15) (actual time=0.039..77.275 rows=46230 loops=1)
-> Hash (cost=757.30..757.30 rows=46230 width=15) (actual time=142.954..142.954 rows=46230 loops=1)
Buckets: 65536 Batches: 1 Memory Usage: 2680kB
-> Seq Scan on chunks2 b (cost=0.00..757.30 rows=46230 width=15) (actual time=0.031..64.340 rows=46230 loops=1)
Planning time: 1.209 ms
Execution time: 1212779.012 ms

I know they have the same operations and no index was used. Where is my mistake and how to speed up query with index? Or how to use indexes efficiently in my case?

Fábio Dias Moreira

pgsql-novice by date:

From: Shmagi Kavtaradze
Date: 07 March 2016, 17:12:35
Subject: Query on indexed table too slow

From: Jozef Riha
Date: 08 March 2016, 22:42:29
Subject: Re: help needed with error during upgrade (9.4 -> 9.5)

Re: Query on indexed table too slow - Mailing list pgsql-novice

Previous

Next