Re: Poor performance on seq scan - Mailing list pgsql-performance
From | Guido Neitzer |
---|---|
Subject | Re: Poor performance on seq scan |
Date | |
Msg-id | fbbe50e0609180150k66e1462fg6591db841e490dc1@mail.gmail.com Whole thread Raw |
In response to | Re: Poor performance on seq scan (Piotr Kołaczkowski <P.Kolaczkowski@elka.pw.edu.pl>) |
List | pgsql-performance |
Because there is no MVCC information in the index. cug 2006/9/12, Piotr Kołaczkowski <P.Kolaczkowski@elka.pw.edu.pl>: > On Tuesday 12 September 2006 12:47, Heikki Linnakangas wrote: > > Laszlo Nagy wrote: > > > I made another test. I create a file with the identifiers and names of > > > the products: > > > > > > psql#\o products.txt > > > psql#select id,name from product; > > > > > > Then I can search using grep: > > > > > > grep "Mug" products.txt | cut -f1 -d\| > > > > > > There is a huge difference. This command runs within 0.5 seconds. That > > > is, at least 76 times faster than the seq scan. It is the same if I > > > vacuum, backup and restore the database. I thought that the table is > > > stored in one file, and the seq scan will be actually faster than > > > grepping the file. Can you please tell me what am I doing wrong? I'm > > > not sure if I can increase the performance of a seq scan by adjusting > > > the values in postgresql.conf. I do not like the idea of exporting the > > > product table periodically into a txt file, and search with grep. :-) > > > > Is there any other columns besides id and name in the table? How big is > > products.txt compared to the heap file? > > > > > Another question: I have a btree index on product(name). It contains > > > all product names and the identifiers of the products. Wouldn't it be > > > easier to seq scan the index instead of seq scan the table? The index > > > is only 66MB, the table is 1123MB. > > > > Probably, but PostgreSQL doesn't know how to do that. Even if it did, it > > depends on how many matches there is. If you scan the index and then > > fetch the matching rows from the heap, you're doing random I/O to the > > heap. That becomes slower than scanning the heap sequentially if you're > > going to get more than a few hits. > > Why match rows from the heap if ALL required data are in the index itself? > Why look at the heap at all? > > This is the same performance problem in PostgreSQL I noticed when doing > some "SELECT count(*)" queries. Look at this: > > explain analyze select count(*) from transakcja where data > '2005-09-09' and > miesiac >= (9 + 2005 * 12) and kwota < 50; > > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------------------------------------------ > Aggregate (cost=601557.86..601557.87 rows=1 width=0) (actual > time=26733.479..26733.484 rows=1 loops=1) > -> Bitmap Heap Scan on transakcja (cost=154878.00..596928.23 rows=1851852 > width=0) (actual time=9974.208..18796.060 rows=1654218 loops=1) > Recheck Cond: ((miesiac >= 24069) AND (kwota < 50::double precision)) > Filter: (data > '2005-09-09 00:00:00'::timestamp without time zone) > -> Bitmap Index Scan on idx_transakcja_miesiac_kwota > (cost=0.00..154878.00 rows=5555556 width=0) (actual time=9919.967..9919.967 > rows=1690402 loops=1) > Index Cond: ((miesiac >= 24069) AND (kwota < 50::double > precision)) > Total runtime: 26733.980 ms > (7 rows) > > The actual time retrieving tuples from the index is less than 10 seconds, but > the system executes needless heap scan that takes up additional 16 seconds. > > Best regards, > Peter > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match > -- PostgreSQL Bootcamp, Big Nerd Ranch Europe, Nov 2006 http://www.bignerdranch.com/news/2006-08-21.shtml
pgsql-performance by date: