Full text indexing preformance! (long) - Mailing list pgsql-hackers
From | Mitch Vincent |
---|---|
Subject | Full text indexing preformance! (long) |
Date | |
Msg-id | 008e01bfc8d4$7b75c180$0300000a@doot.org Whole thread Raw |
Responses |
Re: Full text indexing preformance! (long)
|
List | pgsql-hackers |
Well I have to say that I'm pretty impressed with PostgreSQL after this.. Still, I'd like some input from the experts as I may not be doing the best I can.. I setup the full text indexing (as described in contrib/fulltextindex) and see some amazing results but like I said earlier, I might be able to do better. I took Tom's advice and turned on like planning (as described in contrib/likeplanning) and it made a world of difference by itself.. Here is a quick run-down of the table structure. Table : resumes_fti (coorespnds to the cds-fti table in the example) 25370953 rows. Table applicants_resumes (cooresponds to the cds table in the example) Table applicants (63 fields) 11039 rows. Table applicants_states (2 fields) 276255 rows The most complex query I use is this : select a.appcode, a.firstname, a.middlename, a.lastname, a.state, a.degree1, a.d1date, a.degree2, a.d2date, a.salary, a.skill1, a.skill2, a.skill3, a.objective, a.employer, a.sic1, a.sic2, a.sic3, a.prefs1, a.prefs2, a.sells from applicants as a,applicants_states as s, applicants_resumes as ar, resumes_fti as rf where a.status = 'A' and s.rstate='AL' and s.app_id=a.app_id and rf.string ~'engineer' and rf.id = ar.oid limit 10 offset 0 -- BUT - I forgot one crucial thing. To qualify the results from the applicants_resume table bases on the applicants table (ie ar.app_id = a.app_id) I did this and the query went from just over 3 seconds to over 25 seconds! I changes the above query to select a.appcode, a.firstname, a.middlename, a.lastname, a.state, a.degree1, a.d1date, a.degree2, a.d2date, a.salary, a.skill1, a.skill2, a.skill3, a.objective, a.employer, a.sic1, a.sic2, a.sic3, a.prefs1, a.prefs2, a.sells from applicants as a,applicants_states as s, applicants_resumes as ar, resumes_fti as rf where a.status = 'A' and s.rstate='AL' and rf.string ~'engineer' and rf.id = ar.oid and s.app_id=a.app_id and ar.app_id=a.app_id limit 10 offset 0 (Listed below again) Hopefully it's just something else stupid I am doing and someone will beat me with a clue stick. All of this was done on a PostgreSQL 7.0 backend run as "/usr/local/pgsql/bin/postmaster -D /usr/local/pgsql/data -B 4096 -o '-S 16384' -i >/usr/local/pgsql/postgres.log 2>&1&" on a FreeBSD 4.0, Dual Celeron 600 box with an ATA/66 30 gig drive and 256 megs of RAM. Here are some stats : Without the extra condition : NOTICE: QUERY PLAN: Nested Loop (cost=0.00..487951.02 rows=1644 width=204) -> Nested Loop (cost=0.00..478489.01 rows=2641 width=12) -> Nested Loop (cost=0.00..474081.63 rows=1 width=8) -> Seq Scan on resumes_fti rf (cost=0.00..474076.91rows=1 width=4) -> Index Scan using resumes_oid_index on applicants_resumes ar (cost=0.00..4.70 rows=1 width=4) -> Index Scan using applicants_states_rstate on applicants_states s (cost=0.00..4380.98 rows=2641 width=4) -> Index Scan using applicants_app_id on applicants a (cost=0.00..3.57 rows=1 width=192) EXPLAIN StartTransactionCommand query: select a.appcode, a.firstname, a.middlename, a.lastname, a.state, a.degree1, a.d1date, a.degree2, a.d2date, a.salary, a.skill1, a.skill2, a.skill3, a.objective, a.employer, a.sic1, a.sic2, a.sic3, a.prefs1, a.prefs2, a.sells from applicants as a,applicants_states as s, applicants_resumes as ar, resumes_fti as rf where a.status = 'A' and s.rstate='AL' and rf.string ~'engineer' and rf.id = ar.oid and s.app_id=a.app_id limit 10 offset 0; ProcessQuery ! system usage stats: ! 3.386697 elapsed 2.599174 user 0.787057 system sec ! [2.617929 user 0.797100 sys total] ! 0/0 [0/0] filesystem blocks in/out ! 0/8330 [0/8569] page faults/reclaims, 0 [0] swaps ! 0 [0] signals rcvd, 0/0 [3/3] messages rcvd/sent ! 0/43 [3/47] voluntary/involuntary context switches ! postgres usage stats: ! Shared blocks: 4950 read, 0 written, buffer hit rate = 20.03% ! Local blocks: 0 read, 0 written, buffer hit rate = 0.00% ! Direct blocks: 0 read, 0 written CommitTransactionCommand proc_exit(0) shmem_exit(0) exit(0) With the extra condition : NOTICE: QUERY PLAN: Nested Loop (cost=0.00..474194.76 rows=1 width=208) -> Nested Loop (cost=0.00..474085.21 rows=1 width=204) -> NestedLoop (cost=0.00..474081.63 rows=1 width=12) -> Seq Scan on resumes_fti rf (cost=0.00..474076.91 rows=1 width=4) -> Index Scan using resumes_oid_index on applicants_resumes ar (cost=0.00..4.70 rows=1 width=8) -> Index Scan using applicants_app_id on applicants a (cost=0.00..3.57 rows=1 width=192) -> Index Scan using applicants_states_app_id on applicants_states s (cost=0.00..109.54 rows=1 width=4) EXPLAIN StartTransactionCommand query: select a.appcode, a.firstname, a.middlename, a.lastname, a.state, a.degree1, a.d1date, a.degree2, a.d2date, a.salary, a.skill1, a.skill2, a.skill3, a.objective, a.employer, a.sic1, a.sic2, a.sic3, a.prefs1, a.prefs2, a.sells from applicants as a,applicants_states as s, applicants_resumes as ar, resumes_fti as rf where a.status = 'A' and s.rstate='AL' and rf.string ~'engineer' and rf.id = ar.oid and s.app_id=a.app_id and ar.app_id=a.app_id limit 10 offset 0 ProcessQuery ! system usage stats: ! 25.503341 elapsed 18.564543 user 5.599631 system sec ! [18.564543 user 5.627987 sys total] ! 2029/0 [2029/0] filesystem blocks in/out ! 0/8335 [0/8571] page faults/reclaims, 0 [0] swaps ! 0 [0] signals rcvd, 0/0 [3/3] messages rcvd/sent ! 149/342 [152/346] voluntary/involuntary context switches ! postgres usage stats: ! Shared blocks: 35118 read, 0 written, buffer hit rate = 4.98% ! Local blocks: 0 read, 0 written, buffer hit rate = 0.00% ! Direct blocks: 0 read, 0 written CommitTransactionCommand proc_exit(0) shmem_exit(0) exit(0) Sorry about the length. Thanks! -Mitch
pgsql-hackers by date: