Thread: GiST indexing problems...
I've been doing some research work using the GiST indexes, but I persistently develop a problem where the system doesn't make use of the indexes during the execution of a query. If I use the examples provided here: http://wit.mcs.anl.gov/~selkovjr/pg_extensions/ For instance, and I place an elog( DEBUG, "functionname" ) in each of the GiST accessor functions, I can witness when the database is making use of the index. During the construction of the index, I never have a problem, although during query execution, it seems that my indices aren't getting used at all, and the database is simply searching through all of the entries in the database. This is a terribly frustrating problem that I encountered once before, but which mysteriously went away after fiddling with the problem for a while. This time, the problem isn't going away, however. When I trace through the postgres application I can see that it at least examines the opclass for my specialized data types, and detects that there exists an index that could be used, but it seems to decide not to make use of it regardless. Is there an easy way that I can force the use of an index during a query? -David ----------------------[=========]------------------------ David T. McWherter udmcwher@mcs.drexel.edu vdiff =====/vee'dif/ v.,n. Visual diff. The operation offinding differences between two files by {eyeball search}. Theterm `optical diff' has also been reported, and is sometimes more specifically used for the act of superimposing two nearly identical printouts on one another and holding them up to a light to spot differences. Though this method is poor for detecting omissions in the `rear' file, it can also be used with printouts of graphics, a claim few if any diff programs can make. See {diff}.
David, could you provide more info (scheme, query, postgresql version) Regards, Oleg On Sat, 5 May 2001, David McWherter wrote: > > I've been doing some research work using the GiST indexes, > but I persistently develop a problem where the system doesn't > make use of the indexes during the execution of a query. If > I use the examples provided here: > > http://wit.mcs.anl.gov/~selkovjr/pg_extensions/ > > For instance, and I place an elog( DEBUG, "functionname" ) > in each of the GiST accessor functions, I can witness when > the database is making use of the index. During the construction > of the index, I never have a problem, although during query > execution, it seems that my indices aren't getting used at > all, and the database is simply searching through all of > the entries in the database. > > This is a terribly frustrating problem that I encountered > once before, but which mysteriously went away after fiddling > with the problem for a while. This time, the problem isn't > going away, however. When I trace through the postgres > application I can see that it at least examines the opclass > for my specialized data types, and detects that there exists > an index that could be used, but it seems to decide not to > make use of it regardless. > > Is there an easy way that I can force the use of an index > during a query? > > -David > > ----------------------[=========]------------------------ > David T. McWherter udmcwher@mcs.drexel.edu > > vdiff > ===== > /vee'dif/ v.,n. Visual diff. The operation offinding > differences between two files by {eyeball search}. Theterm > `optical diff' has also been reported, and is sometimes more > specifically used for the act of superimposing two nearly identical > printouts on one another and holding them up to a light to spot > differences. Though this method is poor for detecting omissions in > the `rear' file, it can also be used with printouts of graphics, a > claim few if any diff programs can make. See {diff}. > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Sure. My postgresql version is 7.0.2. My database has a datatype called graph that looks like this: CREATE TYPE graph ( internallength = VARIABLE, input = graph_in, output = graph_out ); CREATE OPERATOR ~ ( leftarg = graph, rightarg = graph, procedure = graph_distance, commutator =~ ); And it has a datatype 'graphrange': CREATE FUNCTION graph_inrange(graph, graphrange) RETURNS bool AS '/usr/remote/home_u/udmcwher/myprojs/pg_graph.2/graph.so' language 'c'; CREATE TYPE graphrange ( internallength = VARIABLE, input = graphrange_in, output = graphrange_out ); CREATE OPERATOR << ( leftarg = graph, rightarg = graphrange, procedure = graph_inrange ); I have a bunch of GiST operators that are created like this: CREATE FUNCTION gist_graph_consistent(opaque,graphrange) RETURNS bool AS '/usr/remote/home_u/udmcwher/myprojs/pg_graph.2/graph.so' language 'c'; /* the same for gist_graph_{compress,decompress,penalty,picksplit,union,same}*/ I've tried adding the parameters 'restrict = eqsel' and 'join = eqjoinsel' to the datatype operators, but that doesn't seem to change anything. I construct a new opclass like this: INSERT INTO pg_opclass (opcname,opcdeftype) values ( 'gist_graphrange_ops' ); SELECT o.oid AS opoid, o.oprname INTO TABLE graph_ops_tmp FROM pg_operator o, pg_type t WHERE o.oprleft = t.oid and t.typname = 'graph'; INSERT INTO pg_amop (amopid, amopclaid, amopopr, amopstrategy) SELECT am.oid, opcl.oid, c.opoid, 1 FROM pg_amam, pg_opclass opcl, graph_ops_tmp c WHERE amname = 'gist' and opcname = 'gist_graphrange_ops' andc.oprname = '<<'; INSERT INTO pg_amproc (amid, amopclaid, amproc, amprocnum) SELECT am.oid, opcl.oid, pro.oid, 1 FROM pg_am am, pg_opclassopcl, pg_proc pro WHERE amname = 'gist' and opcname = 'gist_graphrange_ops' and proname = 'gist_graph_consistent'; INSERT INTO pg_amproc (amid, amopclaid, amproc, amprocnum) SELECT am.oid, opcl.oid, pro.oid, 2 FROM pg_am am, pg_opclassopcl, pg_proc pro WHERE amname = 'gist' and opcname = 'gist_graphrange_ops' and proname = 'gist_graph_union'; INSERT INTO pg_amproc (amid, amopclaid, amproc, amprocnum) SELECT am.oid, opcl.oid, pro.oid, 3 FROM pg_am am, pg_opclassopcl, pg_proc pro WHERE amname = 'gist' and opcname = 'gist_graphrange_ops' and proname = 'gist_graph_compress'; INSERT INTO pg_amproc (amid, amopclaid, amproc, amprocnum) SELECT am.oid, opcl.oid, pro.oid, 4 FROM pg_am am, pg_opclassopcl, pg_proc pro WHERE amname = 'gist' and opcname = 'gist_graphrange_ops' and proname = 'gist_graph_decompress'; INSERT INTO pg_amproc (amid, amopclaid, amproc, amprocnum) SELECT am.oid, opcl.oid, pro.oid, 5 FROM pg_am am, pg_opclassopcl, pg_proc pro WHERE amname = 'gist' and opcname = 'gist_graphrange_ops' and proname = 'gist_graph_penalty'; INSERT INTO pg_amproc (amid, amopclaid, amproc, amprocnum) SELECT am.oid, opcl.oid, pro.oid, 6 FROM pg_am am, pg_opclassopcl, pg_proc pro WHERE amname = 'gist' and opcname = 'gist_graphrange_ops' and proname = 'gist_graph_picksplit'; INSERT INTO pg_amproc (amid, amopclaid, amproc, amprocnum) SELECT am.oid, opcl.oid, pro.oid, 7 FROM pg_am am, pg_opclassopcl, pg_proc pro WHERE amname = 'gist' and opcname = 'gist_graphrange_ops' and proname = 'gist_graphrange_same'; I construct a table like this: CREATE TABLE repos ( a graph, file varchar(512) ); INSERT INTO repos VALUES ( import_graphfile('/tmp/test1'), '/tmp/test1' ); INSERT INTO repos VALUES ( import_graphfile('/tmp/test2'), '/tmp/test2' ); What this does is a little bit weird, it reads in the test1 and test2 datafiles into the database, storing them as large objects. Then, it constructs graph objects which have their oid's, and returns them from import_graphfile. I then try to construct an index like this: CREATE INDEX repos_index ON repos USING gist ( a gist_graphrange_ops ) ; I've also tried a:graph and a:graphrange, but I don't think it changes anything. My queries look like: SELECT * from repos where a << '(oid-num,int-num)'::graphrange; The function operator returns a boolean if a particular relation holds between the graph object and the graphrange object. The GiST compress operator will convert leaf GRAPH keys into graphrange keys for internal use. Each of my GiST operators call elog( DEBUG, "function-name" ) as they're called. When constructing the index, compress,decompress,picksplit,union are called as expected. During the execution of the query, however, nothing happens. I've found the same exact results using the 'pggist' examples (a suite including intproc,boxproc,polyproc,textproc), and the examples found here: http://wit.mcs.anl.gov/~selkovjr/pg_extensions/contrib-7.0.tgz. The 'cube' test suite at that site is somewhat straightforward to invoke, and shows the same results. -david Oleg Bartunov writes:> David,> > could you provide more info (scheme, query, postgresql version)> > Regards,> > Oleg> On Sat, 5 May 2001, David McWherter wrote:> > >> > I've been doing some research work using the GiST indexes,>> but I persistently develop a problem where the system doesn't> > make use of the indexes during the executionof a query. If> > I use the examples provided here:> >> > http://wit.mcs.anl.gov/~selkovjr/pg_extensions/> >>> For instance, and I place an elog( DEBUG, "functionname" )> > in each of the GiST accessor functions, I can witness when>> the database is making use of the index. During the construction> > of the index, I never have a problem, althoughduring query> > execution, it seems that my indices aren't getting used at> > all, and the database is simply searchingthrough all of> > the entries in the database.> >> > This is a terribly frustrating problem that I encountered>> once before, but which mysteriously went away after fiddling> > with the problem for a while. This time, theproblem isn't> > going away, however. When I trace through the postgres> > application I can see that it at least examinesthe opclass> > for my specialized data types, and detects that there exists> > an index that could be used, but itseems to decide not to> > make use of it regardless.> >> > Is there an easy way that I can force the use of an index> >during a query?> >> > -David> >> > ----------------------[=========]------------------------> > David T. McWherter udmcwher@mcs.drexel.edu> >> > vdiff> > =====> > /vee'dif/ v.,n. Visual diff. The operation offinding> > differencesbetween two files by {eyeball search}. Theterm> > `optical diff' has also been reported, and is sometimes more>> specifically used for the act of superimposing two nearly identical> > printouts on one another and holding them upto a light to spot> > differences. Though this method is poor for detecting omissions in> > the `rear' file, it can alsobe used with printouts of graphics, a> > claim few if any diff programs can make. See {diff}.> >> > ---------------------------(endof broadcast)---------------------------> > TIP 1: subscribe and unsubscribe commands go tomajordomo@postgresql.org> >> > Regards,> Oleg> _____________________________________________________________>Oleg Bartunov, sci.researcher, hostmaster of AstroNet,> SternbergAstronomical Institute, Moscow University (Russia)> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/> phone:+007(095)939-16-83, +007(095)939-23-83 ----------------------[=========]------------------------ David T. McWherter udmcwher@mcs.drexel.edu If God had meant for us to be in the Army, we would have been born with green, baggy skin.
David, GiST prior 7.1 was broken in several respects. Please, try 7.1 and examples from contrib/intarray. It should works. btw, you'll have compress function actually works. Regards, Oleg On Sat, 5 May 2001, David McWherter wrote: > > Sure. My postgresql version is 7.0.2. > > My database has a datatype called graph that looks like this: > > CREATE TYPE graph ( > internallength = VARIABLE, > input = graph_in, > output = graph_out > ); > CREATE OPERATOR ~ ( > leftarg = graph, > rightarg = graph, > procedure = graph_distance, > commutator = ~ > ); > > And it has a datatype 'graphrange': > > CREATE FUNCTION graph_inrange(graph, graphrange) > RETURNS bool > AS '/usr/remote/home_u/udmcwher/myprojs/pg_graph.2/graph.so' > language 'c'; > > CREATE TYPE graphrange ( > internallength = VARIABLE, > input = graphrange_in, > output = graphrange_out > ); > CREATE OPERATOR << ( > leftarg = graph, > rightarg = graphrange, > procedure = graph_inrange > ); > > I have a bunch of GiST operators that are created like this: > CREATE FUNCTION gist_graph_consistent(opaque,graphrange) > RETURNS bool > AS '/usr/remote/home_u/udmcwher/myprojs/pg_graph.2/graph.so' > language 'c'; > /* the same for gist_graph_{compress,decompress,penalty,picksplit,union,same} */ > > > > I've tried adding the parameters 'restrict = eqsel' and 'join = eqjoinsel' > to the datatype operators, but that doesn't seem to change anything. > > > I construct a new opclass like this: > > INSERT INTO pg_opclass (opcname,opcdeftype) > values ( 'gist_graphrange_ops' ); > > SELECT o.oid AS opoid, o.oprname > INTO TABLE graph_ops_tmp > FROM pg_operator o, pg_type t > WHERE o.oprleft = t.oid > and t.typname = 'graph'; > INSERT INTO pg_amop (amopid, amopclaid, amopopr, amopstrategy) > SELECT am.oid, opcl.oid, c.opoid, 1 > FROM pg_am am, pg_opclass opcl, graph_ops_tmp c > WHERE amname = 'gist' and opcname = 'gist_graphrange_ops' > and c.oprname = '<<'; > > > > INSERT INTO pg_amproc (amid, amopclaid, amproc, amprocnum) > SELECT am.oid, opcl.oid, pro.oid, 1 > FROM pg_am am, pg_opclass opcl, pg_proc pro > WHERE amname = 'gist' and opcname = 'gist_graphrange_ops' > and proname = 'gist_graph_consistent'; > > INSERT INTO pg_amproc (amid, amopclaid, amproc, amprocnum) > SELECT am.oid, opcl.oid, pro.oid, 2 > FROM pg_am am, pg_opclass opcl, pg_proc pro > WHERE amname = 'gist' and opcname = 'gist_graphrange_ops' > and proname = 'gist_graph_union'; > > INSERT INTO pg_amproc (amid, amopclaid, amproc, amprocnum) > SELECT am.oid, opcl.oid, pro.oid, 3 > FROM pg_am am, pg_opclass opcl, pg_proc pro > WHERE amname = 'gist' and opcname = 'gist_graphrange_ops' > and proname = 'gist_graph_compress'; > > INSERT INTO pg_amproc (amid, amopclaid, amproc, amprocnum) > SELECT am.oid, opcl.oid, pro.oid, 4 > FROM pg_am am, pg_opclass opcl, pg_proc pro > WHERE amname = 'gist' and opcname = 'gist_graphrange_ops' > and proname = 'gist_graph_decompress'; > > INSERT INTO pg_amproc (amid, amopclaid, amproc, amprocnum) > SELECT am.oid, opcl.oid, pro.oid, 5 > FROM pg_am am, pg_opclass opcl, pg_proc pro > WHERE amname = 'gist' and opcname = 'gist_graphrange_ops' > and proname = 'gist_graph_penalty'; > > INSERT INTO pg_amproc (amid, amopclaid, amproc, amprocnum) > SELECT am.oid, opcl.oid, pro.oid, 6 > FROM pg_am am, pg_opclass opcl, pg_proc pro > WHERE amname = 'gist' and opcname = 'gist_graphrange_ops' > and proname = 'gist_graph_picksplit'; > > INSERT INTO pg_amproc (amid, amopclaid, amproc, amprocnum) > SELECT am.oid, opcl.oid, pro.oid, 7 > FROM pg_am am, pg_opclass opcl, pg_proc pro > WHERE amname = 'gist' and opcname = 'gist_graphrange_ops' > and proname = 'gist_graphrange_same'; > > I construct a table like this: > > CREATE TABLE repos ( a graph, file varchar(512) ); > INSERT INTO repos VALUES ( import_graphfile('/tmp/test1'), '/tmp/test1' ); > INSERT INTO repos VALUES ( import_graphfile('/tmp/test2'), '/tmp/test2' ); > > What this does is a little bit weird, it reads in the test1 and test2 datafiles > into the database, storing them as large objects. Then, it constructs > graph objects which have their oid's, and returns them from import_graphfile. > > I then try to construct an index like this: > > CREATE INDEX repos_index ON repos > USING gist ( a gist_graphrange_ops ) ; > > I've also tried a:graph and a:graphrange, but I don't think it changes anything. > > My queries look like: > > SELECT * from repos where a << '(oid-num,int-num)'::graphrange; > > The function operator returns a boolean if a particular relation holds between > the graph object and the graphrange object. > > The GiST compress operator will convert leaf GRAPH keys into > graphrange keys for internal use. Each of my GiST operators > call elog( DEBUG, "function-name" ) as they're called. When > constructing the index, compress,decompress,picksplit,union > are called as expected. During the execution of the query, > however, nothing happens. > > I've found the same exact results using the 'pggist' examples > (a suite including intproc,boxproc,polyproc,textproc), > and the examples found here: http://wit.mcs.anl.gov/~selkovjr/pg_extensions/contrib-7.0.tgz. > The 'cube' test suite at that site is somewhat straightforward > to invoke, and shows the same results. > > -david > > > Oleg Bartunov writes: > > David, > > > > could you provide more info (scheme, query, postgresql version) > > > > Regards, > > > > Oleg > > On Sat, 5 May 2001, David McWherter wrote: > > > > > > > > I've been doing some research work using the GiST indexes, > > > but I persistently develop a problem where the system doesn't > > > make use of the indexes during the execution of a query. If > > > I use the examples provided here: > > > > > > http://wit.mcs.anl.gov/~selkovjr/pg_extensions/ > > > > > > For instance, and I place an elog( DEBUG, "functionname" ) > > > in each of the GiST accessor functions, I can witness when > > > the database is making use of the index. During the construction > > > of the index, I never have a problem, although during query > > > execution, it seems that my indices aren't getting used at > > > all, and the database is simply searching through all of > > > the entries in the database. > > > > > > This is a terribly frustrating problem that I encountered > > > once before, but which mysteriously went away after fiddling > > > with the problem for a while. This time, the problem isn't > > > going away, however. When I trace through the postgres > > > application I can see that it at least examines the opclass > > > for my specialized data types, and detects that there exists > > > an index that could be used, but it seems to decide not to > > > make use of it regardless. > > > > > > Is there an easy way that I can force the use of an index > > > during a query? > > > > > > -David > > > > > > ----------------------[=========]------------------------ > > > David T. McWherter udmcwher@mcs.drexel.edu > > > > > > vdiff > > > ===== > > > /vee'dif/ v.,n. Visual diff. The operation offinding > > > differences between two files by {eyeball search}. Theterm > > > `optical diff' has also been reported, and is sometimes more > > > specifically used for the act of superimposing two nearly identical > > > printouts on one another and holding them up to a light to spot > > > differences. Though this method is poor for detecting omissions in > > > the `rear' file, it can also be used with printouts of graphics, a > > > claim few if any diff programs can make. See {diff}. > > > > > > ---------------------------(end of broadcast)--------------------------- > > > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > > > > > > > Regards, > > Oleg > > _____________________________________________________________ > > Oleg Bartunov, sci.researcher, hostmaster of AstroNet, > > Sternberg Astronomical Institute, Moscow University (Russia) > > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > > phone: +007(095)939-16-83, +007(095)939-23-83 > > ----------------------[=========]------------------------ > David T. McWherter udmcwher@mcs.drexel.edu > > If God had meant for us to be in the Army, we would have been born with > green, baggy skin. > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
David McWherter <udmcwher@mcs.drexel.edu> writes: > I've tried adding the parameters 'restrict = eqsel' and 'join = eqjoinsel' > to the datatype operators, but that doesn't seem to change anything. You might have better luck if you use area-related selectivity estimators. Your problem seems to be that the optimizer doesn't think the index is worth using, and the cause almost certainly is overly pessimistic selectivity estimates for the indexable operators. areasel and friends are completely bogus, but at least they deliver small enough numbers to encourage use of the index ;-) As Oleg says, the GiST support in 7.0.* is in pretty poor shape (it had been suffering from neglect for a long time). Try 7.1. regards, tom lane
So, I've migrated my code to do the TOAST'ing thing required of 7.1 clients, and I've updated my operator to use the areaselectors: CREATE OPERATOR = ( leftarg = graph, rightarg = graphrange,procedure = graph_inrange, commutator = '=', restrict = areasel, join = areajoinsel ); But I still get the issue that my queries don't seem to trigger the GiST indexes to be used. Perhaps the problem is that the system just thinks that the query doesn't need an index to increase performance, i've only got about a dozen elements in the database right now for testing purposes. -David Tom Lane writes:> David McWherter <udmcwher@mcs.drexel.edu> writes:> > I've tried adding the parameters 'restrict = eqsel'and 'join = eqjoinsel'> > to the datatype operators, but that doesn't seem to change anything.> > You might have betterluck if you use area-related selectivity> estimators. Your problem seems to be that the optimizer doesn't> think theindex is worth using, and the cause almost certainly is> overly pessimistic selectivity estimates for the indexable operators.>areasel and friends are completely bogus, but at least they deliver> small enough numbers to encourage use ofthe index ;-)> > As Oleg says, the GiST support in 7.0.* is in pretty poor shape> (it had been suffering from neglect fora long time). Try 7.1.> > regards, tom lane ----------------------[=========]------------------------ David T. McWherter udmcwher@mcs.drexel.edu The truth is rarely pure, and never simple. -- Oscar Wilde
David McWherter <udmcwher@mcs.drexel.edu> writes: > But I still get the issue that my queries don't seem to trigger the > GiST indexes to be used. Perhaps the problem is that the system > just thinks that the query doesn't need an index to increase > performance, i've only got about a dozen elements in the database > right now for testing purposes. Ah, so. You're right, you need more data. You could trySET ENABLE_SEQSCAN TO OFF if you just want to force use of the index for testing purposes. regards, tom lane
Beautiful! That fixed my problem. One thing that might be useful is to update the Index method-extension documentation on the web site to reflect this problem a bit...if somebody just wants to get a working index, it can be a bit misleading. I'll probably go and see if I can construct a few words on it after my current workload subsides a bit. -David Tom Lane writes:> David McWherter <udmcwher@mcs.drexel.edu> writes:> > But I still get the issue that my queries don't seemto trigger the> > GiST indexes to be used. Perhaps the problem is that the system> > just thinks that the query doesn'tneed an index to increase > > performance, i've only got about a dozen elements in the database> > right now for testingpurposes.> > Ah, so. You're right, you need more data.> > You could try> SET ENABLE_SEQSCAN TO OFF> if you justwant to force use of the index for testing purposes.> > regards, tom lane ----------------------[=========]------------------------ David T. McWherter udmcwher@mcs.drexel.edu Never pay a compliment as if expecting a receipt.