Thread: "GIN and GiST Index Types" page is about usage in full text search, but looks general purpose
"GIN and GiST Index Types" page is about usage in full text search, but looks general purpose
From
PG Doc comments form
Date:
The following documentation comment has been logged on the website: Page: https://www.postgresql.org/docs/14/textsearch-indexes.html Description: Hey, when you google for "postgresql gist gin index" you will most probably see this page (or an older version of it) as #1 and the only result from postgresql.org: https://www.postgresql.org/docs/current/textsearch-indexes.html This led me an others in our team to initially misunderstand that GiST and GIN indexes are purely a full text search thing in PostgreSQL. But they are of course so much more, but from this page you would not be able to discover that. (It is interesting that even searching for `GiST` on postgresql.org lists that page first, and that for example https://www.postgresql.org/docs/14/sql.html only lists that page if you Ctrl+F for `gin` or `gist`). It would probably be a good idea to link to https://www.postgresql.org/docs/14/gin.html and https://www.postgresql.org/docs/14/gist.html (or whatever are the best pages to explain GIN and GiST indexes) in the introduction of this article to lead people in the right direction. (Bonus points if this can be added to older versions of the docs as well, as those are ranking on Google and not everyone clicks through to `current` I guess - including me sometimes.) Even more effective would be to update the page title and/or headline to make clear that it is about using GIN and GiST indexes in context of full text search only. For the page content itself, it might be beneficial to highlight that the code example itself is a shorthand that skips the (implied via the type) definition of an operator class (although it might be possible I do not understand the full picture here right now - docs are pretty scarce or hard to find after all). Let me know if there is a public GH repo where I could send PRs to suggest these changes of course. Best Jan Piotrowski
Re: "GIN and GiST Index Types" page is about usage in full text search, but looks general purpose
From
Peter Geoghegan
Date:
On Tue, Apr 12, 2022 at 12:12 PM PG Doc comments form <noreply@postgresql.org> wrote: > Even more effective would be to update the page title and/or headline to > make clear that it is about using GIN and GiST indexes in context of full > text search only. I agree that the overall structure is unclear, and seems to be more of an accident than a deliberate choice. The page in question is "12.9. GIN and GiST Index Types", but it's really supplementary information for "12.2.2. Creating Indexes". The fact that the former has greater prominence than the latter (a general discussion of FTS indexing) seems like a problem in itself. At one point GiST was competitive with GIN for full text search performance (or at least more competitive). These days use of GiST for FTS should be rare. So the title should suggest that GiST FTS indexing is the nonstandard choice. -- Peter Geoghegan
Re: "GIN and GiST Index Types" page is about usage in full text search, but looks general purpose
From
Tom Lane
Date:
Peter Geoghegan <pg@bowt.ie> writes: > The page in question is "12.9. GIN and GiST Index Types", but it's > really supplementary information for "12.2.2. Creating Indexes". The > fact that the former has greater prominence than the latter (a general > discussion of FTS indexing) seems like a problem in itself. > At one point GiST was competitive with GIN for full text search > performance (or at least more competitive). These days use of GiST for > FTS should be rare. So the title should suggest that GiST FTS indexing > is the nonstandard choice. I think we should take the index type names out of the section title entirely, and name it something generic like "Preferred Index Types for Full Text Search". Unfortunately, with the EOL'd documentation versions being pretty much frozen in time, it's not clear that we can prevent Google from continuing to find that 9.1 page when the search terms include GIN and GIST. I suspect it's keying off those terms appearing in the page title :-( After the recent changes discussed on the -www list, it's possible that Google will eventually stop indexing the 9.1 page altogether, but I'm not holding my breath. regards, tom lane
Re: "GIN and GiST Index Types" page is about usage in full text search, but looks general purpose
From
Peter Geoghegan
Date:
On Tue, Apr 12, 2022 at 12:49 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I think we should take the index type names out of the section title > entirely, and name it something generic like "Preferred Index Types for > Full Text Search". Agreed. > After the recent changes discussed on the -www list, it's possible > that Google will eventually stop indexing the 9.1 page altogether, > but I'm not holding my breath. There is always the extreme option of excluding older versions in robots.txt. I bet that would work. Do you see any downside with that solution, Jonathan? -- Peter Geoghegan
Re: "GIN and GiST Index Types" page is about usage in full text search, but looks general purpose
From
Tom Lane
Date:
Peter Geoghegan <pg@bowt.ie> writes: > On Tue, Apr 12, 2022 at 12:49 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I think we should take the index type names out of the section title >> entirely, and name it something generic like "Preferred Index Types for >> Full Text Search". > Agreed. Proposed patch attached. The existing text already says "GIN indexes are the preferred text search index type", so I'm not sure we need to go further than that about guiding people which one to use. In particular, since GIN can't support included columns, we can't really deprecate GiST altogether here. > There is always the extreme option of excluding older versions in > robots.txt. I bet that would work. Yeah, I was wondering about that too. It's sort of the nuclear option, but if we don't want to modify EOL'd versions then we may not have any other way to keep Google from glomming onto them. regards, tom lane diff --git a/doc/src/sgml/textsearch.sgml b/doc/src/sgml/textsearch.sgml index 20db7b7afe..6afaf9e62c 100644 --- a/doc/src/sgml/textsearch.sgml +++ b/doc/src/sgml/textsearch.sgml @@ -3618,7 +3618,7 @@ SELECT plainto_tsquery('supernovae stars'); </sect1> <sect1 id="textsearch-indexes"> - <title>GIN and GiST Index Types</title> + <title>Preferred Index Types for Text Search</title> <indexterm zone="textsearch-indexes"> <primary>text search</primary> @@ -3627,10 +3627,16 @@ SELECT plainto_tsquery('supernovae stars'); <para> There are two kinds of indexes that can be used to speed up full text - searches. + searches: + <link linkend="gin"><acronym>GIN</acronym></link> and + <link linkend="gist"><acronym>GiST</acronym></link>. Note that indexes are not mandatory for full text searching, but in cases where a column is searched on a regular basis, an index is usually desirable. + </para> + + <para> + To create such an index, do one of: <variablelist>
Re: "GIN and GiST Index Types" page is about usage in full text search, but looks general purpose
From
Peter Geoghegan
Date:
On Tue, Apr 12, 2022 at 1:28 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Proposed patch attached. The existing text already says "GIN indexes are > the preferred text search index type", so I'm not sure we need to go > further than that about guiding people which one to use. In particular, > since GIN can't support included columns, we can't really deprecate GiST > altogether here. LGTM. > > There is always the extreme option of excluding older versions in > > robots.txt. I bet that would work. > > Yeah, I was wondering about that too. It's sort of the nuclear option, > but if we don't want to modify EOL'd versions then we may not have any > other way to keep Google from glomming onto them. I think that our recent decision to just live with the downsides that go with making the most recent stable release docs canonical was a wise one, on balance. The reality is that we have very few ways of influencing search results from Google. I don't know enough about the topic to be able to claim that the robots.txt solution would also work out well, in about the same way. But I suspect that it might, and know that it's a reversible process. -- Peter Geoghegan
Re: "GIN and GiST Index Types" page is about usage in full text search, but looks general purpose
From
Tom Lane
Date:
Peter Geoghegan <pg@bowt.ie> writes: > On Tue, Apr 12, 2022 at 1:28 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Proposed patch attached. The existing text already says "GIN indexes are >> the preferred text search index type", so I'm not sure we need to go >> further than that about guiding people which one to use. In particular, >> since GIN can't support included columns, we can't really deprecate GiST >> altogether here. > LGTM. Done that way, then. > I don't know enough about the topic to be able to claim that the > robots.txt solution would also work out well, in about the same way. > But I suspect that it might, and know that it's a reversible process. Yeah, it's outside my expertise too. regards, tom lane