Re: Fixing Google Search on the docs (redux) - Mailing list pgsql-www
From | Magnus Hagander |
---|---|
Subject | Re: Fixing Google Search on the docs (redux) |
Date | |
Msg-id | CABUevExZ_mfbZK=9XmBaxk5osNXNzewfUV=obC-vzTSMa9Xo2Q@mail.gmail.com Whole thread Raw |
In response to | Re: Fixing Google Search on the docs (redux) ("Jonathan S. Katz" <jkatz@postgresql.org>) |
Responses |
Re: Fixing Google Search on the docs (redux)
Re: Fixing Google Search on the docs (redux) Re: Fixing Google Search on the docs (redux) |
List | pgsql-www |
On Wed, Nov 18, 2020 at 5:44 PM Jonathan S. Katz <jkatz@postgresql.org> wrote: > > On 11/18/20 11:20 AM, Dave Page wrote: > > I was looking at our analytic data, and saw that the vast majority of > > inbound traffic to the docs, hits the 9.1 version. We've known this has > > been an issue for years and have tried various remedies, clearly none of > > which are working. > > > > Should we try an experiment for a couple of months, in which we simply > > block anything that matches \/docs\/((\d+)|(\d.\d))\/ in robots.txt? > > It's a much more drastic option, but at least it might force Google into > > indexing the latest doc version with the highest priority. > > If we're going down this road, I would suggest borrowing a concept from > the Django Project documentation which has a similar issue to us. In > their codebase, use a <link> tag with rel="canonical" to point to the > latest version of docs on their page[1]. > > So for example, given 3.1 is their latest release, you will find > something similar to this: > > <link rel="canonical" > href="https://docs.djangoproject.com/en/3.1/ref/templates/builtins/"> > > From a quick test of searching various Django concepts, it seems that > the 3.1 pages tend to turn up first. > > Our equivalent would be "current". > > Jonathan > > [1] > https://developers.google.com/search/docs/advanced/crawling/consolidate-duplicate-urls We've discussed this many times before, and I think so far they've all bogged down at "google suck" :) The problem is that they don't even consider the case like we have where the pages *aren't* identical, but yet related. The problem it usually comes down to is that if we do that, then you will no longer be able to say search for something in the old docs *at all*. A good example right now might be that recovery.conf stuff goes away. Even if you explicitly search for "postgresql recovery.conf 11". And I'd guess the majority of people are actually looking for things in versions that are NOT the latest (though an even bigger majority of people will be looking for things in versions that are not 9.1). FWIW, I find the django example absolutely terrible -- in fact, it's a great example of how the canonical URL handling sucks. There is AFAICT no way to actually search for information about old versions. You have to search for it in the new version and then hope that the same info happens to be on the same page in an earlier version, and then manually browse your way back to that version (also through very annoying js popover stuff, but that's a different thing) I don't know of any way to actually tell google to prioritise the new versions. You used to be able to do this using the sitemap.xml stuff, which is why we do that, but at some point they just stopped caring about those, even in the cases where we're *lowering* our own priority, under the argument of not letting us increase our priority. It's not that what we have now for this is especially great. It might be that going down that route is still the least bad. But we have to make that decision while knowing this means that *nobody* will be able to search for things in our older documentation even if they explicitly ask for it. At all. Their only chance is to search for something else that might hit our docs, then in that click over to the correct version they actually asked for, and then search *again* using our site-search and hope that it shows up there. I'm willing to bet very few users will figure that part out... -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/