uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Eckart de Castilho <...@apache.org>
Subject Re: Avoid indexing of old UIMA documentation
Date Thu, 07 Apr 2016 20:40:03 GMT
We can just disallow /d and then allow all the  *-current folders
under it explicitly. The only difference I see is that we'd have
a couple of more entries in the robots.txt.

-- Richard

> On 07.04.2016, at 22:36, Marshall Schor <msa@schor.com> wrote:
> Hi,
> This sounds like a good idea to me :-)
> There's one small issue possibly, to changing the folder structure.  The DOCBOOK
> schemes have some fancy way to link between docbooks; these require that the
> books be kept relative to one another in some file tree structure.  As long as
> that's not changed, I think there will be no problem. 
> If anyone's curious, the relevant bits of config info are in the
> uima-docbook-olink project, in the various "site.xml" files.  You can see refs
> to the famous "d" folder there.  There may be a dependency on the "books" being
> just one directory layer under d/, so putting an extra layer might break things
> (but I'm not sure...).
> Maybe there's a way to do this without introducing a new level in the directory?
> -Marshall
> On 4/6/2016 4:43 PM, Richard Eckart de Castilho wrote:
>> Hi all,
>> I believe some time back we were talking about a strategy to avoid search engines
pointing to ancient version of the UIMA documentation.
>> I have read a bit on rel="canonical" and robots.txt.
>> 1) per webpage - Apparently, one can place a `link rel="canonical"` element on any
HTML page. Search engines seeing this tag will then not index this page because it is considered
to be a duplicate of whatever other page the link points to.
>> 2) via http header/htaccess - Since we probably don't want to patch up all our JavaDoc
files, the information about a canonical source can also be sent in the HTTP header, e.g.
via a suitable htaccess file.
>> I guess the idea would be that for any old documentation page, we would want it to
point to its latest version as its canonical source. I mean for every page, not only for the
index page. This seems a bit tedious.
>> My suggestion would be an alternative that exploits the website folder structure
and uses robots.txt.
>> We disallow indexing of the "d" folder on the UIMA website.
>> We place all the "*-current" folders (svn copies of the latest documentation versions)
under a dedicated folder (e.g. "d/current") and allow indexing that.
>> In that way, the outdated versions of the documentation should be hidden from the
search engines and the respective latest versions should be indexed.
>> Opinions? Does anybody have experience with SEO?
>> Cheers,
>> -- Richard

View raw message