uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Burrell Donkin" <robertburrelldon...@gmail.com>
Subject Re: Where to put large generated things for our website to access
Date Mon, 31 Dec 2007 11:46:47 GMT
On Dec 31, 2007 3:43 AM, Marshall Schor <msa@schor.com> wrote:
> Robert Burrell Donkin wrote:
> > (apologies for not jumping in promptly)
> >
> > On Dec 24, 2007 3:48 AM, Marshall Schor <msa@schor.com> wrote:
> >
> >> I updated our website download page and documentation page.  I made the
> >> download page work with mirrors, and changed the format for accessing
> >> previous archived files to follow the common practice on other sites,
> >> referring to the archive.apache.org site.
> >>
> >> I made our documentation page refer to apache.org/dist/incubator/uima
> >> for the doc files - and didn't put any of these into our SVN for our
> >> website.
> >>
> >
> > after feeling a little uncertain about this, i asked the
> > intrastructure team who gave some good arguments for storing docs in
> > dist:
> >
> > 1. rsync is good for large files but struggles with lots of small files
> > 2. mirrored documentation is not supported so push all that content to
> > the mirrors is wasteful
> > 3. released documentation should have an unchanging URL. when a
> > release is archived, the documentation URL would need to change (a
> > redirect would help people but not all robots).
> >
> > having release documentation permanently stored and archived is a good
> > idea but it's strongly recommended that subversion is used. the zip'd
> > archive is fine where it is but it would be better for the contents of
> > the folders to be committed to subversion and then checked out to an
> > appropriate place on the website.
> >
> > - robert
> >
> I felt uncertain about all of this, too.  It seems to me that the right
> way to do this would be to have something like w.a.o/dist-not-mirrored/
> ... etc. where the same "archive" mechanism could be used as is used for
> /dist/, but which doesn't do mirroring.  Has this come up before in
> discussions - a way to have things that are not to be mirrored, but
> which would reasonably be "archived"?  You might say that the docs don't
> need to be archived (because they can always be extracted from an
> archived "release" zip/tar), but I find having at least some older
> versions of the docs quite useful in helping users running on a specific
> level - I can say things like "see xxx on page yyy" and know it matches
> their documentation.

agreed. IMHO retaining documentation is best practice.

quite a number of other projects do this eg httpd, most
(ex-jakarta-)commons libraries

httpd stores generated release documentation in subversion. commons
just manually transferred the documentation onto minotaur. IIRC it was
a lot of hassle recovering unarchived release documentation so i now
recommend subversion.

> It seems inefficient to store large generated things in SVN, such as the
> javadocs (these are large numbers of small files) -- but I would be
> happy to learn if I'm worrying about this unnecessarily.

subversion is a very different from CVS under the hood. apache uses a
file system backing subversion. this configuration copes very well
with large numbers of files. apache runs a single repository
containing every revision of every apache project past and present.
so, in that context even a well documented project (such as httpd)
will only have a relatively small number of documentation files.

subversion is archived. the file system storage copes fine with lots
of files. it is recommended that the documentation is checked out from
subversion (rather than using the subversion URLs directly). website
URLs are mirrored (and worldwide mirroring will be rolled out in 2008)
whereas svn.apache.org is not (currently). it is possible that once
the next version of subversion is released, read-only mirroring of
http://svn.apache.org will be rolled out.

> I can see an argument against something like w.a.o/dist-not-mirrored/ -
> avoiding creating even more "infrastructure stuff".

apache is moving away from dist towards a formal release staging
(rather than just sync'ing). this will not only improve security but
will also allow scripts to enforce policy. i expect this to happen
sometime in 2008. another directory would just be another problem to
be dealt with.

subversion really is the way to go for release documentation

- robert

View raw message