nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lewis John McGibbney (JIRA)" <>
Subject [jira] [Commented] (NUTCH-881) Good quality documentation for Nutch
Date Tue, 09 Aug 2011 14:04:27 GMT


Lewis John McGibbney commented on NUTCH-881:

In Nutch trunk we currently only have the wiki as a repository for any Nutch 2.0 information.
Is this satisfactory?

As far as I can tell, the documentation for Gora_trunk is produced using Apache Forrest. I
am reasonably familiar with using Forrest and it would be a great benefit, as well as lessening
the burden upon mailing lists, if we could maintain a clean distribution of documentation
bundled nicely into a /trunk/docs or/and branch-1.4/docs directory from now on and for all
future official releases.

I think the only addition to the documentation we require on the website is a formal tutorial
(available as part of the Apache Nutch website), which we need to add to /site resources and
which we could maintain and direct users to as a one stop resource for Nutch branch/tags,
then similarly a separate resource for trunk.

ith specific reference to Nutch Trunk, in comparison on the Gora team they have provided a
quick-start guide followed by a more in depth tutorial, which in our case we could apply to
both branch-1.4 and 2.0 trunk. The quick-start guide would only show users how to get trunk
up and running, then the formal tutorial would provide in-depth documentation on completing
a crawl with either Nutch 1.4 or trunk 2.0. Does this sound reasonable?

Andrzej provided some good comments in the correspondence on NUTCH-881 which should be addressed
within any comprehensive documentation. I am very happy, and pretty keen to get this issue
resolved but I think we need to agree on a specific tasks which need to be addressed, basically
laying the path for everything this issue encompasses.

> Good quality documentation for Nutch
> ------------------------------------
>                 Key: NUTCH-881
>                 URL:
>             Project: Nutch
>          Issue Type: Improvement
>          Components: documentation
>    Affects Versions: 2.0
>            Reporter: Andrzej Bialecki 
> This is, and has been, a long standing request from Nutch users. This becomes an acute
need as we redesign Nutch 2.0, because the collective knowledge and the Wiki will no longer
be useful without massive amount of editing.
> IMHO the reference documentation should be in SVN, and not on the Wiki - the Wiki is
good for casual information and recipes but I think it's too messy and not reliable enough
as a reference.
> I propose to start with the following:
>  1. let's decide on the format of the docs. Each format has its own pros and cons:
>   * HTML: easy to work with, but formatting may be messy unless we edit it by hand, at
which point it's no longer so easy... Good toolchains to convert to other formats, but limited
expressiveness of larger structures (e.g. book, chapters, TOC, multi-column layouts, etc).
>   * Docbook: learning curve is higher, but not insurmountable... Naturally yields very
good structure. Figures/diagrams may be problematic - different renderers (html, pdf) like
to treat the scaling and placing somewhat differently.
>   * Wiki-style (Confluence or TWiki): easy to use, but limited control over larger structures.
Maven Doxia can format cwiki, twiki, and a host of other formats to e.g. html and pdf.
>   * other?
>  2. start documenting the main tools and the main APIs (e.g. the plugins and all the
extension points). We can of course reuse material from the Wiki and from various presentations
(e.g. the ApacheCon slides).

This message is automatically generated by JIRA.
For more information on JIRA, see:


View raw message