nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Nutch Wiki] Trivial Update of "GORA_HBase" by LewisJohnMcgibbney
Date Wed, 13 Jun 2012 10:57:43 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The "GORA_HBase" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/GORA_HBase?action=diff&rev1=11&rev2=12

- This document describes how to get Nutch to use HBase as a backend for GORA and is based
on the revision 993857 of the Nutch trunk
+ = Nutch 2.0 Tutorial =
+ {{http://www.interadvertising.co.uk/files/nutch_logo_medium.gif}} {{http://gora.apache.org/images/gora-logo.png}}
{{http://hbase.apache.org/images/hbase_logo.png}}
  
-  * Install and configure HBase 0.20.6. You can check it out from [[http://svn.apache.org/repos/asf/hbase/tags/0.20.6/|here]]
('''N.B.''' It is important that you grab HBase version 0.20.6 at this is supported by Gora)
+ This document describes how to get Nutch 2.0 to use HBase as a storage backend for Gora.
+ 
+  * Install and configure HBase. You can get it [[http://www.apache.org/dyn/closer.cgi/hbase/|here]]
('''N.B.''' Gora 0.2 uses HBase 0.90.4, however the setup is know to work with more recent
versions of HBase.)
   * Specify the GORA backend in nutch-site.xml
  
  {{{
@@ -12, +15 @@

   <description>Default class for storing data</description>
  </property>
  }}}
- Note: Currently HBaseStore is NOT YET THREAD-SAFE, so all processes should have single threaded
settings (i.e. set number of fetchers to 1). Work to make it thread-safe is in progress.
  
   * Compile Nutch -> ant runtime
   * Make sure HBase is started and working properly as per the quick start tutorial [[http://hbase.apache.org/book/quickstart.html|here]]
@@ -24, +26 @@

    nutch readdb
  }}}
  
- You should find more details in the logs on ''$NUTCH_HOME/runtime/local/logs/hadoop.log''
+ You should find more details in the logs on ''$NUTCH_HOME/runtime/local/logs/hadoop.log''.
  
+ For more details of the command line interface options, please see [[http://wiki.apache.org/nutch/CommandLineOptions|here]],
or of course run ./bin/nutch which will print usage to std out.
+ Finally, for a more detailed Nutch (1.X) tutorial, please see [[http://wiki.apache.org/nutch/NutchTutorial|here]]
+ 
+ '''back to FrontPage'''
+ 

Mime
View raw message