nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Nutch Wiki] Update of "NutchFileFormats" by LewisJohnMcgibbney
Date Fri, 25 Sep 2015 02:31:07 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The "NutchFileFormats" page has been changed by LewisJohnMcgibbney:
https://wiki.apache.org/nutch/NutchFileFormats?action=diff&rev1=5&rev2=6

  ./src/java/org/apache/nutch/scoring/webgraph/Node.java
  }}}
  
- = CrawlDB =
+ With the above in mind, lets look at the composite features of some of these custom Writable's
  
+ = Writable Composition =
- TODO
- 
- = LinkDB = 
- 
- TODO
- 
- = Segments = 
  
  == org.apache.hadoop.io.Text ==
  
@@ -66, +60 @@

  [[http://hadoop.apache.org/docs/current2/api/index.html?org/apache/hadoop/io/ArrayFile.html|ArrayFile]]
is a specialization of MapFile, specifically a dense file-based mapping from integers to values
where the keys are long integers. Finally you can also see [[http://hadoop.apache.org/docs/current2/api/index.html?org/apache/hadoop/io/SetFile.html|SetFile]
which is a file representing a file-based set of keys.
  
  Additional files in [[http://hadoop.apache.org/docs/current2/api/index.html?org/apache/hadoop/io/package-summary.html|org.apache.hadoop.io.*]]
package contains the actual Writer, Reader and Sorter implementations as well.
+ 
+ = CrawlDB =
+ 
+ Content here is under construction.
+ Content here is under construction.
+ 
+ = LinkDB =
+ 
+ Content here is under construction.
+ Content here is under construction.
+ 
+ = Segments =
  
  When Nutch crawls the web, each resulting segment has four subdirectories, each containing
an ArrayFile (a MapFile having keys that are long integers):
  

Mime
View raw message