nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Nutch Wiki] Trivial Update of "NutchFileFormats" by LewisJohnMcgibbney
Date Fri, 25 Sep 2015 02:35:03 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The "NutchFileFormats" page has been changed by LewisJohnMcgibbney:
https://wiki.apache.org/nutch/NutchFileFormats?action=diff&rev1=6&rev2=7

  
  To economize the handling of large data volumes, [[http://hadoop.apache.org/docs/current2/api/index.html?org/apache/hadoop/io/MapFile.html|MapFile]]
manages a mapping as two separate files in a subdirectory of its own. The large "data" file
stores all keys and values, sorted by the key. The much smaller "index" file points to byte
offsets in the data file for a small sample of keys. Only the index file is read into memory.
  
- [[http://hadoop.apache.org/docs/current2/api/index.html?org/apache/hadoop/io/ArrayFile.html|ArrayFile]]
is a specialization of MapFile, specifically a dense file-based mapping from integers to values
where the keys are long integers. Finally you can also see [[http://hadoop.apache.org/docs/current2/api/index.html?org/apache/hadoop/io/SetFile.html|SetFile]
which is a file representing a file-based set of keys.
+ [[http://hadoop.apache.org/docs/current2/api/index.html?org/apache/hadoop/io/ArrayFile.html|ArrayFile]]
is a specialization of MapFile, specifically a dense file-based mapping from integers to values
where the keys are long integers. Finally you can also see [[http://hadoop.apache.org/docs/current2/api/index.html?org/apache/hadoop/io/SetFile.html|SetFile]]
which is a file representing a file-based set of keys.
  
  Additional files in [[http://hadoop.apache.org/docs/current2/api/index.html?org/apache/hadoop/io/package-summary.html|org.apache.hadoop.io.*]]
package contains the actual Writer, Reader and Sorter implementations as well.
  

Mime
View raw message