cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Cassandra Wiki] Trivial Update of "HadoopSupport" by jeremyhanna
Date Wed, 16 Jun 2010 23:02:27 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.

The "HadoopSupport" page has been changed by jeremyhanna.
http://wiki.apache.org/cassandra/HadoopSupport?action=diff&rev1=10&rev2=11

--------------------------------------------------

              SlicePredicate predicate = new SlicePredicate().setColumn_names(Arrays.asList(columnName.getBytes()));
              ConfigHelper.setSlicePredicate(job.getConfiguration(), predicate);
  }}}
- Cassandra's splits are location-aware (this is the nature of the Hadoop InputSplit design).
 Cassandra  gives the Hadoop !JobTracker a list of locations with each split of data.  That
way, the !JobTracker can try to preserve data locality when  assigning tasks to !TaskTrackers.
 Therefore, when using Hadoop alongside  Cassandra, it is best to have a !TaskTracker running
on the same node as  the Cassandra nodes, if data locality while processing is desired and
to  minimize copying data between Cassandra and Hadoop nodes.
+ Cassandra's splits are location-aware (this is the nature of the Hadoop [[http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/InputSplit.html|InputSplit]]
design).  Cassandra  gives the Hadoop [[http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/JobTracker.html|JobTracker]]
a list of locations with each split of data.  That way, the !JobTracker can try to preserve
data locality when  assigning tasks to [[http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/TaskTracker.html|TaskTracker]]s.
 Therefore, when using Hadoop alongside  Cassandra, it is best to have a !TaskTracker running
on the same node as  the Cassandra nodes, if data locality while processing is desired and
to  minimize copying data between Cassandra and Hadoop nodes.
  
  As of 0.7, there will be a basic mechanism included in Cassandra for  outputting data to
cassandra.  See [[https://issues.apache.org/jira/browse/CASSANDRA-1101|CASSANDRA-1101]]  for
details.
  
- Releases before  0.6.2/0.7 are affected by a small  resource leak that may cause jobs to
fail (connections are not released  properly, causing a resource leak). Depending on your
local setup you  may hit this issue, and workaround it by raising the limit of open file 
descriptors for the process (e.g. in linux/bash using `ulimit -n 32000`).  The error will
be reported on  the hadoop job side as a thrift TimedOutException.
+ Releases before  0.6.2/0.7 are affected by a small  resource leak that may cause jobs to
fail (connections are not released  properly, causing a resource leak). Depending on your
local setup you  may hit this issue, and workaround it by raising the limit of open file 
descriptors for the process (e.g. in linux/bash using `ulimit -n 32000`).  The error will
be reported on  the hadoop job side as a thrift !TimedOutException.
  
  If you are testing the integration against a single node and you obtain  some failures,
this may be normal: you are probably overloading the  single machine, which may again result
in timeout errors. You can  workaround it by reducing the number of concurrent tasks
  
@@ -31, +31 @@

               ConfigHelper.setRangeBatchSize(job.getConfiguration(), 1000);
  }}}
  == Pig ==
- Cassandra 0.6 also adds support for [[http://hadoop.apache.org/pig/|Pig]] with its own implementation
of LoadFunc.  This allows Pig queries to be run against data stored in Cassandra.  For an
example of this, see the contrib/pig example in 0.6 and later.
+ Cassandra 0.6 also adds support for [[http://hadoop.apache.org/pig/|Pig]] with its own implementation
of [[http://hadoop.apache.org/pig/docs/r0.7.0/api/org/apache/pig/LoadFunc.html|LoadFunc]].
 This allows Pig queries to be run against data stored in Cassandra.  For an example of this,
see the contrib/pig example in 0.6 and later.
  
  == Hive ==
  Hive is currently not supported in Cassandra but there has been thought given to support
Hive in the future - [[https://issues.apache.org/jira/browse/CASSANDRA-913|CASSANDRA-913]]

Mime
View raw message