cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brandon Williams (Commented) (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-3740) While using BulkOutputFormat unneccessarily look for the cassandra.yaml file.
Date Thu, 02 Feb 2012 13:41:53 GMT


Brandon Williams commented on CASSANDRA-3740:

bq. what is the significance of "INPUT_INITIAL_THRIFT_ADDRESS" for BulkOutPutFormat.

For an output format, this won't be used, it's only for input formats.

bq. Is there any need to provide Listen address of the Hadoop Nodes for BulkOutputFormat if
yes How to provide the same?

I'm not sure what you mean, hadoop nodes themselves won't have a listen address, and BOF will
discover the cassandra nodes' listen address via thrift.

bq. Actually we are experiencing the problem while loading the data where it fails to connect
if the host the M/R job is running on is dualstack, i.e. has both IPv4 and IPv6. Also it works
when cassandra.yaml is provided ,may be it is reading listen address or something from cassandra.yaml.

Hmm, I can't think of any reason that would work with the yaml, can you give more details
of the setup?
> While using BulkOutputFormat  unneccessarily look for the cassandra.yaml file.
> ------------------------------------------------------------------------------
>                 Key: CASSANDRA-3740
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>    Affects Versions: 1.1
>            Reporter: Samarth Gahire
>            Assignee: Brandon Williams
>              Labels: cassandra, hadoop, mapreduce
>             Fix For: 1.1
>         Attachments: 0001-Make-DD-the-canonical-partitioner-source.txt, 0002-Prevent-loading-from-yaml.txt,
0003-use-output-partitioner.txt, 0004-update-BOF-for-new-dir-layout.txt
> I am trying to use BulkOutputFormat to stream the data from map of Hadoop job. I have
set the cassandra related configuration using ConfigHelper ,Also have looked into Cassandra
code seems Cassandra has taken care that it should not look for the cassandra.yaml file.
> But still when I run the job i get the following error:
> {
> 12/01/13 11:30:04 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments.
Applications should implement Tool for the same.
> 12/01/13 11:30:04 INFO input.FileInputFormat: Total input paths to process : 1
> 12/01/13 11:30:04 INFO mapred.JobClient: Running job: job_201201130910_0015
> 12/01/13 11:30:05 INFO mapred.JobClient:  map 0% reduce 0%
> 12/01/13 11:30:23 INFO mapred.JobClient: Task Id : attempt_201201130910_0015_m_000000_0,
Status : FAILED
> java.lang.Throwable: Child Error
>         at
> Caused by: Task process exit with nonzero status of 1.
>         at
> attempt_201201130910_0015_m_000000_0: Cannot locate cassandra.yaml
> attempt_201201130910_0015_m_000000_0: Fatal configuration error; unable to start server.
> }
> Also let me know how can i make this cassandra.yaml file available to Hadoop mapreduce

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message