spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ankur Srivastava <ankur.srivast...@gmail.com>
Subject Re: Issue writing to Cassandra from Spark
Date Mon, 12 Jan 2015 17:04:16 GMT
Hi Akhil,

Thank you for the pointers. Below is how we are saving data to Cassandra.

javaFunctions(rddToSave).writerBuilder(datapipelineKeyspace,

  datapipelineOutputTable, mapToRow(Sample.class))

The data we are saving at this stage is ~200 million rows.

How do we control application threads in spark so that it does not exceed "
rpc_max_threads"? We are running with default value of this property in
cassandra.yaml. I have already set these
two properties for Spark-Cassandra connector:

spark.cassandra.output.batch.size.rows=1
spark.cassandra.output.concurrent.writes=1

Thanks
- Ankur


On Sun, Jan 11, 2015 at 10:16 PM, Akhil Das <akhil@sigmoidanalytics.com>
wrote:

> I see, can you paste the piece of code? Its probably because you are
> exceeding the number of connection that are specified in the
> property rpc_max_threads. Make sure you close all the connections properly.
>
> Thanks
> Best Regards
>
> On Mon, Jan 12, 2015 at 7:45 AM, Ankur Srivastava <
> ankur.srivastava@gmail.com> wrote:
>
>> Hi Akhil, thank you for your response.
>>
>> Actually we are first reading from cassandra and then writing back after
>> doing some processing. All the reader stages succeed with no error and many
>> writer stages also succeed but many fail as well.
>>
>> Thanks
>> Ankur
>>
>> On Sat, Jan 10, 2015 at 10:15 PM, Akhil Das <akhil@sigmoidanalytics.com>
>> wrote:
>>
>>> Just make sure you are not connecting to the Old RPC Port (9160), new
>>> binary port is running on 9042.
>>>
>>> What is your rpc_address listed in cassandra.yaml? Also make sure you
>>> have start_native_transport: *true *in the yaml file.
>>>
>>> Thanks
>>> Best Regards
>>>
>>> On Sat, Jan 10, 2015 at 8:44 AM, Ankur Srivastava <
>>> ankur.srivastava@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> We are currently using spark to join data in Cassandra and then write
>>>> the results back into Cassandra. While reads happen with out any error
>>>> during the writes we see many exceptions like below. Our environment
>>>> details are:
>>>>
>>>> - Spark v 1.1.0
>>>> - spark-cassandra-connector-java_2.10 v 1.1.0
>>>>
>>>> We are using below settings for the writer
>>>>
>>>> spark.cassandra.output.batch.size.rows=1
>>>>
>>>> spark.cassandra.output.concurrent.writes=1
>>>>
>>>> com.datastax.driver.core.exceptions.NoHostAvailableException: All
>>>> host(s) tried for query failed (tried: [] - use getErrors() for details)
>>>>
>>>> at
>>>> com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:108)
>>>>
>>>> at
>>>> com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:179)
>>>>
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>
>>>> at java.lang.Thread.run(Thread.java:745)
>>>>
>>>> Thanks
>>>>
>>>> Ankur
>>>>
>>>
>>>
>>
>

Mime
View raw message