spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ankur Srivastava <ankur.srivast...@gmail.com>
Subject Re: Issue writing to Cassandra from Spark
Date Tue, 13 Jan 2015 17:05:02 GMT
I realized that I was running the cluster with
spark.cassandra.output.concurrent.writes=2,
changing it to 1 did the trick. We realized that the issue was because
spark was producing data at much higher frequency than our small Cassandra
cluster could write and so changing the property value to 1 fixed the issue
for us.

Thanks
Ankur

On Mon, Jan 12, 2015 at 9:04 AM, Ankur Srivastava <
ankur.srivastava@gmail.com> wrote:

> Hi Akhil,
>
> Thank you for the pointers. Below is how we are saving data to Cassandra.
>
> javaFunctions(rddToSave).writerBuilder(datapipelineKeyspace,
>
>   datapipelineOutputTable, mapToRow(Sample.class))
>
> The data we are saving at this stage is ~200 million rows.
>
> How do we control application threads in spark so that it does not exceed "
> rpc_max_threads"? We are running with default value of this property in
> cassandra.yaml. I have already set these
> two properties for Spark-Cassandra connector:
>
> spark.cassandra.output.batch.size.rows=1
> spark.cassandra.output.concurrent.writes=1
>
> Thanks
> - Ankur
>
>
> On Sun, Jan 11, 2015 at 10:16 PM, Akhil Das <akhil@sigmoidanalytics.com>
> wrote:
>
>> I see, can you paste the piece of code? Its probably because you are
>> exceeding the number of connection that are specified in the
>> property rpc_max_threads. Make sure you close all the connections properly.
>>
>> Thanks
>> Best Regards
>>
>> On Mon, Jan 12, 2015 at 7:45 AM, Ankur Srivastava <
>> ankur.srivastava@gmail.com> wrote:
>>
>>> Hi Akhil, thank you for your response.
>>>
>>> Actually we are first reading from cassandra and then writing back after
>>> doing some processing. All the reader stages succeed with no error and many
>>> writer stages also succeed but many fail as well.
>>>
>>> Thanks
>>> Ankur
>>>
>>> On Sat, Jan 10, 2015 at 10:15 PM, Akhil Das <akhil@sigmoidanalytics.com>
>>> wrote:
>>>
>>>> Just make sure you are not connecting to the Old RPC Port (9160), new
>>>> binary port is running on 9042.
>>>>
>>>> What is your rpc_address listed in cassandra.yaml? Also make sure you
>>>> have start_native_transport: *true *in the yaml file.
>>>>
>>>> Thanks
>>>> Best Regards
>>>>
>>>> On Sat, Jan 10, 2015 at 8:44 AM, Ankur Srivastava <
>>>> ankur.srivastava@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> We are currently using spark to join data in Cassandra and then write
>>>>> the results back into Cassandra. While reads happen with out any error
>>>>> during the writes we see many exceptions like below. Our environment
>>>>> details are:
>>>>>
>>>>> - Spark v 1.1.0
>>>>> - spark-cassandra-connector-java_2.10 v 1.1.0
>>>>>
>>>>> We are using below settings for the writer
>>>>>
>>>>> spark.cassandra.output.batch.size.rows=1
>>>>>
>>>>> spark.cassandra.output.concurrent.writes=1
>>>>>
>>>>> com.datastax.driver.core.exceptions.NoHostAvailableException: All
>>>>> host(s) tried for query failed (tried: [] - use getErrors() for details)
>>>>>
>>>>> at
>>>>> com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:108)
>>>>>
>>>>> at
>>>>> com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:179)
>>>>>
>>>>> at
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>
>>>>> at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>
>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>
>>>>> Thanks
>>>>>
>>>>> Ankur
>>>>>
>>>>
>>>>
>>>
>>
>

Mime
View raw message