spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oleg Ruchovets <oruchov...@gmail.com>
Subject Re: pyspark and cassandra
Date Wed, 10 Sep 2014 15:31:56 GMT
Hi ,
  I try to evaluate different option of spark + cassandra and I have couple
of additional questions.
  My aim is to use cassandra only without hadoop:
  1) Is it possible to use only cassandra as input/output parameter for
PySpark?
  2) In case I'll use Spark (java,scala) is it possible to use only
cassandra - input/output without hadoop?
  3) I know there are couple of strategies for storage level, in case my
data set is quite big and I have no enough memory to process - can I use
DISK_ONLY option without hadoop (having only cassandra)?

Thanks
Oleg

On Wed, Sep 3, 2014 at 3:08 AM, Kan Zhang <kzhang@apache.org> wrote:

> In Spark 1.1, it is possible to read from Cassandra using Hadoop jobs. See
> examples/src/main/python/cassandra_inputformat.py for an example. You may
> need to write your own key/value converters.
>
>
> On Tue, Sep 2, 2014 at 11:10 AM, Oleg Ruchovets <oruchovets@gmail.com>
> wrote:
>
>> Hi All ,
>>    Is it possible to have cassandra as input data for PySpark. I found
>> example for java -
>> http://java.dzone.com/articles/sparkcassandra-stack-perform?page=0,0 and
>> I am looking something similar for python.
>>
>> Thanks
>> Oleg.
>>
>
>

Mime
View raw message