Hello Thunder,

We don't use the hive branch underneath current Calliope release as it focuses on Spark and Cassandra integration. In next EA release coming later this month we plan to bring in the cas-handler to support Shark on Cassandra.

Regards,
Rohit


On Mon, Oct 28, 2013 at 9:53 PM, Thunder Stumpges <thunder.stumpges@gmail.com> wrote:
This is great. I've been following this thread quietly, very interested!

We are using Cassandra with CQL3 and composite primary keys (v2.0.1)
with good success from our application servers. We also have
Hadoop/Hive, but haven't been able to get Spark into production yet
with how busy we have been.

Just Friday I found https://github.com/milliondreams/hive.git as being
a current connector for C* with Hadoop. Rohit, it looks like you're
active on that project as well. Does Calliope use this library
underneath?

Thanks, great group here. Very excited to use Spark and Spark
Streaming in the very near future!

-Thunder



On Sun, Oct 27, 2013 at 11:53 PM, Rohit Rai <rohit@tuplejump.com> wrote:
> Gary,
>
> As Patrick suggests, you can read from HDFS, to create an RDD and output the
> RDD to C*.
>
> On writing to C*, look at the Cassandra example here -
> https://github.com/apache/incubator-spark/blob/master/examples/src/main/scala/org/apache/spark/examples/CassandraTest.scala
>
> Of interest will be lines 104 to 127 which show how to transform an RDD to
> C* mutations.
>
> <shameless_plug>
> If you would like your analytics team to be able to do the transforms and
> not worry about understanding mutations and stuff, I'll again suggest take a
> look at Calliope, in which you can provide the transforms as implicits in
> the Shell so they don't even need to know about it.
>
> You can additionally provide cas config also as predefined variables so all
> the analytics guys need to know is they are writing to C*.
>
> Of course you can already do all that without calliope too, just that it
> will make your work easier. ;)
>
> I you want to use Calliope,
> You can read about writing using Calliope here -
> http://tuplejump.github.io/calliope/show-me-the-code.html
>
> And if you really don't want to signup for the early access release you can
> get the G.A. release along with source and instructions to get the binaries
> from here -
> https://github.com/tuplejump/calliope-release
>
> </shameless_plug>
>
> Regards,
> Rohit
> founder @ tuplejump
>
>
>
>
> On Sun, Oct 27, 2013 at 10:44 AM, Patrick Wendell <pwendell@gmail.com>
> wrote:
>>
>> Hey Rohit,
>>
>> A single SparkContext can be used to read and write files of different
>> formats, including HDFS or cassandra. For instance you could do this:
>>
>> rdd1 = sc.textFile(XXX)  // Some text file in HDFS
>> rdd1.saveAsHadoopFile(.., classOf[ColumnFamilyOutputFormat], ...)  // Save
>> into a cassandra file (see Cassandra example)
>>
>> This is a common pattern when using Spark for ETL between different
>> storage systems.
>>
>> - Patrick
>>
>>
>> On Sat, Oct 26, 2013 at 7:31 PM, Gary Malouf <malouf.gary@gmail.com>
>> wrote:
>>>
>>> Hi Rohit,
>>>
>>> We are big users of the Spark Shell - it is used by our analytics team
>>> for the same purposes that Hive used to be.  The SparkContext which is
>>> provided at startup I guess would have to be one of HDFS or Cassandra - I
>>> take it we would then manually create a second context?
>>>
>>> Thanks,
>>>
>>> Gary
>>>
>>>
>>> On Sat, Oct 26, 2013 at 1:07 PM, Rohit Rai <rohit@tuplejump.com> wrote:
>>>>
>>>> Hello Gary,
>>>>
>>>> This is very easy to do. You can read your data from HDFS using
>>>> FileInputFormat, transform it to a desired rows and write to Cassandra using
>>>> ColumnFamilyInputFormat.
>>>>
>>>> Our library called Calliope (Apache Licensed),
>>>> http://tuplejump.github.io/calliope/ can make the task of writing to C*
>>>> easier.
>>>>
>>>>
>>>> In case you don't want to convert it to rows and keep them as files in
>>>> Cassandra, our lightweight Cassandra backed HDFS compatible filesystem,
>>>> SnackFS can help you. SnackFS will be part of next Calliope release later
>>>> this month, but we can provide you access if you would like to try it out.
>>>>
>>>> Feel free to mail me directly in case you need any assistance.
>>>>
>>>>
>>>> Regards,
>>>> Rohit
>>>> founder @ tuplejump
>>>>
>>>>
>>>>
>>>>
>>>> On Sat, Oct 26, 2013 at 5:45 AM, Gary Malouf <malouf.gary@gmail.com>
>>>> wrote:
>>>>>
>>>>> We have a use case in which much of our raw data is stored in HDFS
>>>>> today.  We'd like to write our Spark jobs such that they read/aggregate data
>>>>> from HDFS and can output to our Cassandra cluster.
>>>>>
>>>>> Is there any way of doing this in spark 0.7.3?
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> ____________________________
>>>> www.tuplejump.com
>>>> The Data Engineering Platform
>>>
>>>
>>
>
>
>
> --
>
> ____________________________
> www.tuplejump.com
> The Data Engineering Platform



--

____________________________
www.tuplejump.com
The Data Engineering Platform