spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sanjay Subramanian <sanjaysubraman...@yahoo.com.INVALID>
Subject Re: A spark newbie question
Date Sun, 04 Jan 2015 20:41:34 GMT
val sconf = new SparkConf().setMaster("local").setAppName("MedicalSideFx-CassandraLogsMessageTypeCount")
val sc = new SparkContext(sconf)val inputDir = "/path/to/cassandralogs.txt"

sc.textFile(inputDir).map(line => line.replace("\"", "")).map(line => (line.split('
')(0) + " " + line.split(' ')(2), 1)).reduceByKey((v1,v2) => v1+v2).collect().foreach(println)
If u want to save the file ==========================val outDir = "/path/to/output/dir/cassandra_logs"
var outFile = outDir+"/"+"sparkout_" + System.currentTimeMillis

sc.textFile(inputDir).map(line => line.replace("\"", "")).map(line => (line.split('
')(0) + " " + line.split(' ')(2), 1)).reduceByKey((v1,v2) => v1+v2).saveToTextFile(outFile)
The code is here (not elegant :-) but works) https://raw.githubusercontent.com/sanjaysubramanian/msfx_scala/master/src/main/scala/org/medicalsidefx/common/utils/CassandraLogsMessageTypeCount.scala
OUTPUT=======(2014-06-27 PAUSE,1)(2014-06-27 START,2)(2014-06-27 STOP,1)(2014-06-25 STOP,1)(2014-06-27
RESTART,1)(2014-06-27 REWIND,2)(2014-06-25 START,3)(2014-06-25 PAUSE,1)
hope this helps. 
Since u r new to Spark , it may help to learn using an IDE. I use IntelliJ and have many examples
posted here.https://github.com/sanjaysubramanian/msfx_scala.git 
These are simple silly examples of my learning process :-)
Plus IMHO , if u r planning on learning Spark, I would say YES to Scala and NO to Java. Yes
its a diff paradigm but being a Java and Hadoop programmer for many years, I am excited to
learn Scala as the language and use Spark. Its exciting.  
regards
sanjay
      From: Aniket Bhatnagar <aniket.bhatnagar@gmail.com>
 To: Dinesh Vallabhdas <dineshkv@yahoo.com>; "user@spark.apache.org" <user@spark.apache.org>

 Sent: Sunday, January 4, 2015 11:07 AM
 Subject: Re: A spark newbie question
   
Go through spark API documentation. Basically you have to do group by (date, message_type)
and then do a count. 


On Sun, Jan 4, 2015, 9:58 PM Dinesh Vallabhdas <dineshkv@yahoo.com.invalid> wrote:

A spark cassandra newbie question. Thanks in advance for the help.I have a cassandra table
with 2 columns message_timestamp(timestamp) and message_type(text). The data is of the form2014-06-25
12:01:39 "START"
2014-06-25 12:02:39 "START"
2014-06-25 12:02:39 "PAUSE"
2014-06-25 14:02:39 "STOP"
2014-06-25 15:02:39 "START"
2014-06-27 12:01:39 "START"
2014-06-27 11:03:39 "STOP"
2014-06-27 12:03:39 "REWIND"
2014-06-27 12:04:39 "RESTART"
2014-06-27 12:05:39 "PAUSE"
2014-06-27 13:03:39 "REWIND"
2014-06-27 14:03:39 "START"
I want to use spark(using java) to calculate counts of a message_type on a per day basis and
store it back in cassandra in a new table with 3 columns (date,message_type,count).The result
table should look like this2014-06-25 START 3
2014-06-25 STOP 1
2014-06-25 PAUSE 1
2014-06-27 START 2
2014-06-27 STOP 1
2014-06-27 PAUSE 1
2014-06-27 REWIND 2
2014-06-27 RESTART 1
I'm not proficient in scala and would like to use java.




  
Mime
View raw message