spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject twitter data analysis
Date Fri, 03 Jun 2016 08:26:49 GMT
I use twitter data with spark streaming to experiment with twitter data.
Basic stuff

    val ssc = new StreamingContext(sparkConf, Seconds(2))
    val tweets = TwitterUtils.createStream(ssc, None)
    val statuses = tweets.map(status => status.getText())
    statuses.print()


Another alternative is to use Apache flume to get the twitter data and
store it as log files in hdfs.

[image: Inline images 1]


I notice that these log files are stored as binary log files.

I presume the log files can be read and converted to json through another
process or used with machine learning language.

I know this question may not be directly relevant  but what are the main
approaches, one real time analysis of twitter using spark streaming and the
other store data in hdfs and use later.?

Thanks

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com

Mime
View raw message