spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <jornfra...@gmail.com>
Subject Re: twitter data analysis
Date Fri, 03 Jun 2016 08:40:26 GMT
Or combine both!  It is possible with Spark Streaming to combine streaming data and on HDFS.
In the end it always depends what you want to do and when you need what.

> On 03 Jun 2016, at 10:26, Mich Talebzadeh <mich.talebzadeh@gmail.com> wrote:
> 
> I use twitter data with spark streaming to experiment with twitter data. Basic stuff
> 
>     val ssc = new StreamingContext(sparkConf, Seconds(2))
>     val tweets = TwitterUtils.createStream(ssc, None)
>     val statuses = tweets.map(status => status.getText())
>     statuses.print()
> 
> 
> Another alternative is to use Apache flume to get the twitter data and store it as log
files in hdfs.
> 
> <image.png>
> 
> 
> I notice that these log files are stored as binary log files.
> 
> I presume the log files can be read and converted to json through another process or
used with machine learning language.
> 
> I know this question may not be directly relevant  but what are the main approaches,
one real time analysis of twitter using spark streaming and the other store data in hdfs and
use later.?
> 
> Thanks
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> http://talebzadehmich.wordpress.com
>  

Mime
View raw message