spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "" <>
Subject Re: Re: Need help in setting up spark cluster
Date Thu, 23 Jul 2015 06:03:12 GMT
Hi, there

Per for your analytical and real time recommendations request, I would recommend you use spark
sql and hive thriftserver 

to store and process your spark streaming data. As thriftserver would be run as a long-term
application and it would be 

quite feasible to cyclely comsume data and provide some analytical requitements. 

On the other hand, hbase or cassandra would also be sufficient and I think you may want to
integrate spark sql with hbase / cassandra

for your data digesting.  You could deploy a CDH or HDP platform to support your productive
environment running. I suggest you 

firstly to deploy a spark standalone cluster to run some integration tests, and also you can
consider running spark on yarn for 

the later development use cases. 

From: Jeetendra Gangele
Date: 2015-07-23 13:39
To: user
Subject: Re: Need help in setting up spark cluster
Can anybody help here?

On 22 July 2015 at 10:38, Jeetendra Gangele <> wrote:
Hi All, 

I am trying to capture the user activities for real estate portal.

I am using RabbitMS and Spark streaming combination where all the Events I am pushing to RabbitMQ
and then 1 secs micro job I am consuming using Spark streaming.

Later on I am thinking to store the consumed data for analytics or near real time recommendations.

Where should I store this data in Spark RDD itself and using SparkSQL people can query this
data for analytics or real time recommendations, this data is not huge currently its 10 GB
per day.

Another alternatiove will be either Hbase or Cassandra, which one will be better?

Any suggestions?

Also for this use cases should I use any existing big data platform like hortonworks or I
can deploy standalone spark cluster ? 

View raw message