spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akhil <>
Subject Re: Is Spark the right tool?
Date Tue, 28 Oct 2014 07:18:57 GMT
You can use sparkstreaming to get the transactions from those TCP Connections
periodically and you can push the data into HBase accordingly. Now,
regarding the querying part, you can use a database like redis which
actually does the key, value storing for you. You can use the RDDs to query
(insert, delete, etc.) over the data ( of course, its thread safe)

kc66 wrote
> I am very new to Spark.
> I am work on a project that involves reading stock transactions off a
> number of TCP connections and
> 1. periodically (once every few hours) uploads the transaction records to
> HBase
> 2. maintains the records that are not yet written into HBase and acts as a
> HTTP query server for these records. An example for a query would be to
> return all transactions between 1-2pm for Google stocks for the current
> trading day.
> I am thinking of using Kafka to receive all the transaction records. Spark
> will be the consumers of Kafka output.
> In particular, I need to create a RDD hashmap with string (stock ticker
> symbol) as key and list (or vector) of transaction records as data.
> This RDD need to be "thread (or process) safe" since different threads and
> processes will be reading and modifying it. I need insertion, deletion,
> and lookup to be fast.
> Is this something that can be done with Spark and is Spark the right tool
> to use in terms of latency and throughput?
> Pardon me if I don't know what I am talking about. All these are very new
> to me.
> Thanks!

View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message