spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tathagata Das <t...@databricks.com>
Subject Re: Is IndexedRDD available in Spark 1.4.0?
Date Wed, 15 Jul 2015 00:55:23 GMT
I do not recommend using IndexRDD for state management in Spark Streaming.
What it does not solve out-of-the-box is checkpointing of indexRDDs, which
important because long running streaming jobs can lead to infinite chain of
RDDs. Spark Streaming solves it for the updateStateByKey operation which
you can use, which gives state management capabilities. Though for most
flexible arbitrary look up of stuff, its better to use a dedicated system
that is designed and optimized for long term storage of data, that is,
key-value stores, databases, etc.

On Tue, Jul 14, 2015 at 5:44 PM, Ted Yu <yuzhihong@gmail.com> wrote:

> Please take a look at SPARK-2365 which is in progress.
>
> On Tue, Jul 14, 2015 at 5:18 PM, swetha <swethakasireddy@gmail.com> wrote:
>
>> Hi,
>>
>> Is IndexedRDD available in Spark 1.4.0? We would like to use this in Spark
>> Streaming to do lookups/updates/deletes in RDDs using keys by storing them
>> as key/value pairs.
>>
>> Thanks,
>> Swetha
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Is-IndexedRDD-available-in-Spark-1-4-0-tp23841.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>

Mime
View raw message