spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Davies Liu <dav...@databricks.com>
Subject Re: Is there a way to create key based on counts in Spark
Date Tue, 18 Nov 2014 19:26:05 GMT
On Tue, Nov 18, 2014 at 9:06 AM, Debasish Das <debasish.das83@gmail.com> wrote:
> Use zipWithIndex but cache the data before you run zipWithIndex...that way
> your ordering will be consistent (unless the bug has been fixed where you
> don't have to cache the data)...

Could you point some link about the bug?

> Normally these operations are used for dictionary building and so I am
> hoping you can cache the dictionary of RDD[String] before you can run
> zipWithIndex...
>
> indices are within 0 till maxIndex-1...if you want 1 you have to later map
> the index to index + 1
>
> On Tue, Nov 18, 2014 at 8:56 AM, Blind Faith <person.of.book@gmail.com>
> wrote:
>>
>> As it is difficult to explain this, I would show what I want. Lets us say,
>> I have an RDD A with the following value
>>
>> A = ["word1", "word2", "word3"]
>>
>> I want to have an RDD with the following value
>>
>> B = [(1, "word1"), (2, "word2"), (3, "word3")]
>>
>> That is, it gives a unique number to each entry as a key value. Can we do
>> such thing with Python or Scala?
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message