spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shishir Anshuman <shishiranshu...@gmail.com>
Subject Re: Error using collectAsMap() in scala
Date Mon, 21 Mar 2016 05:01:39 GMT
I have stored the contents of two csv files in separate RDDs.

file1.csv format*: (column1,column2,column3)*
file2.csv format*: (column1, column2)*

*column1 of file1 *and* column2 of file2 *contains similar data. I want to
compare the two columns and if match is found:

   - Replace the data at *column1(file1)* with the* column1(file2)*


For this reason, I am not using normal RDD.

I am still new to apache spark, so any suggestion will be greatly
appreciated.

On Mon, Mar 21, 2016 at 10:09 AM, Prem Sure <premsure542@gmail.com> wrote:

> any specific reason you would like to use collectasmap only? You probably
> move to normal RDD instead of a Pair.
>
>
> On Monday, March 21, 2016, Mark Hamstra <mark@clearstorydata.com> wrote:
>
>> You're not getting what Ted is telling you.  Your `dict` is an
>> RDD[String]  -- i.e. it is a collection of a single value type, String.
>> But `collectAsMap` is only defined for PairRDDs that have key-value pairs
>> for their data elements.  Both a key and a value are needed to collect into
>> a Map[K, V].
>>
>> On Sun, Mar 20, 2016 at 8:19 PM, Shishir Anshuman <
>> shishiranshuman@gmail.com> wrote:
>>
>>> yes I have included that class in my code.
>>> I guess its something to do with the RDD format. Not able to figure out
>>> the exact reason.
>>>
>>> On Fri, Mar 18, 2016 at 9:27 AM, Ted Yu <yuzhihong@gmail.com> wrote:
>>>
>>>> It is defined in:
>>>> core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala
>>>>
>>>> On Thu, Mar 17, 2016 at 8:55 PM, Shishir Anshuman <
>>>> shishiranshuman@gmail.com> wrote:
>>>>
>>>>> I am using following code snippet in scala:
>>>>>
>>>>>
>>>>> *val dict: RDD[String] = sc.textFile("path/to/csv/file")*
>>>>> *val dict_broadcast=sc.broadcast(dict.collectAsMap())*
>>>>>
>>>>> On compiling It generates this error:
>>>>>
>>>>> *scala:42: value collectAsMap is not a member of
>>>>> org.apache.spark.rdd.RDD[String]*
>>>>>
>>>>>
>>>>> *val dict_broadcast=sc.broadcast(dict.collectAsMap())
>>>>>                     ^*
>>>>>
>>>>
>>>>
>>>
>>

Mime
View raw message