spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Liu, Raymond" <raymond....@intel.com>
Subject RE: how to filter value in spark
Date Mon, 01 Sep 2014 01:23:13 GMT
You could use cogroup to combine RDDs in one RDD for cross reference processing.

e.g.

a.cogroup(b). filter{case (_, (l,r)) => l.nonEmpty && r.nonEmpty }. map{case (k,(l,r))
=> (k, l)}

Best Regards,
Raymond Liu

-----Original Message-----
From: marylucy [mailto:qaz163wsx_001@hotmail.com] 
Sent: Friday, August 29, 2014 9:26 PM
To: Matthew Farrellee
Cc: user@spark.apache.org
Subject: Re: how to filter value in spark

i see it works wellthank you!!!

But in follow situation how to do

var a = sc.textFile("/sparktest/1/").map((_,"a"))
var b = sc.textFile("/sparktest/2/").map((_,"b"))
How to get (3,"a") and (4,"a")????


 Aug 28, 201419:54"Matthew Farrellee" <matt@redhat.com> д

> On 08/28/2014 07:20 AM, marylucy wrote:
>> fileA=1 2 3 4  one number a line,save in /sparktest/1/
>> fileB=3 4 5 6  one number a line,save in /sparktest/2/ I want to get 
>> 3 and 4
>> 
>> var a = sc.textFile("/sparktest/1/").map((_,1))
>> var b = sc.textFile("/sparktest/2/").map((_,1))
>> 
>> a.filter(param=>{b.lookup(param._1).length>0}).map(_._1).foreach(prin
>> tln)
>> 
>> Error throw
>> Scala.MatchError:Null
>> PairRDDFunctions.lookup...
> 
> the issue is nesting of the b rdd inside a transformation of the a rdd
> 
> consider using intersection, it's more idiomatic
> 
> a.intersection(b).foreach(println)
> 
> but not that intersection will remove duplicates
> 
> best,
> 
> 
> matt
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org For 
> additional commands, e-mail: user-help@spark.apache.org
> 
BKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKCB??[XܚXKK[XZ[
?\\][XܚXP?\˘\X?KܙB܈Y][ۘ[[X[??K[XZ[
?\\Z[?\˘\X?KܙBB
Mime
View raw message