spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Praseetha <prasikris...@gmail.com>
Subject Verifying if DStream is empty
Date Mon, 20 Jun 2016 12:30:05 GMT
Hi Experts,

I have 2 inputs, where first input is stream (say input1) and the second
one is batch (say input2). I want to figure out if the keys in first input
matches single row or more than one row in the second input. The further
transformations/logic depends on the number of rows matching, whether
single row matches or multiple rows match (for atleast one key in the first
input)

if(single row matches){
     // do some tranformation
}else{
     // do some transformation
}

Code that i tried so far

val input1Pair = streamData.map(x => (x._1, x))
val input2Pair = input2.map(x => (x._1, x))
val joinData = input1Pair.transform{ x => input2Pair.leftOuterJoin(x)}
val result = joinData.mapValues{
    case(v, Some(a)) => 1L
    case(v, None) => 0
 }.reduceByKey(_ + _).filter(_._2 > 1)

I have done the above coding. When I do result.print, it prints nothing if
all the keys matches only one row in the input2. With the fact that the
DStream may have multiple RDDs, not sure how to figure out if the DStream
is empty or not.

I tried using foreachRDD, but the streaming app stops abruptly.

Inside foreachRDD i was performing transformations with other RDDs. like,

result.foreachRDD{ x=>

if(x.isEmpty){

val out = input1Pair.lefOuterJoin(input2Pair)

}else{

val out = input1Pair.rightOuterJoin(input2Pair)

}

Can you please suggest.


Regds,
--Praseetha

Mime
View raw message