spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Iulian Dragoș <iulian.dra...@typesafe.com>
Subject Re: Optimal way to avoid processing null returns in Spark Scala
Date Thu, 08 Oct 2015 10:34:38 GMT
On Wed, Oct 7, 2015 at 6:42 PM, swetha <swethakasireddy@gmail.com> wrote:

Hi,
>
> I have the following functions that I am using for my job in Scala. If you
> see the getSessionId function I am returning null sometimes. If I return
> null the only way that I can avoid processing those records is by filtering
> out null records. I wanted to avoid having another pass for filtering so I
> tried returning "None" . But, it seems to be having issues as it demands
> the
> return type as optional. What is the optimal way to avoid processing null
> records and at the same way avoid having Option as the return type using
> None? The use of Option[] and Some(()) seems to be having type issues in
> subsequent function calls.
>
You should use RDD.flatMap, this way you can map and filter at the same
time. Something like

rdd.flatMap { case (x, y) =>
  val sessionid = getSessionId(y)
  if (sessionId != null)
      Seq(((sessionId, (getTimeStamp(y).toLong,y))))
  else
      Seq()
}

I didn’t try to compile that method, but you’ll figure out the types, if
need be.

iulian


>
>     val sessions = filteredStream.transform(rdd=>getBeaconMap(rdd))
>
>   def getBeaconMap(rdd: RDD[(String, String)]): RDD[(String, (Long,
> String))] = {
>     rdd.map[(String, (Long, String))]{ case (x, y) =>
>       ((getSessionId(y), (getTimeStamp(y).toLong,y)))
>     }
>   }
>
>   def getSessionId(eventRecord:String): String = {
>     val beaconTestImpl: BeaconTestLoader = new BeaconTestImpl//This needs
> to
> be changed.
>     val beaconEvent: BeaconEventData =
> beaconTestImpl.getBeaconEventData(eventRecord)
>
>     if(beaconEvent!=null){
>        beaconEvent.getSessionID //This might be in Set Cookie header
>     }else{
>      null
> }
>
>
>     val groupedAndSortedSessions =
> sessions.transform(rdd=>ExpoJobCommonNew.getGroupedAndSortedSessions(rdd))
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Optimal-way-to-avoid-processing-null-returns-in-Spark-Scala-tp24972.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
> ​
-- 

--
Iulian Dragos

------
Reactive Apps on the JVM
www.typesafe.com

Mime
View raw message