spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Silvio Fiorito <silvio.fior...@granturing.com>
Subject Re: DataFrame defined within conditional IF ELSE statement
Date Sun, 18 Sep 2016 20:04:55 GMT
Hi Mich,

That’s because df2 is only within scope in the if statements.

Try this:

val df = option match {
  case 1 => {
    println("option = 1")
    val df = spark.read.option("header", false).csv("hdfs://rhes564:9000/data/prices/prices.*")
    val df2 = df.map(p => columns(p(0).toString.toInt,p(1).toString, p(2).toString,p(3).toString))
    df2
  }
  case 2 => spark.table("test.marketData").select('TIMECREATED,'SECURITY,'PRICE)
  case 3 => spark.table("test.marketDataParquet").select('TIMECREATED,'SECURITY,'PRICE)
  case _ => sys.err(“no valid option provided”)
}

df.printSchema()


Thanks,
Silvio

From: Mich Talebzadeh <mich.talebzadeh@gmail.com>
Date: Saturday, September 17, 2016 at 4:18 PM
To: "user @spark" <user@spark.apache.org>
Subject: DataFrame defined within conditional IF ELSE statement

In Spark 2 this gives me an error in a conditional  IF ELSE statement

I recall seeing the same in standard SQL

I am doing a test for different sources (text file, ORC or Parquet) to be read in dependent
on value of var option

I wrote this

import org.apache.spark.sql.functions._
import java.util.Calendar
import org.joda.time._
var option = 1
val today = new DateTime()
val minutes = -15
val  minutesago =  today.plusMinutes(minutes).toString.toString.substring(11,19)
val date = java.time.LocalDate.now.toString
val hour = java.time.LocalTime.now.toString
case class columns(INDEX: Int, TIMECREATED: String, SECURITY: String, PRICE: String)

if(option == 1 ) {
   println("option = 1")
   val df = spark.read.option("header", false).csv("hdfs://rhes564:9000/data/prices/prices.*")
   val df2 = df.map(p => columns(p(0).toString.toInt,p(1).toString, p(2).toString,p(3).toString))
   df2.printSchema
} else if (option == 2) {
    val df2 = spark.table("test.marketData").select('TIMECREATED,'SECURITY,'PRICE)
} else if (option == 3) {
    val df2 = spark.table("test.marketDataParquet").select('TIMECREATED,'SECURITY,'PRICE)
} else {
    println("no valid option provided")
    sys.exit(0)
}

With option 1 selected it goes through and shows this

option = 1
root
 |-- INDEX: integer (nullable = true)
 |-- TIMECREATED: string (nullable = true)
 |-- SECURITY: string (nullable = true)
 |-- PRICE: string (nullable = true)

But when I try to do df2.printSchema OUTSEDE of the LOOP, it comes back with error

scala> df2.printSchema
<console>:31: error: not found: value df2
       df2.printSchema
       ^
I can define a stud df2 before IF ELSE statement. Is that the best way of dealing with it?

Thanks


Dr Mich Talebzadeh



LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction
of data or any other property which may arise from relying on this email's technical content
is explicitly disclaimed. The author will in no case be liable for any monetary damages arising
from such loss, damage or destruction.


Mime
View raw message