spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gastón Schabas <gasto...@batangamedia.com>
Subject Apache Spark Is Hanging when fetch data from SQL Server 2008
Date Wed, 29 Jun 2016 17:53:20 GMT
Hi everyone. I'm experiencing an issue when I try to fetch data from SQL
Server. This is my context
Ubuntu 14.04 LTS
Apache Spark 1.4.0
SQL Server 2008
Scala 2.10.5
Sbt 0.13.11

I'm trying to fetch data from a table in SQL Server 2008 that has
85.000.000 records. I just only need around 200.000 records. This is my code
val df = sqlContext.read.jdbc("anUrl", "aTableName", Array(s"timestamp >=
'2016-06-21T00:00:00'", s"timestamp < '2016-06-22T00:00:00'"), new
Properties)

if I do this
df.take(5).foreach(println)
it works without any trouble.

if I do this
println(df.count()) // this should return 200.000
the application hangs

I've entered to http://localhost:4040/ to check what spark is doing. When I
enter to the job details, it shows that is running the count method and
this is the detail
org.apache.spark.sql.DataFrame.count(DataFrame.scala:1269)
SkipOverPlaysInWeek$.main(SkipOverPlaysInWeek.scala:88)

Thanks,

Gastón

Mime
View raw message