spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <jornfra...@gmail.com>
Subject Re: How to use Spark-Scala to download a CSV file from the web?
Date Sun, 25 Sep 2016 09:14:21 GMT
Use a tool like flume and/or oozie to reliable download files from http and do error handling
(e.g. Requests time out). Afterwards process the data with spark.

> On 25 Sep 2016, at 10:27, Dan Bikle <bikle101@gmail.com> wrote:
> 
> hello spark-world,
> 
> How to use Spark-Scala to download a CSV file from the web and load the file into a spark-csv
DataFrame?
> 
> Currently I depend on curl in a shell command to get my CSV file.
> 
> Here is the syntax I want to enhance:
> 
> /* fb_csv.scala
> This script should load FB prices from Yahoo.
> 
> Demo:
> spark-shell -i fb_csv.scala
> */
> 
> // I should get prices:
> import sys.process._
> "/usr/bin/curl -o /tmp/fb.csv http://ichart.finance.yahoo.com/table.csv?s=FB"!
> 
> import org.apache.spark.sql.SQLContext
> 
> val sqlContext = new SQLContext(sc)
> 
> val fb_df = sqlContext.read.format("com.databricks.spark.csv").option("header","true").option("inferSchema","true").load("/tmp/fb.csv")
> 
> fb_df.head(9)
> 
> I want to enhance the above script so it is pure Scala with no shell syntax inside.
> 

Mime
View raw message