spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Artemis User <arte...@dtechspace.com>
Subject Re: How can transform RDD[Seq[String]] to RDD[ROW]
Date Thu, 05 Aug 2021 13:58:09 GMT
I am not sure why you need to create an RDD first.  You can create a 
data frame directly from csv file, for instance:

spark.read.format("csv").option("header","true").schema(yourSchema).load(ftpUrl)

-- ND

On 8/5/21 3:14 AM, igyu wrote:
> val ftpUrl ="ftp://test:test@ip:21/upload/test/_temporary/0/_temporary/task_20191211114756_0002_m_000000_0/*"

> val rdd = spark.sparkContext.wholeTextFiles(ftpUrl)
> val value = rdd.map(_._2).map(csv=>csv.split(",").toSeq)
>
> val schemas =StructType(List(
>          new StructField("id", DataTypes.StringType, true), new StructField("name", DataTypes.StringType,
true), new StructField("year", DataTypes.IntegerType, true), new StructField("city", DataTypes.StringType,
true)))
> val DF = spark.createDataFrame(value,schemas)
> How can I createDataFrame
>
> ------------------------------------------------------------------------
> igyu


Mime
View raw message