spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sun Rui <sunrise_...@163.com>
Subject Re: get and append file name in record being reading
Date Thu, 02 Jun 2016 07:03:09 GMT
You can use RDD.wholeTextFiles().

For example, suppose all your files are under /tmp/ABC_input/,

val rdd  = sc.wholeTextFiles("file:///tmp/ABC_input”)
val rdd1 = rdd.flatMap { case (path, content) => 
      val fileName = new java.io.File(path).getName
      content.split("\n").map { line => (line, fileName) }
    }
val df = sqlContext.createDataFrame(rdd1).toDF("line", "file")
> On Jun 2, 2016, at 03:13, Vikash Kumar <vikashspark@gmail.com> wrote:
> 
> 100,abc,299
> 200,xyz,499


Mime
View raw message