spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vikash Kumar <vikashsp...@gmail.com>
Subject how to get file name of record being reading in spark
Date Tue, 31 May 2016 17:32:10 GMT
I have a requirement in which I need to read the input files from a
directory and append the file name in each record while output.

e.g. I have directory /input/files/ which have folllowing files:
ABC_input_0528.txt
ABC_input_0531.txt

suppose input file ABC_input_0528.txt contains
111,abc,234
222,xyz,456

suppose input file ABC_input_0531.txt contains
100,abc,299
200,xyz,499

and I need to create one final output with file name in each record using
dataframes
my output file should looks like this:
111,abc,234,ABC_input_0528.txt
222,xyz,456,ABC_input_0528.txt
100,abc,299,ABC_input_0531.txt
200,xyz,499,ABC_input_0531.txt

I am trying to use this inputFileName function but it is showing blank.
https://spark.apache.org/docs/1.6.1/api/java/org/apache/spark/sql/functions.html#inputFileName()

Can anybody help me?

Mime
View raw message