spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rex X <dnsr...@gmail.com>
Subject HOw to concatenate two csv files into one RDD?
Date Fri, 26 Jun 2015 18:00:59 GMT
With Python Pandas, it is easy to do concatenation of dataframes
by combining  pandas.concat
<http://pandas.pydata.org/pandas-docs/stable/generated/pandas.concat.html>
and pandas.read_csv

pd.concat([pd.read_csv(os.path.join(Path_to_csv_files, f)) for f in
csvfiles])

where "csvfiles" is the list of csv files


HOw can we do this in Spark?

Mime
View raw message