spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 孫澤恩 <gn00710...@gmail.com>
Subject How to read LZO file in Spark?
Date Wed, 27 Sep 2017 10:36:02 GMT
Hi All,

Currently, I follow this blog http://blog.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compression/
<http://blog.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compression/>
that I could use hdfs dfs -text to read the LZO file.
But I want to know how to use Spark to read lzo file?
I put the hadoop-lzo.jar to spark/jars and follow the blog https://github.com/awslabs/emr-bootstrap-actions/blob/master/spark/examples/reading-lzo-files.md
<https://github.com/awslabs/emr-bootstrap-actions/blob/master/spark/examples/reading-lzo-files.md>.

Here are my script
sc.newAPIHadoopFile(“hfs://<my_path_to_file>", classOf[com.hadoop.mapreduce.LzoTextInputFormat],classOf[org.apache.hadoop.io.LongWritable],classOf[org.apache.hadoop.io.Text])
val lzoRDD = files.map(_._2.toString)

The result of it is null.

Does anyone has some experience of this?

Sean Sun



Mime
View raw message