spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ivoirians <>
Subject Empty RDD after LzoTextInputFormat in newAPIHadoopFile
Date Tue, 29 Jul 2014 18:05:47 GMT

There seems to be very little documentation on the usage of newAPIHadoopFile
and even less of it in conjunction with opening LZO compressed files. I've
hit a wall with some unexpected behavior that I don't know how to interpret.

This is a test program I'm running in an effort to get this working, after
finding previous threads on this subject.

The job runs on a yarn cluster and input is the path of a very much
non-empty LZO file sitting in hdfs, which I can manually decompress and read
as a textfile, with a count of ~3 million. What I don't know how to
interpret is that the above code runs without complaints and prints 0. I
would appreciate some guidance with where to go; there are no error messages
to point me anywhere, just an empty RDD.


View this message in context:
Sent from the Apache Spark User List mailing list archive at

View raw message