spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Issac Buenrostro <buenros...@ooyala.com>
Subject Re: Lzo + Protobuf
Date Wed, 22 Jan 2014 21:37:21 GMT
You need a core-site.xml file in the classpath with these lines

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

  <property>
    <name>io.compression.codecs</name>

<value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.SnappyCodec</value>
  </property>
  <property>
    <name>io.compression.codec.lzo.class</name>
    <value>com.hadoop.compression.lzo.LzoCodec</value>
  </property>

</configuration>


I also added both the native libraries path and the path to lzoc library to
JAVA_LIBRARY_PATH, but I don't know if this is necessary. This is the
command I used in mac:

export JAVA_LIBRARY_PATH=/Users/*/hadoop-lzo/target/native/Mac_
OS_X-x86_64-64/lib:/usr/local/Cellar/lzo/2.06/lib


On Wed, Jan 22, 2014 at 12:28 PM, Vipul Pandey <vipandey@gmail.com> wrote:

>
> Have you tried looking at the HBase and Cassandra examples under the spark
>> example project? These use custom InputFormats and may provide guidance as
>> to how to go about using the relevant Protobuf inputformat.
>>
>
> Thanks for the pointer Nick, I will look at it once I get past the LZO
> stage.
>
>
> Issac,
>
> How did you get Spark to use the LZO native libraries. I have a fully
> functional hadoop deployment with pig and scalding crunching the lzo files.
> But even after adding the lzo library folder to SPARK_CLASSPATH I get the
> following error :
>
> java.io.IOException: No codec for file
> hdfs://abc.xxx.com:8020/path/to/lzo/file.lzo found, cannot run
> at
> com.twitter.elephantbird.mapreduce.input.LzoRecordReader.initialize(LzoRecordReader.java:80)
> at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:86)
>
>
>
> Thanks
> Vipul
>
> On Jan 21, 2014, at 9:32 AM, Issac Buenrostro <buenrostro@ooyala.com>
> wrote:
>
> Hi Vipul,
>
> I use something like this to read from LZO compressed text files, it may
> be helpful:
>
> import com.twitter.elephantbird.mapreduce.input.LzoTextInputFormat
> import org.apache.hadoop.io.{LongWritable, Text}
> import org.apache.hadoop.mapreduce.Job
>
> val sc = new SparkContext(sparkMaster, "lzoreader", sparkDir,
> List(config.getString("spark.jar")))
> sc.newAPIHadoopFile(logFile,classOf[LzoTextInputFormat],classOf[LongWritable],classOf[Text],
> new Job().getConfiguration()).map(line => line._2)
>
> Additionally I had to compile LZO native libraries, so keep that in mind.
>
>
> On Tue, Jan 21, 2014 at 6:57 AM, Nick Pentreath <nick.pentreath@gmail.com>wrote:
>
>> Hi Vipul
>>
>> Have you tried looking at the HBase and Cassandra examples under the
>> spark example project? These use custom InputFormats and may provide
>> guidance as to how to go about using the relevant Protobuf inputformat.
>>
>>
>>
>>
>> On Mon, Jan 20, 2014 at 11:48 PM, Vipul Pandey <vipandey@gmail.com>wrote:
>>
>>> Any suggestions, anyone?
>>> Core team / contributors / spark-developers - any thoughts?
>>>
>>> On Jan 17, 2014, at 4:45 PM, Vipul Pandey <vipandey@gmail.com> wrote:
>>>
>>> Hi All,
>>>
>>> Can someone please share (sample) code to read lzo compressed protobufs
>>> from hdfs (using elephant bird)? I'm trying whatever I see in the forum and
>>> on the web but it doesn't seem comprehensive to me.
>>>
>>> I'm using Spark0.8.0 . My pig scripts are able to read protobuf just
>>> fine so the hadoop layer is setup alright.  It will be really helpful if
>>> someone can list out what needs to be done with/in spark.
>>>
>>> ~Vipul
>>>
>>>
>>>
>>
>
>
> --
> --
> *Issac Buenrostro*
> Software Engineer |
> buenrostro@ooyala.com | (617) 997-3350
> www.ooyala.com | blog <http://www.ooyala.com/blog> | @ooyala<http://www.twitter.com/ooyala>
>
>
>


-- 
--
*Issac Buenrostro*
Software Engineer |
buenrostro@ooyala.com | (617) 997-3350
www.ooyala.com | blog <http://www.ooyala.com/blog> |
@ooyala<http://www.twitter.com/ooyala>

Mime
View raw message