spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Issac Buenrostro <buenros...@ooyala.com>
Subject Re: Lzo + Protobuf
Date Tue, 21 Jan 2014 17:32:19 GMT
Hi Vipul,

I use something like this to read from LZO compressed text files, it may be
helpful:

import com.twitter.elephantbird.mapreduce.input.LzoTextInputFormat
import org.apache.hadoop.io.{LongWritable, Text}
import org.apache.hadoop.mapreduce.Job

val sc = new SparkContext(sparkMaster, "lzoreader", sparkDir,
List(config.getString("spark.jar")))
sc.newAPIHadoopFile(logFile,classOf[LzoTextInputFormat],classOf[LongWritable],classOf[Text],
new Job().getConfiguration()).map(line => line._2)

Additionally I had to compile LZO native libraries, so keep that in mind.


On Tue, Jan 21, 2014 at 6:57 AM, Nick Pentreath <nick.pentreath@gmail.com>wrote:

> Hi Vipul
>
> Have you tried looking at the HBase and Cassandra examples under the spark
> example project? These use custom InputFormats and may provide guidance as
> to how to go about using the relevant Protobuf inputformat.
>
>
>
>
> On Mon, Jan 20, 2014 at 11:48 PM, Vipul Pandey <vipandey@gmail.com> wrote:
>
>> Any suggestions, anyone?
>> Core team / contributors / spark-developers - any thoughts?
>>
>> On Jan 17, 2014, at 4:45 PM, Vipul Pandey <vipandey@gmail.com> wrote:
>>
>> Hi All,
>>
>> Can someone please share (sample) code to read lzo compressed protobufs
>> from hdfs (using elephant bird)? I'm trying whatever I see in the forum and
>> on the web but it doesn't seem comprehensive to me.
>>
>> I'm using Spark0.8.0 . My pig scripts are able to read protobuf just fine
>> so the hadoop layer is setup alright.  It will be really helpful if someone
>> can list out what needs to be done with/in spark.
>>
>> ~Vipul
>>
>>
>>
>


-- 
--
*Issac Buenrostro*
Software Engineer |
buenrostro@ooyala.com | (617) 997-3350
www.ooyala.com | blog <http://www.ooyala.com/blog> |
@ooyala<http://www.twitter.com/ooyala>

Mime
View raw message