spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raghavendra Pandey <raghavendra.pan...@gmail.com>
Subject Re: StorageLevel.MEMORY_AND_DISK_SER
Date Wed, 01 Jul 2015 16:00:10 GMT
For that you need to change the serialize and deserialize behavior of your
class.
Preferably, you can use Kyro serializers n override the behavior.
For details u can look
https://github.com/EsotericSoftware/kryo/blob/master/README.md
On Jul 1, 2015 9:26 PM, "ÐΞ€ρ@Ҝ (๏̯͡๏)" <deepujain@gmail.com> wrote:

i original assumed that persisting is similar to writing. But its not.
Hence i want to change the behavior of intermediate persists.

On Wed, Jul 1, 2015 at 8:46 AM, Raghavendra Pandey <
raghavendra.pandey@gmail.com> wrote:

> So do you want to change the behavior of persist api or write the rdd on
> disk...
> On Jul 1, 2015 9:13 PM, "ÐΞ€ρ@Ҝ (๏̯͡๏)" <deepujain@gmail.com> wrote:
>
>> I think i want to use persist then and write my intermediate RDDs to
>> disk+mem.
>>
>> On Wed, Jul 1, 2015 at 8:28 AM, Raghavendra Pandey <
>> raghavendra.pandey@gmail.com> wrote:
>>
>>> I think persist api is internal to rdd whereas write api is for saving
>>> content on dist.
>>> Rdd persist will dump your obj bytes serialized on the disk.. If you
>>> wanna change that behavior you need to override the class serialization
>>> that your are storing in rdd..
>>>  On Jul 1, 2015 8:50 PM, "ÐΞ€ρ@Ҝ (๏̯͡๏)" <deepujain@gmail.com>
wrote:
>>>
>>>> This is my write API. how do i integrate it here.
>>>>
>>>>
>>>>  protected def writeOutputRecords(detailRecords:
>>>> RDD[(AvroKey[DetailOutputRecord], NullWritable)], outputDir: String) {
>>>>     val writeJob = new Job()
>>>>     val schema = SchemaUtil.outputSchema(_detail)
>>>>     AvroJob.setOutputKeySchema(writeJob, schema)
>>>>     val outputRecords = detailRecords.coalesce(100)
>>>>     outputRecords.saveAsNewAPIHadoopFile(outputDir,
>>>>       classOf[AvroKey[GenericRecord]],
>>>>       classOf[org.apache.hadoop.io.NullWritable],
>>>>       classOf[AvroKeyOutputFormat[GenericRecord]],
>>>>       writeJob.getConfiguration)
>>>>   }
>>>>
>>>> On Wed, Jul 1, 2015 at 8:11 AM, Koert Kuipers <koert@tresata.com>
>>>> wrote:
>>>>
>>>>> rdd.persist(StorageLevel.MEMORY_AND_DISK_SER)
>>>>>
>>>>> On Wed, Jul 1, 2015 at 11:01 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepujain@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> How do i persist an RDD using StorageLevel.MEMORY_AND_DISK_SER ?
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Deepak
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Deepak
>>>>
>>>>
>>
>>
>> --
>> Deepak
>>
>>


-- 
Deepak

Mime
View raw message