spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com>
Subject Re: StorageLevel.MEMORY_AND_DISK_SER
Date Wed, 01 Jul 2015 15:43:00 GMT
I think i want to use persist then and write my intermediate RDDs to
disk+mem.

On Wed, Jul 1, 2015 at 8:28 AM, Raghavendra Pandey <
raghavendra.pandey@gmail.com> wrote:

> I think persist api is internal to rdd whereas write api is for saving
> content on dist.
> Rdd persist will dump your obj bytes serialized on the disk.. If you wanna
> change that behavior you need to override the class serialization that your
> are storing in rdd..
>  On Jul 1, 2015 8:50 PM, "ÐΞ€ρ@Ҝ (๏̯͡๏)" <deepujain@gmail.com> wrote:
>
>> This is my write API. how do i integrate it here.
>>
>>
>>  protected def writeOutputRecords(detailRecords:
>> RDD[(AvroKey[DetailOutputRecord], NullWritable)], outputDir: String) {
>>     val writeJob = new Job()
>>     val schema = SchemaUtil.outputSchema(_detail)
>>     AvroJob.setOutputKeySchema(writeJob, schema)
>>     val outputRecords = detailRecords.coalesce(100)
>>     outputRecords.saveAsNewAPIHadoopFile(outputDir,
>>       classOf[AvroKey[GenericRecord]],
>>       classOf[org.apache.hadoop.io.NullWritable],
>>       classOf[AvroKeyOutputFormat[GenericRecord]],
>>       writeJob.getConfiguration)
>>   }
>>
>> On Wed, Jul 1, 2015 at 8:11 AM, Koert Kuipers <koert@tresata.com> wrote:
>>
>>> rdd.persist(StorageLevel.MEMORY_AND_DISK_SER)
>>>
>>> On Wed, Jul 1, 2015 at 11:01 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepujain@gmail.com>
>>> wrote:
>>>
>>>> How do i persist an RDD using StorageLevel.MEMORY_AND_DISK_SER ?
>>>>
>>>>
>>>> --
>>>> Deepak
>>>>
>>>>
>>>
>>
>>
>> --
>> Deepak
>>
>>


-- 
Deepak

Mime
View raw message