spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Fine control with sc.sequenceFile
Date Mon, 29 Jun 2015 04:46:17 GMT
There isn't setter for sc.hadoopConfiguration
You can directly change value of parameter in sc.hadoopConfiguration

However, see the note in scaladoc:
   * '''Note:''' As it will be reused in all Hadoop RDDs, it's better not
to modify it unless you
   * plan to set some global configurations for all Hadoop RDDs.

Cheers

On Sun, Jun 28, 2015 at 9:34 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepujain@gmail.com> wrote:

>     val hadoopConf = new Configuration(sc.hadoopConfiguration)
>
>     hadoopConf.set("mapreduce.input.fileinputformat.split.maxsize",
> "67108864")
>
>
>     sc.hadoopConfiguration(hadoopConf)
>
> or
>
>     sc.hadoopConfiguration = hadoopConf
>
> threw error.
>
> On Sun, Jun 28, 2015 at 9:32 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>
>> sequenceFile() calls hadoopFile() where:
>>     val confBroadcast = broadcast(new
>> SerializableConfiguration(hadoopConfiguration))
>>
>> You can set the parameter in sc.hadoopConfiguration before calling
>> sc.sequenceFile().
>>
>> Cheers
>>
>> On Sun, Jun 28, 2015 at 9:23 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepujain@gmail.com>
>> wrote:
>>
>>> I can do this
>>>
>>>     val hadoopConf = new Configuration(sc.hadoopConfiguration)
>>>
>>> *    hadoopConf.set("mapreduce.input.fileinputformat.split.maxsize",
>>> "67108864")*
>>>
>>>     sc.newAPIHadoopFile(
>>>
>>>       path + "/*.avro",
>>>
>>>       classOf[AvroKeyInputFormat[GenericRecord]],
>>>
>>>       classOf[AvroKey[GenericRecord]],
>>>
>>>       classOf[NullWritable],
>>>
>>>       hadoopConf)
>>>
>>>
>>> But i cant do the same with
>>>
>>> sc.sequenceFile("path", classOf[Text], classOf[Text])
>>> How can i achieve the same with sequenceFile
>>> --
>>> Deepak
>>>
>>>
>>
>
>
> --
> Deepak
>
>

Mime
View raw message