spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com>
Subject Re: Fine control with sc.sequenceFile
Date Mon, 29 Jun 2015 04:48:32 GMT
sc.hadoopConfiguration.set("mapreduce.input.fileinputformat.split.maxsize",
"67108864")

    sc.sequenceFile(getMostRecentDirectory(tablePath, _.startsWith("_")).get
+ "/*", classOf[Text], classOf[Text])

works

On Sun, Jun 28, 2015 at 9:46 PM, Ted Yu <yuzhihong@gmail.com> wrote:

> There isn't setter for sc.hadoopConfiguration
> You can directly change value of parameter in sc.hadoopConfiguration
>
> However, see the note in scaladoc:
>    * '''Note:''' As it will be reused in all Hadoop RDDs, it's better not
> to modify it unless you
>    * plan to set some global configurations for all Hadoop RDDs.
>
> Cheers
>
> On Sun, Jun 28, 2015 at 9:34 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepujain@gmail.com>
> wrote:
>
>>     val hadoopConf = new Configuration(sc.hadoopConfiguration)
>>
>>     hadoopConf.set("mapreduce.input.fileinputformat.split.maxsize",
>> "67108864")
>>
>>
>>     sc.hadoopConfiguration(hadoopConf)
>>
>> or
>>
>>     sc.hadoopConfiguration = hadoopConf
>>
>> threw error.
>>
>> On Sun, Jun 28, 2015 at 9:32 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>>
>>> sequenceFile() calls hadoopFile() where:
>>>     val confBroadcast = broadcast(new
>>> SerializableConfiguration(hadoopConfiguration))
>>>
>>> You can set the parameter in sc.hadoopConfiguration before calling
>>> sc.sequenceFile().
>>>
>>> Cheers
>>>
>>> On Sun, Jun 28, 2015 at 9:23 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepujain@gmail.com>
>>> wrote:
>>>
>>>> I can do this
>>>>
>>>>     val hadoopConf = new Configuration(sc.hadoopConfiguration)
>>>>
>>>> *    hadoopConf.set("mapreduce.input.fileinputformat.split.maxsize",
>>>> "67108864")*
>>>>
>>>>     sc.newAPIHadoopFile(
>>>>
>>>>       path + "/*.avro",
>>>>
>>>>       classOf[AvroKeyInputFormat[GenericRecord]],
>>>>
>>>>       classOf[AvroKey[GenericRecord]],
>>>>
>>>>       classOf[NullWritable],
>>>>
>>>>       hadoopConf)
>>>>
>>>>
>>>> But i cant do the same with
>>>>
>>>> sc.sequenceFile("path", classOf[Text], classOf[Text])
>>>> How can i achieve the same with sequenceFile
>>>> --
>>>> Deepak
>>>>
>>>>
>>>
>>
>>
>> --
>> Deepak
>>
>>
>


-- 
Deepak

Mime
View raw message