spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koert Kuipers <ko...@tresata.com>
Subject Re: Fine control with sc.sequenceFile
Date Mon, 29 Jun 2015 14:02:29 GMT
see also:
https://github.com/apache/spark/pull/6848

On Mon, Jun 29, 2015 at 12:48 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepujain@gmail.com> wrote:

> sc.hadoopConfiguration.set("mapreduce.input.fileinputformat.split.maxsize",
> "67108864")
>
>     sc.sequenceFile(getMostRecentDirectory(tablePath, _.startsWith("_")).get
> + "/*", classOf[Text], classOf[Text])
>
> works
>
> On Sun, Jun 28, 2015 at 9:46 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>
>> There isn't setter for sc.hadoopConfiguration
>> You can directly change value of parameter in sc.hadoopConfiguration
>>
>> However, see the note in scaladoc:
>>    * '''Note:''' As it will be reused in all Hadoop RDDs, it's better not
>> to modify it unless you
>>    * plan to set some global configurations for all Hadoop RDDs.
>>
>> Cheers
>>
>> On Sun, Jun 28, 2015 at 9:34 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepujain@gmail.com>
>> wrote:
>>
>>>     val hadoopConf = new Configuration(sc.hadoopConfiguration)
>>>
>>>     hadoopConf.set("mapreduce.input.fileinputformat.split.maxsize",
>>> "67108864")
>>>
>>>
>>>     sc.hadoopConfiguration(hadoopConf)
>>>
>>> or
>>>
>>>     sc.hadoopConfiguration = hadoopConf
>>>
>>> threw error.
>>>
>>> On Sun, Jun 28, 2015 at 9:32 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>>>
>>>> sequenceFile() calls hadoopFile() where:
>>>>     val confBroadcast = broadcast(new
>>>> SerializableConfiguration(hadoopConfiguration))
>>>>
>>>> You can set the parameter in sc.hadoopConfiguration before calling
>>>> sc.sequenceFile().
>>>>
>>>> Cheers
>>>>
>>>> On Sun, Jun 28, 2015 at 9:23 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepujain@gmail.com>
>>>> wrote:
>>>>
>>>>> I can do this
>>>>>
>>>>>     val hadoopConf = new Configuration(sc.hadoopConfiguration)
>>>>>
>>>>> *    hadoopConf.set("mapreduce.input.fileinputformat.split.maxsize",
>>>>> "67108864")*
>>>>>
>>>>>     sc.newAPIHadoopFile(
>>>>>
>>>>>       path + "/*.avro",
>>>>>
>>>>>       classOf[AvroKeyInputFormat[GenericRecord]],
>>>>>
>>>>>       classOf[AvroKey[GenericRecord]],
>>>>>
>>>>>       classOf[NullWritable],
>>>>>
>>>>>       hadoopConf)
>>>>>
>>>>>
>>>>> But i cant do the same with
>>>>>
>>>>> sc.sequenceFile("path", classOf[Text], classOf[Text])
>>>>> How can i achieve the same with sequenceFile
>>>>> --
>>>>> Deepak
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Deepak
>>>
>>>
>>
>
>
> --
> Deepak
>
>

Mime
View raw message