spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: No overwrite flag for saveAsXXFile
Date Fri, 06 Mar 2015 17:20:37 GMT
Since we already have "spark.hadoop.validateOutputSpecs" config, I think
there is not much need to expose disableOutputSpecValidation

Cheers

On Fri, Mar 6, 2015 at 7:34 AM, Nan Zhu <zhunanmcgill@gmail.com> wrote:

>  Actually, except setting spark.hadoop.validateOutputSpecs to false to
> disable output validation for the whole program
>
> Spark implementation uses a Dynamic Variable (object PairRDDFunctions)
> internally to disable it in a case-by-case manner
>
> val disableOutputSpecValidation: DynamicVariable[Boolean] = new DynamicVariable[Boolean](false)
>
>
> I’m not sure if there is enough amount of benefits to make it worth exposing this variable
to the user…
>
>
> Best,
>
>
> --
> Nan Zhu
> http://codingcat.me
>
> On Friday, March 6, 2015 at 10:22 AM, Ted Yu wrote:
>
> Found this thread:
> http://search-hadoop.com/m/JW1q5HMrge2
>
> Cheers
>
> On Fri, Mar 6, 2015 at 6:42 AM, Sean Owen <sowen@cloudera.com> wrote:
>
> This was discussed in the past and viewed as dangerous to enable. The
> biggest problem, by far, comes when you have a job that output M
> partitions, 'overwriting' a directory of data containing N > M old
> partitions. You suddenly have a mix of new and old data.
>
> It doesn't match Hadoop's semantics either, which won't let you do
> this. You can of course simply remove the output directory.
>
> On Fri, Mar 6, 2015 at 2:20 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> > Adding support for overwrite flag would make saveAsXXFile more user
> friendly.
> >
> > Cheers
> >
> >
> >
> >> On Mar 6, 2015, at 2:14 AM, Jeff Zhang <zjffdu@gmail.com> wrote:
> >>
> >> Hi folks,
> >>
> >> I found that RDD:saveXXFile has no overwrite flag which I think is very
> helpful. Is there any reason for this ?
> >>
> >>
> >>
> >> --
> >> Best Regards
> >>
> >> Jeff Zhang
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> > For additional commands, e-mail: user-help@spark.apache.org
> >
>
>
>
>

Mime
View raw message