spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <so...@cloudera.com>
Subject Re: What about removing TaskContext#getPartitionId?
Date Sun, 15 Jan 2017 11:20:44 GMT
As you mentioned, it's called in ForeachSink. I don't know that the
scaladoc is wrong. You're saying something else, that there's no such thing
as local execution. I confess I don't know if that's true? but the doc
isn't wrong in that case, really.

More broadly, I just don't think this type of thing is worth this small
amount of attention.

On Sat, Jan 14, 2017 at 8:05 PM Jacek Laskowski <jacek@japila.pl> wrote:

> Hi Sean,
>
> Can you elaborate on " it's actually used by Spark"? Where exactly?
> I'd like to be corrected.
>
> What about the scaladoc? Since the method's a public API, I think it
> should be fixed, shouldn't it?
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Sat, Jan 14, 2017 at 6:03 PM, Sean Owen <sowen@cloudera.com> wrote:
> > It doesn't strike me as something that's problematic to use. There are a
> > thousand things in the API that, maybe in hindsight, could have been done
> > differently, but unless something is bad practice or superseded by
> another
> > superior mechanism, it's probably not worth the bother for maintainers or
> > users. I don't see much problem with this, and it's actually used by
> Spark.
> >
> > On Sat, Jan 14, 2017 at 3:55 PM Jacek Laskowski <jacek@japila.pl> wrote:
> >>
> >> Hi,
> >>
> >> Yes, correct. I was too forceful in discouraging people using it. I
> think
> >> @deprecated would be a right direction.
> >>
> >> What should be the next step? I think I should file an JIRA so it's in a
> >> release notes. Correct?
> >>
> >> I was very surprised to have noticed its resurrection in the very latest
> >> module of Spark - Structured Streaming - that will be an inspiration for
> >> others to learn Spark.
> >>
> >> Jacek
> >>
> >> On 14 Jan 2017 12:48 p.m., "Mridul Muralidharan" <mridul@gmail.com>
> wrote:
> >>>
> >>> Since TaskContext.getPartitionId is part of the public api, it cant be
> >>> removed as user code can be depending on it (unless we go through a
> >>> deprecation process for it).
> >>>
> >>> Regards,
> >>> Mridul
> >>>
> >>>
> >>> On Sat, Jan 14, 2017 at 2:02 AM, Jacek Laskowski <jacek@japila.pl>
> wrote:
> >>> > Hi,
> >>> >
> >>> > Just noticed that TaskContext#getPartitionId [1] is not used and
> >>> > moreover the scaladoc is incorrect:
> >>> >
> >>> > "It will return 0 if there is no active TaskContext for cases like
> >>> > local execution."
> >>> >
> >>> > since there are no local execution. (I've seen the comment in the
> code
> >>> > before but can't find it now).
> >>> >
> >>> > The reason to remove it is that Structured Streaming is giving new
> >>> > birth to the method in ForeachSink [2] which may look like a
> >>> > "resurrection".
> >>> >
> >>> > There's simply TaskContext.get.partitionId.
> >>> >
> >>> > What do you think?
> >>> >
> >>> > [1]
> >>> >
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/TaskContext.scala#L41
> >>> > [2]
> >>> >
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ForeachSink.scala#L50
> >>> >
> >>> > Pozdrawiam,
> >>> > Jacek Laskowski
> >>> > ----
> >>> > https://medium.com/@jaceklaskowski/
> >>> > Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark
> >>> > Follow me at https://twitter.com/jaceklaskowski
> >>> >
> >>> > ---------------------------------------------------------------------
> >>> > To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
> >>> >
>

Mime
View raw message