spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacek Laskowski <ja...@japila.pl>
Subject Re: Why's ds.foreachPartition(println) not possible?
Date Wed, 06 Jul 2016 09:53:52 GMT
Thanks Cody, Reynold, and Ryan! Learnt a lot and feel "corrected".

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Wed, Jul 6, 2016 at 2:46 AM, Shixiong(Ryan) Zhu
<shixiong@databricks.com> wrote:
> I asked this question in Scala user group two years ago:
> https://groups.google.com/forum/#!topic/scala-user/W4f0d8xK1nk
>
> Take a look if you are interested in.
>
> On Tue, Jul 5, 2016 at 1:31 PM, Reynold Xin <rxin@databricks.com> wrote:
>>
>> You can file it here: https://issues.scala-lang.org/secure/Dashboard.jspa
>>
>> Perhaps "bug" is not the right word, but "limitation". println accepts a
>> single argument of type Any and returns Unit, and it appears that Scala
>> fails to infer the correct overloaded method in this case.
>>
>>   def println() = Console.println()
>>   def println(x: Any) = Console.println(x)
>>
>>
>>
>> On Tue, Jul 5, 2016 at 1:27 PM, Cody Koeninger <cody@koeninger.org> wrote:
>>>
>>> I don't think that's a scala compiler bug.
>>>
>>> println is a valid expression that returns unit.
>>>
>>> Unit is not a single-argument function, and does not match any of the
>>> overloads of foreachPartition
>>>
>>> You may be used to a conversion taking place when println is passed to
>>> method expecting a function, but that's not a safe thing to do
>>> silently for multiple overloads.
>>>
>>> tldr;
>>>
>>> just use
>>>
>>> ds.foreachPartition(x => println(x))
>>>
>>> you don't need any type annotations
>>>
>>>
>>> On Tue, Jul 5, 2016 at 2:53 PM, Jacek Laskowski <jacek@japila.pl> wrote:
>>> > Hi Reynold,
>>> >
>>> > Is this already reported and tracked somewhere. I'm quite sure that
>>> > people will be asking about the reasons Spark does this. Where are
>>> > such issues reported usually?
>>> >
>>> > Pozdrawiam,
>>> > Jacek Laskowski
>>> > ----
>>> > https://medium.com/@jaceklaskowski/
>>> > Mastering Apache Spark http://bit.ly/mastering-apache-spark
>>> > Follow me at https://twitter.com/jaceklaskowski
>>> >
>>> >
>>> > On Tue, Jul 5, 2016 at 6:19 PM, Reynold Xin <rxin@databricks.com>
>>> > wrote:
>>> >> This seems like a Scala compiler bug.
>>> >>
>>> >>
>>> >> On Tuesday, July 5, 2016, Jacek Laskowski <jacek@japila.pl> wrote:
>>> >>>
>>> >>> Well, there is foreach for Java and another foreach for Scala. That's
>>> >>> what I can understand. But while supporting two language-specific
>>> >>> APIs
>>> >>> -- Scala and Java -- Dataset API lost support for such simple calls
>>> >>> without type annotations so you have to be explicit about the variant
>>> >>> (since I'm using Scala I want to use Scala API right). It appears
>>> >>> that
>>> >>> any single-argument-function operators in Datasets are affected
:(
>>> >>>
>>> >>> My question was to know whether there are works to fix it (if
>>> >>> possible
>>> >>> -- I don't know if it is).
>>> >>>
>>> >>> Pozdrawiam,
>>> >>> Jacek Laskowski
>>> >>> ----
>>> >>> https://medium.com/@jaceklaskowski/
>>> >>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>>> >>> Follow me at https://twitter.com/jaceklaskowski
>>> >>>
>>> >>>
>>> >>> On Tue, Jul 5, 2016 at 4:21 PM, Sean Owen <sowen@cloudera.com>
wrote:
>>> >>> > Right, should have noticed that in your second mail. But foreach
>>> >>> > already does what you want, right? it would be identical here.
>>> >>> >
>>> >>> > How these two methods do conceptually different things on different
>>> >>> > arguments. I don't think I'd expect them to accept the same
>>> >>> > functions.
>>> >>> >
>>> >>> > On Tue, Jul 5, 2016 at 3:18 PM, Jacek Laskowski <jacek@japila.pl>
>>> >>> > wrote:
>>> >>> >> ds is Dataset and the problem is that println (or any other
>>> >>> >> one-element function) would not work here (and perhaps
other
>>> >>> >> methods
>>> >>> >> with two variants - Java's and Scala's).
>>> >>> >>
>>> >>> >> Pozdrawiam,
>>> >>> >> Jacek Laskowski
>>> >>> >> ----
>>> >>> >> https://medium.com/@jaceklaskowski/
>>> >>> >> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>>> >>> >> Follow me at https://twitter.com/jaceklaskowski
>>> >>> >>
>>> >>> >>
>>> >>> >> On Tue, Jul 5, 2016 at 3:53 PM, Sean Owen <sowen@cloudera.com>
>>> >>> >> wrote:
>>> >>> >>> A DStream is a sequence of RDDs, not of elements. I
don't think
>>> >>> >>> I'd
>>> >>> >>> expect to express an operation on a DStream as if it
were
>>> >>> >>> elements.
>>> >>> >>>
>>> >>> >>> On Tue, Jul 5, 2016 at 2:47 PM, Jacek Laskowski <jacek@japila.pl>
>>> >>> >>> wrote:
>>> >>> >>>> Sort of. Your example works, but could you do a
mere
>>> >>> >>>> ds.foreachPartition(println)? Why not? What should
I even see
>>> >>> >>>> the
>>> >>> >>>> Java
>>> >>> >>>> version?
>>> >>> >>>>
>>> >>> >>>> scala> val ds = spark.range(10)
>>> >>> >>>> ds: org.apache.spark.sql.Dataset[Long] = [id: bigint]
>>> >>> >>>>
>>> >>> >>>> scala> ds.foreachPartition(println)
>>> >>> >>>> <console>:26: error: overloaded method value
foreachPartition
>>> >>> >>>> with
>>> >>> >>>> alternatives:
>>> >>> >>>>   (func:
>>> >>> >>>>
>>> >>> >>>> org.apache.spark.api.java.function.ForeachPartitionFunction[Long])Unit
>>> >>> >>>> <and>
>>> >>> >>>>   (f: Iterator[Long] => Unit)Unit
>>> >>> >>>>  cannot be applied to (Unit)
>>> >>> >>>>        ds.foreachPartition(println)
>>> >>> >>>>           ^
>>> >>> >>>>
>>> >>> >>>> Pozdrawiam,
>>> >>> >>>> Jacek Laskowski
>>> >>> >>>> ----
>>> >>> >>>> https://medium.com/@jaceklaskowski/
>>> >>> >>>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>>> >>> >>>> Follow me at https://twitter.com/jaceklaskowski
>>> >>> >>>>
>>> >>> >>>>
>>> >>> >>>> On Tue, Jul 5, 2016 at 3:32 PM, Sean Owen <sowen@cloudera.com>
>>> >>> >>>> wrote:
>>> >>> >>>>> Do you not mean ds.foreachPartition(_.foreach(println))
or
>>> >>> >>>>> similar?
>>> >>> >>>>>
>>> >>> >>>>> On Tue, Jul 5, 2016 at 2:22 PM, Jacek Laskowski
>>> >>> >>>>> <jacek@japila.pl>
>>> >>> >>>>> wrote:
>>> >>> >>>>>> Hi,
>>> >>> >>>>>>
>>> >>> >>>>>> It's with the master built today. Why can't
I call
>>> >>> >>>>>> ds.foreachPartition(println)? Is using
type annotation the
>>> >>> >>>>>> only way
>>> >>> >>>>>> to
>>> >>> >>>>>> go forward? I'd be so sad if that's the
case.
>>> >>> >>>>>>
>>> >>> >>>>>> scala> ds.foreachPartition(println)
>>> >>> >>>>>> <console>:28: error: overloaded method
value foreachPartition
>>> >>> >>>>>> with
>>> >>> >>>>>> alternatives:
>>> >>> >>>>>>   (func:
>>> >>> >>>>>>
>>> >>> >>>>>> org.apache.spark.api.java.function.ForeachPartitionFunction[Record])Unit
>>> >>> >>>>>> <and>
>>> >>> >>>>>>   (f: Iterator[Record] => Unit)Unit
>>> >>> >>>>>>  cannot be applied to (Unit)
>>> >>> >>>>>>        ds.foreachPartition(println)
>>> >>> >>>>>>           ^
>>> >>> >>>>>>
>>> >>> >>>>>> scala> sc.version
>>> >>> >>>>>> res9: String = 2.0.0-SNAPSHOT
>>> >>> >>>>>>
>>> >>> >>>>>> Pozdrawiam,
>>> >>> >>>>>> Jacek Laskowski
>>> >>> >>>>>> ----
>>> >>> >>>>>> https://medium.com/@jaceklaskowski/
>>> >>> >>>>>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>>> >>> >>>>>> Follow me at https://twitter.com/jaceklaskowski
>>> >>> >>>>>>
>>> >>> >>>>>>
>>> >>> >>>>>>
>>> >>> >>>>>> ---------------------------------------------------------------------
>>> >>> >>>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>> >>> >>>>>>
>>> >>>
>>> >>> ---------------------------------------------------------------------
>>> >>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>> >>>
>>> >>
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>> >
>>
>>
>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Mime
View raw message