spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <so...@cloudera.com>
Subject Re: Why's ds.foreachPartition(println) not possible?
Date Tue, 05 Jul 2016 14:21:02 GMT
Right, should have noticed that in your second mail. But foreach
already does what you want, right? it would be identical here.

How these two methods do conceptually different things on different
arguments. I don't think I'd expect them to accept the same functions.

On Tue, Jul 5, 2016 at 3:18 PM, Jacek Laskowski <jacek@japila.pl> wrote:
> ds is Dataset and the problem is that println (or any other
> one-element function) would not work here (and perhaps other methods
> with two variants - Java's and Scala's).
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Tue, Jul 5, 2016 at 3:53 PM, Sean Owen <sowen@cloudera.com> wrote:
>> A DStream is a sequence of RDDs, not of elements. I don't think I'd
>> expect to express an operation on a DStream as if it were elements.
>>
>> On Tue, Jul 5, 2016 at 2:47 PM, Jacek Laskowski <jacek@japila.pl> wrote:
>>> Sort of. Your example works, but could you do a mere
>>> ds.foreachPartition(println)? Why not? What should I even see the Java
>>> version?
>>>
>>> scala> val ds = spark.range(10)
>>> ds: org.apache.spark.sql.Dataset[Long] = [id: bigint]
>>>
>>> scala> ds.foreachPartition(println)
>>> <console>:26: error: overloaded method value foreachPartition with alternatives:
>>>   (func: org.apache.spark.api.java.function.ForeachPartitionFunction[Long])Unit
>>> <and>
>>>   (f: Iterator[Long] => Unit)Unit
>>>  cannot be applied to (Unit)
>>>        ds.foreachPartition(println)
>>>           ^
>>>
>>> Pozdrawiam,
>>> Jacek Laskowski
>>> ----
>>> https://medium.com/@jaceklaskowski/
>>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>>> Follow me at https://twitter.com/jaceklaskowski
>>>
>>>
>>> On Tue, Jul 5, 2016 at 3:32 PM, Sean Owen <sowen@cloudera.com> wrote:
>>>> Do you not mean ds.foreachPartition(_.foreach(println)) or similar?
>>>>
>>>> On Tue, Jul 5, 2016 at 2:22 PM, Jacek Laskowski <jacek@japila.pl> wrote:
>>>>> Hi,
>>>>>
>>>>> It's with the master built today. Why can't I call
>>>>> ds.foreachPartition(println)? Is using type annotation the only way to
>>>>> go forward? I'd be so sad if that's the case.
>>>>>
>>>>> scala> ds.foreachPartition(println)
>>>>> <console>:28: error: overloaded method value foreachPartition with
alternatives:
>>>>>   (func: org.apache.spark.api.java.function.ForeachPartitionFunction[Record])Unit
>>>>> <and>
>>>>>   (f: Iterator[Record] => Unit)Unit
>>>>>  cannot be applied to (Unit)
>>>>>        ds.foreachPartition(println)
>>>>>           ^
>>>>>
>>>>> scala> sc.version
>>>>> res9: String = 2.0.0-SNAPSHOT
>>>>>
>>>>> Pozdrawiam,
>>>>> Jacek Laskowski
>>>>> ----
>>>>> https://medium.com/@jaceklaskowski/
>>>>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>>>>> Follow me at https://twitter.com/jaceklaskowski
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Mime
View raw message