spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacek Laskowski <ja...@japila.pl>
Subject Re: Why's ds.foreachPartition(println) not possible?
Date Tue, 05 Jul 2016 14:47:12 GMT
Well, there is foreach for Java and another foreach for Scala. That's
what I can understand. But while supporting two language-specific APIs
-- Scala and Java -- Dataset API lost support for such simple calls
without type annotations so you have to be explicit about the variant
(since I'm using Scala I want to use Scala API right). It appears that
any single-argument-function operators in Datasets are affected :(

My question was to know whether there are works to fix it (if possible
-- I don't know if it is).

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Tue, Jul 5, 2016 at 4:21 PM, Sean Owen <sowen@cloudera.com> wrote:
> Right, should have noticed that in your second mail. But foreach
> already does what you want, right? it would be identical here.
>
> How these two methods do conceptually different things on different
> arguments. I don't think I'd expect them to accept the same functions.
>
> On Tue, Jul 5, 2016 at 3:18 PM, Jacek Laskowski <jacek@japila.pl> wrote:
>> ds is Dataset and the problem is that println (or any other
>> one-element function) would not work here (and perhaps other methods
>> with two variants - Java's and Scala's).
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> ----
>> https://medium.com/@jaceklaskowski/
>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>> Follow me at https://twitter.com/jaceklaskowski
>>
>>
>> On Tue, Jul 5, 2016 at 3:53 PM, Sean Owen <sowen@cloudera.com> wrote:
>>> A DStream is a sequence of RDDs, not of elements. I don't think I'd
>>> expect to express an operation on a DStream as if it were elements.
>>>
>>> On Tue, Jul 5, 2016 at 2:47 PM, Jacek Laskowski <jacek@japila.pl> wrote:
>>>> Sort of. Your example works, but could you do a mere
>>>> ds.foreachPartition(println)? Why not? What should I even see the Java
>>>> version?
>>>>
>>>> scala> val ds = spark.range(10)
>>>> ds: org.apache.spark.sql.Dataset[Long] = [id: bigint]
>>>>
>>>> scala> ds.foreachPartition(println)
>>>> <console>:26: error: overloaded method value foreachPartition with
alternatives:
>>>>   (func: org.apache.spark.api.java.function.ForeachPartitionFunction[Long])Unit
>>>> <and>
>>>>   (f: Iterator[Long] => Unit)Unit
>>>>  cannot be applied to (Unit)
>>>>        ds.foreachPartition(println)
>>>>           ^
>>>>
>>>> Pozdrawiam,
>>>> Jacek Laskowski
>>>> ----
>>>> https://medium.com/@jaceklaskowski/
>>>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>>>> Follow me at https://twitter.com/jaceklaskowski
>>>>
>>>>
>>>> On Tue, Jul 5, 2016 at 3:32 PM, Sean Owen <sowen@cloudera.com> wrote:
>>>>> Do you not mean ds.foreachPartition(_.foreach(println)) or similar?
>>>>>
>>>>> On Tue, Jul 5, 2016 at 2:22 PM, Jacek Laskowski <jacek@japila.pl>
wrote:
>>>>>> Hi,
>>>>>>
>>>>>> It's with the master built today. Why can't I call
>>>>>> ds.foreachPartition(println)? Is using type annotation the only way
to
>>>>>> go forward? I'd be so sad if that's the case.
>>>>>>
>>>>>> scala> ds.foreachPartition(println)
>>>>>> <console>:28: error: overloaded method value foreachPartition
with alternatives:
>>>>>>   (func: org.apache.spark.api.java.function.ForeachPartitionFunction[Record])Unit
>>>>>> <and>
>>>>>>   (f: Iterator[Record] => Unit)Unit
>>>>>>  cannot be applied to (Unit)
>>>>>>        ds.foreachPartition(println)
>>>>>>           ^
>>>>>>
>>>>>> scala> sc.version
>>>>>> res9: String = 2.0.0-SNAPSHOT
>>>>>>
>>>>>> Pozdrawiam,
>>>>>> Jacek Laskowski
>>>>>> ----
>>>>>> https://medium.com/@jaceklaskowski/
>>>>>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>>>>>> Follow me at https://twitter.com/jaceklaskowski
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>>>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Mime
View raw message