spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ovidiu-Cristian MARCU <ovidiu-cristian.ma...@inria.fr>
Subject Re: Have I done everything correctly when subscribing to Spark User List
Date Mon, 08 Aug 2016 18:23:12 GMT
Probably the yellow warning message can be confusing even more than not receiving an answer/opinion
on his post.

Best,
Ovidiu
> On 08 Aug 2016, at 20:10, Sean Owen <sowen@cloudera.com> wrote:
> 
> I also don't know what's going on with the "This post has NOT been
> accepted by the mailing list yet" message, because actually the
> messages always do post. In fact this has been sent to the list 4
> times:
> 
> https://www.mail-archive.com/search?l=user%40spark.apache.org&q=dueckm&submit.x=0&submit.y=0
> 
> On Mon, Aug 8, 2016 at 3:03 PM, Chris Mattmann <mattmann@apache.org> wrote:
>> 
>> 
>> 
>> 
>> 
>> On 8/8/16, 2:03 AM, "Matthias.Dueck@fiduciagad.de" <Matthias.Dueck@fiduciagad.de>
wrote:
>> 
>>> Hello,
>>> 
>>> I write to you because I am not really sure whether I did everything right when
registering and subscribing to the spark user list.
>>> 
>>> I posted the appended question to Spark User list after subscribing and receiving
the "WELCOME to user@spark.apache.org" mail from "user-help@spark.apache.org".
>>> But this post is still in state "This post has NOT been accepted by the mailing
list yet.".
>>> 
>>> Is this because I forgot something to do or did something wrong with my user
account (dueckm)? Or is it because no member of the Spark User List reacted to that post yet?
>>> 
>>> Thanks a lot for yout help.
>>> 
>>> Matthias
>>> 
>>> Fiducia & GAD IT AG | www.fiduciagad.de
>>> AG Frankfurt a. M. HRB 102381 | Sitz der Gesellschaft: Hahnstr. 48, 60528 Frankfurt
a. M. | USt-IdNr. DE 143582320
>>> Vorstand: Klaus-Peter Bruns (Vorsitzender), Claus-Dieter Toben (stv. Vorsitzender),
>>> 
>>> Jens-Olaf Bartels, Martin Beyer, Jörg Dreinhöfer, Wolfgang Eckert, Carsten
Pfläging, Jörg Staff
>>> Vorsitzender des Aufsichtsrats: Jürgen Brinkmann
>>> 
>>> ----- Weitergeleitet von Matthias Dück/M/FAG/FIDUCIA/DE am 08.08.2016 10:57
-----
>>> 
>>> Von: dueckm <matthias.dueck@fiduciagad.de>
>>> An: user@spark.apache.org
>>> Datum: 04.08.2016 13:27
>>> Betreff: Are join/groupBy operations with wide Java Beans using Dataset API much
slower than using RDD API?
>>> 
>>> ________________________________________
>>> 
>>> 
>>> 
>>> Hello,
>>> 
>>> I built a prototype that uses join and groupBy operations via Spark RDD API.
>>> Recently I migrated it to the Dataset API. Now it runs much slower than with
>>> the original RDD implementation.
>>> Did I do something wrong here? Or is this a price I have to pay for the more
>>> convienient API?
>>> Is there a known solution to deal with this effect (eg configuration via
>>> "spark.sql.shuffle.partitions" - but now could I determine the correct
>>> value)?
>>> In my prototype I use Java Beans with a lot of attributes. Does this slow
>>> down Spark-operations with Datasets?
>>> 
>>> Here I have an simple example, that shows the difference:
>>> JoinGroupByTest.zip
>>> <http://apache-spark-user-list.1001560.n3.nabble.com/file/n27473/JoinGroupByTest.zip>
>>> - I build 2 RDDs and join and group them. Afterwards I count and display the
>>> joined RDDs.  (Method de.testrddds.JoinGroupByTest.joinAndGroupViaRDD() )
>>> - When I do the same actions with Datasets it takes approximately 40 times
>>> as long (Methodd e.testrddds.JoinGroupByTest.joinAndGroupViaDatasets()).
>>> 
>>> Thank you very much for your help.
>>> Matthias
>>> 
>>> PS1: excuse me for sending this post more than once, but I am new to this
>>> mailing list and probably did something wrong when registering/subscribing,
>>> so my previous postings have not been accepted ...
>>> 
>>> PS2: See the appended screenshots taken from Spark UI (jobs 0/1 belong to
>>> RDD implementation, jobs 2/3 to Dataset):
>>> <http://apache-spark-user-list.1001560.n3.nabble.com/file/n27473/jobs.png>
>>> 
>>> <http://apache-spark-user-list.1001560.n3.nabble.com/file/n27473/Job_RDD_Details.png>
>>> 
>>> <http://apache-spark-user-list.1001560.n3.nabble.com/file/n27473/Job_Dataset_Details.png>
>>> 
>>> 
>>> 
>>> 
>>> --
>>> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Are-join-groupBy-operations-with-wide-Java-Beans-using-Dataset-API-much-slower-than-using-RDD-API-tp27473.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> 


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message