spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Utkarsh Sengar <utkarsh2...@gmail.com>
Subject Re: spark.mesos.coarse impacts memory performance on mesos
Date Thu, 01 Oct 2015 23:21:00 GMT
Not sure what you mean by that, I shared the data which I see in spark UI.
Can you point me to a location where I can precisely get the data you need?

When I run the job in fine grained mode, I see tons are tasks created and
destroyed under a mesos "framework". I have about 80k spark tasks which I
think translates directly to independent mesos tasks.
https://dl.dropboxusercontent.com/u/2432670/Screen%20Shot%202015-10-01%20at%204.14.34%20PM.png

When i run the job in coarse grained mode, I just see 1-4 tasks with 1-4
executors (it varies from what mesos allocates). And these mesos tasks try
to complete the 80k spark tasks and runs out of memory eventually (see the
stack track above) in the gist shared above.


On Thu, Oct 1, 2015 at 4:07 PM, Tim Chen <tim@mesosphere.io> wrote:

> Hi Utkarsh,
>
> I replied earlier asking what is your task assignment like with fine vs
> coarse grain mode look like?
>
> Tim
>
> On Thu, Oct 1, 2015 at 4:05 PM, Utkarsh Sengar <utkarsh2012@gmail.com>
> wrote:
>
>> Bumping it up, its not really a blocking issue.
>> But fine grain mode eats up uncertain number of resources in mesos and
>> launches tons of tasks, so I would prefer using the coarse grained mode if
>> only it didn't run out of memory.
>>
>> Thanks,
>> -Utkarsh
>>
>> On Mon, Sep 28, 2015 at 2:24 PM, Utkarsh Sengar <utkarsh2012@gmail.com>
>> wrote:
>>
>>> Hi Tim,
>>>
>>> 1. spark.mesos.coarse:false (fine grain mode)
>>> This is the data dump for config and executors assigned:
>>> https://gist.github.com/utkarsh2012/6401d5526feccab14687
>>>
>>> 2. spark.mesos.coarse:true (coarse grain mode)
>>> Dump for coarse mode:
>>> https://gist.github.com/utkarsh2012/918cf6f8ed5945627188
>>>
>>> As you can see, exactly the same code works fine in fine grained, goes
>>> out of memory in coarse grained mode. First an executor was lost and then
>>> the driver went out of memory.
>>> So I am trying to understand what is different in fine grained vs coarse
>>> mode other than allocation of multiple mesos tasks vs 1 mesos task. Clearly
>>> spark is not managing memory in the same way.
>>>
>>> Thanks,
>>> -Utkarsh
>>>
>>>
>>> On Fri, Sep 25, 2015 at 9:17 AM, Tim Chen <tim@mesosphere.io> wrote:
>>>
>>>> Hi Utkarsh,
>>>>
>>>> What is your job placement like when you run fine grain mode? You said
>>>> coarse grain mode only ran with one node right?
>>>>
>>>> And when the job is running could you open the Spark webui and get
>>>> stats about the heap size and other java settings?
>>>>
>>>> Tim
>>>>
>>>> On Thu, Sep 24, 2015 at 10:56 PM, Utkarsh Sengar <utkarsh2012@gmail.com
>>>> > wrote:
>>>>
>>>>> Bumping this one up, any suggestions on the stacktrace?
>>>>> spark.mesos.coarse=true is not working and the driver crashed with the
>>>>> error.
>>>>>
>>>>> On Wed, Sep 23, 2015 at 3:29 PM, Utkarsh Sengar <utkarsh2012@gmail.com
>>>>> > wrote:
>>>>>
>>>>>> Missed to do a reply-all.
>>>>>>
>>>>>> Tim,
>>>>>>
>>>>>> spark.mesos.coarse = true doesn't work and spark.mesos.coarse = false
>>>>>> works (sorry there was a typo in my last email, I meant "when I do
>>>>>> "spark.mesos.coarse=false", the job works like a charm. ").
>>>>>>
>>>>>> I get this exception with spark.mesos.coarse = true:
>>>>>>
>>>>>> 15/09/22 20:18:05 INFO MongoCollectionSplitter: Created split: min={
>>>>>> "_id" : "55af4bf26750ad38a444d7cf"}, max= { "_id" :
>>>>>> "55af5a61e8a42806f47546c1"}
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611337>15/09/22
>>>>>> 20:18:05 INFO MongoCollectionSplitter: Created split: min={ "_id"
:
>>>>>> "55af5a61e8a42806f47546c1"}, max= null
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611453>Exception
>>>>>> in thread "main" java.lang.OutOfMemoryError: Java heap space
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611524>
>>>>>> at org.apache.spark.rdd.CartesianRDD.getPartitions(CartesianRDD.scala:60)
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611599>
>>>>>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611671>
>>>>>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611743>
>>>>>> at scala.Option.getOrElse(Option.scala:120)
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611788>
>>>>>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611843>
>>>>>> at org.apache.spark.rdd.CartesianRDD.getPartitions(CartesianRDD.scala:60)
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611918>
>>>>>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611990>
>>>>>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612062>
>>>>>> at scala.Option.getOrElse(Option.scala:120)
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612107>
>>>>>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612162>
>>>>>> at
>>>>>> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612245>
>>>>>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612317>
>>>>>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612389>
>>>>>> at scala.Option.getOrElse(Option.scala:120)
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612434>
>>>>>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612489>
>>>>>> at
>>>>>> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612572>
>>>>>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612644>
>>>>>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612716>
>>>>>> at scala.Option.getOrElse(Option.scala:120)
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612761>
>>>>>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612816>
>>>>>> at
>>>>>> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612899>
>>>>>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612971>
>>>>>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613043>
>>>>>> at scala.Option.getOrElse(Option.scala:120)
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613088>
>>>>>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613143>
>>>>>> at
>>>>>> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613226>
>>>>>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613298>
>>>>>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613370>
>>>>>> at scala.Option.getOrElse(Option.scala:120)
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613415>
>>>>>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613470>
>>>>>> at org.apache.spark.ShuffleDependency.<init>(Dependency.scala:82)
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613537>
>>>>>> at org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:78)
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613612>15/09/22
>>>>>> 20:18:17 INFO SparkContext: Invoking stop() from shutdown hook
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613684>15/09/22
>>>>>> 20:18:17 INFO BlockManagerInfo: Removed broadcast_2_piece0 on
>>>>>> some-ip-here:37706 in memory (size: 1964.0 B, free: 2.8 GB)
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613814>15/09/22
>>>>>> 20:18:17 INFO BlockManagerInfo: Removed broadcast_2_piece0 on mesos-slave10
>>>>>> in memory (size: 1964.0 B, free: 5.2 GB)
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613977>15/09/22
>>>>>> 20:18:17 INFO BlockManagerInfo: Removed broadcast_1_piece0 on
>>>>>> some-ip-here:37706 in memory (size: 17.2 KB, free: 2.8 GB)
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#614106>15/09/22
>>>>>> 20:18:17 INFO BlockManagerInfo: Removed broadcast_1_piece0 on
>>>>>> mesos-slave105 in memory (size: 17.2 KB, free: 5.2 GB)
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#614268>15/09/22
>>>>>> 20:18:17 INFO BlockManagerInfo: Removed broadcast_1_piece0 on mesos-slave1
>>>>>> in memory (size: 17.2 KB, free: 5.2 GB)
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#614429>15/09/22
>>>>>> 20:18:17 INFO BlockManagerInfo: Removed broadcast_1_piece0 on mesos-slave9
>>>>>> in memory (size: 17.2 KB, free: 5.2 GB)
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#614590>15/09/22
>>>>>> 20:18:17 INFO BlockManagerInfo: Removed broadcast_1_piece0 on mesos-slave3
>>>>>> in memory (size: 17.2 KB, free: 5.2 GB)
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#614751>15/09/22
>>>>>> 20:18:17 INFO SparkUI: Stopped Spark web UI at
>>>>>> http://some-ip-here:4040
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#614831>15/09/22
>>>>>> 20:18:17 INFO DAGScheduler: Stopping DAGScheduler
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#614890>15/09/22
>>>>>> 20:18:17 INFO CoarseMesosSchedulerBackend: Shutting down all executors
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#614970>15/09/22
>>>>>> 20:18:17 INFO CoarseMesosSchedulerBackend: Asking each executor to
shut down
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615056>I0922
>>>>>> 20:18:17.794598 171 sched.cpp:1591] Asked to stop the driver
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615125>I0922
>>>>>> 20:18:17.794739 143 sched.cpp:835] Stopping framework
>>>>>> '20150803-224832-1577534986-5050-1614-0016'
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615231>15/09/22
>>>>>> 20:18:17 INFO CoarseMesosSchedulerBackend: driver.run() returned
with code
>>>>>> DRIVER_STOPPED
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615330>15/09/22
>>>>>> 20:18:17 INFO MapOutputTrackerMasterEndpoint:
>>>>>> MapOutputTrackerMasterEndpoint stopped!
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615425>15/09/22
>>>>>> 20:18:17 INFO Utils: path =
>>>>>> /tmp/spark-98801318-9c49-473b-bf2f-07ea42187252/blockmgr-0e0e1a1c-894e-4e79-beac-ead0dff43166,
>>>>>> already present as root for deletion.
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615595>15/09/22
>>>>>> 20:18:17 INFO MemoryStore: MemoryStore cleared
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615651>15/09/22
>>>>>> 20:18:17 INFO BlockManager: BlockManager stopped
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615709>15/09/22
>>>>>> 20:18:17 INFO BlockManagerMaster: BlockManagerMaster stopped
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615779>15/09/22
>>>>>> 20:18:17 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
>>>>>> OutputCommitCoordinator stopped!
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615892>15/09/22
>>>>>> 20:18:17 INFO SparkContext: Successfully stopped SparkContext
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615963>15/09/22
>>>>>> 20:18:17 INFO Utils: Shutdown hook called
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#616014>15/09/22
>>>>>> 20:18:17 INFO Utils: Deleting directory
>>>>>> /tmp/spark-98801318-9c49-473b-bf2f-07ea42187252
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#616111>15/09/22
>>>>>> 20:18:17 INFO RemoteActorRefProvider$RemotingTerminator: Shutting
down
>>>>>> remote daemon.
>>>>>>
>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#616206>15/09/22
>>>>>> 20:18:17 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon
shut
>>>>>> down; proceeding with flushing remote transports.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Sep 22, 2015 at 1:26 AM, Tim Chen <tim@mesosphere.io>
wrote:
>>>>>>
>>>>>>> Hi Utkarsh,
>>>>>>>
>>>>>>> Just to be sure you originally set coarse to false but then to
true?
>>>>>>> Or is it the other way around?
>>>>>>>
>>>>>>> Also what's the exception/stack trace when the driver crashed?
>>>>>>>
>>>>>>> Coarse grain mode per-starts all the Spark executor backends,
so has
>>>>>>> the least overhead comparing to fine grain. There is no single
answer for
>>>>>>> which mode you should use, otherwise we would have removed one
of those
>>>>>>> modes since it depends on your use case.
>>>>>>>
>>>>>>> There are quite some factor why there could be huge GC pauses,
but I
>>>>>>> don't think if you switch to standalone your GC pauses go away.
>>>>>>>
>>>>>>> Tim
>>>>>>>
>>>>>>> On Mon, Sep 21, 2015 at 5:18 PM, Utkarsh Sengar <
>>>>>>> utkarsh2012@gmail.com> wrote:
>>>>>>>
>>>>>>>> I am running Spark 1.4.1 on mesos.
>>>>>>>>
>>>>>>>> The spark job does a "cartesian" of 4 RDDs (aRdd, bRdd, cRdd,
dRdd)
>>>>>>>> of size 100, 100, 7 and 1 respectively. Lets call it prouctRDD.
>>>>>>>>
>>>>>>>> Creation of "aRdd" needs data pull from multiple data sources,
>>>>>>>> merging it and creating a tuple of JavaRdd, finally aRDD
looks something
>>>>>>>> like this: JavaRDD<Tuple4<A1, A2>>
>>>>>>>> bRdd, cRdd and dRdds are just List<> of values.
>>>>>>>>
>>>>>>>> Then apply a transformation on prouctRDD and finally call
>>>>>>>> "saveAsTextFile" to save the result of my transformation.
>>>>>>>>
>>>>>>>> Problem:
>>>>>>>> By setting "spark.mesos.coarse=true", creation of "aRdd"
works fine
>>>>>>>> but driver crashes while doing the cartesian but when I do
>>>>>>>> "spark.mesos.coarse=true", the job works like a charm. I
am running spark
>>>>>>>> on mesos.
>>>>>>>>
>>>>>>>> Comments:
>>>>>>>> So I wanted to understand what role does "spark.mesos.coarse=true"
>>>>>>>> plays in terms of memory and compute performance. My findings
look counter
>>>>>>>> intuitive since:
>>>>>>>>
>>>>>>>>    1. "spark.mesos.coarse=true" just runs on 1 mesos task,
so
>>>>>>>>    there should be an overhead of spinning up mesos tasks
which should impact
>>>>>>>>    the performance.
>>>>>>>>    2. What config for "spark.mesos.coarse" recommended for
running
>>>>>>>>    spark on mesos? Or there is no best answer and it depends
on usecase?
>>>>>>>>    3. Also by setting "spark.mesos.coarse=true", I notice
that I
>>>>>>>>    get huge GC pauses even with small dataset but a long
running job (but this
>>>>>>>>    can be a separate discussion).
>>>>>>>>
>>>>>>>> Let me know if I am missing something obvious, we are learning
>>>>>>>> spark tuning as we move forward :)
>>>>>>>>
>>>>>>>> --
>>>>>>>> Thanks,
>>>>>>>> -Utkarsh
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Thanks,
>>>>>> -Utkarsh
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Thanks,
>>>>> -Utkarsh
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Thanks,
>>> -Utkarsh
>>>
>>
>>
>>
>> --
>> Thanks,
>> -Utkarsh
>>
>
>


-- 
Thanks,
-Utkarsh

Mime
View raw message