spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Davidson <ilike...@gmail.com>
Subject Re: hadoopRDD stalls reading entire directory
Date Mon, 02 Jun 2014 20:19:03 GMT
You may have to do "sudo jps", because it should definitely list your
processes.

What does hivecluster2:8080 look like? My guess is it says there are 2
applications registered, and one has taken all the executors. There must be
two applications running, as those are the only things that keep open those
4040/4041 ports.


On Mon, Jun 2, 2014 at 11:32 AM, Russell Jurney <russell.jurney@gmail.com>
wrote:

> If it matters, I have servers running at
> http://hivecluster2:4040/stages/ and http://hivecluster2:4041/stages/
>
> When I run rdd.first, I see an item at
> http://hivecluster2:4041/stages/ but no tasks are running. Stage ID 1,
> first at <console>:46, Tasks: Succeeded/Total 0/16.
>
> On Mon, Jun 2, 2014 at 10:09 AM, Russell Jurney
> <russell.jurney@gmail.com> wrote:
> > Looks like just worker and master processes are running:
> >
> > [hivedata@hivecluster2 ~]$ jps
> >
> > 10425 Jps
> >
> > [hivedata@hivecluster2 ~]$ ps aux|grep spark
> >
> > hivedata 10424  0.0  0.0 103248   820 pts/3    S+   10:05   0:00 grep
> spark
> >
> > root     10918  0.5  1.4 4752880 230512 ?      Sl   May27  41:43 java -cp
> >
> :/opt/cloudera/parcels/SPARK-0.9.0-1.cdh4.6.0.p0.98/lib/spark/conf:/opt/cloudera/parcels/SPARK-0.9.0-1.cdh4.6.0.p0.98/lib/spark/core/lib/*:/opt/cloudera/parcels/SPARK-0.9.0-1.cdh4.6.0.p0.98/lib/spark/repl/lib/*:/opt/cloudera/parcels/SPARK-0.9.0-1.cdh4.6.0.p0.98/lib/spark/examples/lib/*:/opt/cloudera/parcels/SPARK-0.9.0-1.cdh4.6.0.p0.98/lib/spark/bagel/lib/*:/opt/cloudera/parcels/SPARK-0.9.0-1.cdh4.6.0.p0.98/lib/spark/mllib/lib/*:/opt/cloudera/parcels/SPARK-0.9.0-1.cdh4.6.0.p0.98/lib/spark/streaming/lib/*:/opt/cloudera/parcels/SPARK-0.9.0-1.cdh4.6.0.p0.98/lib/spark/lib/*:/etc/hadoop/conf:/opt/cloudera/parcels/CDH/lib/hadoop/*:/opt/cloudera/parcels/CDH/lib/hadoop/../hadoop-hdfs/*:/opt/cloudera/parcels/CDH/lib/hadoop/../hadoop-yarn/*:/opt/cloudera/parcels/CDH/lib/hadoop/../hadoop-mapreduce/*:/opt/cloudera/parcels/SPARK-0.9.0-1.cdh4.6.0.p0.98/lib/spark/lib/scala-library.jar:/opt/cloudera/parcels/SPARK-0.9.0-1.cdh4.6.0.p0.98/lib/spark/lib/scala-compiler.jar:/opt/cloudera/parcels/SPARK-0.9.0-1.cdh4.6.0.p0.98/lib/spark/lib/jline.jar
> > -Dspark.akka.logLifecycleEvents=true
> >
> -Djava.library.path=/opt/cloudera/parcels/SPARK-0.9.0-1.cdh4.6.0.p0.98/lib/spark/lib:/opt/cloudera/parcels/CDH/lib/hadoop/lib/native
> > -Xms512m -Xmx512m org.apache.spark.deploy.master.Master --ip hivecluster2
> > --port 7077 --webui-port 18080
> >
> > root     12715  0.0  0.0 148028   656 ?        S    May27   0:00 sudo
> > /opt/cloudera/parcels/SPARK/lib/spark/bin/spark-class
> > org.apache.spark.deploy.worker.Worker spark://hivecluster2:7077
> >
> > root     12716  0.3  1.1 4155884 191340 ?      Sl   May27  30:21 java -cp
> >
> :/opt/cloudera/parcels/SPARK/lib/spark/conf:/opt/cloudera/parcels/SPARK/lib/spark/core/lib/*:/opt/cloudera/parcels/SPARK/lib/spark/repl/lib/*:/opt/cloudera/parcels/SPARK/lib/spark/examples/lib/*:/opt/cloudera/parcels/SPARK/lib/spark/bagel/lib/*:/opt/cloudera/parcels/SPARK/lib/spark/mllib/lib/*:/opt/cloudera/parcels/SPARK/lib/spark/streaming/lib/*:/opt/cloudera/parcels/SPARK/lib/spark/lib/*:/etc/hadoop/conf:/opt/cloudera/parcels/CDH/lib/hadoop/*:/opt/cloudera/parcels/CDH/lib/hadoop/../hadoop-hdfs/*:/opt/cloudera/parcels/CDH/lib/hadoop/../hadoop-yarn/*:/opt/cloudera/parcels/CDH/lib/hadoop/../hadoop-mapreduce/*:/opt/cloudera/parcels/SPARK/lib/spark/lib/scala-library.jar:/opt/cloudera/parcels/SPARK/lib/spark/lib/scala-compiler.jar:/opt/cloudera/parcels/SPARK/lib/spark/lib/jline.jar
> > -Dspark.akka.logLifecycleEvents=true
> >
> -Djava.library.path=/opt/cloudera/parcels/SPARK/lib/spark/lib:/opt/cloudera/parcels/CDH/lib/hadoop/lib/native
> > -Xms512m -Xmx512m org.apache.spark.deploy.worker.Worker
> > spark://hivecluster2:7077
> >
> >
> >
> >
> > On Sun, Jun 1, 2014 at 7:41 PM, Aaron Davidson <ilikerps@gmail.com>
> wrote:
> >>
> >> Sounds like you have two shells running, and the first one is talking
> all
> >> your resources. Do a "jps" and kill the other guy, then try again.
> >>
> >> By the way, you can look at http://localhost:8080 (replace localhost
> with
> >> the server your Spark Master is running on) to see what applications are
> >> currently started, and what resource allocations they have.
> >>
> >>
> >> On Sun, Jun 1, 2014 at 6:47 PM, Russell Jurney <
> russell.jurney@gmail.com>
> >> wrote:
> >>>
> >>> Thanks again. Run results here:
> >>> https://gist.github.com/rjurney/dc0efae486ba7d55b7d5
> >>>
> >>> This time I get a port already in use exception on 4040, but it isn't
> >>> fatal. Then when I run rdd.first, I get this over and over:
> >>>
> >>> 14/06/01 18:35:40 WARN scheduler.TaskSchedulerImpl: Initial job has not
> >>> accepted any resources; check your cluster UI to ensure that workers
> are
> >>> registered and have sufficient memory
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Sun, Jun 1, 2014 at 3:09 PM, Aaron Davidson <ilikerps@gmail.com>
> >>> wrote:
> >>>>
> >>>> You can avoid that by using the constructor that takes a SparkConf,
a
> la
> >>>>
> >>>> val conf = new SparkConf()
> >>>> conf.setJars("avro.jar", ...)
> >>>> val sc = new SparkContext(conf)
> >>>>
> >>>>
> >>>> On Sun, Jun 1, 2014 at 2:32 PM, Russell Jurney
> >>>> <russell.jurney@gmail.com> wrote:
> >>>>>
> >>>>> Followup question: the docs to make a new SparkContext require that
I
> >>>>> know where $SPARK_HOME is. However, I have no idea. Any idea where
> that
> >>>>> might be?
> >>>>>
> >>>>>
> >>>>> On Sun, Jun 1, 2014 at 10:28 AM, Aaron Davidson <ilikerps@gmail.com>
> >>>>> wrote:
> >>>>>>
> >>>>>> Gotcha. The easiest way to get your dependencies to your Executors
> >>>>>> would probably be to construct your SparkContext with all necessary
> jars
> >>>>>> passed in (as the "jars" parameter), or inside a SparkConf with
> setJars().
> >>>>>> Avro is a "necessary jar", but it's possible your application
also
> needs to
> >>>>>> distribute other ones to the cluster.
> >>>>>>
> >>>>>> An easy way to make sure all your dependencies get shipped to
the
> >>>>>> cluster is to create an assembly jar of your application, and
then
> you just
> >>>>>> need to tell Spark about that jar, which includes all your
> application's
> >>>>>> transitive dependencies. Maven and sbt both have pretty
> straightforward ways
> >>>>>> of producing assembly jars.
> >>>>>>
> >>>>>>
> >>>>>> On Sat, May 31, 2014 at 11:23 PM, Russell Jurney
> >>>>>> <russell.jurney@gmail.com> wrote:
> >>>>>>>
> >>>>>>> Thanks for the fast reply.
> >>>>>>>
> >>>>>>> I am running CDH 4.4 with the Cloudera Parcel of Spark 0.9.0,
in
> >>>>>>> standalone mode.
> >>>>>>>
> >>>>>>>
> >>>>>>> On Saturday, May 31, 2014, Aaron Davidson <ilikerps@gmail.com>
> wrote:
> >>>>>>>>
> >>>>>>>> First issue was because your cluster was configured
incorrectly.
> You
> >>>>>>>> could probably read 1 file because that was done on
the driver
> node, but
> >>>>>>>> when it tried to run a job on the cluster, it failed.
> >>>>>>>>
> >>>>>>>> Second issue, it seems that the jar containing avro
is not getting
> >>>>>>>> propagated to the Executors. What version of Spark are
you
> running on? What
> >>>>>>>> deployment mode (YARN, standalone, Mesos)?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Sat, May 31, 2014 at 9:37 PM, Russell Jurney
> >>>>>>>> <russell.jurney@gmail.com> wrote:
> >>>>>>>>
> >>>>>>>> Now I get this:
> >>>>>>>>
> >>>>>>>> scala> rdd.first
> >>>>>>>>
> >>>>>>>> 14/05/31 21:36:28 INFO spark.SparkContext: Starting
job: first at
> >>>>>>>> <console>:41
> >>>>>>>>
> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.DAGScheduler: Got job
4 (first at
> >>>>>>>> <console>:41) with 1 output partitions (allowLocal=true)
> >>>>>>>>
> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.DAGScheduler: Final
stage: Stage
> 4
> >>>>>>>> (first at <console>:41)
> >>>>>>>>
> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.DAGScheduler: Parents
of final
> >>>>>>>> stage: List()
> >>>>>>>>
> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.DAGScheduler: Missing
parents:
> >>>>>>>> List()
> >>>>>>>>
> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.DAGScheduler: Computing
the
> >>>>>>>> requested partition locally
> >>>>>>>>
> >>>>>>>> 14/05/31 21:36:28 INFO rdd.HadoopRDD: Input split:
> >>>>>>>>
> hdfs://hivecluster2/securityx/web_proxy_mef/2014/05/29/22/part-m-00000.avro:0+3864
> >>>>>>>>
> >>>>>>>> 14/05/31 21:36:28 INFO spark.SparkContext: Job finished:
first at
> >>>>>>>> <console>:41, took 0.037371256 s
> >>>>>>>>
> >>>>>>>> 14/05/31 21:36:28 INFO spark.SparkContext: Starting
job: first at
> >>>>>>>> <console>:41
> >>>>>>>>
> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.DAGScheduler: Got job
5 (first at
> >>>>>>>> <console>:41) with 16 output partitions (allowLocal=true)
> >>>>>>>>
> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.DAGScheduler: Final
stage: Stage
> 5
> >>>>>>>> (first at <console>:41)
> >>>>>>>>
> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.DAGScheduler: Parents
of final
> >>>>>>>> stage: List()
> >>>>>>>>
> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.DAGScheduler: Missing
parents:
> >>>>>>>> List()
> >>>>>>>>
> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.DAGScheduler: Submitting
Stage 5
> >>>>>>>> (HadoopRDD[0] at hadoopRDD at <console>:37), which
has no missing
> parents
> >>>>>>>>
> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.DAGScheduler: Submitting
16
> missing
> >>>>>>>> tasks from Stage 5 (HadoopRDD[0] at hadoopRDD at <console>:37)
> >>>>>>>>
> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSchedulerImpl:
Adding task
> set
> >>>>>>>> 5.0 with 16 tasks
> >>>>>>>>
> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting
task
> 5.0:0
> >>>>>>>> as TID 92 on executor 2: hivecluster3 (NODE_LOCAL)
> >>>>>>>>
> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized
task
> >>>>>>>> 5.0:0 as 1294 bytes in 1 ms
> >>>>>>>>
> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting
task
> 5.0:3
> >>>>>>>> as TID 93 on executor 1: hivecluster5.labs.lan (NODE_LOCAL)
> >>>>>>>>
> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized
task
> >>>>>>>> 5.0:3 as 1294 bytes in 0 ms
> >>>>>>>>
> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting
task
> 5.0:1
> >>>>>>>> as TID 94 on executor 4: hivecluster4 (NODE_LOCAL)
> >>>>>>>>
> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized
task
> >>>>>>>> 5.0:1 as 1294 bytes in 1 ms
> >>>>>>>>
> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting
task
> 5.0:2
> >>>>>>>> as TID 95 on executor 0: hivecluster6.labs.lan (NODE_LOCAL)
> >>>>>>>>
> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized
task
> >>>>>>>> 5.0:2 as 1294 bytes in 0 ms
> >>>>>>>>
> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting
task
> 5.0:4
> >>>>>>>> as TID 96 on executor 3: hivecluster1.labs.lan (NODE_LOCAL)
> >>>>>>>>
> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized
task
> >>>>>>>> 5.0:4 as 1294 bytes in 0 ms
> >>>>>>>>
> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting
task
> 5.0:6
> >>>>>>>> as TID 97 on executor 2: hivecluster3 (NODE_LOCAL)
> >>>>>>>>
> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized
task
> >>>>>>>> 5.0:6 as 1294 bytes in 0 ms
> >>>>>>>>
> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting
task
> 5.0:5
> >>>>>>>> as TID 98 on executor 1: hivecluster5.labs.lan (NODE_LOCAL)
> >>>>>>>>
> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized
task
> >>>>>>>> 5.0:5 as 1294 bytes in 0 ms
> >>>>>>>>
> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting
task
> 5.0:8
> >>>>>>>> as TID 99 on executor 4: hivecluster4 (NODE_LOCAL)
> >>>>>>>>
> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized
task
> >>>>>>>> 5.0:8 as 1294 bytes in 0 ms
> >>>>>>>>
> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting
task
> 5.0:7
> >>>>>>>> as TID 100 on executor 0: hivecluster6.labs.lan (NODE_LOCAL)
> >>>>>>>>
> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized
task
> >>>>>>>> 5.0:7 as 1294 bytes in 0 ms
> >>>>>>>>
> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting
task
> >>>>>>>> 5.0:10 as TID 101 on executor 3: hivecluster1.labs.lan
> (NODE_LOCAL)
> >>>>>>>>
> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized
task
> >>>>>>>> 5.0:10 as 1294 bytes in 0 ms
> >>>>>>>>
> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting
task
> >>>>>>>> 5.0:14 as TID 102 on executor 2: hivecluster3 (NODE_LOCAL)
> >>>>>>>>
> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized
task
> >>>>>>>> 5.0:14 as 1294 bytes in 0 ms
> >>>>>>>>
> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting
task
> 5.0:9
> >>>>>>>> as TID 103 on executor 1: hivecluster5.labs.lan (NODE_LOCAL)
> >>>>>>>>
> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized
task
> >>>>>>>> 5.0:9 as 1294 bytes in 0 ms
> >>>>>>>>
> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting
task
> >>>>>>>> 5.0:11 as TID 104 on executor 4: hivecluster4 (N
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> >>>>>>> datasyndrome.com
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> >>>>> datasyndrome.com
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> >>> datasyndrome.com
> >>
> >>
> >
> >
> >
> > --
> > Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> datasyndrome.com
>
>
>
> --
> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> datasyndrome.com
>

Mime
View raw message