spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-1967) Using parallelize method to create RDD, wordcount app just hanging there without errors or warnings
Date Fri, 23 Jan 2015 13:59:34 GMT

    [ https://issues.apache.org/jira/browse/SPARK-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14289266#comment-14289266
] 

Sean Owen commented on SPARK-1967:
----------------------------------

I can't reproduce this. Is it still a problem? can you provide the program and output?

> Using parallelize method to create RDD, wordcount app just hanging there without errors
or warnings
> ---------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-1967
>                 URL: https://issues.apache.org/jira/browse/SPARK-1967
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 0.9.1
>         Environment: Ubuntu-12.04, single machine spark standalone, 8 core, 8G mem, spark
0.9.1, java-1.7
>            Reporter: Min Li
>
> I was trying the parallelize method to create RDD. I used Java. And it's a simple wordcount
program, except that I first read the input into memory and then use the parallelize method
to create the RDD, rather than the default textFile method in the given example. 
> Pseudo codes:
> JavaSparkContext ctx = new JavaSparkContext($SparkMasterURL, $NAME, $SparkHome, $jars);
> List<String> input = #read lines from input file and form a ArrayList<String>
> JavaRDD lines = ctx.parallelize(input);
> //followed by wordcount
> ----above is not working.
> JavaRDD lines = ctx.textFile(file);
> //followed by wordcount
> ----this is working
> The log is:
> 14/05/29 16:18:43 INFO Slf4jLogger: Slf4jLogger started
> 14/05/29 16:18:43 INFO Remoting: Starting remoting
> 14/05/29 16:18:43 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark@spark:55224]
> 14/05/29 16:18:43 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark@spark:55224]
> 14/05/29 16:18:43 INFO SparkEnv: Registering BlockManagerMaster
> 14/05/29 16:18:43 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20140529161843-836a
> 14/05/29 16:18:43 INFO MemoryStore: MemoryStore started with capacity 1056.0 MB.
> 14/05/29 16:18:43 INFO ConnectionManager: Bound socket to port 42942 with id = ConnectionManagerId(spark,42942)
> 14/05/29 16:18:43 INFO BlockManagerMaster: Trying to register BlockManager
> 14/05/29 16:18:43 INFO BlockManagerMasterActor$BlockManagerInfo: Registering block manager
spark:42942 with 1056.0 MB RAM
> 14/05/29 16:18:43 INFO BlockManagerMaster: Registered BlockManager
> 14/05/29 16:18:43 INFO HttpServer: Starting HTTP Server
> 14/05/29 16:18:43 INFO HttpBroadcast: Broadcast server started at http://10.227.119.185:43522
> 14/05/29 16:18:43 INFO SparkEnv: Registering MapOutputTracker
> 14/05/29 16:18:43 INFO HttpFileServer: HTTP File server directory is /tmp/spark-3704a621-789c-4d97-b1fc-9654236dba3e
> 14/05/29 16:18:43 INFO HttpServer: Starting HTTP Server
> 14/05/29 16:18:43 INFO SparkUI: Started Spark Web UI at http://spark:4040
> 14/05/29 16:18:44 INFO SparkContext: Added JAR /home/maxmin/tmp/spark-test-1.0-SNAPSHOT-jar-with-dependencies.jar
at http://10.227.119.185:55286/jars/spark-test-1.0-SNAPSHOT-jar-with-dependencies.jar with
timestamp 1401394724045
> 14/05/29 16:18:44 INFO AppClient$ClientActor: Connecting to master spark://spark:7077...
> 14/05/29 16:18:44 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app
ID app-20140529161844-0001
> 14/05/29 16:18:44 INFO AppClient$ClientActor: Executor added: app-20140529161844-0001/0
on worker-20140529155406-spark-59658 (spark:59658) with 8 cores
> The app is hanging here forever. And spark:8080 spark:4040 are not showing any strange
info. The Spark Stages page shows the Active Stages is reduceByKey, tasks: Succeeded/Total
is 0/2. I've also tried directly call lines.count after parallelize, and the app will stuck
at the count stage.
> I've also tried to use some static give string list and use the parallelize to create
rdd. This time, the app is still hanging but the stages show nothing active. And the log is
similar. 
> I used spark-0.9.1 and used default spark-env.sh. In the slaves file I have only one
host. I used maven to compile a fat jar with spark specified as provided. I modified the run-example
script to submit the jar.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message