spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From StanZhai <m...@zhaishidan.cn>
Subject Re: The driver hangs at DataFrame.rdd in Spark 2.1.0
Date Thu, 23 Feb 2017 07:11:04 GMT
Could this be related to https://issues.apache.org/jira/browse/SPARK-17733 ?




------------------ Original ------------------
From:  "Cheng Lian-3 [via Apache Spark Developers List]";<ml-node+s1001551n21053h3@n3.nabble.com>;
Send time: Thursday, Feb 23, 2017 9:43 AM
To: "Stan Zhai"<mail@zhaishidan.cn>; 

Subject:  Re: The driver hangs at DataFrame.rdd in Spark 2.1.0



 	                   
Just from the thread dump you provided, it seems that this       particular query plan jams
our optimizer. However, it's also       possible that the driver just happened to be running
optimizer       rules at that particular time point.
     
     
Since query planning doesn't touch any actual data, could you       please try to minimize
this query by replacing the actual       relations with temporary views derived from Scala
local       collections? In this way, it would be much easier for others to       reproduce
issue.
     
Cheng
     
     
     On 2/22/17 5:16 PM, Stan Zhai wrote:
     
            Thanks for lian's reply.
       
       
       Here is the QueryPlan generated by Spark 1.6.2(I can't get it         in Spark 2.1.0):
                ...       
        
                
         
         ------------------ Original ------------------
                    Subject:  Re: The driver hangs at DataFrame.rdd             in Spark 2.1.0
         
         
         
         
What is the query plan? We had once observed query plans that           grow exponentially
in iterative ML workloads and the query           planner hangs forever. For example, each
iteration combines 4           plan trees of the last iteration and forms a larger plan tree.
          The size of the plan tree can easily reach billions of nodes           after 15
iterations.
         
         
         On 2/22/17 9:29 AM, Stan Zhai           wrote:
         
                    Hi all,
           
           
           The driver hangs at DataFrame.rdd in Spark 2.1.0 when the             DataFrame(SQL)
is complex, Following thread dump of my             driver:
           ...
                  
       
          
    	 	 	 	
 	
 	
 	 		If you reply to this email, your message will be added to the discussion below:
 		http://apache-spark-developers-list.1001551.n3.nabble.com/Re-The-driver-hangs-at-DataFrame-rdd-in-Spark-2-1-0-tp21052p21053.html
	
 	 		To start a new topic under Apache Spark Developers List, email ml-node+s1001551n1h91@n3.nabble.com

 		To unsubscribe from Apache Spark Developers List, click here.
 		NAML



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Re-The-driver-hangs-at-DataFrame-rdd-in-Spark-2-1-0-tp21052p21054.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
Mime
View raw message