spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From StanZhai <m...@zhaishidan.cn>
Subject Re: The driver hangs at DataFrame.rdd in Spark 2.1.0
Date Fri, 24 Feb 2017 03:41:15 GMT
Thanks for Cheng's help.


It must be something wrong with InferFiltersFromConstraints, I just removed InferFiltersFromConstraints
from org/apache/spark/sql/catalyst/optimizer/Optimizer.scala to avoid this issue. I will analysis
this issue with the method your provided.




------------------ Original ------------------
From:  "Cheng Lian [via Apache Spark Developers List]";<ml-node+s1001551n21069h99@n3.nabble.com>;
Send time: Friday, Feb 24, 2017 2:28 AM
To: "Stan Zhai"<mail@zhaishidan.cn>; 

Subject:  Re: The driver hangs at DataFrame.rdd in Spark 2.1.0



 	                   
This one seems to be relevant, but it's already fixed in 2.1.0.
     
One way to debug is to turn on trace log and check how the       analyzer/optimizer behaves.
     
     
     On 2/22/17 11:11 PM, StanZhai wrote:
     
            Could this be related to https://issues.apache.org/jira/browse/SPARK-17733 ?
                
         
         
         
         ------------------ Original ------------------
                    From:  "Cheng Lian-3 [via Apache Spark Developers             List]";<[hidden
              email]>;
           Send time: Thursday, Feb 23, 2017 9:43 AM
           To: "Stan Zhai"<[hidden               email]>; 
           Subject:  Re: The driver hangs at DataFrame.rdd             in Spark 2.1.0
         
         
         
         
Just from the thread dump you provided, it seems that this           particular query plan
jams our optimizer. However, it's also           possible that the driver just happened to
be running optimizer           rules at that particular time point.
         
         
Since query planning doesn't touch any actual data, could you           please try to minimize
this query by replacing the actual           relations with temporary views derived from Scala
local           collections? In this way, it would be much easier for others           to
reproduce issue.
         
Cheng
         
         
         On 2/22/17 5:16 PM, Stan Zhai           wrote:
         
                    Thanks for lian's reply.
           
           
           Here is the QueryPlan generated by Spark 1.6.2(I can't             get it in Spark
2.1.0):
                        ...           
                        
                        
             
             ------------------ Original ------------------
                            Subject:  Re: The driver hangs at                 DataFrame.rdd
in Spark 2.1.0
             
             
             
             
What is the query plan? We had once observed query plans               that grow exponentially
in iterative ML workloads and the               query planner hangs forever. For example,
each iteration               combines 4 plan trees of the last iteration and forms a     
         larger plan tree. The size of the plan tree can easily               reach billions
of nodes after 15 iterations.
             
             
             On 2/22/17 9:29 AM, Stan Zhai               wrote:
             
                            Hi all,
               
               
               The driver hangs at DataFrame.rdd in Spark 2.1.0 when                 the DataFrame(SQL)
is complex, Following thread dump of                 my driver:
               ...
                          
           
                  
         
         
         
                    If you reply to this email, your             message will be added to
the discussion below:
           http://apache-spark-developers-list.1001551.n3.nabble.com/Re-The-driver-hangs-at-DataFrame-rdd-in-Spark-2-1-0-tp21052p21053.html
        
                    To start a new topic under Apache Spark Developers List, email       
   [hidden email]           
           To unsubscribe from Apache Spark Developers List, click here.
           NAML 
       
       
       
       View this message in context: Re:         The driver hangs at DataFrame.rdd in Spark
2.1.0
       Sent from the Apache         Spark Developers List mailing list archive at Nabble.com.
          
    	 	 	 	
 	
 	
 	 		If you reply to this email, your message will be added to the discussion below:
 		http://apache-spark-developers-list.1001551.n3.nabble.com/Re-The-driver-hangs-at-DataFrame-rdd-in-Spark-2-1-0-tp21052p21069.html
	
 	 		To start a new topic under Apache Spark Developers List, email ml-node+s1001551n1h91@n3.nabble.com

 		To unsubscribe from Apache Spark Developers List, click here.
 		NAML



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Re-The-driver-hangs-at-DataFrame-rdd-in-Spark-2-1-0-tp21052p21073.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
Mime
View raw message