spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: While Loop
Date Sat, 24 Jan 2015 10:09:16 GMT
Please check the ulimit setting. 

Cheers 


> On Jan 23, 2015, at 11:19 PM, Deep Pradhan <pradhandeep1991@gmail.com> wrote:
> 
> Ted, when I added --driver-memory 2g to my Spark submit command, I got error which says
"Too many files open"
> 
>> On Sat, Jan 24, 2015 at 10:59 AM, Deep Pradhan <pradhandeep1991@gmail.com>
wrote:
>> Version of Spark: 1.0.0
>> Spark.executor.memory 2g
>> Code Snippet:
>> /*Spark Conf and Spark Context, which I have not pasted here*/
>> 
>> val lines = sc.textFile(args(0)) //loads the edgelist file from the HDFS
>> 	var edges = lines.map(s =>{  //writing a lambda expression for creating key-value
pairs of (src_id, dst_id) pair
>> 	val fields = s.split("\\s+")
>> 	(fields(0).toLong,fields(1).toLong) //in order to avoid type mismatch, because fields
will be of type String
>> 	}).distinct().cache()
>> 	
>> 	var distances = edges.map(pair => (pair, 1)).cache() //initialize the distance
between the edges by 1
>> 	
>> 	var prevDistsSize = 0L
>> 	var distsSize = distances.count()
>> 	
>> 	while (distsSize > prevDistsSize) {
>>   	val newDists = distances.map {case ((a, b), dist) => (b, (a, dist))}.join(edges)
>>    	.map {case (b, ((a, dist), c)) => ((a, c), dist + 1)} //
>>    	
>>   	distances = distances.union(newDists).reduceByKey{(a, b) => math.min(a, b)}.cache()
//get the minimum distance between the pairs of vertices
>> 
>>  	prevDistsSize = distsSize //assign the current distance size to the previous distance
size
>>  	distsSize = distances.count() //update the count of the pairs 
>> 	}
>> 
>> 
>> 
>>> On Sat, Jan 24, 2015 at 10:51 AM, Ted Yu <yuzhihong@gmail.com> wrote:
>>> Can you provide more information?
>>> 
>>> Version of Spark
>>> Snippet of your code
>>> Heap size
>>> 
>>> Etc
>>> 
>>> 
>>> 
>>>> On Jan 23, 2015, at 9:11 PM, Deep Pradhan <pradhandeep1991@gmail.com>
wrote:
>>>> 
>>>> When I read the thread, I understand that while loop is the best possible
construct.
>>>> I am getting an OutOfMemoryError, heap space error, thats why I was asking.
>>>> 
>>>> thank you
>>>> 
>>>>> On Sat, Jan 24, 2015 at 10:37 AM, Ted Yu <yuzhihong@gmail.com>
wrote:
>>>>> Can you tell us the problem you're facing ?
>>>>> 
>>>>> Please see this thread:
>>>>> http://search-hadoop.com/m/JW1q5SsB5m
>>>>> 
>>>>> Cheers
>>>>> 
>>>>>> On Fri, Jan 23, 2015 at 9:02 PM, Deep Pradhan <pradhandeep1991@gmail.com>
wrote:
>>>>>> Hi,
>>>>>> Is there a better programming construct than while loop in Spark?
>>>>>> 
>>>>>> Thank You
> 

Mime
View raw message