spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jianshi Huang <jianshi.hu...@gmail.com>
Subject Re: spark.shuffle.consolidateFiles seems not working
Date Fri, 01 Aug 2014 16:34:03 GMT
I see. I'll try spark 1.1.


On Fri, Aug 1, 2014 at 9:58 AM, Aaron Davidson <ilikerps@gmail.com> wrote:

> Make sure to set it before you start your SparkContext -- it cannot be
> changed afterwards. Be warned that there are some known issues with shuffle
> file consolidation, which should be fixed in 1.1.
>
>
> On Thu, Jul 31, 2014 at 12:40 PM, Jianshi Huang <jianshi.huang@gmail.com>
> wrote:
>
>> I got the number from the Hadoop admin. It's 1M actually. I suspect the
>> consolidation didn't work as expected? Any other reason?
>>
>>
>> On Thu, Jul 31, 2014 at 11:01 AM, Shao, Saisai <saisai.shao@intel.com>
>> wrote:
>>
>>>  I don’t think it’s a bug of consolidated shuffle, it’s a Linux
>>> configuration problem. The default open files in Linux is 1024, while your
>>> open file is larger than 1024 you will get the error as you mentioned
>>> below. So you can set the open file numbers to a large one by: ulimit –n
>>> xxx or write into /etc/security/limits.conf in Ubuntu.
>>>
>>>
>>>
>>> Shuffle consolidation can reduce the total shuffle file numbers, but the
>>> concurrent opened file number is the same as basic hash-based shuffle.
>>>
>>>
>>>
>>> Thanks
>>>
>>> Jerry
>>>
>>>
>>>
>>> *From:* Jianshi Huang [mailto:jianshi.huang@gmail.com]
>>> *Sent:* Thursday, July 31, 2014 10:34 AM
>>> *To:* user@spark.apache.org
>>> *Cc:* xiaodi@sjtu.edu.cn
>>> *Subject:* Re: spark.shuffle.consolidateFiles seems not working
>>>
>>>
>>>
>>> Ok... but my question is why spark.shuffle.consolidateFiles is working
>>> (or is it)? Is this a bug?
>>>
>>>
>>>
>>> On Wed, Jul 30, 2014 at 4:29 PM, Larry Xiao <xiaodi@sjtu.edu.cn> wrote:
>>>
>>> Hi Jianshi,
>>>
>>> I've met similar situation before.
>>> And my solution was 'ulimit', you can use
>>>
>>> -a to see your current settings
>>> -n to set open files limit
>>> (and other limits also)
>>>
>>> And I set -n to 10240.
>>>
>>> I see spark.shuffle.consolidateFiles helps by reusing open files.
>>> (so I don't know to what extend does it help)
>>>
>>> Hope it helps.
>>>
>>> Larry
>>>
>>>
>>>
>>> On 7/30/14, 4:01 PM, Jianshi Huang wrote:
>>>
>>> I'm using Spark 1.0.1 on Yarn-Client mode.
>>>
>>> SortByKey always reports a FileNotFoundExceptions with messages says
>>> "too many open files".
>>>
>>> I already set spark.shuffle.consolidateFiles to true:
>>>
>>>   conf.set("spark.shuffle.consolidateFiles", "true")
>>>
>>> But it seems not working. What are the other possible reasons? How to
>>> fix it?
>>>
>>> Jianshi
>>>
>>> --
>>> Jianshi Huang
>>>
>>> LinkedIn: jianshi
>>> Twitter: @jshuang
>>> Github & Blog: http://huangjs.github.com/
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Jianshi Huang
>>>
>>> LinkedIn: jianshi
>>> Twitter: @jshuang
>>> Github & Blog: http://huangjs.github.com/
>>>
>>
>>
>>
>> --
>> Jianshi Huang
>>
>> LinkedIn: jianshi
>> Twitter: @jshuang
>> Github & Blog: http://huangjs.github.com/
>>
>
>


-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/

Mime
View raw message