spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Hamstra <m...@clearstorydata.com>
Subject Re: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
Date Fri, 03 Mar 2017 23:09:29 GMT
Cores aren't the only resource. You haven't got enough memory left for the
second Job to start:

   - *Memory in use:* 12.6 GB Total, 12.0 GB Used


On Fri, Mar 3, 2017 at 3:05 PM, Marco Mistroni <mmistroni@gmail.com> wrote:

> Thanks Mark. agree.
>
> But in my case the UI  clearly shows that there are two cores avaialble
> (2/4 are used) , 2 app running but the second one never gets any chance to
> run until i kill the other one
>
> 17/03/03 23:02:41 WARN TaskSchedulerImpl: Initial job has not accepted any
> resources; check your cluster UI to ensure that workers are registered and
> have sufficient resources
> 17/03/03 23:02:56 WARN TaskSchedulerImpl: Initial job has not accepted any
> resources; check your cluster UI to ensure that workers are registered and
> have sufficient resources
>
> The first application uses 2/4 cores, and the other 0/4
>
> Indeed the command i submittted from each node was this one (the previous
> one was an incorrec copy and paste)
>
> ./spark-submit --master spark://ec2-54-218-113-119.us-
> west-2.compute.amazonaws.com:7077  --driver-cores 1 --executor-cores 1
> /root/pyscripts/dataprocessing_Sample.py  file:///root/pyscripts/tree_
> addhealth.csv
>
> And this is what i get in the UI
>
>
>    - *URL:* spark://ip-172-31-14-137.us-west-2.compute.internal:7077
>    - *REST URL:* spark://ip-172-31-14-137.us-west-2.compute.internal:6066 (cluster
>    mode)
>    - *Alive Workers:* 2
>    - *Cores in use:* 4 Total, 2 Used
>    - *Memory in use:* 12.6 GB Total, 12.0 GB Used
>    - *Applications:* 2 Running
>    <http://ec2-54-218-113-119.us-west-2.compute.amazonaws.com:8080/#running-app>,
>    0 Completed
>    <http://ec2-54-218-113-119.us-west-2.compute.amazonaws.com:8080/#completed-app>
>
> I am curious to see, if you have a Spark standalone, if you can reproduce.
> As i mention, when i do similar action on EMR i have two programs running
> in parallel
>
> kr
>
>
>
> On Fri, Mar 3, 2017 at 10:51 PM, Mark Hamstra <mark@clearstorydata.com>
> wrote:
>
>> Aseem's screenshots clearly show that he has 1 worker with 4 cores, and
>> that there is an application running that has claimed those 4 cores. It is
>> hardly surprising, then, that another application will not receive any
>> resource offers when it tries to start up.
>>
>> On Fri, Mar 3, 2017 at 2:30 PM, Marco Mistroni <mmistroni@gmail.com>
>> wrote:
>>
>>> I forgot to attach the jpg. they are in total half GB
>>> The first shows 4 cores (2 per nodes) ,none in use
>>> The seconcd shows  1 core per node in use
>>> The third show 2 cores in use and 2 available, but the second job never
>>> makes it to the cluster/. Indeed, it only makes to the cluster if i kill
>>> the other job
>>>
>>>
>>>
>>>
>>> On Fri, Mar 3, 2017 at 10:27 PM, Marco Mistroni <mmistroni@gmail.com>
>>> wrote:
>>>
>>>> Hello
>>>>    i'd lke to disagree to that.
>>>> Here's my usecase (similar to Aseem)
>>>>
>>>> 1 - SEtup a Spark Stndalone Cluster with 2 nodes (2 cores each)
>>>> 2 - Check resources on the cluster (see  Spark Cluster.jpg)
>>>>
>>>> 3- Run a script from node1 with the following command
>>>>
>>>>  ./spark-submit   --driver-cores 1 --executor-cores 1
>>>> /root/pyscripts/dataprocessing_Sample.py  file:///root/pyscripts/tree_a
>>>> ddhealth.csv
>>>>
>>>> 4  -Check status of cluster when submitting 1 job (see SparkCluster
>>>> 1job)
>>>>
>>>> 5  -Run exactly the same script from node2 with the following command
>>>>
>>>>      ./spark-submit   --driver-cores 1 --executor-cores 1
>>>> /root/pyscripts/dataprocessing_Sample.py  file:///root/pyscripts/tree_a
>>>> ddhealth.csv
>>>> 6. This job ends up in getting Initial job has not accepted any
>>>> resources (but you can see from SparkCluster 1 job that only 2 of the cores
>>>> have been used
>>>>
>>>> 7. Check Status of cluster when 2 jobs are running (See Spark Cluster 2
>>>> job)
>>>>
>>>> The script below is a simple script i am running. It reads a csv file
>>>> provided as input for 6 times at random times and it does not do any magic
>>>> or tricks
>>>>
>>>>
>>>>
>>>> Perhaps my spark submit settings are wrong?
>>>> Perhaps i need to override how i instantiat spark context?
>>>>
>>>> I am curious to see , if you have a standalone cluster, if you can
>>>> reproduce the same problem.
>>>> When i run it on EMR on Yarn, everything works fine
>>>>
>>>> kr
>>>>  marco
>>>>
>>>>
>>>> from pyspark.sql import SQLContext
>>>> from random import randint
>>>> from time import sleep
>>>> from pyspark.sql.session import SparkSession
>>>> import logging
>>>> logger = logging.getLogger(__name__)
>>>> logger.setLevel(logging.INFO)
>>>> ch = logging.StreamHandler()
>>>> logger.addHandler(ch)
>>>>
>>>>
>>>> import sys
>>>> def dataprocessing(filePath, count, sqlContext):
>>>>     logger.info( 'Iter count is:%s' , count)
>>>>     if count == 0:
>>>>         print 'exiting'
>>>>     else:
>>>>         df_traffic_tmp = sqlContext.read.format("csv").
>>>> option("header",'true').load(filePath)
>>>>         logger.info( '#############################DataSet has:%s' ,
>>>> df_traffic_tmp.count())
>>>>         df_traffic_tmp.repartition(5)
>>>>         sleepInterval = randint(10,100)
>>>>         logger.info( '#############################Sleeping for %s' ,
>>>> sleepInterval)
>>>>         sleep(sleepInterval)
>>>>         dataprocessing(filePath, count-1, sqlContext)
>>>>
>>>> if __name__ == '__main__':
>>>>
>>>>     if len(sys.argv) < 2:
>>>>         print 'Usage dataProcessingSample <filename>'
>>>>         sys.exit(0)
>>>>
>>>>     filename = sys.argv[-1]
>>>>     iterations = 6
>>>>     logger.info('----------------------')
>>>>     logger.info('Filename:%s', filename)
>>>>     logger.info('Iterations:%s', iterations )
>>>>     logger.info('----------------------')
>>>>
>>>>     logger.info( '........Starting spark..........Loading from%s for
>>>> %s iterations' , filename, iterations)
>>>>     logger.info(  'Starting up....')
>>>>     sc = SparkSession.builder.appName("DataProcessSimple").getOrCreat
>>>> e()
>>>>     logger.info ('Initializing sqlContext')
>>>>     sqlContext = SQLContext(sc)
>>>>     dataprocessing(filename, iterations, sqlContext)
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Mar 3, 2017 at 4:03 PM, Mark Hamstra <mark@clearstorydata.com>
>>>> wrote:
>>>>
>>>>> Removing dev. This is a basic user question; please don't add noise to
>>>>> the development list.
>>>>>
>>>>> If your jobs are not accepting any resources, then it is almost
>>>>> certainly because no resource offers are being received. Check the status
>>>>> of your workers and their reachability from the driver.
>>>>>
>>>>> On Fri, Mar 3, 2017 at 1:14 AM, Aseem Bansal <asmbansal2@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> When Initial jobs have not accepted any resources then what all can
>>>>>> be wrong? Going through stackoverflow and various blogs does not
help.
>>>>>> Maybe need better logging for this? Adding dev
>>>>>>
>>>>>> On Thu, Mar 2, 2017 at 5:03 PM, Marco Mistroni <mmistroni@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi
>>>>>>>  I have found exactly same issue....I even have a script which
>>>>>>> simulates a random file read.
>>>>>>> 2 nodes, 4 core. I am submitting code from each node passing
max
>>>>>>> core 1 but one of the programme occupy 2/4 nodes and the other
is In
>>>>>>> waiting state
>>>>>>> I am creating standalone cluster for SPK 2.0. Can send sample
code
>>>>>>> if someone can help
>>>>>>> Kr
>>>>>>>
>>>>>>> On 2 Mar 2017 11:04 am, "Aseem Bansal" <asmbansal2@gmail.com>
wrote:
>>>>>>>
>>>>>>> I have been trying to get basic spark cluster up on single machine.
>>>>>>> I know it should be distributed but want to get something running
before I
>>>>>>> do distributed in a higher environment.
>>>>>>>
>>>>>>> So I used sbin/start-master.sh and sbin/start-slave.sh
>>>>>>>
>>>>>>> I keep on getting *WARN TaskSchedulerImpl: Initial job has not
>>>>>>> accepted any resources; check your cluster UI to ensure that
workers are
>>>>>>> registered and have sufficient resources*
>>>>>>>
>>>>>>> I read up and changed /opt/spark-2.1.0-bin-h
>>>>>>> adoop2.7/conf/spark-defaults.conf to contain this
>>>>>>>
>>>>>>> spark.executor.cores               2
>>>>>>> spark.cores.max                    8
>>>>>>>
>>>>>>> I changed /opt/spark-2.1.0-bin-hadoop2.7/conf/spark-env.sh to
>>>>>>> contain
>>>>>>>
>>>>>>> SPARK_WORKER_CORES=4
>>>>>>>
>>>>>>> My understanding is that after this spark will use 8 cores in
total
>>>>>>> with the worker using 4 cores and hence being able to support
2 executor on
>>>>>>> that worker.
>>>>>>>
>>>>>>> But I still keep on getting the same error
>>>>>>>
>>>>>>> For my master I have
>>>>>>> [image: Inline image 1]
>>>>>>>
>>>>>>> For my slave I have
>>>>>>> [image: Inline image 2]
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message