spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marco Mistroni <mmistr...@gmail.com>
Subject Re: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
Date Fri, 03 Mar 2017 23:05:07 GMT
Thanks Mark. agree.

But in my case the UI  clearly shows that there are two cores avaialble
(2/4 are used) , 2 app running but the second one never gets any chance to
run until i kill the other one

17/03/03 23:02:41 WARN TaskSchedulerImpl: Initial job has not accepted any
resources; check your cluster UI to ensure that workers are registered and
have sufficient resources
17/03/03 23:02:56 WARN TaskSchedulerImpl: Initial job has not accepted any
resources; check your cluster UI to ensure that workers are registered and
have sufficient resources

The first application uses 2/4 cores, and the other 0/4

Indeed the command i submittted from each node was this one (the previous
one was an incorrec copy and paste)

./spark-submit --master spark://
ec2-54-218-113-119.us-west-2.compute.amazonaws.com:7077  --driver-cores 1
--executor-cores 1 /root/pyscripts/dataprocessing_Sample.py
 file:///root/pyscripts/tree_addhealth.csv

And this is what i get in the UI


   - *URL:* spark://ip-172-31-14-137.us-west-2.compute.internal:7077
   - *REST URL:*
spark://ip-172-31-14-137.us-west-2.compute.internal:6066 (cluster
   mode)
   - *Alive Workers:* 2
   - *Cores in use:* 4 Total, 2 Used
   - *Memory in use:* 12.6 GB Total, 12.0 GB Used
   - *Applications:* 2 Running
   <http://ec2-54-218-113-119.us-west-2.compute.amazonaws.com:8080/#running-app>,
   0 Completed
   <http://ec2-54-218-113-119.us-west-2.compute.amazonaws.com:8080/#completed-app>

I am curious to see, if you have a Spark standalone, if you can reproduce.
As i mention, when i do similar action on EMR i have two programs running
in parallel

kr



On Fri, Mar 3, 2017 at 10:51 PM, Mark Hamstra <mark@clearstorydata.com>
wrote:

> Aseem's screenshots clearly show that he has 1 worker with 4 cores, and
> that there is an application running that has claimed those 4 cores. It is
> hardly surprising, then, that another application will not receive any
> resource offers when it tries to start up.
>
> On Fri, Mar 3, 2017 at 2:30 PM, Marco Mistroni <mmistroni@gmail.com>
> wrote:
>
>> I forgot to attach the jpg. they are in total half GB
>> The first shows 4 cores (2 per nodes) ,none in use
>> The seconcd shows  1 core per node in use
>> The third show 2 cores in use and 2 available, but the second job never
>> makes it to the cluster/. Indeed, it only makes to the cluster if i kill
>> the other job
>>
>>
>>
>>
>> On Fri, Mar 3, 2017 at 10:27 PM, Marco Mistroni <mmistroni@gmail.com>
>> wrote:
>>
>>> Hello
>>>    i'd lke to disagree to that.
>>> Here's my usecase (similar to Aseem)
>>>
>>> 1 - SEtup a Spark Stndalone Cluster with 2 nodes (2 cores each)
>>> 2 - Check resources on the cluster (see  Spark Cluster.jpg)
>>>
>>> 3- Run a script from node1 with the following command
>>>
>>>  ./spark-submit   --driver-cores 1 --executor-cores 1
>>> /root/pyscripts/dataprocessing_Sample.py  file:///root/pyscripts/tree_a
>>> ddhealth.csv
>>>
>>> 4  -Check status of cluster when submitting 1 job (see SparkCluster 1job)
>>>
>>> 5  -Run exactly the same script from node2 with the following command
>>>
>>>      ./spark-submit   --driver-cores 1 --executor-cores 1
>>> /root/pyscripts/dataprocessing_Sample.py  file:///root/pyscripts/tree_a
>>> ddhealth.csv
>>> 6. This job ends up in getting Initial job has not accepted any
>>> resources (but you can see from SparkCluster 1 job that only 2 of the cores
>>> have been used
>>>
>>> 7. Check Status of cluster when 2 jobs are running (See Spark Cluster 2
>>> job)
>>>
>>> The script below is a simple script i am running. It reads a csv file
>>> provided as input for 6 times at random times and it does not do any magic
>>> or tricks
>>>
>>>
>>>
>>> Perhaps my spark submit settings are wrong?
>>> Perhaps i need to override how i instantiat spark context?
>>>
>>> I am curious to see , if you have a standalone cluster, if you can
>>> reproduce the same problem.
>>> When i run it on EMR on Yarn, everything works fine
>>>
>>> kr
>>>  marco
>>>
>>>
>>> from pyspark.sql import SQLContext
>>> from random import randint
>>> from time import sleep
>>> from pyspark.sql.session import SparkSession
>>> import logging
>>> logger = logging.getLogger(__name__)
>>> logger.setLevel(logging.INFO)
>>> ch = logging.StreamHandler()
>>> logger.addHandler(ch)
>>>
>>>
>>> import sys
>>> def dataprocessing(filePath, count, sqlContext):
>>>     logger.info( 'Iter count is:%s' , count)
>>>     if count == 0:
>>>         print 'exiting'
>>>     else:
>>>         df_traffic_tmp = sqlContext.read.format("csv").
>>> option("header",'true').load(filePath)
>>>         logger.info( '#############################DataSet has:%s' ,
>>> df_traffic_tmp.count())
>>>         df_traffic_tmp.repartition(5)
>>>         sleepInterval = randint(10,100)
>>>         logger.info( '#############################Sleeping for %s' ,
>>> sleepInterval)
>>>         sleep(sleepInterval)
>>>         dataprocessing(filePath, count-1, sqlContext)
>>>
>>> if __name__ == '__main__':
>>>
>>>     if len(sys.argv) < 2:
>>>         print 'Usage dataProcessingSample <filename>'
>>>         sys.exit(0)
>>>
>>>     filename = sys.argv[-1]
>>>     iterations = 6
>>>     logger.info('----------------------')
>>>     logger.info('Filename:%s', filename)
>>>     logger.info('Iterations:%s', iterations )
>>>     logger.info('----------------------')
>>>
>>>     logger.info( '........Starting spark..........Loading from%s for %s
>>> iterations' , filename, iterations)
>>>     logger.info(  'Starting up....')
>>>     sc = SparkSession.builder.appName("DataProcessSimple").getOrCreate()
>>>     logger.info ('Initializing sqlContext')
>>>     sqlContext = SQLContext(sc)
>>>     dataprocessing(filename, iterations, sqlContext)
>>>
>>>
>>>
>>>
>>> On Fri, Mar 3, 2017 at 4:03 PM, Mark Hamstra <mark@clearstorydata.com>
>>> wrote:
>>>
>>>> Removing dev. This is a basic user question; please don't add noise to
>>>> the development list.
>>>>
>>>> If your jobs are not accepting any resources, then it is almost
>>>> certainly because no resource offers are being received. Check the status
>>>> of your workers and their reachability from the driver.
>>>>
>>>> On Fri, Mar 3, 2017 at 1:14 AM, Aseem Bansal <asmbansal2@gmail.com>
>>>> wrote:
>>>>
>>>>> When Initial jobs have not accepted any resources then what all can be
>>>>> wrong? Going through stackoverflow and various blogs does not help. Maybe
>>>>> need better logging for this? Adding dev
>>>>>
>>>>> On Thu, Mar 2, 2017 at 5:03 PM, Marco Mistroni <mmistroni@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi
>>>>>>  I have found exactly same issue....I even have a script which
>>>>>> simulates a random file read.
>>>>>> 2 nodes, 4 core. I am submitting code from each node passing max
core
>>>>>> 1 but one of the programme occupy 2/4 nodes and the other is In waiting
>>>>>> state
>>>>>> I am creating standalone cluster for SPK 2.0. Can send sample code
if
>>>>>> someone can help
>>>>>> Kr
>>>>>>
>>>>>> On 2 Mar 2017 11:04 am, "Aseem Bansal" <asmbansal2@gmail.com>
wrote:
>>>>>>
>>>>>> I have been trying to get basic spark cluster up on single machine.
>>>>>> I know it should be distributed but want to get something running
before I
>>>>>> do distributed in a higher environment.
>>>>>>
>>>>>> So I used sbin/start-master.sh and sbin/start-slave.sh
>>>>>>
>>>>>> I keep on getting *WARN TaskSchedulerImpl: Initial job has not
>>>>>> accepted any resources; check your cluster UI to ensure that workers
are
>>>>>> registered and have sufficient resources*
>>>>>>
>>>>>> I read up and changed /opt/spark-2.1.0-bin-h
>>>>>> adoop2.7/conf/spark-defaults.conf to contain this
>>>>>>
>>>>>> spark.executor.cores               2
>>>>>> spark.cores.max                    8
>>>>>>
>>>>>> I changed /opt/spark-2.1.0-bin-hadoop2.7/conf/spark-env.sh to contain
>>>>>>
>>>>>> SPARK_WORKER_CORES=4
>>>>>>
>>>>>> My understanding is that after this spark will use 8 cores in total
>>>>>> with the worker using 4 cores and hence being able to support 2 executor
on
>>>>>> that worker.
>>>>>>
>>>>>> But I still keep on getting the same error
>>>>>>
>>>>>> For my master I have
>>>>>> [image: Inline image 1]
>>>>>>
>>>>>> For my slave I have
>>>>>> [image: Inline image 2]
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message