spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aseem Bansal <asmbans...@gmail.com>
Subject Re: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
Date Sat, 04 Mar 2017 16:38:08 GMT
That one application is the only thing which is present. That is the
application which is giving that error. There is no second application.
That is why it is confusing. I submit just 1 application which takes the
resources but starts complaining that it does not have the resources.

On Sat, Mar 4, 2017 at 4:21 AM, Mark Hamstra <mark@clearstorydata.com>
wrote:

> Aseem's screenshots clearly show that he has 1 worker with 4 cores, and
> that there is an application running that has claimed those 4 cores. It is
> hardly surprising, then, that another application will not receive any
> resource offers when it tries to start up.
>
> On Fri, Mar 3, 2017 at 2:30 PM, Marco Mistroni <mmistroni@gmail.com>
> wrote:
>
>> I forgot to attach the jpg. they are in total half GB
>> The first shows 4 cores (2 per nodes) ,none in use
>> The seconcd shows  1 core per node in use
>> The third show 2 cores in use and 2 available, but the second job never
>> makes it to the cluster/. Indeed, it only makes to the cluster if i kill
>> the other job
>>
>>
>>
>>
>> On Fri, Mar 3, 2017 at 10:27 PM, Marco Mistroni <mmistroni@gmail.com>
>> wrote:
>>
>>> Hello
>>>    i'd lke to disagree to that.
>>> Here's my usecase (similar to Aseem)
>>>
>>> 1 - SEtup a Spark Stndalone Cluster with 2 nodes (2 cores each)
>>> 2 - Check resources on the cluster (see  Spark Cluster.jpg)
>>>
>>> 3- Run a script from node1 with the following command
>>>
>>>  ./spark-submit   --driver-cores 1 --executor-cores 1
>>> /root/pyscripts/dataprocessing_Sample.py  file:///root/pyscripts/tree_a
>>> ddhealth.csv
>>>
>>> 4  -Check status of cluster when submitting 1 job (see SparkCluster 1job)
>>>
>>> 5  -Run exactly the same script from node2 with the following command
>>>
>>>      ./spark-submit   --driver-cores 1 --executor-cores 1
>>> /root/pyscripts/dataprocessing_Sample.py  file:///root/pyscripts/tree_a
>>> ddhealth.csv
>>> 6. This job ends up in getting Initial job has not accepted any
>>> resources (but you can see from SparkCluster 1 job that only 2 of the cores
>>> have been used
>>>
>>> 7. Check Status of cluster when 2 jobs are running (See Spark Cluster 2
>>> job)
>>>
>>> The script below is a simple script i am running. It reads a csv file
>>> provided as input for 6 times at random times and it does not do any magic
>>> or tricks
>>>
>>>
>>>
>>> Perhaps my spark submit settings are wrong?
>>> Perhaps i need to override how i instantiat spark context?
>>>
>>> I am curious to see , if you have a standalone cluster, if you can
>>> reproduce the same problem.
>>> When i run it on EMR on Yarn, everything works fine
>>>
>>> kr
>>>  marco
>>>
>>>
>>> from pyspark.sql import SQLContext
>>> from random import randint
>>> from time import sleep
>>> from pyspark.sql.session import SparkSession
>>> import logging
>>> logger = logging.getLogger(__name__)
>>> logger.setLevel(logging.INFO)
>>> ch = logging.StreamHandler()
>>> logger.addHandler(ch)
>>>
>>>
>>> import sys
>>> def dataprocessing(filePath, count, sqlContext):
>>>     logger.info( 'Iter count is:%s' , count)
>>>     if count == 0:
>>>         print 'exiting'
>>>     else:
>>>         df_traffic_tmp = sqlContext.read.format("csv").
>>> option("header",'true').load(filePath)
>>>         logger.info( '#############################DataSet has:%s' ,
>>> df_traffic_tmp.count())
>>>         df_traffic_tmp.repartition(5)
>>>         sleepInterval = randint(10,100)
>>>         logger.info( '#############################Sleeping for %s' ,
>>> sleepInterval)
>>>         sleep(sleepInterval)
>>>         dataprocessing(filePath, count-1, sqlContext)
>>>
>>> if __name__ == '__main__':
>>>
>>>     if len(sys.argv) < 2:
>>>         print 'Usage dataProcessingSample <filename>'
>>>         sys.exit(0)
>>>
>>>     filename = sys.argv[-1]
>>>     iterations = 6
>>>     logger.info('----------------------')
>>>     logger.info('Filename:%s', filename)
>>>     logger.info('Iterations:%s', iterations )
>>>     logger.info('----------------------')
>>>
>>>     logger.info( '........Starting spark..........Loading from%s for %s
>>> iterations' , filename, iterations)
>>>     logger.info(  'Starting up....')
>>>     sc = SparkSession.builder.appName("DataProcessSimple").getOrCreate()
>>>     logger.info ('Initializing sqlContext')
>>>     sqlContext = SQLContext(sc)
>>>     dataprocessing(filename, iterations, sqlContext)
>>>
>>>
>>>
>>>
>>> On Fri, Mar 3, 2017 at 4:03 PM, Mark Hamstra <mark@clearstorydata.com>
>>> wrote:
>>>
>>>> Removing dev. This is a basic user question; please don't add noise to
>>>> the development list.
>>>>
>>>> If your jobs are not accepting any resources, then it is almost
>>>> certainly because no resource offers are being received. Check the status
>>>> of your workers and their reachability from the driver.
>>>>
>>>> On Fri, Mar 3, 2017 at 1:14 AM, Aseem Bansal <asmbansal2@gmail.com>
>>>> wrote:
>>>>
>>>>> When Initial jobs have not accepted any resources then what all can be
>>>>> wrong? Going through stackoverflow and various blogs does not help. Maybe
>>>>> need better logging for this? Adding dev
>>>>>
>>>>> On Thu, Mar 2, 2017 at 5:03 PM, Marco Mistroni <mmistroni@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi
>>>>>>  I have found exactly same issue....I even have a script which
>>>>>> simulates a random file read.
>>>>>> 2 nodes, 4 core. I am submitting code from each node passing max
core
>>>>>> 1 but one of the programme occupy 2/4 nodes and the other is In waiting
>>>>>> state
>>>>>> I am creating standalone cluster for SPK 2.0. Can send sample code
if
>>>>>> someone can help
>>>>>> Kr
>>>>>>
>>>>>> On 2 Mar 2017 11:04 am, "Aseem Bansal" <asmbansal2@gmail.com>
wrote:
>>>>>>
>>>>>> I have been trying to get basic spark cluster up on single machine.
>>>>>> I know it should be distributed but want to get something running
before I
>>>>>> do distributed in a higher environment.
>>>>>>
>>>>>> So I used sbin/start-master.sh and sbin/start-slave.sh
>>>>>>
>>>>>> I keep on getting *WARN TaskSchedulerImpl: Initial job has not
>>>>>> accepted any resources; check your cluster UI to ensure that workers
are
>>>>>> registered and have sufficient resources*
>>>>>>
>>>>>> I read up and changed /opt/spark-2.1.0-bin-h
>>>>>> adoop2.7/conf/spark-defaults.conf to contain this
>>>>>>
>>>>>> spark.executor.cores               2
>>>>>> spark.cores.max                    8
>>>>>>
>>>>>> I changed /opt/spark-2.1.0-bin-hadoop2.7/conf/spark-env.sh to contain
>>>>>>
>>>>>> SPARK_WORKER_CORES=4
>>>>>>
>>>>>> My understanding is that after this spark will use 8 cores in total
>>>>>> with the worker using 4 cores and hence being able to support 2 executor
on
>>>>>> that worker.
>>>>>>
>>>>>> But I still keep on getting the same error
>>>>>>
>>>>>> For my master I have
>>>>>> [image: Inline image 1]
>>>>>>
>>>>>> For my slave I have
>>>>>> [image: Inline image 2]
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message