airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Iván Robla Albarrán <ivanro...@gmail.com>
Subject Problem with Spark submit operator - Some Example
Date Fri, 22 Sep 2017 11:14:15 GMT
Hi,

I am trying to use the spark submit operator

This my Dag

from __future__ import print_functionfrom airflow.contrib.operators
import SparkSubmitOperatorfrom airflow.models import DAGfrom datetime
import datetime, timedeltaimport os
DAG_ID = os.path.basename(__file__).replace(".pyc", "").replace(".py", "")
APPLICATION_FILE_PATH =
'/opt/cloudera/parcels/CDH/lib/spark/examples/lib/spark-examples-1.6.0-cdh5.11.1-hadoop2.6.0-cdh5.11.1.jar'
default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'retries': 0,
    }
dag = DAG(DAG_ID, default_args=default_args, schedule_interval=None,
start_date=(datetime.now() - timedelta(minutes=1)))
dummy = SparkSubmitOperator(
    task_id='spark-submit-scala',
    application=APPLICATION_FILE_PATH,
    java_class='org.apache.spark.examples.SparkPi',
    application_args='10',
    dag=dag)


The Operator isn´t getting the java_class and the applications args

[2017-09-15 13:24:39,177] {base_task_runner.py:95} INFO - Subtask:
--------------------------------------------------------------------------------
[2017-09-15 13:24:39,177] {base_task_runner.py:95} INFO - Subtask:
Starting attempt 1 of 1
[2017-09-15 13:24:39,177] {base_task_runner.py:95} INFO - Subtask:
--------------------------------------------------------------------------------
[2017-09-15 13:24:39,177] {base_task_runner.py:95} INFO - Subtask:
[2017-09-15 13:24:39,191] {base_task_runner.py:95} INFO - Subtask:
[2017-09-15 13:24:39,190] {models.py:1342} INFO - Executing
<Task(SparkSubmitOperator): spark-submit-scala> on 2017-09-15 00:00:00
[2017-09-15 13:24:39,217] {base_task_runner.py:95} INFO - Subtask:
[2017-09-15 13:24:39,216] {base_hook.py:67} INFO - Using connection
to: yarn
[2017-09-15 13:24:39,992] {base_task_runner.py:95} INFO - Subtask:
[2017-09-15 13:24:39,991] {spark_submit_hook.py:214} INFO - Error: No
main class set in JAR; please specify one with --class
[2017-09-15 13:24:39,992] {base_task_runner.py:95} INFO - Subtask:
[2017-09-15 13:24:39,992] {spark_submit_hook.py:214} INFO - Run with
--help for usage help or --verbose for debug output
[2017-09-15 13:24:39,993] {base_task_runner.py:95} INFO - Subtask:
[2017-09-15 13:24:39,992] {models.py:1417} ERROR - Cannot execute:
['spark-submit', '--master', u'yarn', '--name', 'airflow-spark',
'--queue', u'root.default',
'/opt/cloudera/parcels/CDH/lib/spark/examples/lib/spark-examples-1.6.0-cdh5.11.1-hadoop2.6.0-cdh5.11.1.jar'].
Error code is: 1. Output: , Stderr:
[2017-09-15 13:24:39,993] {base_task_runner.py:95} INFO - Subtask:
Traceback (most recent call last):
[2017-09-15 13:24:39,993] {base_task_runner.py:95} INFO - Subtask:
File "/opt/conda/2/airflow/lib/python2.7/site-packages/airflow/models.py",
line 1374, in run
[2017-09-15 13:24:39,993] {base_task_runner.py:95} INFO - Subtask:
result = task_copy.execute(context=context)
[2017-09-15 13:24:39,994] {base_task_runner.py:95} INFO - Subtask:
File "/opt/conda/2/airflow/lib/python2.7/site-packages/airflow/contrib/operators/spark_submit_operator.py",
line 109, in execute
[2017-09-15 13:24:39,994] {base_task_runner.py:95} INFO - Subtask:
self._hook.submit(self._application)
[2017-09-15 13:24:39,994] {base_task_runner.py:95} INFO - Subtask:
File "/opt/conda/2/airflow/lib/python2.7/site-packages/airflow/contrib/hooks/spark_submit_hook.py",
line 193, in submit
[2017-09-15 13:24:39,995] {base_task_runner.py:95} INFO - Subtask:
spark_submit_cmd, self._sp.returncode, output, stderr
[2017-09-15 13:24:39,995] {base_task_runner.py:95} INFO - Subtask:
AirflowException: Cannot execute: ['spark-submit', '--master',
u'yarn', '--name', 'airflow-spark', '--queue', u'root.default',
'/opt/cloudera/parcels/CDH/lib/spark/examples/lib/spark-examples-1.6.0-cdh5.11.1-hadoop2.6.0-cdh5.11.1.jar'].
Error code is: 1. Output: , Stderr:
[2017-09-15 13:24:39,995] {base_task_runner.py:95} INFO - Subtask:
[2017-09-15 13:24:39,993] {models.py:1441} INFO - Marking task as
FAILED.
[2017-09-15 13:24:40,020] {base_task_runner.py:95} INFO - Subtask:
[2017-09-15 13:24:40,009] {models.py:1462} ERROR - Cannot execute:
['spark-submit', '--master', u'yarn', '--name', 'airflow-spark',
'--queue', u'root.default',
'/opt/cloudera/parcels/CDH/lib/spark/examples/lib/spark-examples-1.6.0-cdh5.11.1-hadoop2.6.0-cdh5.11.1.jar'].
Error code is: 1. Output: , Stderr:
[2017-09-15 13:24:40,021] {base_task_runner.py:95} INFO - Subtask:
Traceback (most recent call last):
[2017-09-15 13:24:40,021] {base_task_runner.py:95} INFO - Subtask:
File "/opt/conda/2/airflow/bin/airflow", line 28, in <module>
[2017-09-15 13:24:40,021] {base_task_runner.py:95} INFO - Subtask:
args.func(args)
[2017-09-15 13:24:40,022] {base_task_runner.py:95} INFO - Subtask:
File "/opt/conda/2/airflow/lib/python2.7/site-packages/airflow/bin/cli.py",
line 422, in run
[2017-09-15 13:24:40,022] {base_task_runner.py:95} INFO - Subtask:
pool=args.pool,
[2017-09-15 13:24:40,022] {base_task_runner.py:95} INFO - Subtask:
File "/opt/conda/2/airflow/lib/python2.7/site-packages/airflow/utils/db.py",
line 53, in wrapper
[2017-09-15 13:24:40,022] {base_task_runner.py:95} INFO - Subtask:
result = func(*args, **kwargs)
[2017-09-15 13:24:40,023] {base_task_runner.py:95} INFO - Subtask:
File "/opt/conda/2/airflow/lib/python2.7/site-packages/airflow/models.py",
line 1374, in run
[2017-09-15 13:24:40,023] {base_task_runner.py:95} INFO - Subtask:
result = task_copy.execute(context=context)
[2017-09-15 13:24:40,023] {base_task_runner.py:95} INFO - Subtask:
File "/opt/conda/2/airflow/lib/python2.7/site-packages/airflow/contrib/operators/spark_submit_operator.py",
line 109, in execute
[2017-09-15 13:24:40,024] {base_task_runner.py:95} INFO - Subtask:
self._hook.submit(self._application)
[2017-09-15 13:24:40,024] {base_task_runner.py:95} INFO - Subtask:
File "/opt/conda/2/airflow/lib/python2.7/site-packages/airflow/contrib/hooks/spark_submit_hook.py",
line 193, in submit
[2017-09-15 13:24:40,024] {base_task_runner.py:95} INFO - Subtask:
spark_submit_cmd, self._sp.returncode, output, stderr
[2017-09-15 13:24:40,025] {base_task_runner.py:95} INFO - Subtask:
airflow.exceptions.AirflowException: Cannot execute: ['spark-submit',
'--master', u'yarn', '--name', 'airflow-spark', '--queue',
u'root.default',
'/opt/cloudera/parcels/CDH/lib/spark/examples/lib/spark-examples-1.6.0-cdh5.11.1-hadoop2.6.0-cdh5.11.1.jar'].
Error code is: 1. Output: , Stderr:
[2017-09-15 13:24:42,631] {jobs.py:2083} INFO - Task exited with return code

Could you help me?

Thanks!!

Iván Robla

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message