spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Grigor <>
Subject spark-submit --py-files remote: "Only local additional python files are supported"
Date Tue, 20 Jan 2015 15:38:39 GMT
Hi all!

I found this problem when I tried running python application on Amazon's
EMR yarn cluster.

It is possible to run bundled example applications on EMR but I cannot
figure out how to run a little bit more complex python application which
depends on some other python scripts. I tried adding those files with
'--py-files' and it works fine in local mode but it fails and gives me
following message when run in EMR:
"Error: Only local python files are supported:

Simplest way to reproduce in local:
bin/spark-submit --py-files s3://

Actual commands to run it in EMR
#launch cluster
aws emr create-cluster --name SparkCluster --ami-version 3.3.1
--instance-type m1.medium --instance-count 2  --ec2-attributes
KeyName=key20141114 --log-uri s3://pathtomybucket/cluster_logs
--enable-debugging --use-default-roles  --bootstrap-action
#   "ClusterId": "j-2Y58DME79MPQJ"

#run application
aws emr add-steps --cluster-id "j-2Y58DME79MPQJ" --steps
#    "StepIds": [
#        "s-2UP4PP75YX0KU"
#    ]
And in stderr of that step I get "Error: Only local python files are
supported: s3://pathtomybucket/tasks/demo/".

What is the workaround or correct way to do it? Using hadoop's distcp to
copy dependency files from s3 to nodes as another pre-step?

Regards, Vladimir

View raw message