beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike Lambert (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-1790) Failure to build --requirements.txt when it uses google protobuf
Date Fri, 24 Mar 2017 06:52:41 GMT

    [ https://issues.apache.org/jira/browse/BEAM-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15939925#comment-15939925
] 

Mike Lambert commented on BEAM-1790:
------------------------------------

I believe it is execution phase, this was when it was downloading all the packages to build
a "source" repo to upload to the server:
{noformat}
INFO:root:Starting GCS upload to gs://dancedeets-hrd.appspot.com/staging/beamapp-lambert-0324064754-553331.1490338074.553460/requirements.txt...
INFO:root:Completed GCS upload to gs://dancedeets-hrd.appspot.com/staging/beamapp-lambert-0324064754-553331.1490338074.553460/requirements.txt
INFO:root:Executing command: ['/usr/local/opt/python/bin/python2.7', '-m', 'pip', 'install',
'--download', '/var/folders/94/wngs1jw91_n2_jjjrfljtqrc0000gn/T/dataflow-requirements-cache',
'-r', 'requirements.txt', '--no-binary', ':all:']
DEPRECATION: pip install --download has been deprecated and will be removed in the future.
Pip now has a download command that should be used instead.
Collecting google-cloud-datastore (from -r requirements.txt (line 1))
  File was already downloaded /var/folders/94/wngs1jw91_n2_jjjrfljtqrc0000gn/T/dataflow-requirements-cache/google-cloud-datastore-0.23.0.tar.gz
...
ollecting proto-google-cloud-datastore-v1[grpc]<0.91dev,>=0.90.3 (from gapic-google-cloud-datastore-v1<0.16dev,>=0.15.0->google-cloud-datastore->-r
requirements.txt (line 1))
  File was already downloaded /var/folders/94/wngs1jw91_n2_jjjrfljtqrc0000gn/T/dataflow-requirements-cache/proto-google-cloud-datastore-v1-0.90.3.tar.gz
Collecting setuptools (from protobuf>=3.0.0->google-cloud-core<0.24dev,>=0.23.1->google-cloud-datastore->-r
requirements.txt (line 1))
  File was already downloaded /var/folders/94/wngs1jw91_n2_jjjrfljtqrc0000gn/T/dataflow-requirements-cache/setuptools-34.3.2.zip
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "setuptools/__init__.py", line 12, in <module>
        import setuptools.version
      File "setuptools/version.py", line 1, in <module>
        import pkg_resources
      File "pkg_resources/__init__.py", line 70, in <module>
        import packaging.version
    ImportError: No module named packaging.version
    
    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /private/var/folders/94/wngs1jw91_n2_jjjrfljtqrc0000gn/T/pip-build-tplMt1/setuptools/
Traceback (most recent call last):
  File "/usr/local/Cellar/python/2.7.12_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py",
line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/local/Cellar/python/2.7.12_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py",
line 72, in _run_code
    exec code in run_globals
  File "/.../dataflow/popular_people.py", line 255, in <module>
    run()
  File "/.../dataflow/popular_people.py", line 252, in run
    read_from_datastore('dancedeets-hrd', gcloud_options)
  File "/.../dataflow/popular_people.py", line 243, in read_from_datastore
    result = p.run()
  File "/.../dataflow/lib/apache_beam/pipeline.py", line 163, in run
    return self.runner.run(self)
  File "/.../dataflow/lib/apache_beam/runners/dataflow/dataflow_runner.py", line 175, in run
    self.dataflow_client.create_job(self.job), self)
  File "/.../dataflow/lib/apache_beam/utils/retry.py", line 174, in wrapper
    return fun(*args, **kwargs)
  File "/.../dataflow/lib/apache_beam/runners/dataflow/internal/apiclient.py", line 411, in
create_job
    self.create_job_description(job)
  File "/.../dataflow/lib/apache_beam/runners/dataflow/internal/apiclient.py", line 432, in
create_job_description
    job.options, file_copy=self._gcs_file_copy)
  File "/.../dataflow/lib/apache_beam/runners/dataflow/internal/dependency.py", line 290,
in stage_job_resources
    setup_options.requirements_file, requirements_cache_path)
  File "/.../dataflow/lib/apache_beam/runners/dataflow/internal/dependency.py", line 226,
in _populate_requirements_cache
    processes.check_call(cmd_args)
  File "/.../dataflow/lib/apache_beam/utils/processes.py", line 40, in check_call
    return subprocess.check_call(*args, **kwargs)
  File "/usr/local/Cellar/python/2.7.12_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py",
line 541, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/usr/local/opt/python/bin/python2.7', '-m', 'pip',
'install', '--download', '/var/folders/94/wngs1jw91_n2_jjjrfljtqrc0000gn/T/dataflow-requirements-cache',
'-r', 'requirements.txt', '--no-binary', ':all:']' returned non-zero exit status 1
{noformat}

And I'm running the latest pip.
{noformat}
$ pip --version
pip 9.0.1 from /usr/local/lib/python2.7/site-packages (python 2.7)
{noformat}
or more specifically, using the above command:
{noformat}
$ /usr/local/opt/python/bin/python2.7
Python 2.7.12 (default, Oct 10 2016, 02:02:45) 
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.38)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pip
>>> pip.__version__
'9.0.1'
{noformat}



> Failure to build --requirements.txt when it uses google protobuf
> ----------------------------------------------------------------
>
>                 Key: BEAM-1790
>                 URL: https://issues.apache.org/jira/browse/BEAM-1790
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py
>            Reporter: Mike Lambert
>            Assignee: Ahmet Altay
>              Labels: build, requirements
>
> I am running with {{--requirements_file requirements.txt}}, which contains:
> {noformat}
> google-cloud-datastore
> {noformat}
> Unfortunately, when attempting to run this on the cloud dataflow, I get the following
error trying to build the requirements:
> {noformat}
> Collecting setuptools (from protobuf>=3.0.0->google-cloud-core<0.24dev,>=0.23.1->google-cloud-datastore->-r
requirements.txt (line 3))
>   File was already downloaded /var/folders/94/wngs1jw91_n2_jjjrfljtqrc0000gn/T/dataflow-requirements-cache/setuptools-34.3.2.zip
>     Complete output from command python setup.py egg_info:
>     Traceback (most recent call last):
>       File "<string>", line 1, in <module>
>       File "setuptools/__init__.py", line 12, in <module>
>         import setuptools.version
>       File "setuptools/version.py", line 1, in <module>
>         import pkg_resources
>       File "pkg_resources/__init__.py", line 70, in <module>
>         import packaging.version
>     ImportError: No module named packaging.version
> {noformat}
> Looking online https://github.com/pypa/setuptools/issues/937 , it appears this is due
to "pip asking setuptools to build itself (from source dist), which is no longer supported."
> I'm not sure what the correct fix is here...since protobuf depends on setuptools, and
a lot of Google libraries depend on protobuf. Seems there is no way to whitelist protobuf/setuptools
as being "provided" by the beam runtime (ie https://github.com/pypa/pip/issues/3090).
> I'm going to try using my own setup.py next and see if I can skirt around the issue,
but this definitely seems like a bug with beam's requirements packager asking for too much?
> In the case of GCE, I compile my dependencies into a docker image that extends the base
GCE images (and lets me use binary installs), not sure something like that would work here?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message