beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zdenko Hrcek (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-3134) cannot write data to BigQuery with Dataflow
Date Wed, 01 Nov 2017 21:04:00 GMT

    [ https://issues.apache.org/jira/browse/BEAM-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16234748#comment-16234748
] 

Zdenko Hrcek commented on BEAM-3134:
------------------------------------

thanks for advice, disabling save_main_session worked. in my real life case I had to deal
with imports of other libraries/packages but this helped solve the issues https://cloud.google.com/dataflow/faq#how-do-i-handle-nameerrors

> cannot write data to BigQuery with Dataflow
> -------------------------------------------
>
>                 Key: BEAM-3134
>                 URL: https://issues.apache.org/jira/browse/BEAM-3134
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core
>            Reporter: Zdenko Hrcek
>            Assignee: Ahmet Altay
>            Priority: Normal
>
> (sample code with description is here [https://github.com/zdenulo/dataflow_bigquery_error])
> I was running for the first time Dataflow job (with version 2.1.1) to read data from
BigQuery, make some modifications, then write data to different table in BigQuery. When I
was running locally (on small subset) it was ok, but when I tried to run on Dataflow I get
following exception:
> apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: Dataflow pipeline
failed. State: FAILED, Error:
> (ade3180ffa878a6b): Traceback (most recent call last):
>   File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line
706, in run
>     self._load_main_session(self.local_staging_directory)
>   File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line
446, in _load_main_session
>     pickler.load_session(session_file)
>   File "/usr/local/lib/python2.7/dist-packages/apache_beam/internal/pickler.py", line
247, in load_session
>     return dill.load_session(file_path)
>   File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 363, in load_session
>     module = unpickler.load()
>   File "/usr/lib/python2.7/pickle.py", line 858, in load
>     dispatch[key](self)
>   File "/usr/lib/python2.7/pickle.py", line 1182, in load_append
>     list.append(value)
>   File "/usr/local/lib/python2.7/dist-packages/apitools/base/protorpclite/messages.py",
line 1142, in append
>     self.__field.validate_element(value)
> AttributeError: 'FieldList' object has no attribute '_FieldList__field'
> In my opinion, it looks like it has to do something with pickling schema definition for
output table.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message