beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike Lambert (JIRA)" <j...@apache.org>
Subject [jira] [Created] (BEAM-1800) Can't save datastore objects
Date Fri, 24 Mar 2017 08:01:41 GMT
Mike Lambert created BEAM-1800:
----------------------------------

             Summary: Can't save datastore objects
                 Key: BEAM-1800
                 URL: https://issues.apache.org/jira/browse/BEAM-1800
             Project: Beam
          Issue Type: Bug
          Components: sdk-py
            Reporter: Mike Lambert
            Assignee: Ahmet Altay


I can't seem to save my database objects using {{WriteToDatastore}}, as it errors out on a
strange unicode issue when trying to write a batch. Stacktrace follows:

{noformat}
File "apache_beam/runners/common.py", line 195, in apache_beam.runners.common.DoFnRunner.receive
(apache_beam/runners/common.c:5142)
  self.process(windowed_value) 
File "apache_beam/runners/common.py", line 267, in apache_beam.runners.common.DoFnRunner.process
(apache_beam/runners/common.c:7201)
  self.reraise_augmented(exn) 
File "apache_beam/runners/common.py", line 279, in apache_beam.runners.common.DoFnRunner.reraise_augmented
(apache_beam/runners/common.c:7590)
  raise type(exn), args, sys.exc_info()[2] 
File "apache_beam/runners/common.py", line 263, in apache_beam.runners.common.DoFnRunner.process
(apache_beam/runners/common.c:7090)
  self._dofn_simple_invoker(element) 
File "apache_beam/runners/common.py", line 198, in apache_beam.runners.common.DoFnRunner._dofn_simple_invoker
(apache_beam/runners/common.c:5262)
  self._process_outputs(element, self.dofn_process(element.value)) 
File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/datastore/v1/datastoreio.py",
line 354, in process
  self._flush_batch() 
File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/datastore/v1/datastoreio.py",
line 363, in _flush_batch
  helper.write_mutations(self._datastore, self._project, self._mutations) 
File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/datastore/v1/helper.py", line
187, in write_mutations
  commit(commit_request) 
File "/usr/local/lib/python2.7/dist-packages/apache_beam/utils/retry.py", line 174, in wrapper
  return fun(*args, **kwargs) 
File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/datastore/v1/helper.py", line
185, in commit
  datastore.commit(req) 
File "/usr/local/lib/python2.7/dist-packages/googledatastore/connection.py", line 140, in
commit
  datastore_pb2.CommitResponse) 
File "/usr/local/lib/python2.7/dist-packages/googledatastore/connection.py", line 199, in
_call_method
  method='POST', body=payload, headers=headers) 
File "/usr/local/lib/python2.7/dist-packages/oauth2client/client.py", line 631, in new_request
  redirections, connection_type) 
File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1609, in request
(response, content)
  = self._request(conn, authority, uri, request_uri, method, body, headers, redirections,
cachekey) 
File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1351, in _request
(response, content)
  = self._conn_request(conn, request_uri, method, body, headers) 
File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1273, in _conn_request
  conn.request(method, request_uri, body, headers) 
File "/usr/lib/python2.7/httplib.py", line 1039, in request
  self._send_request(method, url, body, headers)
File "/usr/lib/python2.7/httplib.py", line 1073, in _send_request
   self.endheaders(body) 
File "/usr/lib/python2.7/httplib.py", line 1035, in endheaders
  self._send_output(message_body) 
File "/usr/lib/python2.7/httplib.py", line 877, in _send_output
  msg += message_body TypeError: must be str, not unicode
[while running 'write to datastore/Convert to Mutation']
{noformat}

My code is basically:
{noformat}
        | 'convert from entity' >> beam.Map(ConvertFromEntity)
        | 'write to datastore' >> WriteToDatastore(client.project)
{noformat}

Where {{ConvertFromEntity}} converts from a google.cloud.datastore object (which has a nice
API/interface) into the underlying protobuf (which is what the beam gcp/datastore library
expects):
{noformat}
from google.cloud.datastore import helpers
def ConvertFromEntity(entity):
    return helpers.entity_to_protobuf(entity)
{noformat}

I assume entity_to_protobuf works fine/normally, since it's also what is used by {google/cloud/datastore/batch.py}
to write a bunch of {entity_pb2.Entity} objects into the {datastore_pb2.CommitRequest.mutations[n].upsert}:

In batch.py: {put() -> _assign_entity_to_pb() -> entity_to_protobuf()}.

In datastoreio.py: {WriteToDatastore->DatastoreWriteFn.to_upsert_mutation->_Mutate.DatastoreWriteFn->helper.write_mutations}

Any idea what's going on here and why this doesn't work? Yes, I may have some unicode in my
objects...but it works in my appengine DB/NDB usage.

I will attempt to skip WriteToDatastore and just put unbatched entities using the datastore
library and see if that goes any better for me...



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message