CloudDataflow can not use "google.cloud.datastore" package? - python

I want to put datastore with transaction on CloudDataflow.
So, I wrote below.
def exe_dataflow():
....
from google.cloud import datastore
# call from pipeline
def ds_test(content):
datastore_client = datastore.Client()
kind = 'test_out'
name = 'change'
task_key = datastore_client.key(kind, name)
for _ in range(3):
with datastore_client.transaction():
current_value = client.get(task_key)
current_value['v'] += content['v']
datastore_client.put(task)
# pipeline
....
| 'datastore test' >> beam.Map(ds_test)
But, Error occured and log message was displayed as below.
(7b75e0ef2db229da): Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 582, in do_work
work_executor.execute()
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", line 167, in execute
op.start()
...(SNIP)...
File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 767, in _import_module
return getattr(__import__(module, None, None, [obj]), obj)
AttributeError: 'module' object has no attribute 'datastore'
CloudDataflow can not use "google.cloud.datastore" package?
add 2018/2/28.
I add --requirements_file to MyOption
options = MyOptions(flags = ["--requirements_file", "./requirements.txt"])
and I make requirements.txt
google-cloud-datastore==1.5.0
But, Another error occurred.
(366397598dcf7f02): Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 582, in do_work
work_executor.execute()
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", line 167, in execute
op.start()
...(SNIP)...
File "my_dataflow.py", line 66, in to_entity
File "/usr/local/lib/python2.7/dist-packages/google/cloud/datastore/__init__.py", line 60, in <module>
from google.cloud.datastore.batch import Batch
File "/usr/local/lib/python2.7/dist-packages/google/cloud/datastore/batch.py", line 24, in <module>
from google.cloud.datastore import helpers
File "/usr/local/lib/python2.7/dist-packages/google/cloud/datastore/helpers.py", line 29, in <module>
from google.cloud.datastore_v1.proto import datastore_pb2
File "/usr/local/lib/python2.7/dist-packages/google/cloud/datastore_v1/__init__.py", line 17, in <module>
from google.cloud.datastore_v1 import types
File "/usr/local/lib/python2.7/dist-packages/google/cloud/datastore_v1/types.py", line 21, in <module>
from google.cloud.datastore_v1.proto import datastore_pb2
File "/usr/local/lib/python2.7/dist-packages/google/cloud/datastore_v1/proto/datastore_pb2.py", line 17, in <module>
from google.cloud.datastore_v1.proto import entity_pb2 as google_dot_cloud_dot_datastore__v1_dot_proto_dot_entity__pb2
File "/usr/local/lib/python2.7/dist-packages/google/cloud/datastore_v1/proto/entity_pb2.py", line 28, in <module>
dependencies=[google_dot_api_dot_annotations__pb2.DESCRIPTOR,google_dot_protobuf_dot_struct__pb2.DESCRIPTOR,google_dot_protobuf_dot_timestamp__pb2.DESCRIPTOR,google_dot_type_dot_latlng__pb2.DESCRIPTOR,])
File "/usr/local/lib/python2.7/dist-packages/google/protobuf/descriptor.py", line 824, in __new__
return _message.default_pool.AddSerializedFile(serialized_pb)
TypeError: Couldn't build proto file into descriptor pool!
Invalid proto descriptor for file "google/cloud/datastore_v1/proto/entity.proto":
google.datastore.v1.PartitionId.project_id: "google.datastore.v1.PartitionId.project_id" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
...(SNIP)...
google.datastore.v1.Entity.properties: "google.datastore.v1.Entity.PropertiesEntry" seems to be defined in "google/cloud/proto/datastore/v1/entity.proto", which is not imported by "google/cloud/datastore_v1/proto/entity.proto". To use it here, please add the necessary import.

The recommended way to interact with Cloud Datastore from a Cloud Dataflow Pipeline is to use the Datastore I/O API, which is available through the Dataflow SDK and provides some methods to read and write data to a Cloud Datastore database.
You can find detailed documentation for the Datastore I/O package for Dataflow SDK 2.x for Python in this other link. The datastore.v1.datastoreio module is the specific module that you want to use. There is plenty of information in the links I am sharing, but in short, it is a connector to Datastore that uses PTransform to read / write / delete a PCollection from Datastore using the classes ReadFromDatastore() / WriteToDatastore() / DeleteFromDatastore() respectively.
You should try using it instead of implementing the calls yourself. I suspect this may be the reason for the error you are seeing, as a Datastore implementation already exists in the Dataflow SDK:
"google.datastore.v1.PartitionId.project_id" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
UPDATE:
It looks like those three classes collect several mutations and executes them in a single transaction. You can check that in the code describing the classes.
If the aim is to retrieve (get()) and then update (put()) a Datastore entity, you can probably work with the write_mutations() function, which is described in the documentation, and you can work with a full batch of mutations performing the operations you are interested in.

Related

Chatterbot installation working, but broken calls to variables

I installed chatterbot in my terminal earlier today using virtual studio code's terminal. I saw that both chatterbot and chatterbot_corpus worked in installation. Then, I made the following python document:
EDIT: Turns out I should define a chatbot variable first.
from chatterbot import ChatBot
from chatterbot.trainers import ChatterBotCorpusTrainer
from chatterbot.trainers import ListTrainer
conversation = [
"Hello",
"Hi there!",
"How are you doing?",
"I'm doing great.",
"That is good to hear",
"Thank you.",
"You're welcome."
]
bot = ChatBot('Maya')
trainer = ListTrainer(bot)
trainer.train(conversation)
This was my code
it says this however
bot = ChatBot('Maya')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\Subha\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\chatterbot\chatterbot.py", line 34, in __init__
self.storage = utils.initialize_class(storage_adapter, **kwargs)
File "C:\Users\Subha\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\chatterbot\utils.py", line 54, in initialize_class
return Class(*args, **kwargs)
File "C:\Users\Subha\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\chatterbot\storage\sql_storage.py", line 22, in __init__
from sqlalchemy import create_engine
File "C:\Users\Subha\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\sqlalchemy\__init__.py", line 8, in <module>
from . import util as _util # noqa
File "C:\Users\Subha\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\sqlalchemy\util\__init__.py", line 14, in <module>
from ._collections import coerce_generator_arg # noqa
File "C:\Users\Subha\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\sqlalchemy\util\_collections.py", line 16, in <module>
from .compat import binary_types
File "C:\Users\Subha\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\sqlalchemy\util\compat.py", line 264, in <module>
time_func = time.clock
AttributeError: module 'time' has no attribute 'clock'
Does anyone know how to fix this easily?
EDIT: I just updated python using pip install --upgrade ipython in terminal, but it didn't fix the issue
EDIT 2: Well now I tried updating a package using pip install sqlalchemy --upgrade
But now it gives
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\Subha\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\chatterbot\chatterbot.py", line 34, in __init__
self.storage = utils.initialize_class(storage_adapter, **kwargs)
File "C:\Users\Subha\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\chatterbot\utils.py", line 54, in initialize_class
return Class(*args, **kwargs)
File "C:\Users\Subha\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\chatterbot\storage\sql_storage.py", line 46, in __init__
if not self.engine.dialect.has_table(self.engine, 'Statement'):
File "C:\Users\Subha\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\sqlalchemy\dialects\sqlite\base.py", line 2009, in has_table
self._ensure_has_table_connection(connection)
File "C:\Users\Subha\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\sqlalchemy\engine\default.py", line 341, in _ensure_has_table_connection
raise exc.ArgumentError(
sqlalchemy.exc.ArgumentError: The argument passed to Dialect.has_table() should be a <class 'sqlalchemy.engine.base.Connection'>,
got <class 'sqlalchemy.engine.base.Engine'>. Additionally, the Dialect.has_table() method is for internal dialect use only; please use ``inspect(some_engine).has_table(<tablename>>)`` for public API use.
I am in the latest version though
PS C:\Users\Subha> Python --version
Python 3.9.6
EDIT 3: Now it comes up with
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\Subha\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\chatterbot\chatterbot.py", line 34, in __init__
self.storage = utils.initialize_class(storage_adapter, **kwargs)
File "C:\Users\Subha\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\chatterbot\utils.py", line 54, in initialize_class
return Class(*args, **kwargs)
File "C:\Users\Subha\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\chatterbot\storage\sql_storage.py", line 46, in __init__
# if not self.engine.dialect.has_table(self.engine, 'Statement'):
File "C:\Users\Subha\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\sqlalchemy\dialects\sqlite\base.py", line 2009, in has_table
self._ensure_has_table_connection(connection)
File "C:\Users\Subha\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\sqlalchemy\engine\default.py", line 341, in _ensure_has_table_connection
raise exc.ArgumentError(
sqlalchemy.exc.ArgumentError: The argument passed to Dialect.has_table() should be a <class 'sqlalchemy.engine.base.Connection'>, got <class 'sqlalchemy.engine.base.Engine'>. Additionally, the Dialect.has_table() method is for internal dialect use only; please use ``inspect(some_engine).has_table(<tablename>>)`` for public API use.
it seems the python upgrade worked with regard to the initial error on time.clock. This new error you are seeing is quite different to the previous one. In this case you need to go into chatterbot/storage/sql_storage.py and comment out if not self.engine.dialect.has_table(self.engine, 'Statement'): and leave only self.create_database(). This means that instead of checking whether a database table has been created which is triggering the error, it will just create the DB every time, which i expect to work fine.
# if not self.engine.dialect.has_table(self.engine, 'Statement'):
self.create_database()

"Private key is missing or invalid. It should be service " when making BigQuery call with credential

I'm following this data prediction using Cloud ML Engine with scikit-learn tutorial for GCP AI Platforms. I tried to make an API call to BigQuery with:
def query_to_dataframe(query):
import pandas as pd
import pkgutil
privatekey = pkgutil.get_data('trainer', 'privatekey.json')
print(privatekey[:200])
return pd.read_gbq(query,
project_id=PROJECT,
dialect='standard',
private_key=privatekey)
but got the following error:
Traceback (most recent call last):
[...]
TypeError: a bytes-like object is required, not 'str'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/root/.local/lib/python3.7/site-packages/trainer/task.py", line 66, in <module>
arguments['numTrees']
File "/root/.local/lib/python3.7/site-packages/trainer/model.py", line 119, in train_and_evaluate
train_df, eval_df = create_dataframes(frac)
File "/root/.local/lib/python3.7/site-packages/trainer/model.py", line 95, in create_dataframes
train_df = query_to_dataframe(train_query)
File "/root/.local/lib/python3.7/site-packages/trainer/model.py", line 82, in query_to_dataframe
private_key=privatekey)
File "/usr/local/lib/python3.7/dist-packages/pandas/io/gbq.py", line 149, in read_gbq
credentials=credentials, verbose=verbose, private_key=private_key)
File "/root/.local/lib/python3.7/site-packages/pandas_gbq/gbq.py", line 846, in read_gbq
dialect=dialect, auth_local_webserver=auth_local_webserver)
File "/root/.local/lib/python3.7/site-packages/pandas_gbq/gbq.py", line 184, in __init__
self.credentials = self.get_credentials()
File "/root/.local/lib/python3.7/site-packages/pandas_gbq/gbq.py", line 193, in get_credentials
return self.get_service_account_credentials()
File "/root/.local/lib/python3.7/site-packages/pandas_gbq/gbq.py", line 413, in get_service_account_credentials
"Private key is missing or invalid. It should be service "
pandas_gbq.gbq.InvalidPrivateKeyFormat: Private key is missing or invalid. It should be service account private key JSON (file path or string contents) with at least two keys: 'client_email' and 'private_key'. Can be obtained from: https://console.developers.google.com/permissions/serviceaccounts
When the package runs in local environment, the private key loads fine, but when submitted as a ml-engine training job, the error occurs. Note that the private key fails to load only when I use GCP RUNTIME_VERSION="1.15" and PYTHON_VERSION="3.7", but can load with no problem when I use PYTHON_VERSION="2.7".
In case it's useful, the structure of my package is:
/babyweight
- setup.py
- trainer
- __init__.py
- model.py
- privatekey.json
- task.py
I'm not sure if the problem is due to a bug in Python, or where I placed privatekey.json.
I was able to solve the problem after I changed read_gbq's attribute for reading BigQuery access key from private_keys to credentials, as recommended by #rmesteves, and as shown here. I then set the value as the absolute path to privatekey.json, as shown here. Now the job is able to run without error.
Note: I only encountered this problem with Python 3+, but not with Python 2.7. I'm not sure why. It could possibly be due to the implementation of read_gbq.

assert group is none

I am trying to run a preprocessing pipeline using nipype and I get the following error message:
Traceback (most recent call last):
File "preprocscript.py", line 211, in <module>
preproc.run('MultiProc', plugin_args={'n_procs': 8})
File "/sw/anaconda/3/lib/python3.6/site-packages/nipype/pipeline/engine/workflows.py", line 579, in run
runner = plugin_mod(plugin_args=plugin_args)
File "/sw/anaconda/3/lib/python3.6/site-packages/nipype/pipeline/plugins/multiproc.py", line 162, in __init__
initargs=(self._cwd,)
File "/sw/anaconda/3/lib/python3.6/multiprocessing/pool.py", line 175, in __init__
self._repopulate_pool()
File "/sw/anaconda/3/lib/python3.6/multiprocessing/pool.py", line 236, in _repopulate_pool
self._wrap_exception)
File "/sw/anaconda/3/lib/python3.6/multiprocessing/pool.py", line 250, in _repopulate_pool_static
wrap_exception)
File "/sw/anaconda/3/lib/python3.6/multiprocessing/process.py", line 73, in __init__
assert group is None, 'group argument must be None for now'
AssertionError: group argument must be None for now
and I am not sure what exactly might be wrong in my code that leads to this or if this is an issue with my software.
I am on a linux system and use python 3.6.
The module you are using has a ProcessPoolExecuter being used in it. In Python 3.7 they added some additional arguments to that class, namely initargs which is what is being called in nipype multiprocess module you are using. Unfortunately it is not backwards compatible to 3.6 and they did not write in another way to use that module.
Your options are to upgrade or not use the multiprocessing portion of nipype.

Revit API collector: Could not cast value:spaces to target_type:

I don't understand why my code, which works fine with other revit categories:
# -*- coding: utf-8 -*-
import rpw
from rpw import revit, db, ui, DB, UI
dd1 = rpw.db.Collector(of_category='Spaces')
produces this error:
IronPython Traceback:
Traceback (most recent call last):
File "C:\Users\USTL02870\Dropbox\WSP Project local folders\PyRevit custom extensions folder\BTS-NY-BETA.extension\BTS-NY-BETA.tab\Beta Tools.panel\test1.pushbutton\beta1_script.py", line 16, in
File "C:\Users\USTL02870\AppData\Roaming\pyRevit-Master\pyrevitlib\rpw\db\collector.py", line 445, in __init__
File "C:\Users\USTL02870\AppData\Roaming\pyRevit-Master\pyrevitlib\rpw\db\collector.py", line 464, in _collect
File "C:\Users\USTL02870\AppData\Roaming\pyRevit-Master\pyrevitlib\rpw\db\collector.py", line 78, in apply
File "C:\Users\USTL02870\AppData\Roaming\pyRevit-Master\pyrevitlib\rpw\db\collector.py", line 190, in process_value
File "C:\Users\USTL02870\AppData\Roaming\pyRevit-Master\pyrevitlib\rpw\utils\coerce.py", line 149, in to_category
File "C:\Users\USTL02870\AppData\Roaming\pyRevit-Master\pyrevitlib\rpw\db\builtins.py", line 134, in fuzzy_get
File "C:\Users\USTL02870\AppData\Roaming\pyRevit-Master\pyrevitlib\rpw\db\builtins.py", line 107, in get
rpw.exceptions.RpwCoerceError: Could not cast value:spaces to target_type:
What target type? If your target type is rooms, the explanation is provided by The Building Coder discussion of Collecting all Rooms on a Given Level: You can't collect Room elements directly, because they are an artificial construct of the Revit API and do not exist natively inside of Revit. Therefore, you need to collect SpatialElement objects instead, the Room parent class, and post-process the results, e.g., cast them to rooms. See also Accessing Room Data and Filtering for a Non-Native Class.

DeleteResource from Google Docs with Python

I am trying to delete a spreadsheet in Google Docs with this function:
def f_DeleteResource(xls_name):
"""Delete a resource"""
client=Auth()
for e1 in client.GetResources().entry:
e2 = client.GetResource(e1)
if xls_name==e2.title.text:
client.DeleteResource(e2.resource_id.text,True)
And I obtain different errors when I change the first parameter of client.DeleteResource(p1,p2):
client.DeleteResource(e2.resource_id.text,True):
Traceback (most recent call last):
File "C:\xmp\D6GDocsDeleteUpload.py", line 164, in <module> main()
File "C:\xmp\D6GDocsDeleteUpload.py", line 157, in main f_DeleteResource(sys.argv[2])
File "C:\xmp\D6GDocsDeleteUpload.py", line 144, in f_DeleteResource client.DeleteResource(e2.resource_id.text,True)
File "C:\Python27\lib\site-packages\gdata\docs\client.py", line 540, in delete_resource uri = entry.GetEditLink().href
AttributeError: 'str' object has no attribute 'GetEditLink'
client.DeleteResource(e2,True):
Traceback (most recent call last):
File "C:\xmp\D6GDocsDeleteUpload.py", line 164, in <module> main()
File "C:\xmp\D6GDocsDeleteUpload.py", line 157, in main f_DeleteResource(sys.argv[2])
File "C:\xmp\D6GDocsDeleteUpload.py", line 144, in f_DeleteResource client.DeleteResource(e2,True)
File "C:\Python27\lib\site-packages\gdata\docs\client.py", line 543, in delete_resource return super(DocsClient, self).delete(uri, **kwargs)
File "C:\Python27\lib\site-packages\gdata\client.py", line 748, in delete **kwargs)
File "C:\Python27\lib\site-packages\gdata\docs\client.py", line 66, in request return super(DocsClient, self).request(method=method, uri=uri, **kwargs)
File "C:\Python27\lib\site-packages\gdata\client.py", line 319, in request RequestError)
gdata.client.RequestError: Server responded with: 403, <errors xmlns='http://schemas.google.com/g/2005'><error><domain>GData</domain><code>matchHeaderRequired</code><location type='header'>If-Match|If-None-Match</location><internalReason>If-Match or If-None-Match header or entry etag attribute required</internalReason></error></errors>
Anyone can help me?
It seems to be a bug in Google API Python library. I checked gdata-2.0.16 and noticed that DeleteResource() function uses only URL of the resource (gdata/docs/client.py lines 540-543), but later checks for hasattr(entry_or_uri, 'etag') (gdata/client.py lines 737-741) and of course string value (uri) doesn't have etag attribute.
You may work around it using force keyword argument:
import gdata.docs.data
import gdata.docs.client
client = gdata.docs.client.DocsClient()
client.ClientLogin('xxxxxx#gmail.com', 'xxxxxx', 'XxX')
for doc in client.GetAllResources():
if doc.title.text == 'qpqpqpqpqpqp':
client.DeleteResource(doc, force=True)
break
If you want you may report an error to library maintainers (if it isn't already reported).
This issue has been fixed in this revision:
http://code.google.com/p/gdata-python-client/source/detail?r=f98fff494fb89fca12deede00c3567dd589e5f97
If you sync you client to the repository, you should be able to delete a resource without having to specify 'force=True'.

Categories

Resources