I wanted to explore Google Big-Query with Python, and as per this tutorial I have set up a Google Cloud Account (free-tier), and have generated a key. The JSON file is stored in D:\keys\quixotic-folio-318907-64bfdccfb050.json.
The ENVIRONMENT VARIABLES in Windows-10 is also added under GOOGLE_APPLICATION_CREDENTIALS under System Variables:
However, whenever I try to initialize the client, it throws an error - File Not Found:
> from google.cloud import storage
> storage.Client(project = "quixotic-folio-318907")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "D:\Anaconda3\lib\site-packages\google\cloud\storage\client.py", line 123, in __init__
super(Client, self).__init__(
File "D:\Anaconda3\lib\site-packages\google\cloud\client.py", line 319, in __init__
Client.__init__(
File "D:\Anaconda3\lib\site-packages\google\cloud\client.py", line 178, in __init__
credentials, _ = google.auth.default(scopes=scopes)
File "D:\Anaconda3\lib\site-packages\google\auth\_default.py", line 454, in default
credentials, project_id = checker()
File "D:\Anaconda3\lib\site-packages\google\auth\_default.py", line 221, in _get_explicit_environ_credentials
credentials, project_id = load_credentials_from_file(
File "D:\Anaconda3\lib\site-packages\google\auth\_default.py", line 107, in load_credentials_from_file
raise exceptions.DefaultCredentialsError(
google.auth.exceptions.DefaultCredentialsError: File D:\keys\quixotic-folio-318907-64bfdccfb050.json; was not found.
I've tried os method, as suggested here, and it is running perfectly:
> import os
> from google.cloud import storage
> os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = "D:\keys\quixotic-folio-318907-64bfdccfb050.json"
> storage.Client(project = "quixotic-folio-318907")
<google.cloud.storage.client.Client object at 0x000002448A4E8AF0>
I have the following questions:
Is this an expected behavior, and why?
How do I ensure that I do not have to specifically set os.environ['GOOGLE_APPLICATION_CREDENTIALS'] as it is already defined under System Variables?
Remove the
;
at the end of your path in the environment variable.
Edit: User AKS was faster than me. #AKS: If you write your comment in an answer, it can be marked as solved.
Related
I am trying to push the data to GCP DataStore, The below code snippet works fine in Jupyter Notebook but it is throwing error in VS Code.
def load_data_json(self, kind_name, data_with_qp_ID, qp_id):
#Load the data in JSON format to upload into the DataStore
data_with_qp_ID_as_JSON = self.convert_DF_to_JSON(data_with_qp_ID, qp_id)
#Loop to iterate through the JSON format and upload into the GCS Storage
for data in data_with_qp_ID_as_JSON.keys():
with self.client.transaction():
incomplete_key = self.client.key(kind_name)
task = datastore.Entity(key=incomplete_key)
task.update(data_with_qp_ID_as_JSON[data])
self.client.put(task)
return 'Ingestion Successful - Data Store Repository'
I have defined the name of the bucket in "Kind Name", data_with_qp_id is a pandas dataframe, qp_id is the name of the column name in pandas. Please see the error message that I get below,
Traceback (most recent call last):
File "/Users/ajaykrishnan/Desktop/Projects/Sprint 3/Data Migration/DataMigration_v1.1/main2.py", line 139, in <module>
write_datastore_db.load_data_json(ds_kindname, bookmarks_data_with_qp_ID, qp_id)
File "/Users/ajaykrishnan/Desktop/Projects/Sprint 3/Data Migration/DataMigration_v1.1/pkg/repository/ds_repository.py", line 50, in load_data_json
self.client.put(task)
File "/opt/anaconda3/lib/python3.9/site-packages/google/cloud/datastore/client.py", line 597, in put
self.put_multi(entities=[entity], retry=retry, timeout=timeout)
File "/opt/anaconda3/lib/python3.9/site-packages/google/cloud/datastore/client.py", line 634, in put_multi
current.put(entity)
File "/opt/anaconda3/lib/python3.9/site-packages/google/cloud/datastore/transaction.py", line 315, in put
super(Transaction, self).put(entity)
File "/opt/anaconda3/lib/python3.9/site-packages/google/cloud/datastore/batch.py", line 227, in put
_assign_entity_to_pb(entity_pb, entity)
File "/opt/anaconda3/lib/python3.9/site-packages/google/cloud/datastore/batch.py", line 373, in _assign_entity_to_pb
bare_entity_pb = helpers.entity_to_protobuf(entity)
File "/opt/anaconda3/lib/python3.9/site-packages/google/cloud/datastore/helpers.py", line 208, in entity_to_protobuf
key_pb = entity.key.to_protobuf()
File "/opt/anaconda3/lib/python3.9/site-packages/google/cloud/datastore/key.py", line 298, in to_protobuf
key.path.append(element)
TypeError: Parameter to MergeFrom() must be instance of same class: expected google.datastore.v1.Key.PathElement got PathElement.
My environment is as follows,
Mac OS Monterey V12.06
Python - Conda 3.9.12
I was able to clear this error. It was an issue with Protobuf library that my environment was using. I downgraded the version of protobuf from 4.x.x to 3.20.1 and it worked.
I am starting out working on looking into using Python for some Cisco CUCM automation. I found the plugin ciscoaxl here, I installed it and programed the following script:
from ciscoaxl import axl
cucm = "10.10.20.1"
username = "axlusr"
password = "password1"
version = "12.5"
ucm = axl(username, password, cucm, version)
for phone in ucm.get_phones():
print(phone.name)
I am connected to Cisco's DevNET Sandbox and all the login and configuration for the AXL user appear to be correct, however I get the following output when I attempt to run the script:
Traceback (most recent call last):
File "%home%\AppData\Local\Programs\Python\Python39\axl-test.py", line 7, in <module>
for phone in ucm.get_phones():
File "%home%\AppData\Local\Programs\Python\Python39\lib\site-packages\ciscoaxl\axl.py", line 1877, in get_phones
for each in inner(skip):
File "%home%\AppData\Local\Programs\Python\Python39\lib\site-packages\ciscoaxl\axl.py", line 1869, in inner
res = self.client.listPhone(
File "%home%\AppData\Local\Programs\Python\Python39\lib\site-packages\zeep\proxy.py", line 40, in __call__
return self._proxy._binding.send(
File "%home%\AppData\Local\Programs\Python\Python39\lib\site-packages\zeep\wsdl\bindings\soap.py", line 130, in send
return self.process_reply(client, operation_obj, response)
File "%home%\AppData\Local\Programs\Python\Python39\lib\site-packages\zeep\wsdl\bindings\soap.py", line 195, in process_reply
return self.process_error(doc, operation)
File "%home%\AppData\Local\Programs\Python\Python39\lib\site-packages\zeep\wsdl\bindings\soap.py", line 283, in process_error
raise Fault(
zeep.exceptions.Fault: Unknown fault occured
I have run it on Windows10 in an IDLE enviornment, from the Linux-Subsystem (Ubuntu 20.04) via python and ipython3.
After some additional research this is a know issue with CUCM 12.5. It should be fixed in CU1 see here: https://github.com/mvantellingen/python-zeep/issues/989
I still receive this error on 12.5.1.12900-115, but I receive it when I don't have the appropriate permissions. Fixing user permissions for AXL access resolves it.
I'm following this data prediction using Cloud ML Engine with scikit-learn tutorial for GCP AI Platforms. I tried to make an API call to BigQuery with:
def query_to_dataframe(query):
import pandas as pd
import pkgutil
privatekey = pkgutil.get_data('trainer', 'privatekey.json')
print(privatekey[:200])
return pd.read_gbq(query,
project_id=PROJECT,
dialect='standard',
private_key=privatekey)
but got the following error:
Traceback (most recent call last):
[...]
TypeError: a bytes-like object is required, not 'str'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/root/.local/lib/python3.7/site-packages/trainer/task.py", line 66, in <module>
arguments['numTrees']
File "/root/.local/lib/python3.7/site-packages/trainer/model.py", line 119, in train_and_evaluate
train_df, eval_df = create_dataframes(frac)
File "/root/.local/lib/python3.7/site-packages/trainer/model.py", line 95, in create_dataframes
train_df = query_to_dataframe(train_query)
File "/root/.local/lib/python3.7/site-packages/trainer/model.py", line 82, in query_to_dataframe
private_key=privatekey)
File "/usr/local/lib/python3.7/dist-packages/pandas/io/gbq.py", line 149, in read_gbq
credentials=credentials, verbose=verbose, private_key=private_key)
File "/root/.local/lib/python3.7/site-packages/pandas_gbq/gbq.py", line 846, in read_gbq
dialect=dialect, auth_local_webserver=auth_local_webserver)
File "/root/.local/lib/python3.7/site-packages/pandas_gbq/gbq.py", line 184, in __init__
self.credentials = self.get_credentials()
File "/root/.local/lib/python3.7/site-packages/pandas_gbq/gbq.py", line 193, in get_credentials
return self.get_service_account_credentials()
File "/root/.local/lib/python3.7/site-packages/pandas_gbq/gbq.py", line 413, in get_service_account_credentials
"Private key is missing or invalid. It should be service "
pandas_gbq.gbq.InvalidPrivateKeyFormat: Private key is missing or invalid. It should be service account private key JSON (file path or string contents) with at least two keys: 'client_email' and 'private_key'. Can be obtained from: https://console.developers.google.com/permissions/serviceaccounts
When the package runs in local environment, the private key loads fine, but when submitted as a ml-engine training job, the error occurs. Note that the private key fails to load only when I use GCP RUNTIME_VERSION="1.15" and PYTHON_VERSION="3.7", but can load with no problem when I use PYTHON_VERSION="2.7".
In case it's useful, the structure of my package is:
/babyweight
- setup.py
- trainer
- __init__.py
- model.py
- privatekey.json
- task.py
I'm not sure if the problem is due to a bug in Python, or where I placed privatekey.json.
I was able to solve the problem after I changed read_gbq's attribute for reading BigQuery access key from private_keys to credentials, as recommended by #rmesteves, and as shown here. I then set the value as the absolute path to privatekey.json, as shown here. Now the job is able to run without error.
Note: I only encountered this problem with Python 3+, but not with Python 2.7. I'm not sure why. It could possibly be due to the implementation of read_gbq.
I want to put datastore with transaction on CloudDataflow.
So, I wrote below.
def exe_dataflow():
....
from google.cloud import datastore
# call from pipeline
def ds_test(content):
datastore_client = datastore.Client()
kind = 'test_out'
name = 'change'
task_key = datastore_client.key(kind, name)
for _ in range(3):
with datastore_client.transaction():
current_value = client.get(task_key)
current_value['v'] += content['v']
datastore_client.put(task)
# pipeline
....
| 'datastore test' >> beam.Map(ds_test)
But, Error occured and log message was displayed as below.
(7b75e0ef2db229da): Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 582, in do_work
work_executor.execute()
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", line 167, in execute
op.start()
...(SNIP)...
File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 767, in _import_module
return getattr(__import__(module, None, None, [obj]), obj)
AttributeError: 'module' object has no attribute 'datastore'
CloudDataflow can not use "google.cloud.datastore" package?
add 2018/2/28.
I add --requirements_file to MyOption
options = MyOptions(flags = ["--requirements_file", "./requirements.txt"])
and I make requirements.txt
google-cloud-datastore==1.5.0
But, Another error occurred.
(366397598dcf7f02): Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 582, in do_work
work_executor.execute()
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", line 167, in execute
op.start()
...(SNIP)...
File "my_dataflow.py", line 66, in to_entity
File "/usr/local/lib/python2.7/dist-packages/google/cloud/datastore/__init__.py", line 60, in <module>
from google.cloud.datastore.batch import Batch
File "/usr/local/lib/python2.7/dist-packages/google/cloud/datastore/batch.py", line 24, in <module>
from google.cloud.datastore import helpers
File "/usr/local/lib/python2.7/dist-packages/google/cloud/datastore/helpers.py", line 29, in <module>
from google.cloud.datastore_v1.proto import datastore_pb2
File "/usr/local/lib/python2.7/dist-packages/google/cloud/datastore_v1/__init__.py", line 17, in <module>
from google.cloud.datastore_v1 import types
File "/usr/local/lib/python2.7/dist-packages/google/cloud/datastore_v1/types.py", line 21, in <module>
from google.cloud.datastore_v1.proto import datastore_pb2
File "/usr/local/lib/python2.7/dist-packages/google/cloud/datastore_v1/proto/datastore_pb2.py", line 17, in <module>
from google.cloud.datastore_v1.proto import entity_pb2 as google_dot_cloud_dot_datastore__v1_dot_proto_dot_entity__pb2
File "/usr/local/lib/python2.7/dist-packages/google/cloud/datastore_v1/proto/entity_pb2.py", line 28, in <module>
dependencies=[google_dot_api_dot_annotations__pb2.DESCRIPTOR,google_dot_protobuf_dot_struct__pb2.DESCRIPTOR,google_dot_protobuf_dot_timestamp__pb2.DESCRIPTOR,google_dot_type_dot_latlng__pb2.DESCRIPTOR,])
File "/usr/local/lib/python2.7/dist-packages/google/protobuf/descriptor.py", line 824, in __new__
return _message.default_pool.AddSerializedFile(serialized_pb)
TypeError: Couldn't build proto file into descriptor pool!
Invalid proto descriptor for file "google/cloud/datastore_v1/proto/entity.proto":
google.datastore.v1.PartitionId.project_id: "google.datastore.v1.PartitionId.project_id" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
...(SNIP)...
google.datastore.v1.Entity.properties: "google.datastore.v1.Entity.PropertiesEntry" seems to be defined in "google/cloud/proto/datastore/v1/entity.proto", which is not imported by "google/cloud/datastore_v1/proto/entity.proto". To use it here, please add the necessary import.
The recommended way to interact with Cloud Datastore from a Cloud Dataflow Pipeline is to use the Datastore I/O API, which is available through the Dataflow SDK and provides some methods to read and write data to a Cloud Datastore database.
You can find detailed documentation for the Datastore I/O package for Dataflow SDK 2.x for Python in this other link. The datastore.v1.datastoreio module is the specific module that you want to use. There is plenty of information in the links I am sharing, but in short, it is a connector to Datastore that uses PTransform to read / write / delete a PCollection from Datastore using the classes ReadFromDatastore() / WriteToDatastore() / DeleteFromDatastore() respectively.
You should try using it instead of implementing the calls yourself. I suspect this may be the reason for the error you are seeing, as a Datastore implementation already exists in the Dataflow SDK:
"google.datastore.v1.PartitionId.project_id" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
UPDATE:
It looks like those three classes collect several mutations and executes them in a single transaction. You can check that in the code describing the classes.
If the aim is to retrieve (get()) and then update (put()) a Datastore entity, you can probably work with the write_mutations() function, which is described in the documentation, and you can work with a full batch of mutations performing the operations you are interested in.
I'm trying to use the Azure Storage Emulator in my python app but getting the following error:
from azure.storage.table import TableService, Entity
#Added after error
global DEV_ACCOUNT_NAME
DEV_ACCOUNT_NAME = "devstoreaccount1"
table_service = TableService(account_name='devstoreaccount1', account_key='Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==',)
table_service.use_local_storage = True
table_service.is_emulated = True
table_service.create_table("test")
Error:
Traceback (most recent call last):
File "C:\Program Files (x86)\Microsoft Visual Studio 12.0\Common7\IDE\Extensio
ns\Microsoft\Python Tools for Visual Studio\2.0\visualstudio_py_util.py", line 7
6, in exec_file
exec(code_obj, global_variables)
File "c:\ScratchApp\ScratchApp.py", line 17, in <module>
table_service.create_table("test")
File "C:\Python27\lib\site-packages\azure\storage\table\tableservice.py", line
274, in create_table
request, self.use_local_storage)
File "C:\Python27\lib\site-packages\azure\storage\_common_serialization.py", l
ine 212, in _update_request_uri_query_local_storage
return '/' + DEV_ACCOUNT_NAME + uri, query
NameError: global name 'DEV_ACCOUNT_NAME' is not defined
Press any key to continue . . .
Any ideas?
Declaring a variable as global in one module won't magically make it visible to other modules.
To address your specific problem, though, it looks like this may be a known issue that was fixed some time ago. Do you have an up-to-date version of the azure storage module?