OAuth2 service account credentials for Google Drive upload script - python

Apologies in advance if anyone has asked this before, but I've got some basic questions about a server-side Python script I'm writing that does a nightly upload of CSV files to a folder in our Google Drive account. The folder owner has created a Google API project, enabled the Drive API for this and created a credentials object which I've downloaded as a JSON file. The folder has been shared with the service account email, so I assume that the script will now have access to this folder, once it is authorized.
The JSON file contains the following fields private_key_id, private_key, client_email, client_id, auth_uri, token_uri, auth_provider_x509_cert_url, client_x509_cert_url.
I'm guessing that my script will not need all of these - which are the essential or compulsory fields for OAuth2 authorization?
The example Python script given here
https://developers.google.com/drive/web/quickstart/python
seems to assume that the credentials are retrieved directly from a JSON file:
...
home_dir = os.path.expanduser('~')
credential_dir = os.path.join(home_dir, '.credentials')
if not os.path.exists(credential_dir):
os.makedirs(credential_dir)
credential_path = os.path.join(credential_dir,
'drive-python-quickstart.json')
store = oauth2client.file.Storage(credential_path)
credentials = store.get()
...
but in our setup we are storing them in our own DB and the script accesses them via a dict. How would the authorization be done if the credentials were in a dict?
Thanks in advance.

After browsing the source code, it seems that it was designed to accept only JSON files. They are using simplejson to encode and decode. Take a look at the source code:
This is where is gets the content from the file you provide:
62 def locked_get(self):
63 """Retrieve Credential from file.
64
65 Returns:
66 oauth2client.client.Credentials
67
68 Raises:
69 CredentialsFileSymbolicLinkError if the file is a symbolic link.
70 """
71 credentials = None
72 self._validate_file()
73 try:
74 f = open(self._filename, 'rb')
75 content = f.read()
76 f.close()
77 except IOError:
78 return credentials
79
80 try:
81 credentials = Credentials.new_from_json(content) #<----!!!
82 credentials.set_store(self)
83 except ValueError:
84 pass
85
86 return credentials
In new_from_json they attempt to decode the content using simplejson.
205 def new_from_json(cls, s):
206 """Utility class method to instantiate a Credentials subclass from a JSON
207 representation produced by to_json().
208
209 Args:
210 s: string, JSON from to_json().
211
212 Returns:
213 An instance of the subclass of Credentials that was serialized with
214 to_json().
215 """
216 data = simplejson.loads(s)
Long story short, it seems you'll have to construct a JSON file from your dict. See this. Basically, you'll need to json.dumps(your_dictionary) and create a file.

Actually it looks like a SignedJwtAssertionCredentials may be one answer. Another possibly is
from oauth2client import service_account
...
credentials = service_account._ServiceAccountCredentials(
service_account_id=client.clid,
service_account_email=client.email,
private_key_id=client.private_key_id,
private_key_pkcs8_text=client.private_key,
scopes=client.auth_scopes,
)
....

Related

error reading a zip file from a GCP bucket to a pandas dataframe: "Project not passed and could not be determined from the environment"

im trying to read a zip file from a GCP bucket to a pandas dataframe in a jupyter notebook. All im currently doing is creating a storage client:
import pandas as pd
from io import BytesIO
from google.cloud import storage
import os
storage_client = storage.Client()
which throws the following error:
OSError Traceback (most recent call last)
/tmp/ipykernel_1/3298573438.py in <module>
4 import os
5
----> 6 storage_client = storage.Client()
7 bucket = storage_client.get_bucket('dev-lbrm-bucket')
8 # blob = bucket.blob('my.csv')
/opt/conda/lib/python3.7/site-packages/google/cloud/storage/client.py in __init__(self, project, credentials, _http, client_info, client_options)
163 credentials=credentials,
164 client_options=client_options,
--> 165 _http=_http,
166 )
167
/opt/conda/lib/python3.7/site-packages/google/cloud/client/__init__.py in __init__(self, project, credentials, client_options, _http)
318
319 def __init__(self, project=None, credentials=None, client_options=None, _http=None):
--> 320 _ClientProjectMixin.__init__(self, project=project, credentials=credentials)
321 Client.__init__(
322 self, credentials=credentials, client_options=client_options, _http=_http
/opt/conda/lib/python3.7/site-packages/google/cloud/client/__init__.py in __init__(self, project, credentials)
270 if project is None:
271 raise EnvironmentError(
--> 272 "Project was not passed and could not be "
273 "determined from the environment."
274 )
OSError: Project was not passed and could not be determined from the environment.
im not sure how to set client! how do i make sure it has a project?

IBM text to speech Python DecodeError

I just tried the basic example of IBM's text to speech with Python:
!pip install ibm_watson
from ibm_watson import TextToSpeechV1
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
apikey = 'my API KEY'
url = 'my SERVICE URL'
authenticator = IAMAuthenticator(apikey)
tts = TextToSpeechV1(authenticator=authenticator)
tts.set_service_url(url)
with open('./speech.mp3', 'wb') as audio_file:
res = tts.synthesize('Hello World!', accept='audio/mp3', voice='en-US_AllisonV3Voice').get_result()
audio_file.write(res.content)
But I get an error message:
DecodeError: It is required that you pass in a value for the "algorithms" argument when calling decode().*
DecodeError Traceback (most recent call last)
<ipython-input-5-53b9d398591b> in <module>
1 with open('./speech.mp3', 'wb') as audio_file:
----> 2 res = tts.synthesize('Hello World!', accept='audio/mp3', voice='en-US_AllisonV3Voice').get_result()
3 audio_file.write(res.content)
c:\users\chris\appdata\local\programs\python\python39\lib\site-packages\ibm_watson\text_to_speech_v1.py in synthesize(self, text, accept, voice, customization_id, **kwargs)
275
276 url = '/v1/synthesize'
--> 277 request = self.prepare_request(method='POST',
278 url=url,
279 headers=headers,
c:\users\chris\appdata\local\programs\python\python39\lib\site-packages\ibm_cloud_sdk_core\base_service.py in prepare_request(self, method, url, headers, params, data, files, **kwargs)
295 request['data'] = data
296
--> 297 self.authenticator.authenticate(request)
298
299 # Next, we need to process the 'files' argument to try to fill in
c:\users\chris\appdata\local\programs\python\python39\lib\site-packages\ibm_cloud_sdk_core\authenticators\iam_authenticator.py in authenticate(self, req)
104 """
105 headers = req.get('headers')
--> 106 bearer_token = self.token_manager.get_token()
107 headers['Authorization'] = 'Bearer {0}'.format(bearer_token)
108
c:\users\chris\appdata\local\programs\python\python39\lib\site-packages\ibm_cloud_sdk_core\jwt_token_manager.py in get_token(self)
77 """
78 if self._is_token_expired():
---> 79 self.paced_request_token()
80
81 if self._token_needs_refresh():
c:\users\chris\appdata\local\programs\python\python39\lib\site-packages\ibm_cloud_sdk_core\jwt_token_manager.py in paced_request_token(self)
122 if not request_active:
123 token_response = self.request_token()
--> 124 self._save_token_info(token_response)
125 self.request_time = 0
126 return
c:\users\chris\appdata\local\programs\python\python39\lib\site-packages\ibm_cloud_sdk_core\jwt_token_manager.py in _save_token_info(self, token_response)
189
190 # The time of expiration is found by decoding the JWT access token
--> 191 decoded_response = jwt.decode(access_token, verify=False)
192 # exp is the time of expire and iat is the time of token retrieval
193 exp = decoded_response.get('exp')
c:\users\chris\appdata\local\programs\python\python39\lib\site-packages\jwt\api_jwt.py in decode(self, jwt, key, algorithms, options, **kwargs)
111 **kwargs,
112 ) -> Dict[str, Any]:
--> 113 decoded = self.decode_complete(jwt, key, algorithms, options, **kwargs)
114 return decoded["payload"]
115
c:\users\chris\appdata\local\programs\python\python39\lib\site-packages\jwt\api_jwt.py in decode_complete(self, jwt, key, algorithms, options, **kwargs)
77
78 if options["verify_signature"] and not algorithms:
---> 79 raise DecodeError(
80 'It is required that you pass in a value for the "algorithms" argument when calling decode().'
81 )
DecodeError: It is required that you pass in a value for the "algorithms" argument when calling decode().
The issue may be with the newest version of PyJWT package (2.0.0).
use
pip install PyJWT==1.7.1
to downgrade to the previous version and your project may work now. ( this worked for me )
Yea there is definitely an issue with pyjwt > 2.0, like earlier mentioned you need to uninstall the previous version and install a more stable one like 1.7.1.
it worked for me
I was having this problem with PyJWT==2.1.0,
My existing code,
import jwt
data = 'your_data'
key = 'your_secret_key'
encoded_data = jwt.encode(data, key) # worked
decoded_data = jwt.decode(encoded_data, key) # did not worked
I passed the algorithm for encoding and afterwards it worked fine,
Solution:
import jwt
data = 'your_data'
key = 'your_secret_key'
encoded_data = jwt.encode(data, key, algorithm="HS256")
decoded_data = jwt.decode(encoded_data, key, algorithms=["HS256"])
More information is available in JWT Docs
I faced the same error when i was working with speech-to-text using ibm_watson then i solved my issue by installing PyJWT of version 1.7.1 to do that try:
pip install PyJWT==1.7.1
OR
python -m pip install PyJWT
Good Luck
For those who want to use the latest version(as of writing this v2.1.0) of PyJWT.
If you don't want to continue with the older version(i.e PyJWT==1.7.1) and want to upgrade it for some reason, you need to use the verify_signature parameter and set it to False(It is True by default if you don't provide it). In older versions (before <2.0.0) the parameter was verify and you could directly use that. but in the newer version, you have to use it inside options parameter which is of type dict
jwt.decode(...., options={"verify_signature": False})
If you don't want to use verify_signature, you can simply pass the algorithms parameter without verify_signature.
jwt.decode(.... algorithms=['HS256'])
This is from the official Changelog.
Dropped deprecated verify param in jwt.decode(...)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Use jwt.decode(encoded, key, options={"verify_signature": False})
instead.
Require explicit algorithms in jwt.decode(...) by default
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Example: jwt.decode(encoded, key, algorithms=["HS256"]).
I had this problem, with version 2.3.0 of jwt.
I have solved it like this:
jwt.decode(token, MY_SECRET, algorithms=['HS256'])
you must use algorithms instead of algorithm

Troubles querying public BigQuery data using local workstation

I am trying to query public data from the BigQuery API (The Ethereum dataset) on my colab.
I have tried this
from google.colab import auth
auth.authenticate_user()
from google.cloud import bigquery
eth_project_id = 'crypto_ethereum_classic'
client = bigquery.Client(project=eth_project_id)
and receive this error message:
WARNING:google.auth._default:No project ID could be determined. Consider running `gcloud config set project` or setting the GOOGLE_CLOUD_PROJECT environment variable
I have also tried using the BigQueryHelper library and receive a similar error message
from bq_helper import BigQueryHelper
eth_dataset = BigQueryHelper(active_project="bigquery-public-data",dataset_name="crypto_ethereum_classic")
Error:
WARNING:google.auth._default:No project ID could be determined. Consider running `gcloud config set project` or setting the GOOGLE_CLOUD_PROJECT environment variable
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-21-53ac8b2901e1> in <module>()
1 from bq_helper import BigQueryHelper
----> 2 eth_dataset = BigQueryHelper(active_project="bigquery-public-data",dataset_name="crypto_ethereum_classic")
/content/src/bq-helper/bq_helper.py in __init__(self, active_project, dataset_name, max_wait_seconds)
23 self.dataset_name = dataset_name
24 self.max_wait_seconds = max_wait_seconds
---> 25 self.client = bigquery.Client()
26 self.__dataset_ref = self.client.dataset(self.dataset_name, project=self.project_name)
27 self.dataset = None
/usr/local/lib/python3.6/dist-packages/google/cloud/bigquery/client.py in __init__(self, project, credentials, _http, location, default_query_job_config)
140 ):
141 super(Client, self).__init__(
--> 142 project=project, credentials=credentials, _http=_http
143 )
144 self._connection = Connection(self)
/usr/local/lib/python3.6/dist-packages/google/cloud/client.py in __init__(self, project, credentials, _http)
221
222 def __init__(self, project=None, credentials=None, _http=None):
--> 223 _ClientProjectMixin.__init__(self, project=project)
224 Client.__init__(self, credentials=credentials, _http=_http)
/usr/local/lib/python3.6/dist-packages/google/cloud/client.py in __init__(self, project)
176 if project is None:
177 raise EnvironmentError(
--> 178 "Project was not passed and could not be "
179 "determined from the environment."
180 )
OSError: Project was not passed and could not be determined from the environment.
Just to reiterate I am using Colab- I know how to query the data on Kaggle, but need to do it on my colab
IN Colab - you need to first authenticate.
from google.colab import auth
auth.authenticate_user()
That will authenticate your user account to a project.

ValueError: Invalid endpoint: s3-api.xxxx.objectstorage.service.networklayer.com

I'm trying to access a csv file in my Watson Data Platform catalog. I used the code generation functionality from my DSX notebook: Insert to code > Insert StreamingBody object.
The generated code was:
import os
import types
import pandas as pd
import boto3
def __iter__(self): return 0
# #hidden_cell
# The following code accesses a file in your IBM Cloud Object Storage. It includes your credentials.
# You might want to remove those credentials before you share your notebook.
os.environ['AWS_ACCESS_KEY_ID'] = '******'
os.environ['AWS_SECRET_ACCESS_KEY'] = '******'
endpoint = 's3-api.us-geo.objectstorage.softlayer.net'
bucket = 'catalog-test'
cos_12345 = boto3.resource('s3', endpoint_url=endpoint)
body = cos_12345.Object(bucket,'my.csv').get()['Body']
# add missing __iter__ method so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType(__iter__, body)
df_data_2 = pd.read_csv(body)
df_data_2.head()
When I try to run this code, I get:
/usr/local/src/conda3_runtime.v27/4.1.1/lib/python3.5/site-packages/botocore/endpoint.py in create_endpoint(self, service_model, region_name, endpoint_url, verify, response_parser_factory, timeout, max_pool_connections)
270 if not is_valid_endpoint_url(endpoint_url):
271
--> 272 raise ValueError("Invalid endpoint: %s" % endpoint_url)
273 return Endpoint(
274 endpoint_url,
ValueError: Invalid endpoint: s3-api.us-geo.objectstorage.service.networklayer.com
What is strange is that if I generate the code for SparkSession setup instead, the same endpoint is used but the spark code runs ok.
How can I fix this issue?
I'm presuming the same issue will be encountered for the other Softlayer endpoints so I'm listing them here as well to ensure this question is also applicable for the other softlayer locations:
s3-api.us-geo.objectstorage.softlayer.net
s3-api.dal-us-geo.objectstorage.softlayer.net
s3-api.sjc-us-geo.objectstorage.softlayer.net
s3-api.wdc-us-geo.objectstorage.softlayer.net
s3.us-south.objectstorage.softlayer.net
s3.us-east.objectstorage.softlayer.net
s3.eu-geo.objectstorage.softlayer.net
s3.ams-eu-geo.objectstorage.softlayer.net
s3.fra-eu-geo.objectstorage.softlayer.net
s3.mil-eu-geo.objectstorage.softlayer.net
s3.eu-gb.objectstorage.softlayer.net
The solution was to prefix the endpoint with https://, changing from ...
this
endpoint = 's3-api.us-geo.objectstorage.softlayer.net'
to
endpoint = 'https://s3-api.us-geo.objectstorage.softlayer.net'
For IBM Cloud Object Storage, it should be import ibm_boto3 rather than import boto3. The original boto3 is for accessing AWS, which uses different authentication. Maybe those two have a different interpretation of the endpoint value.

Azure service principal and storage.blob with Python

I'm trying to authenticate with a service principal through python and then accessing azure.storage.blob
Used to do it with:
NAME = '****'
KEY = '****'
block_blob_service = BlockBlobService(account_name=NAME, account_key=KEY, protocol='https')
But I cant make it work with the service principal:
TENANT_ID = '****'
CLIENT = '****'
KEY_SERVICE = '****'
credentials = ServicePrincipalCredentials(
client_id = CLIENT,
secret = KEY_SERVICE,
tenant = TENANT_ID
)
I'm a little confused how I pair those 2 and whatever I try it just gives me a timeout when I'm trying to upload a blob.
I don't think Azure Storage Service supports service principal credentials. Actually it only accepts two kinds of credentials currently: shared keys and Shared Access Signature (SAS).
Azure Storage works specifically with account name + key (whether primary or secondary). There is no notion of service principal / AD-based access.
Your first example (setting up the blob service endpoint) with account name + key is the correct way to operate.
Note: As Zhaoxing mentioned, you can also use SAS. But from a programmatic standpoint, assuming you are the storage account owner, that doesn't really buy you much.
The only place Service Principals (and AD in general) comes into play is managing the resource itself (e.g. the storage account, from a deployment/management/deletion standpoint).
This is highly confusing.
Once a service principal (SP) is registered in azure portal with a new secret created...
...assigned Contributor role at the Resource Group level...
...any storage accounts and containers therein inherit this role.
In your application, you can create ServicePrincipalCredentials() and a ClientSecretCredential() from the registered SP...
service_credential = ServicePrincipalCredentials(
tenant = '<yourTenantID>',
client_id = '<yourClientID>',
secret = '<yourClientSecret>'
)
client_credential = ClientSecretCredential(
'<yourTenantID>',
'<yourClientID>',
'<yourClientSecret>'
)
From here, create a ResourceManagementClient()...
resource_client = ResourceManagementClient(service_credential, subscription_id)
...to list RG's, Resources and Storage Accounts.
for item in resource_client.resource_groups.list():
print(item.name)
for item in resource_client.resources.list():
print(item.name + " " + item.type)
for item in resource_client.resources.list_by_resource_group('azureStorage'):
print(item.name)
BUT...from my research, you cannot list blob containers nor blobs within a given container using ResourceManagementClient()!!. So we move to a BlobServiceClient()
blob_service_client = BlobServiceClient(account_url = url, credential=client_credential)
From here, you can list blob containers...
blob_list = blob_service_client.list_containers()
for blob in blob_list:
print(blob.name + " " + str(blob.last_modified))
BUT... From my research, you cannot list blobs within a container!!!
container_client = blob_service_client.get_container_client('testcontainer')
blob_list = container_client.list_blobs()
for blob in blob_list:
print("\t" + blob.name)
---------------------------------------------------------------------------
StorageErrorException Traceback (most recent call last)
~/anaconda3_501/lib/python3.6/site-packages/azure/storage/blob/_models.py in _get_next_cb(self, continuation_token)
599 cls=return_context_and_deserialized,
--> 600 use_location=self.location_mode)
601 except StorageErrorException as error:
~/anaconda3_501/lib/python3.6/site-packages/azure/storage/blob/_generated/operations/_container_operations.py in list_blob_flat_segment(self, prefix, marker, maxresults, include, timeout, request_id, cls, **kwargs)
1142 map_error(status_code=response.status_code, response=response, error_map=error_map)
-> 1143 raise models.StorageErrorException(response, self._deserialize)
1144
StorageErrorException: Operation returned an invalid status 'This request is not authorized to perform this operation using this permission.'
During handling of the above exception, another exception occurred:
HttpResponseError Traceback (most recent call last)
<ipython-input-104-7517e7a6a19f> in <module>
1 container_client = blob_service_client.get_container_client('testcontainer')
2 blob_list = container_client.list_blobs()
----> 3 for blob in blob_list:
4 print("\t" + blob.name)
~/anaconda3_501/lib/python3.6/site-packages/azure/core/paging.py in __next__(self)
120 if self._page_iterator is None:
121 self._page_iterator = itertools.chain.from_iterable(self.by_page())
--> 122 return next(self._page_iterator)
123
124 next = __next__ # Python 2 compatibility.
~/anaconda3_501/lib/python3.6/site-packages/azure/core/paging.py in __next__(self)
72 raise StopIteration("End of paging")
73
---> 74 self._response = self._get_next(self.continuation_token)
75 self._did_a_call_already = True
76
~/anaconda3_501/lib/python3.6/site-packages/azure/storage/blob/_models.py in _get_next_cb(self, continuation_token)
600 use_location=self.location_mode)
601 except StorageErrorException as error:
--> 602 process_storage_error(error)
603
604 def _extract_data_cb(self, get_next_return):
~/anaconda3_501/lib/python3.6/site-packages/azure/storage/blob/_shared/response_handlers.py in process_storage_error(storage_error)
145 error.error_code = error_code
146 error.additional_info = additional_data
--> 147 raise error
148
149
HttpResponseError: This request is not authorized to perform this operation using this permission.
RequestId:e056fe39-b01e-0007-425c-20a63f000000
Time:2020-05-02T08:32:02.2204809Z
ErrorCode:AuthorizationPermissionMismatch
Error:None
The only way I've found to list blobs within a container (and to do other things like copy blobs, etc), is to create the BlobServiceClient using a Connection String rather than TenantID, ClientID, ClientSecret.
There is some more info about using a token to access blob resources here and here and here, but I havent' been able to test yet.

Categories

Resources