NoCredentialsError: Unable to locate credentials in Hugging Face Library - python

So i am not using Huggin face a lot for my ai but I've discover that you can train you're ai with it so it tried to use my machine to train it but i kept having that error:
PS C:\Users\gboss\OneDrive\Bureau\Ai training> & C:/Users/gboss/AppData/Local/Programs/Python/Python310/python.exe "c:/Users/gboss/OneDrive/Bureau/Ai training/AiTraining.py"
Traceback (most recent call last):
File "c:\Users\gboss\OneDrive\Bureau\Ai training\AiTraining.py", line 8, in <module>
role = iam_client.get_role(RoleName='{IAM_ROLE_WITH_SAGEMAKER_PERMISSIONS}')['Role']['Arn']
File "C:\Users\gboss\AppData\Local\Programs\Python\Python310\lib\site-packages\botocore\client.py", line 514, in _api_call
return self._make_api_call(operation_name, kwargs)
File "C:\Users\gboss\AppData\Local\Programs\Python\Python310\lib\site-packages\botocore\client.py", line 921, in _make_api_call
http, parsed_response = self._make_request(
File "C:\Users\gboss\AppData\Local\Programs\Python\Python310\lib\site-packages\botocore\client.py", line 944, in _make_request
return self._endpoint.make_request(operation_model, request_dict)
File "C:\Users\gboss\AppData\Local\Programs\Python\Python310\lib\site-packages\botocore\endpoint.py", line 119, in make_request
return self._send_request(request_dict, operation_model)
File "C:\Users\gboss\AppData\Local\Programs\Python\Python310\lib\site-packages\botocore\endpoint.py", line 198, in _send_request
request = self.create_request(request_dict, operation_model)
File "C:\Users\gboss\AppData\Local\Programs\Python\Python310\lib\site-packages\botocore\endpoint.py", line 134, in create_request
self._event_emitter.emit(
File "C:\Users\gboss\AppData\Local\Programs\Python\Python310\lib\site-packages\botocore\hooks.py", line 412, in emit
return self._emitter.emit(aliased_event_name, **kwargs)
File "C:\Users\gboss\AppData\Local\Programs\Python\Python310\lib\site-packages\botocore\hooks.py", line 256, in emit
return self._emit(event_name, kwargs)
File "C:\Users\gboss\AppData\Local\Programs\Python\Python310\lib\site-packages\botocore\hooks.py", line 239, in _emit
response = handler(**kwargs)
File "C:\Users\gboss\AppData\Local\Programs\Python\Python310\lib\site-packages\botocore\signers.py", line 105, in handler
return self.sign(operation_name, request)
File "C:\Users\gboss\AppData\Local\Programs\Python\Python310\lib\site-packages\botocore\signers.py", line 189, in sign
auth.add_auth(request)
File "C:\Users\gboss\AppData\Local\Programs\Python\Python310\lib\site-packages\botocore\auth.py", line 418, in add_auth
raise NoCredentialsError()
botocore.exceptions.NoCredentialsError: Unable to locate credentials
and let's say that i can't find why because i don't use huggin face a lot
the code:
import sagemaker
import boto3
from sagemaker.huggingface import HuggingFace
# gets role for executing training job
iam_client = boto3.client('iam')
role = iam_client.get_role(RoleName='{IAM_ROLE_WITH_SAGEMAKER_PERMISSIONS}')['Role']['Arn']
hyperparameters = {
'model_name_or_path':'ZipperXYZ/DialoGPT-medium-TheWorldMachineExpressive2',
'output_dir':'/opt/ml/model'
# add your remaining hyperparameters
# more info here https://github.com/huggingface/transformers/tree/v4.17.0/examples/pytorch/language-modeling
}
# git configuration to download our fine-tuning script
git_config = {'repo': 'https://github.com/huggingface/transformers.git','branch': 'v4.17.0'}
# creates Hugging Face estimator
huggingface_estimator = HuggingFace(
entry_point='run_clm.py',
source_dir='./examples/pytorch/language-modeling',
instance_type='ml.p3.2xlarge',
instance_count=1,
role=role,
git_config=git_config,
transformers_version='4.17.0',
pytorch_version='1.10.2',
py_version='py38',
hyperparameters = hyperparameters
)
# starting the train job
huggingface_estimator.fit()

You would first need to configure your credentials, its not an error in your code. Follow this thread
TLDR:
NoCredentialsError: Unable to locate credentials
Reply: Yes let me explain this, it is a bit complicated to get started. So the IAM ROLE, which you have created will be used inside the SageMaker Training Job/Inference. Meaning this ROLE is used to, e.g. download your data from s3 or is needed to start the underlying machine.
The error NoCredentialsError: Unable to locate credentials is shown when you don’t have credentials configured on your machine. You need to run aws configure so set up IAM Credentials (User) on your machine. You need these credentials to start your training job.

Related

Azure ML: Upload File to Step Run's Output - Authentication Error

During a PythonScriptStep in an Azure ML Pipeline, I'm saving a model as joblib pickle dump to a directory in a Blob Container in the Azure Blob Storage which I've created during the setup of the Azure ML Workspace. Afterwards I'm trying to upload this model file to the step run's output directory using
Run.upload_file (name, path_or_stream)
(for the function's documentation, see https://learn.microsoft.com/en-us/python/api/azureml-core/azureml.core.run(class)?view=azure-ml-py#upload-file-name--path-or-stream--datastore-name-none-)
Some time ago when I created the script using the azureml-sdk version 1.18.0, everything worked fine. Now, I've updated the script's functionalities and upgraded the azureml-sdk to version 1.33.0 during the process and the upload function now runs into the following error:
Traceback (most recent call last):
File "/opt/miniconda/lib/python3.7/site-packages/azureml/_file_utils/upload.py", line 64, in upload_blob_from_stream
validate_content=True)
File "/opt/miniconda/lib/python3.7/site-packages/azureml/_restclient/clientbase.py", line 93, in execute_func_with_reset
return ClientBase._execute_func_internal(backoff, retries, module_logger, func, reset_func, *args, **kwargs)
File "/opt/miniconda/lib/python3.7/site-packages/azureml/_restclient/clientbase.py", line 367, in _execute_func_internal
left_retry = cls._handle_retry(back_off, left_retry, total_retry, error, logger, func)
File "/opt/miniconda/lib/python3.7/site-packages/azureml/_restclient/clientbase.py", line 399, in _handle_retry
raise error
File "/opt/miniconda/lib/python3.7/site-packages/azureml/_restclient/clientbase.py", line 358, in _execute_func_internal
response = func(*args, **kwargs)
File "/opt/miniconda/lib/python3.7/site-packages/azureml/_vendor/azure_storage/blob/blockblobservice.py", line 614, in create_blob_from_stream
initialization_vector=iv
File "/opt/miniconda/lib/python3.7/site-packages/azureml/_vendor/azure_storage/blob/_upload_chunking.py", line 98, in _upload_blob_chunks
range_ids = [f.result() for f in futures]
File "/opt/miniconda/lib/python3.7/site-packages/azureml/_vendor/azure_storage/blob/_upload_chunking.py", line 98, in <listcomp>
range_ids = [f.result() for f in futures]
File "/opt/miniconda/lib/python3.7/concurrent/futures/_base.py", line 435, in result
return self.__get_result()
File "/opt/miniconda/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
File "/opt/miniconda/lib/python3.7/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/opt/miniconda/lib/python3.7/site-packages/azureml/_vendor/azure_storage/blob/_upload_chunking.py", line 210, in process_chunk
return self._upload_chunk_with_progress(chunk_offset, chunk_bytes)
File "/opt/miniconda/lib/python3.7/site-packages/azureml/_vendor/azure_storage/blob/_upload_chunking.py", line 224, in _upload_chunk_with_progress
range_id = self._upload_chunk(chunk_offset, chunk_data)
File "/opt/miniconda/lib/python3.7/site-packages/azureml/_vendor/azure_storage/blob/_upload_chunking.py", line 269, in _upload_chunk
timeout=self.timeout,
File "/opt/miniconda/lib/python3.7/site-packages/azureml/_vendor/azure_storage/blob/blockblobservice.py", line 1013, in _put_block
self._perform_request(request)
File "/opt/miniconda/lib/python3.7/site-packages/azureml/_vendor/azure_storage/common/storageclient.py", line 432, in _perform_request
raise ex
File "/opt/miniconda/lib/python3.7/site-packages/azureml/_vendor/azure_storage/common/storageclient.py", line 357, in _perform_request
raise ex
File "/opt/miniconda/lib/python3.7/site-packages/azureml/_vendor/azure_storage/common/storageclient.py", line 343, in _perform_request
HTTPError(response.status, response.message, response.headers, response.body))
File "/opt/miniconda/lib/python3.7/site-packages/azureml/_vendor/azure_storage/common/_error.py", line 115, in _http_error_handler
raise ex
azure.common.AzureHttpError: Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature. ErrorCode: AuthenticationFailed
<?xml version="1.0" encoding="utf-8"?><Error><Code>AuthenticationFailed</Code><Message>Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
RequestId:5d4e1b7e-c01e-0070-0d47-9bf8a0000000
Time:2021-08-27T13:30:02.2685991Z</Message><AuthenticationErrorDetail>Signature did not match. String to sign used was rcw
2021-08-27T13:19:56Z
2021-08-28T13:29:56Z
/blob/mystorage/azureml/ExperimentRun/dcid.98d11a7b-2aac-4bc0-bd64-bb4d72e0e0be/outputs/models/Model.pkl
2019-07-07
b
</AuthenticationErrorDetail></Error>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/mnt/batch/tasks/shared/LS_root/jobs/.../azureml-setup/context_manager_injector.py", line 243, in execute_with_context
runpy.run_path(sys.argv[0], globals(), run_name="__main__")
File "/opt/miniconda/lib/python3.7/runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "/opt/miniconda/lib/python3.7/runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "/opt/miniconda/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "401_AML_Pipeline_Time_Series_Model_Training_Azure_ML_CPU.py", line 318, in <module>
main()
File "401_AML_Pipeline_Time_Series_Model_Training_Azure_ML_CPU.py", line 286, in main
path_or_stream=model_path)
File "/opt/miniconda/lib/python3.7/site-packages/azureml/core/run.py", line 53, in wrapped
return func(self, *args, **kwargs)
File "/opt/miniconda/lib/python3.7/site-packages/azureml/core/run.py", line 1989, in upload_file
datastore_name=datastore_name)
File "/opt/miniconda/lib/python3.7/site-packages/azureml/_restclient/artifacts_client.py", line 114, in upload_artifact
return self.upload_artifact_from_path(artifact, *args, **kwargs)
File "/opt/miniconda/lib/python3.7/site-packages/azureml/_restclient/artifacts_client.py", line 107, in upload_artifact_from_path
return self.upload_artifact_from_stream(stream, *args, **kwargs)
File "/opt/miniconda/lib/python3.7/site-packages/azureml/_restclient/artifacts_client.py", line 99, in upload_artifact_from_stream
content_type=content_type, session=session)
File "/opt/miniconda/lib/python3.7/site-packages/azureml/_restclient/artifacts_client.py", line 88, in upload_stream_to_existing_artifact
timeout=TIMEOUT, backoff=BACKOFF_START, retries=RETRY_LIMIT)
File "/opt/miniconda/lib/python3.7/site-packages/azureml/_file_utils/upload.py", line 71, in upload_blob_from_stream
raise AzureMLException._with_error(azureml_error, inner_exception=e)
azureml._common.exceptions.AzureMLException: AzureMLException:
Message: Encountered authorization error while uploading to blob storage. Please check the storage account attached to your workspace. Make sure that the current user is authorized to access the storage account and that the request is not blocked by a firewall, virtual network, or other security setting.
StorageAccount: mystorage
ContainerName: azureml
StatusCode: 403
InnerException Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature. ErrorCode: AuthenticationFailed
<?xml version="1.0" encoding="utf-8"?><Error><Code>AuthenticationFailed</Code><Message>Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
RequestId:5d4e1b7e-c01e-0070-0d47-9bf8a0000000
Time:2021-08-27T13:30:02.2685991Z</Message><AuthenticationErrorDetail>Signature did not match. String to sign used was rcw
2021-08-27T13:19:56Z
2021-08-28T13:29:56Z
/blob/mystorage/azureml/ExperimentRun/dcid.98d11a7b-2aac-4bc0-bd64-bb4d72e0e0be/outputs/models/Model.pkl
2019-07-07
b
</AuthenticationErrorDetail></Error>
ErrorResponse
{
"error": {
"code": "UserError",
"message": "Encountered authorization error while uploading to blob storage. Please check the storage account attached to your workspace. Make sure that the current user is authorized to access the storage account and that the request is not blocked by a firewall, virtual network, or other security setting.\n\tStorageAccount: mystorage\n\tContainerName: azureml\n\tStatusCode: 403",
"inner_error": {
"code": "Auth",
"inner_error": {
"code": "Authorization"
}
}
}
}
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "401_AML_Pipeline_Time_Series_Model_Training_Azure_ML_CPU.py", line 318, in <module>
main()
File "401_AML_Pipeline_Time_Series_Model_Training_Azure_ML_CPU.py", line 286, in main
path_or_stream=model_path)
File "/opt/miniconda/lib/python3.7/site-packages/azureml/core/run.py", line 53, in wrapped
return func(self, *args, **kwargs)
File "/opt/miniconda/lib/python3.7/site-packages/azureml/core/run.py", line 1989, in upload_file
datastore_name=datastore_name)
File "/opt/miniconda/lib/python3.7/site-packages/azureml/_restclient/artifacts_client.py", line 114, in upload_artifact
return self.upload_artifact_from_path(artifact, *args, **kwargs)
File "/opt/miniconda/lib/python3.7/site-packages/azureml/_restclient/artifacts_client.py", line 107, in upload_artifact_from_path
return self.upload_artifact_from_stream(stream, *args, **kwargs)
File "/opt/miniconda/lib/python3.7/site-packages/azureml/_restclient/artifacts_client.py", line 99, in upload_artifact_from_stream
content_type=content_type, session=session)
File "/opt/miniconda/lib/python3.7/site-packages/azureml/_restclient/artifacts_client.py", line 88, in upload_stream_to_existing_artifact
timeout=TIMEOUT, backoff=BACKOFF_START, retries=RETRY_LIMIT)
File "/opt/miniconda/lib/python3.7/site-packages/azureml/_file_utils/upload.py", line 71, in upload_blob_from_stream
raise AzureMLException._with_error(azureml_error, inner_exception=e)
UserScriptException: UserScriptException:
Message: Encountered authorization error while uploading to blob storage. Please check the storage account attached to your workspace. Make sure that the current user is authorized to access the storage account and that the request is not blocked by a firewall, virtual network, or other security setting.
StorageAccount: mystorage
ContainerName: azureml
StatusCode: 403
InnerException AzureMLException:
Message: Encountered authorization error while uploading to blob storage. Please check the storage account attached to your workspace. Make sure that the current user is authorized to access the storage account and that the request is not blocked by a firewall, virtual network, or other security setting.
StorageAccount: mystorage
ContainerName: azureml
StatusCode: 403
InnerException Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature. ErrorCode: AuthenticationFailed
<?xml version="1.0" encoding="utf-8"?><Error><Code>AuthenticationFailed</Code><Message>Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
RequestId:5d4e1b7e-c01e-0070-0d47-9bf8a0000000
Time:2021-08-27T13:30:02.2685991Z</Message><AuthenticationErrorDetail>Signature did not match. String to sign used was rcw
2021-08-27T13:19:56Z
2021-08-28T13:29:56Z
/blob/mystorage/azureml/ExperimentRun/dcid.98d11a7b-2aac-4bc0-bd64-bb4d72e0e0be/outputs/models/Model.pkl
2019-07-07
b
</AuthenticationErrorDetail></Error>
ErrorResponse
{
"error": {
"code": "UserError",
"message": "Encountered authorization error while uploading to blob storage. Please check the storage account attached to your workspace. Make sure that the current user is authorized to access the storage account and that the request is not blocked by a firewall, virtual network, or other security setting.\n\tStorageAccount: verovisionstorage\n\tContainerName: azureml\n\tStatusCode: 403",
"inner_error": {
"code": "Auth",
"inner_error": {
"code": "Authorization"
}
}
}
}
ErrorResponse
{
"error": {
"code": "UserError",
"message": "Encountered authorization error while uploading to blob storage. Please check the storage account attached to your workspace. Make sure that the current user is authorized to access the storage account and that the request is not blocked by a firewall, virtual network, or other security setting.\n\tStorageAccount: mystorage\n\tContainerName: azureml\n\tStatusCode: 403"
}
}
As far as I can tell from the code of the azureml.core.Run class and the subsequent function calls, the Run object tries to upload the file to the step run's output directory using SAS-Token-Authentication (which fails). This documentation article is linked in the code (but I don't know if this relates to the issue): https://learn.microsoft.com/en-us/rest/api/storageservices/create-service-sas#service-sas-example
Did anybody encounter this error as well and knows what causes it or how it can be resolved?
Best,
Jonas
We’ve seen the before, it’s annoying. I think the answer is to go to the data stores page of the AML Studio UI and manually enter the storage account key again.

Error trying to connect Celery through SQS using STS

I'm trying to use Celery with SQS as broker. In order to use the SQS from my container I need to assume a role and for that I'm using STS. My code looks like this:
role_info = {
'RoleArn': 'arn:aws:iam::xxxxxxx:role/my-role-execution',
'RoleSessionName': 'roleExecution'
}
sts_client = boto3.client('sts', region_name='eu-central-1')
credentials = sts_client.assume_role(**role_info)
aws_access_key_id = credentials["Credentials"]['AccessKeyId']
aws_secret_access_key = credentials["Credentials"]['SecretAccessKey']
aws_session_token = credentials["Credentials"]["SessionToken"]
os.environ["AWS_ACCESS_KEY_ID"] = aws_access_key_id
os.environ["AWS_SECRET_ACCESS_KEY"] = aws_secret_access_key
os.environ["AWS_DEFAULT_REGION"] = 'eu-central-1'
os.environ["AWS_SESSION_TOKEN"] = aws_session_token
broker = "sqs://"
backend = 'redis://redis-service:6379/0'
celery = Celery('tasks', broker=broker, backend=backend)
celery.conf["task_default_queue"] = 'my-queue'
celery.conf["broker_transport_options"] = {
'region': 'eu-central-1',
'predefined_queues': {
'my-queue': {
'url': 'https://sqs.eu-central-1.amazonaws.com/xxxxxxx/my-queue'
}
}
}
In the same file I have the following task:
#celery.task(name='my-queue.my_task')
def my_task(content) -> int:
print("hello")
return 0
When I execute the following code I get an error:
[2020-09-24 10:38:03,602: CRITICAL/MainProcess] Unrecoverable error: ClientError('An error occurred (AccessDenied) when calling the ListQueues operation: Access to the resource https://eu-central-1.queue.amazonaws.com/ is denied.',)
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/kombu/transport/virtual/base.py", line 921, in create_channel
return self._avail_channels.pop()
IndexError: pop from empty list
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/celery/worker/worker.py", line 208, in start
self.blueprint.start(self)
File "/usr/local/lib/python3.6/site-packages/celery/bootsteps.py", line 119, in start
step.start(parent)
File "/usr/local/lib/python3.6/site-packages/celery/bootsteps.py", line 369, in start
return self.obj.start()
File "/usr/local/lib/python3.6/site-packages/celery/worker/consumer/consumer.py", line 318, in start
blueprint.start(self)
File "/usr/local/lib/python3.6/site-packages/celery/bootsteps.py", line 119, in start
step.start(parent)
File "/usr/local/lib/python3.6/site-packages/celery/worker/consumer/connection.py", line 23, in start
c.connection = c.connect()
File "/usr/local/lib/python3.6/site-packages/celery/worker/consumer/consumer.py", line 405, in connect
conn = self.connection_for_read(heartbeat=self.amqheartbeat)
File "/usr/local/lib/python3.6/site-packages/celery/worker/consumer/consumer.py", line 412, in connection_for_read
self.app.connection_for_read(heartbeat=heartbeat))
File "/usr/local/lib/python3.6/site-packages/celery/worker/consumer/consumer.py", line 439, in ensure_connected
callback=maybe_shutdown,
File "/usr/local/lib/python3.6/site-packages/kombu/connection.py", line 422, in ensure_connection
callback, timeout=timeout)
File "/usr/local/lib/python3.6/site-packages/kombu/utils/functional.py", line 341, in retry_over_time
return fun(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/kombu/connection.py", line 275, in connect
return self.connection
File "/usr/local/lib/python3.6/site-packages/kombu/connection.py", line 823, in connection
self._connection = self._establish_connection()
File "/usr/local/lib/python3.6/site-packages/kombu/connection.py", line 778, in _establish_connection
conn = self.transport.establish_connection()
File "/usr/local/lib/python3.6/site-packages/kombu/transport/virtual/base.py", line 941, in establish_connection
self._avail_channels.append(self.create_channel(self))
File "/usr/local/lib/python3.6/site-packages/kombu/transport/virtual/base.py", line 923, in create_channel
channel = self.Channel(connection)
File "/usr/local/lib/python3.6/site-packages/kombu/transport/SQS.py", line 100, in __init__
self._update_queue_cache(self.queue_name_prefix)
File "/usr/local/lib/python3.6/site-packages/kombu/transport/SQS.py", line 105, in _update_queue_cache
resp = self.sqs.list_queues(QueueNamePrefix=queue_name_prefix)
File "/usr/local/lib/python3.6/site-packages/botocore/client.py", line 337, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/usr/local/lib/python3.6/site-packages/botocore/client.py", line 656, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the ListQueues operation: Access to the resource https://eu-central-1.queue.amazonaws.com/ is denied.
If I use boto3 directly without Celery, I'm able to connect to the queue and retrieve data without this error. I don't know why Celery/Kombu try to list queues when I specify the predefined_queues configuration, tha is used to avoid these behavior (from docs):
If you want Celery to use a set of predefined queues in AWS, and to never attempt to list SQS queues, nor attempt to create or delete them, pass a map of queue names to URLs using the predefined_queue_urls setting
Source here
Anyone know what happens? How I should modify my code in order to make it work?. Seems that Celery is not using the credentials at all.
The versions I'm using:
celery==4.4.7
boto3==1.14.54
kombu==4.5.0
Thanks!
PS: I created and issue in Github to track if this can be a library error or not...
I solved the problem updating dependencies to the latest versions:
celery==5.0.0
boto3==1.14.54
kombu==5.0.2
pycurl==7.43.0.6
I was able to get celery==4.4.7 and kombu==4.6.11 working by setting the following configuration option:
celery.conf["task_create_missing_queues"] = False

client.get_bucket() fails, but only from cloud dataflow (compute engine)

Application has been working normally, now on a re-deploy, google storage is giving strange errors.
MissingSchema: Invalid URL 'None/storage/v1/b/my-bucket-name?projection=noAcl': No schema supplied. Perhaps you meant http://None/storage/v1/b/my-bucket-name?projection=noAcl?
File "/usr/local/lib/python2.7/dist-packages/lib/file_store.py", line 11, in __init__
self.bucket = self.client.get_bucket(parts[0])
File "/usr/local/lib/python2.7/dist-packages/google/cloud/storage/client.py", line 301, in get_bucket
bucket.reload(client=self)
File "/usr/local/lib/python2.7/dist-packages/google/cloud/storage/_helpers.py", line 130, in reload
_target_object=self,
File "/usr/local/lib/python2.7/dist-packages/google/cloud/_http.py", line 392, in api_request
target_object=_target_object,
File "/usr/local/lib/python2.7/dist-packages/google/cloud/_http.py", line 269, in _make_request
return self._do_request(method, url, headers, data, target_object)
File "/usr/local/lib/python2.7/dist-packages/google/cloud/_http.py", line 298, in _do_request
return self.http.request(url=url, method=method, headers=headers, data=data)
File "/usr/local/lib/python2.7/dist-packages/google/auth/transport/requests.py", line 208, in request
method, url, data=data, headers=request_headers, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 519, in request
prep = self.prepare_request(req)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 462, in prepare_request
hooks=merge_hooks(request.hooks, self.hooks),
File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 313, in prepare
self.prepare_url(url, params)
File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 387, in prepare_url
raise MissingSchema(error)
MissingSchema: Invalid URL 'None/storage/v1/b/my-bucket-name?projection=noAcl': No schema supplied. Perhaps you meant http://None/storage/v1/b/my-bucket-name?projection=noAcl? [while running 'generatedPtransform-51']
The code causing the error, I can run this locally using the same service account and it works, no error. I am using $env:GOOGLE_APPLICATION_CREDENTIALS to export my service account credentials at deploy time. All other services are working normally.
# My test is:
# fs = FileStore("gs://my-bucket-name/models/", "development", "general")
class FileStore():
# modelPath - must be a gs:// style google storage resource path containing everything but the file extension
def __init__(self, modelPath, env, modelName):
from google.cloud import storage
parts = modelPath[5:].split('/', 1)
self.client = storage.Client()
self.bucket = self.client.get_bucket(parts[0]) # <- error here
Why would google core client fail to build a URL? Based on 'None/storage/v1/b/my-bucket-name?projection=noAcl', the missing part of the URL should be something like "https://www.googleapis.com".
This error is apparently caused by a mismatch between google_cloud_storage and google_cloud_core. I had specified google_cloud_core >= 1.0.3 in my setup.py but when I looked on the docker image on the compute VM I found it had an earlier version.
After rebuilding my venv from setup.py I had to also run:
C:\Python27\python.exe -m pipenv install google-cloud-core>=1.0.3 --skip-lock
Then I was able to deploy and the application started working again.

using Google BigQuery through Python script

I want to make some very easy tasks on BigQuery via a python script. I found this package which does not work well. Indeed, when I try this code:
from bigquery import get_client
project_id = 'txxxxxxxxxxxxxxxxxx9'
# Service account email address as listed in the Google Developers Console.
service_account = '7xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.apps.googleusercontent.com'
# PKCS12 or PEM key provided by Google.
key = '/home/fxxxxxxxxxxxx/Dropbox/access_keys/google_storage/xxxxxxxxxxxxxxxxxxxxx.pem'
client = get_client(project_id, service_account=service_account, private_key_file=key, readonly=True)
# Submit an async query.
results = client.get_table_schema('newdataset', 'newtable2')
print('results')
I get this error:
/home/xxxxxx/anaconda3/envs/snakes/bin/python2.7 /home/xxxxxx/Dropbox/Prog/bigQuery_daily_import/src/main.py
Traceback (most recent call last):
File "/home/xxxxxx/Dropbox/Prog/bigQuery_daily_import/src/main.py", line 9, in <module>
client = get_client(project_id, service_account=service_account, private_key_file=key, readonly=True)
File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/bigquery/client.py", line 83, in get_client
readonly=readonly)
File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/bigquery/client.py", line 101, in _get_bq_service
service = build('bigquery', 'v2', http=http)
File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/oauth2client/util.py", line 142, in positional_wrapper
return wrapped(*args, **kwargs)
File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/googleapiclient/discovery.py", line 196, in build
cache)
File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/googleapiclient/discovery.py", line 242, in _retrieve_discovery_doc
resp, content = http.request(actual_url)
File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/oauth2client/client.py", line 565, in new_request
self._refresh(request_orig)
File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/oauth2client/client.py", line 835, in _refresh
self._do_refresh_request(http_request)
File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/oauth2client/client.py", line 862, in _do_refresh_request
body = self._generate_refresh_request_body()
File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/oauth2client/client.py", line 1541, in _generate_refresh_request_body
assertion = self._generate_assertion()
File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/oauth2client/client.py", line 1670, in _generate_assertion
private_key, self.private_key_password), payload)
File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/oauth2client/_pycrypto_crypt.py", line 121, in from_string
pkey = RSA.importKey(parsed_pem_key)
File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/Crypto/PublicKey/RSA.py", line 665, in importKey
return self._importKeyDER(der)
File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/Crypto/PublicKey/RSA.py", line 588, in _importKeyDER
raise ValueError("RSA key format is not supported")
ValueError: RSA key format is not supported
Process finished with exit code 1
My question: is there a tutorial in python which shows how to communicate easily with BigQuery: importing a dataset from google storage or S3, querying something, exporting the result to google storage.
A lot depends on your environment, and once you've figure that out everything should be super simple. I see the only problem on the error log you pasted is figuring out authentication.
Python pandas has had support for BigQuery for a while:
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.gbq.read_gbq.html
And I did a video with the creators of the module:
https://www.youtube.com/watch?v=gLeTDUMb7HY
Now, the simplest and fastest way these days to launch an Jupyter notebook with all of the Google Cloud goodies you mention is our new Google Datalab project:
https://cloud.google.com/datalab/
The only Datalab caveat is that it works on cloud servers, but if you want a fully managed Jupyter/IPython environment, totally secure, persistent, and ready to handle BigQuery, storage, etc... try it out.
Meanwhile, if you are writing a web application look at how other web applications solve this task.
For example, re:dash code to connect to BigQuery:
https://github.com/EverythingMe/redash/blob/master/redash/query_runner/big_query.py

Using a service account with boto to access Cloud Storage in GAE with gcs_oauth2_boto_plugin

I am wondering if anyone knows a way to use a service account to authenticate if I want to access data in Cloud Storage by:
1. Using boto library (and gcs_oauth2_boto_plugin)
2. Running in Google App Engine (GAE)
Following https://developers.google.com/storage/docs/gspythonlibrary I am using boto and gcs_oauth2_boto_plugin to authenticate and perform actions against Cloud Storage (upload/download files). I am using a service account to authenticate so that we don't have to authenticate with a Google account periodically (the thought being that if we run this in GCE, it'll run with the GCE service account -- haven't actually done that yet). Locally, I've set up my boto config file to use the service account and point to a p12 key file. This runs fine locally.
I would like to use the same code to interact with Cloud Storage from within Google App Engine (GAE). We are running a light weight ETL process that transforms and loads the data into Big Query. We want to run this code in App Engine task queue (the task is getting triggered by an Object Change Notification from Cloud Storage).
Since we're currently relying on the boto config (~/.boto), I adapted http://thurloat.com/2010/06/07/google-storage-and-app-engine to put the relevant config items for a service account.
When I finally run the code from App Engine (dev_appserver.py), I get the below stack trace:
Traceback (most recent call last):
File "/home/some-user/google-cloud-sdk/platform/google_appengine/lib/webapp2-2.5.1/webapp2.py", line 1536, in __call__
rv = self.handle_exception(request, response, e)
File "/home/some-user/google-cloud-sdk/platform/google_appengine/lib/webapp2-2.5.1/webapp2.py", line 1530, in __call__
rv = self.router.dispatch(request, response)
File "/home/some-user/google-cloud-sdk/platform/google_appengine/lib/webapp2-2.5.1/webapp2.py", line 1278, in default_dispatcher
return route.handler_adapter(request, response)
File "/home/some-user/google-cloud-sdk/platform/google_appengine/lib/webapp2-2.5.1/webapp2.py", line 1102, in __call__
return handler.dispatch()
File "/home/some-user/google-cloud-sdk/platform/google_appengine/lib/webapp2-2.5.1/webapp2.py", line 572, in dispatch
return self.handle_exception(e, self.app.debug)
File "/home/some-user/google-cloud-sdk/platform/google_appengine/lib/webapp2-2.5.1/webapp2.py", line 570, in dispatch
return method(*args, **kwargs)
File "/home/some-user/dev/myApp/main.py", line 247, in post
gs.download(fname, fp)
File "/home/some-user/dev/myApp/cloudstorage.py", line 107, in download
bytes = src_uri.get_key().get_contents_to_file(fp)
File "/home/some-user/dev/myApp/boto/storage_uri.py", line 336, in get_key
bucket = self.get_bucket(validate, headers)
File "/home/some-user/dev/myApp/boto/storage_uri.py", line 181, in get_bucket
conn = self.connect()
File "/home/some-user/dev/myApp/boto/storage_uri.py", line 140, in connect
**connection_args)
File "/home/some-user/dev/myApp/boto/gs/connection.py", line 47, in __init__
suppress_consec_slashes=suppress_consec_slashes)
File "/home/some-user/dev/myApp/boto/s3/connection.py", line 190, in __init__
validate_certs=validate_certs, profile_name=profile_name)
File "/home/some-user/dev/myApp/boto/connection.py", line 568, in __init__
host, config, self.provider, self._required_auth_capability())
File "/home/some-user/dev/myApp/boto/auth.py", line 929, in get_auth_handler
ready_handlers.append(handler(host, config, provider))
File "/home/some-user/dev/myApp/gcs_oauth2_boto_plugin/oauth2_plugin.py", line 56, in __init__
cred_type=oauth2_client.CredTypes.OAUTH2_SERVICE_ACCOUNT)
File "/home/some-user/dev/myApp/gcs_oauth2_boto_plugin/oauth2_helper.py", line 48, in OAuth2ClientFromBotoConfig
token_cache = oauth2_client.FileSystemTokenCache()
File "/home/some-user/dev/myApp/gcs_oauth2_boto_plugin/oauth2_client.py", line 175, in __init__
tempfile.gettempdir(), 'oauth2_client-tokencache.%(uid)s.%(key)s')
File "/home/some-user/google-cloud-sdk/platform/google_appengine/google/appengine/dist/tempfile.py", line 61, in PlaceHolder
raise NotImplementedError("Only tempfile.TemporaryFile is available for use")
NotImplementedError: Only tempfile.TemporaryFile is available for use
Looks like the problem is just with gcs_oauth2_boto_plugin trying to use a temporary directory when caching the oauth credentials (App Engine only supports tempfile.TemporaryFile).
Rather than try and patch gcs_oauth2_boto_plugin, is there potentially another solution? Can we use a service account with gcs_oauth2_boto_plugin/boto on App Engine to access Cloud Storage resources?
Or, am I using the wrong authentication method here?
This doesn't quite answer the question directly, but instead of using boto and gcs_oauth2_boto_plugin, I am using the "Google Cloud Storage
Python Client Library", GoogleAppEngineCloudStorageClient from pip.
https://developers.google.com/appengine/docs/python/googlecloudstorageclient/

Categories

Resources