Trying to simply connect to the google-cloud-storage using these instructions;
https://googleapis.github.io/google-cloud-python/latest/storage/index.html
However, I keep getting the problem with the storage module, no client attribute.
from google.cloud import storage
# Instantiates a client
storage_client = storage.Client(credentials=creds, project='name')
# The name for the new bucket
bucket_name = 'my-new-bucket'
# Creates the new bucket
bucket = storage_client.create_bucket(bucket_name)
print('Bucket {} created.'.format(bucket.name))
This is a problem I've seen several times, and happens as well in other google.cloud modules. Most of the time it is related to a broken installation
Try to uninstall and then installgoogle.cloud packages. If no luck, try to use it on a newly created virtual environment (this will work for sure)
Related git issue with same solution
Related
I want to access string data from a blob with a Python function app.
The function app works fine in local but doesn't return anything when published (Even though Configuration section in the portal is updated with all environment variables needed in local.settings.json)
The part data.readall() is what I am returning which doesn't return anything once published:
from azure.storage.blob import BlobServiceClient
blob_service_client = BlobServiceClient.from_connection_string(connect_str)
blob_client = blob_service_client.get_blob_client(container=container_name, blob=blob_name)
data = blob_client.download_blob()
data.readall()
Any idea why I can't access the content of the blob once the app published?
Or any other idea/method that would help me debug this would be greatly appreciated.
Thanks
So it would appear that there are different behaviors according to the version of azure-storage-blob pip installed.
data.readall() seemed to work only for one version of the package but once upgraded to azure-storage-blob==12.14.1, the prior method doesn't work, but this one does:
data.content_as_text() (in local development + published function)
So, I think I'm running up against an issue with out of date documentation. According to the documentation here I should be able to use list_schemas() to get a list of schemas defined in the Hive Data Catalog: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/glue.html#Glue.Client.list_schemas
However, this method doesn't seem to exist:
import boto3
glue = boto3.client('glue')
glue.list_schemas()
AttributeError: 'Glue' object has no attribute 'list_schemas'
Other methods (e.g. list_crawlers()) still appear to be present and work just fine. Has this method been moved? Do I need to install some additional boto3 libraries for this to work?
Based on the comments.
The issue was caused by using old boto3. Upgrading to the newer version solved the issue.
You should make a session first, and use the client method of the session, then it should work:
import boto3
session = boto3.session.Session()
glue_client = session.client('glue')
schemas_name = glue_client.list_schemas()
How do I set up direct private bucket access for Tensorflow?
After running
from tensorflow.python.lib.io import file_io
and running print file_io.stat('s3://my/private/bucket/file.json') I end up with an error -
NotFoundError: Object s3://my/private/bucket/file.json does not exist
However, the same line on a public object works without an error:
print file_io.stat('s3://ryft-public-sample-data/wikipedia-20150518.bin')
There appears to be an article on support here: https://github.com/tensorflow/examples/blob/master/community/en/docs/deploy/s3.md
However, I end up with the same error after exporting the variables shown.
I have awscli set up with all credentials, and boto3 can view and download the file in question. I am wondering how I can get Tensorflow to have S3 access directly when the bucket is private.
I had the same problem when trying to access files in private S3 bucket from Sagemaker notebook. The mistake I made was to try using credentials I obtained from boto3, which seem not to be valid outside.
The solution was not to specify credentials (in such case it uses the role attached to the machine), but instead just specify the region name (for some reason it didn't read it from ~/.aws/config file) as follows:
import boto3
import os
session = boto3.Session()
os.environ['AWS_REGION']=session.region_name
NOTE: when debugging this error useful was to look at CloudWatch logs, as the logs of S3 client were printed only there and not in the Jupyter notebook.
In there I have first have seen, that:
when I did specify credentials from boto3 the error was: The AWS Access Key Id you provided does not exist in our records.
When accessing without AWS_REGION env variable set I had The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint. which apparently is common when you don't specify bucket (see 301 Moved Permanently after S3 uploading)
I'm trying to upload large files to Amazon S3 without using credentials. I'm creating a plugin for Octoprint with this, and I can't put any sort of credentials into the code due to it being public. Currently my code for uploads looks like this:
import boto3
from botocore import UNSIGNED
from botocore.client import Config
s3 = boto3.client('s3', config=Config(signature_version=UNSIGNED))
# Create an S3 client
filename = 'file.txt'
bucket_name = 'BUCKET_HERE'
s3.upload_file(filename, bucket_name, filename)
However, it gives me the following error:
S3UploadFailedError: Failed to upload largefiletest.mp4 to BUCKETNAMEHERE/largefiletest.mp4: An error occurred (AccessDenied) when calling the CreateMultipartUpload operation: Anonymous users cannot initiate multipart uploads. Please authenticate.
Is there any way to work around this, or are there any suggestions for alternative libraries? Anything is appreciated.
Do you mean that the repository is public but the runtime environment is private? If so, the standard practice is to set environment variables like this:
# first pip install environ
import environ
SOME_KEY = env('SOME_KEY', default='')
This way, you can easily update your credentials without changing your code or compromising security.
Edit:
Then on the machine this code will be run, you can set the environment variables as such:
macOS: https://natelandau.com/my-mac-osx-bash_profile/
Linux: https://www.cyberciti.biz/faq/set-environment-variable-linux/
Windows: http://www.dowdandassociates.com/blog/content/howto-set-an-environment-variable-in-windows-command-line-and-registry/
I have my data on google cloud platform and i want to be able to be able to download it locally, this is my first time trying that and eventually i'll use the downloaded data with my python code.
I have checked the docs, like https://cloud.google.com/genomics/downloading-credentials-for-api-access and https://cloud.google.com/storage/docs/cloud-console i have successfully got the Json file for my first link, the second one is where u'm struggling, i'm using python 3.5 and assuming my json files name is data.json i have added the following code:
os.environ["file"] = "data.json"
urllib.request.urlopen('https://storage.googleapis.com/[bucket_name]/[filename]')
first of all i don't even know what i should call the value near environ so i just called it file, not sure how i'm supposed to fill it and i got access denied on the second line, obviously it's not how to download my file as there is no destination local repository or anything in that command any guidance will be appreciated.
Edit:
from google.cloud.storage import Blob
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "credentials/client_secret.json"
storage_client = storage.Client.from_service_account_json('service_account.json')
client = storage.Client(project='my-project')
bucket = client.get_bucket('my-bucket')
blob = Blob('path/to/my-object', bucket)
download_to_filename('local/path/to/my-file')
I'm getting unresolved reference for storage and download_to_filename and should i replace service_account.json with credentials/client_secret.json. Plus i tried to print the content of os.environ["GOOGLE_APPLICATION_CREDENTIALS"]['installed'] like i'd do with any Json but it just said i should give numbers meaning it read the input path as regular text only.
You should use the idiomatic Google Cloud library to run operations in GCS.
With the example there, and knowing that the client library will get the application default credentials, first we have to set the applicaiton default credentials with
gcloud auth application-default login
===EDIT===
That was the old way. Now you should use the instructions in this link.
This means downloading a service account key file from the console, and setting the environment variable GOOGLE_APPLICATION_CREDENTIALS to the path of the downloaded JSON.
Also, make sure that this service account has the proper permissions on the project of the bucket.
Or you can create the client with explicit credentials. You'll need to download the key file all the same, but when creating the client, use:
storage_client = storage.Client.from_service_account_json('service_account.json')
==========
And then, following the example code:
from google.cloud import storage
client = storage.Client(project='project-id')
bucket = client.get_bucket('bucket-id')
blob = storage.Blob('bucket/file/path', bucket)
blob.download_to_filename('/path/to/local/save')
Or, if this is a one-off download, just install the SDK and use gsutil to download:
gsutil cp gs://bucket/file .