I want to access string data from a blob with a Python function app.
The function app works fine in local but doesn't return anything when published (Even though Configuration section in the portal is updated with all environment variables needed in local.settings.json)
The part data.readall() is what I am returning which doesn't return anything once published:
from azure.storage.blob import BlobServiceClient
blob_service_client = BlobServiceClient.from_connection_string(connect_str)
blob_client = blob_service_client.get_blob_client(container=container_name, blob=blob_name)
data = blob_client.download_blob()
data.readall()
Any idea why I can't access the content of the blob once the app published?
Or any other idea/method that would help me debug this would be greatly appreciated.
Thanks
So it would appear that there are different behaviors according to the version of azure-storage-blob pip installed.
data.readall() seemed to work only for one version of the package but once upgraded to azure-storage-blob==12.14.1, the prior method doesn't work, but this one does:
data.content_as_text() (in local development + published function)
Related
Google cloud storage client library is returning 500 error when I attempt to upload via development server.
ServerError: Expect status [200] from Google Storage. But got status 500.
I haven't changed anything with the project and the code still works correctly in production.
I've attempted gcloud components update to get the latest dev_server and I've updated to the latest google cloud storage client library.
I've run gcloud init again to make sure credentials are loaded and I've made sure I'm using the correct bucket.
The project is running on windows 10.
Python version 2.7
Any idea why this is happening?
Thanks
Turns out this has been a problem for a while.
It has to do with how blobstore filenames are generated.
https://issuetracker.google.com/issues/35900575
The fix is to monkeypatch this file:
google-cloud-sdk\platform\google_appengine\google\appengine\api\blobstore\file_blob_storage.py
def _FileForBlob(self, blob_key):
"""Calculate full filename to store blob contents in.
This method does not check to see if the file actually exists.
Args:
blob_key: Blob key of blob to calculate file for.
Returns:
Complete path for file used for storing blob.
"""
blob_key = self._BlobKey(blob_key)
# Remove bad characters.
import re
blob_fname = re.sub(r"[^\w\./\\]", "_", str(blob_key))
# Make sure it's a relative directory.
if blob_fname and blob_fname[0] in "/\\":
blob_fname = blob_fname[1:]
return os.path.join(self._DirectoryForBlob(blob_key), blob_fname)
Trying to simply connect to the google-cloud-storage using these instructions;
https://googleapis.github.io/google-cloud-python/latest/storage/index.html
However, I keep getting the problem with the storage module, no client attribute.
from google.cloud import storage
# Instantiates a client
storage_client = storage.Client(credentials=creds, project='name')
# The name for the new bucket
bucket_name = 'my-new-bucket'
# Creates the new bucket
bucket = storage_client.create_bucket(bucket_name)
print('Bucket {} created.'.format(bucket.name))
This is a problem I've seen several times, and happens as well in other google.cloud modules. Most of the time it is related to a broken installation
Try to uninstall and then installgoogle.cloud packages. If no luck, try to use it on a newly created virtual environment (this will work for sure)
Related git issue with same solution
I have a Python 3.6 application that uses scikit-learn, deployed to IBM Cloud (Cloud Foundry). It works fine. My local development environment is Mac OS High Sierra.
Recently, I added IBM Cloud Object Storage functionality (ibm_boto3) to the app. The COS functionality itself works fine. I'm able to upload, download, list, and delete objects just fine using the ibm_boto3 library.
Strangely, the part of the app that uses scikit-learn now freezes up.
If I comment out the ibm_boto3 import statements (and corresponding code), the scikit-learn code works fine.
More perplexingly, the issue only happens on the local development machine running OS X. When the app is deployed to IBM Cloud, it works fine -- both scikit-learn and ibm_boto3 work well side-by-side.
Our only hypothesis at this point is that somehow the ibm_boto3 library surfaces a known issue in scikit-learn (see this -- parallel version of the K-means algorithm is broken when numpy uses Accelerator on OS X).
Note that we only face this issue once we add ibm_boto3 to the project.
However, we need to be able to test on localhost before deploying to IBM Cloud. Are there any known compatibility issues between ibm_boto3 and scikit-learn on Mac OS?
Any suggestions on how we can avoid this on the dev machine?
Cheers.
Up until now, there weren't any known compatibility issues. :)
At some point there were some issues with the vanilla SSL libraries that come with OSX, but if you're able to read and write data that isn't the problem.
Are you using HMAC credentials? If so, I'm curious if the behavior continues if you use the original boto3 library instead of the IBM fork.
Here's a simple examples that shows how you might use pandas with the original boto3:
import boto3 # package used to connect to IBM COS using the S3 API
import io # python package used to stream data
import pandas as pd # lightweight data analysis package
access_key = '<access key>'
secret_key = '<secret key>'
pub_endpoint = 'https://s3-api.us-geo.objectstorage.softlayer.net'
pvt_endpoint = 'https://s3-api.us-geo.objectstorage.service.networklayer.com'
bucket = 'demo' # the bucket holding the objects being worked on.
object_key = 'demo-data' # the name of the data object being analyzed.
result_key = 'demo-data-results' # the name of the output data object.
# First, we need to open a session and create a client that can connect to IBM COS.
# This client needs to know where to connect, the credentials to use,
# and what signature protocol to use for authentication. The endpoint
# can be specified to be public or private.
cos = boto3.client('s3', endpoint_url=pub_endpoint,
aws_access_key_id=access_key,
aws_secret_access_key=secret_key,
region_name='us',
config=boto3.session.Config(signature_version='s3v4'))
# Since we've already uploaded the dataset to be worked on into cloud storage,
# now we just need to identify which object we want to use. This creates a JSON
# representation of request's response headers.
obj = cos.get_object(Bucket=bucket, Key=object_key)
# Now, because this is all REST API based, the actual contents of the file are
# transported in the request body, so we need to identify where to find the
# data stream containing the actual CSV file we want to analyze.
data = obj['Body'].read()
# Now we can read that data stream into a pandas dataframe.
df = pd.read_csv(io.BytesIO(data))
# This is just a trivial example, but we'll take that dataframe and just
# create a JSON document that contains the mean values for each column.
output = df.mean(axis=0, numeric_only=True).to_json()
# Now we can write that JSON file to COS as a new object in the same bucket.
cos.put_object(Bucket=bucket, Key=result_key, Body=output)
I have my data on google cloud platform and i want to be able to be able to download it locally, this is my first time trying that and eventually i'll use the downloaded data with my python code.
I have checked the docs, like https://cloud.google.com/genomics/downloading-credentials-for-api-access and https://cloud.google.com/storage/docs/cloud-console i have successfully got the Json file for my first link, the second one is where u'm struggling, i'm using python 3.5 and assuming my json files name is data.json i have added the following code:
os.environ["file"] = "data.json"
urllib.request.urlopen('https://storage.googleapis.com/[bucket_name]/[filename]')
first of all i don't even know what i should call the value near environ so i just called it file, not sure how i'm supposed to fill it and i got access denied on the second line, obviously it's not how to download my file as there is no destination local repository or anything in that command any guidance will be appreciated.
Edit:
from google.cloud.storage import Blob
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "credentials/client_secret.json"
storage_client = storage.Client.from_service_account_json('service_account.json')
client = storage.Client(project='my-project')
bucket = client.get_bucket('my-bucket')
blob = Blob('path/to/my-object', bucket)
download_to_filename('local/path/to/my-file')
I'm getting unresolved reference for storage and download_to_filename and should i replace service_account.json with credentials/client_secret.json. Plus i tried to print the content of os.environ["GOOGLE_APPLICATION_CREDENTIALS"]['installed'] like i'd do with any Json but it just said i should give numbers meaning it read the input path as regular text only.
You should use the idiomatic Google Cloud library to run operations in GCS.
With the example there, and knowing that the client library will get the application default credentials, first we have to set the applicaiton default credentials with
gcloud auth application-default login
===EDIT===
That was the old way. Now you should use the instructions in this link.
This means downloading a service account key file from the console, and setting the environment variable GOOGLE_APPLICATION_CREDENTIALS to the path of the downloaded JSON.
Also, make sure that this service account has the proper permissions on the project of the bucket.
Or you can create the client with explicit credentials. You'll need to download the key file all the same, but when creating the client, use:
storage_client = storage.Client.from_service_account_json('service_account.json')
==========
And then, following the example code:
from google.cloud import storage
client = storage.Client(project='project-id')
bucket = client.get_bucket('bucket-id')
blob = storage.Blob('bucket/file/path', bucket)
blob.download_to_filename('/path/to/local/save')
Or, if this is a one-off download, just install the SDK and use gsutil to download:
gsutil cp gs://bucket/file .
I am trying to sync the static files of my django application to Azure storage. I am getting an error when I try to write static files to the storage container when running the manage.py collectstatic command.
I am getting the error. The MAC signature found in the HTTP request is not the same as any computed signature.
The common cause for this error is mismatched time signatures on the two servers, but this is not the problem in my case.
I am using the django packages django-azure-storage and azure-sdk-for-python to format the request.
Here is a gist of the http request and responses generated when trying to connect to the azure storage container.
Is there anything that seems wrong from these outputs?
I have downloaded the django packages and Azure SDK following your description. I have coded a sample to reproduce this issue, but it works fine on my side. Below are the steps that I have done:
Set up the environment: Python 2.7 and Azure SDK(0.10.0).
1.Trying to use the django-azure-storage
It is very frustrating that I didn't import it into my project successfully since this is the first time I used it. Usually, I leverage Azure Python SDK directly. This time I copied storage.py as AzureStorage class in my project.
#need import django contentfile type
from django.core.files.base import ContentFile
#import the AzureStorage Class form my project
from DjangoWP.AzureStorage import AzureStorage
# my local image path
file_path="local.png";
# my Azure storage blob file
def djangorplugin():
azurestorage=AzureStorage(myaccount, mykey,"mycontainer")
stream=open(file_path, 'rb')
data = stream.read()
#need convert file to ContentFile
azurestorage.save("Testfile1.png",ContentFile(data))
2.You many want to know how to use Azure SDK for Python directly, below code snippet for your reference:
from azure.storage.blobservice import BlobService
#my local image path
file_path="local.png";
def upload():
blob_service = BlobService(account_name=myaccount, account_key=mykey)
stream=open(file_path, 'rb')
data = stream.read()
blob_service.put_blob("mycontainer","local.png",data,"BlockBlob")
If you have any further concerns, please feel free to let us know.
I was incorrectly using the setting DEFAULT_FILE_STORAGE instead of STATICFILES_STORAGE to override the storage backend used while syncing static files. Changing this setting solved this problem.
I was also encountering problems when trying to use django-storages, which specifies to use the DEFAULT_FILE_STORAGE setting in its documentation. However, using STATICFILES_STORAGE with this package also fixed the issue I was having.