Google cloud storage client library is returning 500 error when I attempt to upload via development server.
ServerError: Expect status [200] from Google Storage. But got status 500.
I haven't changed anything with the project and the code still works correctly in production.
I've attempted gcloud components update to get the latest dev_server and I've updated to the latest google cloud storage client library.
I've run gcloud init again to make sure credentials are loaded and I've made sure I'm using the correct bucket.
The project is running on windows 10.
Python version 2.7
Any idea why this is happening?
Thanks
Turns out this has been a problem for a while.
It has to do with how blobstore filenames are generated.
https://issuetracker.google.com/issues/35900575
The fix is to monkeypatch this file:
google-cloud-sdk\platform\google_appengine\google\appengine\api\blobstore\file_blob_storage.py
def _FileForBlob(self, blob_key):
"""Calculate full filename to store blob contents in.
This method does not check to see if the file actually exists.
Args:
blob_key: Blob key of blob to calculate file for.
Returns:
Complete path for file used for storing blob.
"""
blob_key = self._BlobKey(blob_key)
# Remove bad characters.
import re
blob_fname = re.sub(r"[^\w\./\\]", "_", str(blob_key))
# Make sure it's a relative directory.
if blob_fname and blob_fname[0] in "/\\":
blob_fname = blob_fname[1:]
return os.path.join(self._DirectoryForBlob(blob_key), blob_fname)
Related
I have my data on google cloud platform and i want to be able to be able to download it locally, this is my first time trying that and eventually i'll use the downloaded data with my python code.
I have checked the docs, like https://cloud.google.com/genomics/downloading-credentials-for-api-access and https://cloud.google.com/storage/docs/cloud-console i have successfully got the Json file for my first link, the second one is where u'm struggling, i'm using python 3.5 and assuming my json files name is data.json i have added the following code:
os.environ["file"] = "data.json"
urllib.request.urlopen('https://storage.googleapis.com/[bucket_name]/[filename]')
first of all i don't even know what i should call the value near environ so i just called it file, not sure how i'm supposed to fill it and i got access denied on the second line, obviously it's not how to download my file as there is no destination local repository or anything in that command any guidance will be appreciated.
Edit:
from google.cloud.storage import Blob
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "credentials/client_secret.json"
storage_client = storage.Client.from_service_account_json('service_account.json')
client = storage.Client(project='my-project')
bucket = client.get_bucket('my-bucket')
blob = Blob('path/to/my-object', bucket)
download_to_filename('local/path/to/my-file')
I'm getting unresolved reference for storage and download_to_filename and should i replace service_account.json with credentials/client_secret.json. Plus i tried to print the content of os.environ["GOOGLE_APPLICATION_CREDENTIALS"]['installed'] like i'd do with any Json but it just said i should give numbers meaning it read the input path as regular text only.
You should use the idiomatic Google Cloud library to run operations in GCS.
With the example there, and knowing that the client library will get the application default credentials, first we have to set the applicaiton default credentials with
gcloud auth application-default login
===EDIT===
That was the old way. Now you should use the instructions in this link.
This means downloading a service account key file from the console, and setting the environment variable GOOGLE_APPLICATION_CREDENTIALS to the path of the downloaded JSON.
Also, make sure that this service account has the proper permissions on the project of the bucket.
Or you can create the client with explicit credentials. You'll need to download the key file all the same, but when creating the client, use:
storage_client = storage.Client.from_service_account_json('service_account.json')
==========
And then, following the example code:
from google.cloud import storage
client = storage.Client(project='project-id')
bucket = client.get_bucket('bucket-id')
blob = storage.Blob('bucket/file/path', bucket)
blob.download_to_filename('/path/to/local/save')
Or, if this is a one-off download, just install the SDK and use gsutil to download:
gsutil cp gs://bucket/file .
For weather processing purpose, I am looking to retrieve automatically daily weather forecast data in Google Cloud Storage.
The files are available on public HTTP URL (http://dcpc-nwp.meteo.fr/openwis-user-portal/srv/en/main.home), but they are very large (between 30 and 300 Megabytes). Size of files is the main issue.
After looking at previous stackoverflow topics, I have tried two unsuccessful methods:
1/ First attempt via urlfetch in Google App Engine
from google.appengine.api import urlfetch
url = "http://dcpc-nwp.meteo.fr/servic..."
result = urlfetch.fetch(url)
[...] # Code to save in a Google Cloud Storage bucket
But I get the following error message on the urlfetch line :
DeadlineExceededError: Deadline exceeded while waiting for HTTP response from URL
2/ Second attempt via the Cloud Storage Transfert Service
According to the documentation, it is possible to retrieve HTTP Data into Cloud Storage directly via the Cloud Storage Transfert Service :
https://cloud.google.com/storage/transfer/reference/rest/v1/TransferSpec#httpdata
But it requires the size and md5 of the files before the download. This option cannot work in my case because the website does not provide those information.
3/ Any ideas ?
Do you see any solution to retrieve automatically large file on HTTP into my Cloud Storage bucket?
3/ Workaround with a Compute Engine instance
Since it was not possible to retrieve large files from external HTTP with App Engine or directly with Cloud Storage, I have used a workaround with an always-running Compute Engine instance.
This instance regularly checks if new weather files are available, downloads them and uploads them to a Cloud Storage bucket.
For scalability, maintenance and cost reasons, I would have prefered to use only serverless services, but hopefully :
It works well on a fresh f1-micro Compute Engine instance (no extra package required and only 4$/month if running 24/7)
The network traffic from Compute Engine to Google Cloud Storage is free if the instance and the bucket are in the same region (0$/month)
The md5 and size of the file can be retrieved easily and quickly using curl -I command as mentioned in this link https://developer.mozilla.org/en-US/docs/Web/HTTP/Range_requests.
The Storage Transfer Service can then be configured to use that information.
Another option would be to use a serverless Cloud Function. It could look like something below in Python.
import requests
def download_url_file(url):
try:
print('[ INFO ] Downloading {}'.format(url))
req = requests.get(url)
if req.status_code==200:
# Download and save to /tmp
output_filepath = '/tmp/{}'.format(url.split('/')[-1])
output_filename = '{}'.format(url.split('/')[-1])
open(output_filepath, 'wb').write(req.content)
print('[ INFO ] Successfully downloaded to output_filepath: {} & output_filename: {}'.format(output_filepath, output_filename))
return output_filename
else:
print('[ ERROR ] Status Code: {}'.format(req.status_code))
except Exception as e:
print('[ ERROR ] {}'.format(e))
return output_filename
Currently, the MD5 and size are required for Google's Transfer Service; we understand that in cases like yours, this can be difficult to work with, but unfortunately we don't have a great solution today.
Unless you're able to get the size and MD5 by downloading the files yourself (temporarily), I think that's the best you can do.
I have problems with the authentication in the Python Library of Google Cloud API.
At first it worked for some days without problem, but suddenly the API calls are not showing up in the API Overview of the Google CloudPlatform.
I created a service account and stored the json file locally. Then I set the environment variable GCLOUD_PROJECT to the project ID and GOOGLE_APPLICATION_CREDENTIALS to the path of the json file.
from google.cloud import speech
client = speech.Client()
print(client._credentials.service_account_email)
prints the correct service account email.
The following code transcribes the audio_file successfully, but the Dashboard for my Google Cloud project doesn't show anything for the activated Speech API Graph.
import io
with io.open(audio_file, 'rb') as f:
audio = client.sample(f.read(), source_uri=None, sample_rate=48000, encoding=speech.encoding.Encoding.FLAC)
alternatives = audio.sync_recognize(language_code='de-DE')
At some point the code also ran in some errors, regarding the usage limit. I guess due to the unsuccessful authentication, the free/limited option is used somehow.
I also tried the alternative option for authentication by installing the Google Cloud SDK and gcloud auth application-default login, but without success.
I have no idea where to start troubleshooting the problem.
Any help is appreciated!
(My system is running Windows 7 with Anaconda)
EDIT:
The error count (Fehler) is increasing with calls to the API. How can I get detailed information about the error?!
Make sure you are using an absolute path when setting the GOOGLE_APPLICATION_CREDENTIALS environment variable. Also, you might want to try inspecting the access token using OAuth2 tokeninfo and make sure it has "scope": "https://www.googleapis.com/auth/cloud-platform" in its response.
Sometimes you will get different error information if you initialize the client with GRPC enabled:
0.24.0:
speech_client = speech.Client(_use_grpc=True)
0.23.0:
speech_client = speech.Client(use_gax=True)
Usually it's an encoding issue, can you try with the sample audio or try generating LINEAR16 samples using something like the Unix rec tool:
rec --channels=1 --bits=16 --rate=44100 audio.wav trim 0 5
...
with io.open(speech_file, 'rb') as audio_file:
content = audio_file.read()
audio_sample = speech_client.sample(
content,
source_uri=None,
encoding='LINEAR16',
sample_rate=44100)
Other notes:
Sync Recognize is limited to 60 seconds of audio, you must use async for longer audio
If you haven't already, set up billing for your account
With regards to the usage problem, the issue is in fact that when you use the new google-cloud library to access ML APIs, it seems everyone authenticates to a project shared by everyone (hence it says you've used up your limit even though you've not used anything). To check and confirm this, you can call an ML API that you have not enabled by using the python client library, which will give you a result even though it shouldn't. This problem persists to other language client libraries and OS, so I suspect it's an issue with their grpc.
Because of this, to ensure consistency I always use the older googleapiclient that uses my API key. Here is an example to use the translate API:
from googleapiclient import discovery
service = discovery.build('translate', 'v2', developerKey='')
service_request = service.translations().list(q='hello world', target='zh')
result = service_request.execute()
print(result)
For the speech API, it's something along the lines of:
from googleapiclient import discovery
service = discovery.build('speech', 'v1beta1', developerKey='')
service_request = service.speech().syncrecognize()
result = service_request.execute()
print(result)
You can get the list of the discovery APIs at https://developers.google.com/api-client-library/python/apis/ with the speech one located in https://developers.google.com/resources/api-libraries/documentation/speech/v1beta1/python/latest/.
One of the other benefits of using the discovery library is that you get a lot more options compared to the current library, although often times it's a bit more of a pain to implement.
I have a S3 bukcet url of an image. I am trying to download this image using urllib or wget, in both cases code executes successfully, but corrupt image is downloaded. When i say corrupt, I mean that for a 2MB image a 200kb only get downloaded.
urllib.urlretrieve(str(sys.argv[1]), "data/img"+str(randomword(10))+".jpg")
In the later part of line, I am just adding random string as the name of the image that is to be downloaded.
Pls help
You can grab the file by authenticating first and downloading. I'd recommend just using python boto library to deal with amazon web services. If you did that the code would look something like this
import boto
# set your AWS creds on your environment path or hardcode it
AWS_ACCESS_KEY_ID = os.getenv("AWS_KEY_ID")
AWS_ACCESS_SECRET_KEY = os.getenv("AWS_ACCESS_KEY")
conn = boto.connect_s3(AWS_ACCESS_KEY_ID, AWS_ACCESS_SECRET_KEY)
bucket = conn.get_bucket("my_bucket_name")
key = bucket.get_key('file_on_s3.txt')
key.get_contents_to_filename('where_file_goes_locally.txt')
If you really don't want to use boto, you can piece it together manually and essentially do what boto does build up the right request headers from your aws creds. I do this using a bash script on a server I have. This should point you in the right direction (https://gist.github.com/davidejones/d05f51df75e659111227) if you want to rewrite this with python requests or urllib that should work too.
I am trying to sync the static files of my django application to Azure storage. I am getting an error when I try to write static files to the storage container when running the manage.py collectstatic command.
I am getting the error. The MAC signature found in the HTTP request is not the same as any computed signature.
The common cause for this error is mismatched time signatures on the two servers, but this is not the problem in my case.
I am using the django packages django-azure-storage and azure-sdk-for-python to format the request.
Here is a gist of the http request and responses generated when trying to connect to the azure storage container.
Is there anything that seems wrong from these outputs?
I have downloaded the django packages and Azure SDK following your description. I have coded a sample to reproduce this issue, but it works fine on my side. Below are the steps that I have done:
Set up the environment: Python 2.7 and Azure SDK(0.10.0).
1.Trying to use the django-azure-storage
It is very frustrating that I didn't import it into my project successfully since this is the first time I used it. Usually, I leverage Azure Python SDK directly. This time I copied storage.py as AzureStorage class in my project.
#need import django contentfile type
from django.core.files.base import ContentFile
#import the AzureStorage Class form my project
from DjangoWP.AzureStorage import AzureStorage
# my local image path
file_path="local.png";
# my Azure storage blob file
def djangorplugin():
azurestorage=AzureStorage(myaccount, mykey,"mycontainer")
stream=open(file_path, 'rb')
data = stream.read()
#need convert file to ContentFile
azurestorage.save("Testfile1.png",ContentFile(data))
2.You many want to know how to use Azure SDK for Python directly, below code snippet for your reference:
from azure.storage.blobservice import BlobService
#my local image path
file_path="local.png";
def upload():
blob_service = BlobService(account_name=myaccount, account_key=mykey)
stream=open(file_path, 'rb')
data = stream.read()
blob_service.put_blob("mycontainer","local.png",data,"BlockBlob")
If you have any further concerns, please feel free to let us know.
I was incorrectly using the setting DEFAULT_FILE_STORAGE instead of STATICFILES_STORAGE to override the storage backend used while syncing static files. Changing this setting solved this problem.
I was also encountering problems when trying to use django-storages, which specifies to use the DEFAULT_FILE_STORAGE setting in its documentation. However, using STATICFILES_STORAGE with this package also fixed the issue I was having.