Azure storage container API request authentication failing with Django app

Azure storage container API request authentication failing with Django app - python

I am trying to sync the static files of my django application to Azure storage. I am getting an error when I try to write static files to the storage container when running the manage.py collectstatic command.
I am getting the error. The MAC signature found in the HTTP request is not the same as any computed signature.
The common cause for this error is mismatched time signatures on the two servers, but this is not the problem in my case.
I am using the django packages django-azure-storage and azure-sdk-for-python to format the request.
Here is a gist of the http request and responses generated when trying to connect to the azure storage container.
Is there anything that seems wrong from these outputs?

I have downloaded the django packages and Azure SDK following your description. I have coded a sample to reproduce this issue, but it works fine on my side. Below are the steps that I have done:
Set up the environment: Python 2.7 and Azure SDK(0.10.0).
1.Trying to use the django-azure-storage
It is very frustrating that I didn't import it into my project successfully since this is the first time I used it. Usually, I leverage Azure Python SDK directly. This time I copied storage.py as AzureStorage class in my project.
#need import django contentfile type
from django.core.files.base import ContentFile
#import the AzureStorage Class form my project
from DjangoWP.AzureStorage import AzureStorage
# my local image path
file_path="local.png";
# my Azure storage blob file
def djangorplugin():
azurestorage=AzureStorage(myaccount, mykey,"mycontainer")
stream=open(file_path, 'rb')
data = stream.read()
#need convert file to ContentFile
azurestorage.save("Testfile1.png",ContentFile(data))
2.You many want to know how to use Azure SDK for Python directly, below code snippet for your reference:
from azure.storage.blobservice import BlobService
#my local image path
file_path="local.png";
def upload():
blob_service = BlobService(account_name=myaccount, account_key=mykey)
stream=open(file_path, 'rb')
data = stream.read()
blob_service.put_blob("mycontainer","local.png",data,"BlockBlob")
If you have any further concerns, please feel free to let us know.

I was incorrectly using the setting DEFAULT_FILE_STORAGE instead of STATICFILES_STORAGE to override the storage backend used while syncing static files. Changing this setting solved this problem.
I was also encountering problems when trying to use django-storages, which specifies to use the DEFAULT_FILE_STORAGE setting in its documentation. However, using STATICFILES_STORAGE with this package also fixed the issue I was having.

Related

VisualStudioCodeCredential.get_token failed

I'm using a Jupyter Notebook within VS Code and the Azure Python SDK to develop locally.
Relevant VS Code Extensions installed:
Python
Azure Account
Azure Storage (maybe relevant?)
Goal:
To retrieve a secret from Azure Keyvault using DefaultCredential to authenticate
Since there are no environment variables nor ManagedIdentity credentials, DefaultCredential should default to pulling my creds from VS Code
Issue:
import logging
from azure.keyvault.secrets import SecretClient
from azure.identity import DefaultAzureCredential
keyvault_name = "kv-test"
keyvualt_url = "https://" + keyvault_name + ".vault.azure.net"
keyvault_credential = DefaultAzureCredential()
kv_secret1_name = "secret-test"
keyvault_client = SecretClient(vault_url=keyvualt_url, credential=keyvault_credential)
retrieved_key = keyvault_client.get_secret(kv_secret1_name)
logging.info("Account key retrieved from Keyvault")
Error:
EnvironmentCredential.get_token failed: EnvironmentCredential authentication unavailable. Environment variables are not fully configured.
ManagedIdentityCredential.get_token failed: ManagedIdentityCredential authentication unavailable, no managed identity endpoint found.
SharedTokenCacheCredential.get_token failed: SharedTokenCacheCredential authentication unavailable. No accounts were found in the cache.
VisualStudioCodeCredential.get_token failed: **Failed to get Azure user details from Visual Studio Code**.
Tried so far:
F1, Azure: Sign in
Authenticate via browser
No change
It looks like the DefaultCredential() cred chain is running, but its unable to ...get Azure user details from Visual Studio Code..
Is this because I'm developing inside a Jupyter Notebook in VS Code or is there another issue going on? It looks like something similar happened to the Python .NET SDK.

Not sure why it does not work, it looks correct. If you just want to login with visual studio code, you can also use AzureCliCredential. It works on my side.
You could use az login to sign in your account. Then you will get secret using the code.
from azure.keyvault.secrets import SecretClient
from azure.identity import DefaultAzureCredential,AzureCliCredential
keyvault_credential= AzureCliCredential()
secret_client = SecretClient("https://{vault-name}.vault.azure.net", keyvault_credential)
secret = secret_client.get_secret("secret-name")
print(secret.name)
print(secret.value)
For more details, see the Azure Identity client library for Python.

There is another easy way to fix this issue. As I quoted in this article, we can make use of the configurations options as preceding.
if self.local_dev:
print(f"Local Dev is {self.local_dev}")
self.az_cred = DefaultAzureCredential(
exclude_environment_credential=True,
exclude_managed_identity_credential=True,
exclude_shared_token_cache_credential=True,
exclude_interactive_browser_credential=True,
exclude_powershell_credential=True,
exclude_visual_studio_code_credential=False,
exclude_cli_credential=False,
logging_enable=True,
)
else:
self.az_cred = DefaultAzureCredential(
exclude_environment_credential=True, logging_enable=True
)
Please be noted that the exclude_visual_studio_code_credential and exclude_cli_credential as set to False and others set to True to exclude for the local development and exclude_environment_credential is set to True for the other environment such as production.
You can see these configuations in the default.py file of azure identity package.

As stated in the docs:
It's a known issue that VisualStudioCodeCredential doesn't work with Azure Account extension versions newer than 0.9.11. A long-term fix to this problem is in progress. In the meantime, consider authenticating via the Azure CLI.
For reference, this is the issue. With that, it is probably smart to disable the vscode credentials: DefaultAzureCredential(exclude_visual_studio_code_credential=True)
Anyway, depending on the version of the vscode extension, we might need to use another mean of authentication, such as SharedTokenCacheCredential, AzureCliCredential or even InteractiveBrowserCredential.
In my case, my authentication was failing on the SharedTokenCacheCredential step, that from what I read this is a shared cache used between Microsoft products. Thus, I assume it is likely the same is happening to others that have Microsoft products installed.
It was failing because my target tenant was not included in this cache. To fix this I had two options: either I disable the shared token credential or I include the target tenant in the shared cache.
For the first option, we can do something similar as for disabling vscode: DefaultAzureCredential(exclude_shared_token_cache_credential=True)
For the second option, I did it as suggested in this blog post from Microsoft: DefaultAzureCredential(additionally_allowed_tenants=[TENANT_ID]). But by looking at the source code, it seems we can achieve the same by:
setting the target tenant ID to an environment variable named AZURE_TENANT_ID, or;
directly passing the shared cache tenant id: DefaultAzureCredential(shared_cache_tenant_id="TENANT_ID")
Note that the env variable has the benefit that it is used by other authentication methods, namely InteractiveBrowserCredential and VisualStudioCodeCredential.

gcs client library stopped working with dev_appserver

Google cloud storage client library is returning 500 error when I attempt to upload via development server.
ServerError: Expect status [200] from Google Storage. But got status 500.
I haven't changed anything with the project and the code still works correctly in production.
I've attempted gcloud components update to get the latest dev_server and I've updated to the latest google cloud storage client library.
I've run gcloud init again to make sure credentials are loaded and I've made sure I'm using the correct bucket.
The project is running on windows 10.
Python version 2.7
Any idea why this is happening?
Thanks

Turns out this has been a problem for a while.
It has to do with how blobstore filenames are generated.
https://issuetracker.google.com/issues/35900575
The fix is to monkeypatch this file:
google-cloud-sdk\platform\google_appengine\google\appengine\api\blobstore\file_blob_storage.py
def _FileForBlob(self, blob_key):
"""Calculate full filename to store blob contents in.
This method does not check to see if the file actually exists.
Args:
blob_key: Blob key of blob to calculate file for.
Returns:
Complete path for file used for storing blob.
"""
blob_key = self._BlobKey(blob_key)
# Remove bad characters.
import re
blob_fname = re.sub(r"[^\w\./\\]", "_", str(blob_key))
# Make sure it's a relative directory.
if blob_fname and blob_fname[0] in "/\\":
blob_fname = blob_fname[1:]
return os.path.join(self._DirectoryForBlob(blob_key), blob_fname)

Upload Large files to S3 without authentication in Python

I'm trying to upload large files to Amazon S3 without using credentials. I'm creating a plugin for Octoprint with this, and I can't put any sort of credentials into the code due to it being public. Currently my code for uploads looks like this:
import boto3
from botocore import UNSIGNED
from botocore.client import Config
s3 = boto3.client('s3', config=Config(signature_version=UNSIGNED))
# Create an S3 client
filename = 'file.txt'
bucket_name = 'BUCKET_HERE'
s3.upload_file(filename, bucket_name, filename)
However, it gives me the following error:
S3UploadFailedError: Failed to upload largefiletest.mp4 to BUCKETNAMEHERE/largefiletest.mp4: An error occurred (AccessDenied) when calling the CreateMultipartUpload operation: Anonymous users cannot initiate multipart uploads. Please authenticate.
Is there any way to work around this, or are there any suggestions for alternative libraries? Anything is appreciated.

Do you mean that the repository is public but the runtime environment is private? If so, the standard practice is to set environment variables like this:
# first pip install environ
import environ
SOME_KEY = env('SOME_KEY', default='')
This way, you can easily update your credentials without changing your code or compromising security.
Edit:
Then on the machine this code will be run, you can set the environment variables as such:
macOS: https://natelandau.com/my-mac-osx-bash_profile/
Linux: https://www.cyberciti.biz/faq/set-environment-variable-linux/
Windows: http://www.dowdandassociates.com/blog/content/howto-set-an-environment-variable-in-windows-command-line-and-registry/

Authentication to Google Cloud Python API Library stopped working

I have problems with the authentication in the Python Library of Google Cloud API.
At first it worked for some days without problem, but suddenly the API calls are not showing up in the API Overview of the Google CloudPlatform.
I created a service account and stored the json file locally. Then I set the environment variable GCLOUD_PROJECT to the project ID and GOOGLE_APPLICATION_CREDENTIALS to the path of the json file.
from google.cloud import speech
client = speech.Client()
print(client._credentials.service_account_email)
prints the correct service account email.
The following code transcribes the audio_file successfully, but the Dashboard for my Google Cloud project doesn't show anything for the activated Speech API Graph.
import io
with io.open(audio_file, 'rb') as f:
audio = client.sample(f.read(), source_uri=None, sample_rate=48000, encoding=speech.encoding.Encoding.FLAC)
alternatives = audio.sync_recognize(language_code='de-DE')
At some point the code also ran in some errors, regarding the usage limit. I guess due to the unsuccessful authentication, the free/limited option is used somehow.
I also tried the alternative option for authentication by installing the Google Cloud SDK and gcloud auth application-default login, but without success.
I have no idea where to start troubleshooting the problem.
Any help is appreciated!
(My system is running Windows 7 with Anaconda)
EDIT:
The error count (Fehler) is increasing with calls to the API. How can I get detailed information about the error?!

Make sure you are using an absolute path when setting the GOOGLE_APPLICATION_CREDENTIALS environment variable. Also, you might want to try inspecting the access token using OAuth2 tokeninfo and make sure it has "scope": "https://www.googleapis.com/auth/cloud-platform" in its response.
Sometimes you will get different error information if you initialize the client with GRPC enabled:
0.24.0:
speech_client = speech.Client(_use_grpc=True)
0.23.0:
speech_client = speech.Client(use_gax=True)
Usually it's an encoding issue, can you try with the sample audio or try generating LINEAR16 samples using something like the Unix rec tool:
rec --channels=1 --bits=16 --rate=44100 audio.wav trim 0 5
...
with io.open(speech_file, 'rb') as audio_file:
content = audio_file.read()
audio_sample = speech_client.sample(
content,
source_uri=None,
encoding='LINEAR16',
sample_rate=44100)
Other notes:
Sync Recognize is limited to 60 seconds of audio, you must use async for longer audio
If you haven't already, set up billing for your account

With regards to the usage problem, the issue is in fact that when you use the new google-cloud library to access ML APIs, it seems everyone authenticates to a project shared by everyone (hence it says you've used up your limit even though you've not used anything). To check and confirm this, you can call an ML API that you have not enabled by using the python client library, which will give you a result even though it shouldn't. This problem persists to other language client libraries and OS, so I suspect it's an issue with their grpc.
Because of this, to ensure consistency I always use the older googleapiclient that uses my API key. Here is an example to use the translate API:
from googleapiclient import discovery
service = discovery.build('translate', 'v2', developerKey='')
service_request = service.translations().list(q='hello world', target='zh')
result = service_request.execute()
print(result)
For the speech API, it's something along the lines of:
from googleapiclient import discovery
service = discovery.build('speech', 'v1beta1', developerKey='')
service_request = service.speech().syncrecognize()
result = service_request.execute()
print(result)
You can get the list of the discovery APIs at https://developers.google.com/api-client-library/python/apis/ with the speech one located in https://developers.google.com/resources/api-libraries/documentation/speech/v1beta1/python/latest/.
One of the other benefits of using the discovery library is that you get a lot more options compared to the current library, although often times it's a bit more of a pain to implement.

Prefered way of using Flask and S3 for large files

I know that this is a little bit open ended but I am confused as to what strategy/method to apply for a large file upload service developed using Flask and boto3. For smaller files and all it is fine. But it would be really nice to see what you guys think when the size exceeds 100 MB
What I have in mind are following -
a) Stream the file to Flask app using some kind of AJAX uploader(What I am trying to build is just a REST interface using Flask-Restful. Any example of using these components, e.g. Flask-Restful, boto3 and streaming large files are welcome.). The upload app is going to be (I believe) part of a microservices platform that we are building. I do not know whether there will be a Nginx proxy in front of the flask app or it will be directly served from a Kubernetes pod/service. In case it is directly served, is there something that I have to change for large file upload either in kubernetes and/or Flask layer?
b) Using a direct JS uploader (like http://www.plupload.com/) and stream the file into s3 bucket directly and when finished get the URL and pass it to the Flask API app and store it in DB. The problem with this is, the credentials need to be there somewhere in JS which means a security threat. (Not sure if any other concerns are there)
What among them (or something different I did not think about at all) you think is the best way and where can I find some code example for that?
Thanks in advance.
[EDIT]
I have found this - http://blog.pelicandd.com/article/80/streaming-input-and-output-in-flask where the author is dealing with kind of similar situation like me and he proposed a solution. But he is opening a file already present in disk. What if I want to directly upload the file as it comes in as one single object in a s3 bucket? I feel that this can be a base of a solution but not the solution itself.

Alternatively you can use Minio-py client library, its Open Source and compatible with S3 API. It handles multipart upload for you natively.
A simple put_object.py example:
import os
from minio import Minio
from minio.error import ResponseError
client = Minio('s3.amazonaws.com',
access_key='YOUR-ACCESSKEYID',
secret_key='YOUR-SECRETACCESSKEY')
# Put a file with default content-type.
try:
file_stat = os.stat('my-testfile')
file_data = open('my-testfile', 'rb')
client.put_object('my-bucketname', 'my-objectname', file_data, file_stat.st_size)
except ResponseError as err:
print(err)
# Put a file with 'application/csv'
try:
file_stat = os.stat('my-testfile.csv')
file_data = open('my-testfile.csv', 'rb')
client.put_object('my-bucketname', 'my-objectname', file_data,
file_stat.st_size, content_type='application/csv')
except ResponseError as err:
print(err)
You can find list of complete API operations with examples here
Installing Minio-Py library
$ pip install minio
Hope it helps.
Disclaimer: I work for Minio

Flask can only use the memory to save all http request body, so there is no feature such as disk buffing as I know.
Nginx upload module is a really good way to do large file upload. the document is here.
You can also use html5, flash to send trunked file data and process the data in Flask, but it's complicated.
Try to look up if s3 offer the one time token.

Using the link I have posted above I finally ended up doing the following. Please tell me if you think it is a good solution
import boto3
from flask import Flask, request
.
.
.
#app.route('/upload', methods=['POST'])
def upload():
s3 = boto3.resource('s3', aws_access_key_id="key", aws_secret_access_key='secret', region_name='us-east-1')
s3.Object('bucket-name','filename').put(Body=request.stream.read(CHUNK_SIZE))
.
.
.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.