Migrating from Amazon S3 to Azure Storage (Django web app) - python

I maintain this Django web app where users congregate and chat with one another. They can post pictures too if they want. I process these photos (i.e. optimize their size) and store them on an Amazon S3 bucket (like a 'container' in Azure Storage). To do that, I set up the bucket on Amazon, and included the following configuration code in my settings.py:
DEFAULT_FILE_STORAGE = 'storages.backends.s3boto.S3BotoStorage'
AWS_S3_FORCE_HTTP_URL = True
AWS_QUERYSTRING_AUTH = False
AWS_SECRET_ACCESS_KEY = os.environ.get('awssecretkey')
AWS_ACCESS_KEY_ID = os.environ.get('awsaccesskeyid')
AWS_S3_CALLING_FORMAT='boto.s3.connection.OrdinaryCallingFormat'
AWS_STORAGE_BUCKET_NAME = 'dakadak.in'
Additionally Boto 2.38.0 and django-storages 1.1.8 are installed in my virtual environment. Boto is a Python package that provides interfaces to Amazon Web Services, whereas django-storages is a collection of custom storage backends for Django.
I now want to stop using Amazon S3 and instead migrate to Azure Storage. I.e. I need to migrate all existing files from S3 to Azure Storage, and next I need to configure my Django app to save all new static assets on Azure Storage.
I can't find precise documentation on what I'm trying to achieve. Though I do know django-storages support Azure. Has anyone done this kind of a migration before, and can point out where I need to begin, what steps to follow to get everything up and running?
Note: Ask me for more information if you need it.

Per my experience, you can do the two steps for migrating from Amazon S3 to Azure Storage for Django web app.
The first step is moving all files from S3 to Azure Blob Storage. There are two ways you can try.
Using the tools for S3 and Azure Storage to move files from S3 to local directory to Azure Blob Storage.
These tools below for S3 can help moving files to local directory.
AWS Command Line(https://aws.amazon.com/cli/): aws s3 cp s3://BUCKET/FOLDER localfolder --recursive
S3cmd Tools(http://s3tools.org/): s3cmd -r sync s3://BUCKET/FOLDER localfolder
S3 Browser (http://s3browser.com/) :
This is a GUI client.
For moving local files to Azure Blob Storage, you can use AzCopy Command-Line Utility for high-performance uploading, downloading, and copying data to and from Microsoft Azure Blob, File, and Table storage, please refer to the document https://azure.microsoft.com/en-us/documentation/articles/storage-use-azcopy/.
Example: AzCopy /Source:C:\myfolder /Dest:https://myaccount.blob.core.windows.net/mycontainer
Migrating by programming with Amazon S3 and Azure Blob Storage APIs in your familiar language like Python, Java, etc. Please refer to their API usage docs https://azure.microsoft.com/en-us/documentation/articles/storage-python-how-to-use-blob-storage/ and http://docs.aws.amazon.com/AmazonS3/latest/API/APIRest.html.
The second step is following the document http://django-storages.readthedocs.org/en/latest/backends/azure.html to re-configure Django settings.py file. Django-storages would allow any uploads you do to be automatically stored in your storage container.
DEFAULT_FILE_STORAGE='storages.backends.azure_storage.AzureStorage'
AZURE_ACCOUNT_NAME='myaccount'
AZURE_ACCOUNT_KEY='account_key'
AZURE_CONTAINER='mycontainer'
You can find these settings on Azure Portal as #neolursa said.
Edit:
On Azure old portal:
On Azure new portal:

The link you've shared has the configuration for Django to Azure Blobs. Now when I did this all I had to do is to go to the Azure portal. Under the storage account, get the access keys as below. Then create a container and give the container name to Django configuration. This should be enough. However I've done this a while ago.
For the second part, migrating the current files from S3 bucket to BLOB, there a couple of tools you can use;
If you are using visual studio, you can find the blobs under the Server Explorer after entering your azure account credentials in Visual Studio.
Alternatively you can use third party tools like Azure Storage Explorer or Cloudberry explorer.
Hope this helps!

Related

Zip deploy azure function from storage account

I'm trying to zip deploy an azure function from a blob storage.
I have set SCM_DO_BUILD_DURING_DEPLOYMENT, to true.
I have also set WEBSITE_RUN_FROM_PACKAGE to the remote url.
I am able to deploy easily if the function is in a remote url. However, I can't seem to do it if I have it as a blob on azure.
The prefarable runtime for this is python.
For having the zip deploy from storage account you need to navigate to your .zip blob in your storage account and get the generated SAS token for that blob.
Then add the same url in your Function App Application settings for WEBSITE_RUN_FROM_PACKAGE.
NOTE:- This option is the only one supported for running from a package on Linux hosted in a Consumption plan.
For more information on this you can refer Run your functions from a package file in Azure

How to create permanent files on Heroku?

I have a telegram bot with a Postgres DB hosted on Heroku Free dyno. In one stage of my code, I want to save pickled files permanently so that I can access them later. Storing it on a table doesn't feel like a nice idea as it is a nested class with a variable number of inputs.
Problem is that Heroku deletes these files frequently or at least on each restart or push. Is there any way to tackle this problem?
You have to use external services such as AWS S3, GCP Cloud Storage (Buckets), Azure Blob Storage etc.. for that. Or you may consider using an addon such as Felix Cloud Storage, Cloud Cube, Bucketeer, HDrive for easy integration.
Here is what the documentation states:
The Heroku filesystem is ephemeral - that means that any changes to
the filesystem whilst the dyno is running only last until that dyno is
shut down or restarted. Each dyno boots with a clean copy of the
filesystem from the most recent deploy. This is similar to how many
container based systems, such as Docker, operate.
In addition, under normal operations dynos will restart every day in a
process known as "Cycling".
These two facts mean that the filesystem on Heroku is not suitable for
persistent storage of data. In cases where you need to store data we
recommend using a database addon such as Postgres (for data) or a
dedicated file storage service such as AWS S3 (for static files). If
you don't want to set up an account with AWS to create an S3 bucket we
also have addons here that handle storage and processing of static
assets https://elements.heroku.com/addons

How to download file from website to S3 bucket without having to download to local machine

I'm trying to download a dataset from a website. However all the files I want to download add up to about 100 gb which I don't want to download to my local machine, then upload to s3. Is there a way to download directly to an s3 bucket? Or do you have to use ec2, and if so could somebody give brief instructions on how to do this? Thanks
S3's put_object() method supports a Body parameter for Bytes (or file):
Python example:
response = client.put_object(
Body=b'bytes'|file,
Bucket='string',
Key='string',
)
So if you download a webpage, using Python you'd use the requests.Get() method or .Net you use either HttpWebRequest or WebClient and then upload the file as a byte array so you never need to save it locally. It can all be done in memory.
Or do you have to use ec2
An Ec2 is just a VM in the cloud, you can programatically do this task (download 100gb to S3) from your Desktop PC/Laptop. Simply open an Command Window or a Terminal and type:
AWS Configure
Put in an IAM users creds and use the aws cli or use an AWS SDK like the python example above. You can give the S3 Bucket a Policy Document that will allow access to the IAM user. This will download everything to your local machine.
If you want to run this on an EC2 and avoid downloading everything to your local PC modify the role assigned to the EC2 and give it Put privileges to S3. This will be the easiest and most secure. If you use the in-memory and bytes approach it will download all the data but it wont save it to disk.

interface between google colaboratory and google cloud

From google colaboratory, if I want to read/write to a folder in a given bucket created in google cloud, how do I achieve this?
I have created a bucket, a folder within the bucket and uploaded bunch of images into it. Now from colaboratory, using jupyter notebook, want to create multiple sub-directories to organise these images into train, validation and test folders.
Subsequently access respective folders for training, validating and testing the model.
With Google drive, we just update the path to direct to specific directory with following commands, after authentication.
import sys
sys.path.append('drive/xyz')
We do some thing similar on desktop version also
import os
os.chdir(local_path)
Does some thing similar exist for Google Cloud Storage?
I colaboratory FAQs, it has procedure for reading and writing a single file, where we need to set the entire path. That will be tedious to re-organise a main directory into sub-directories and access them separately.
In general it's not a good idea to try to mount a GCS bucket on the local machine (which would allow you to use it as you mentioned). From Connecting to Cloud Storage buckets:
Note: Cloud Storage is an object storage system that does not have the
same write constraints as a POSIX file system. If you write data
to a file in Cloud Storage simultaneously from multiple sources, you
might unintentionally overwrite critical data.
Assuming you'd like to continue regardless of the warning, if you use a Linux OS you may be able to mount it using the Cloud Storage FUSE adapter. See related How to mount Google Bucket as local disk on Linux instance with full access rights.
The recommended way to access GCS from python apps is using the Cloud Storage Client Libraries, but accessing files will be different
than in your snippets. You can find some examples at Python Client for Google Cloud Storage:
from google.cloud import storage
client = storage.Client()
# https://console.cloud.google.com/storage/browser/[bucket-id]/
bucket = client.get_bucket('bucket-id-here')
# Then do other things...
blob = bucket.get_blob('remote/path/to/file.txt')
print(blob.download_as_string())
blob.upload_from_string('New contents!')
blob2 = bucket.blob('remote/path/storage.txt')
blob2.upload_from_filename(filename='/local/path.txt')
Update:
The Colaboratory doc recommends another method that I forgot about, based on the Google API Client Library for Python, but note that it also doesn't operate like a regular filesystem, it's using an intermediate file on the local filesystem:
uploading files to GCS
downloading files from GCS:

Python azure module: how to create a new deployment

azure.servicemanagmentservice.py contains:
def create_deployment(self, service_name, deployment_slot, name,
package_url, label, configuration,
start_deployment=False,
treat_warnings_as_error=False,
extended_properties=None):
What is package_url and configuration? Method comment indicates that
package_url:
A URL that refers to the location of the service package in the
Blob service. The service package can be located either in a
storage account beneath the same subscription or a Shared Access
Signature (SAS) URI from any storage account.
....
configuration:
The base-64 encoded service configuration file for the deployment.
All over the internet there are references to Visual Studio and Powershell to create those files. What do they look like? Can I manually create them? Can azure module create them? Why is Microsoft service so confusing and lacks documentation?
I am using https://pypi.python.org/pypi/azure Python Azure SDK. I am running Mac OS X on my dev box, so I don't have Visual Studio or cspack.exe.
Any help appreciated. Thank you.
According to your description, it looks like you are trying to use Python Azure SDK to create a cloud service deployment. Here is the documentation of how to use the create_deployment function.
Can I manually create them? Can azure module create them?
If you mean you wanna know how to create an Azure deployment package of your Python app, based on my experience, there are several options you can leverage.
If you have a visual studio, you can create a cloud project from projects’ template and package the project via 1 click. In VS, create new project -> cloud ->
Then package the project:
Without VS, you can use Microsoft Azure PowerShell cmdlets, or cspack commandline tool to create a deployment package. Similar ask could be found at: Django project to Azure Cloud Services without Visual Studio
After you packaging the project, you would have the .cspkg file like this:
For your better reference, I have uploaded the testing project at:
https://onedrive.live.com/redir?resid=7B27A151CFCEAF4F%21143283
As to the ‘configuration’, it means the base-64 encoded service configuration file (.cscfg) for the deployment
In Python, we can set up the ‘configuration’ via below code
configuration = base64.b64encode(open('E:\\TestProjects\\Python\\YourProjectFolder\\ServiceConfiguration.Cloud.cscfg', 'rb').read())
Hope above info could provide you a quick clarification. Now, let’s go back to Python SDK itself and see how we can use create_deployment function to create a cloud service deployment accordingly.
Firstly, I’d like to suggest you to refer to https://azure.microsoft.com/en-us/documentation/articles/cloud-services-python-how-to-use-service-management/ to get the basic idea of what azure Service Management is and how it’s processing.
In general, we can make create_deployment function work via 5 steps
Create your project’s deployment package and set up a configuration file (.cscfg) – (to have the quick test, you can use the one that I have uploaded)
Store your project’s deployment package in a Microsoft Azure Blob Storage account under the same subscription as the hosted service to which the package is being uploaded. Get the blob file’s URL (or use a Shared Access Signature (SAS) URI from any storage account). You can use Azure storage explorer to upload the package file, and then it will be shown on Azure portal.
Use OpenSSL to create your management certificate. It is needed to create two certificates, one for the server (a .cer file) and one for the client (a .pem file), the article I mentioned just now have provided a detailed info https://azure.microsoft.com/en-us/documentation/articles/cloud-services-python-how-to-use-service-management/
Screenshot of my created certificates
Then, upload .cer certificate to Azure portal: SETTINGS -> management certificate tab -> click upload button (on the bottom of the page).
Create a cloud service in Azure and keep the name in mind.
Create another project to test Azure SDK - create_deployment, code snippet for your reference:
subscription_id = 'Your subscription ID That could be found in Azure portal'
certificate_path = 'E:\YourFoloder\mycert.pem'
sms = ServiceManagementService(subscription_id, certificate_path)
def TestForCreateADeployment():
service_name = "Your Cloud Service Name"
deployment_name = "name"
slot = 'Production'
package_url = ".cspkg file’s URL – from your blob"
configuration = base64.b64encode(open('E:\\TestProjects\\Python\\ YourProjectFolder\\ServiceConfiguration.Cloud.cscfg ', 'rb').read())
label = service_name
result = sms.create_deployment(service_name,
slot,
deployment_name,
package_url,
label,
configuration)
operation = sms.get_operation_status(result.request_id)
print('Operation status: ' + operation.status)
Here is running result’s screenshot:

Categories

Resources