How to find vhd files in azure using python - python

I want to create VM from a disk in azure using python. So I need VHD file of the disk for this purpose.
How can I download or obtain the VHD file in azure using python. And on Azure portal, where are the VHD files listed?
I referred a video where it was told to search under storage account->containers->vhds, but I did not find anything as such. Then I found this :- blob_service.get_blob_to_path, but it also did not work for me.
Can anyone help me with this problem?
I want to obtain VHD file of the disk in azure using python.

Related

Azure - How to dowload a file from Azure Databricks Filestore?

I trained a model in using Keras on Azure databricks (notebook). I would like to be able to save this model on an .h5 or .pkl file and download it to my local machine.
When I train the model locally I use the following to save the file inside a directory called models, but obviously this path does not exist on Azure.
model.save('models/cnn_w2v.h5')
I am new to Azure so any help will be greatly appreciated
Correct me if I'm wrong, you are executing this line on your DataBricks notebook:
model.save('models/cnn_w2v.h5')
Right?
So if that's the case, your model is saved, but it is stored on the Azure instance that is running behind.
You need to upload this file to Azure Storage (just add code to the notebook that does that).
Later, you will be able to download it to your local machine.
I have found the answer to my question above here: how to download files from azure databricks file store
Files stored in /FileStore are accessible in your web browser at https://.cloud.databricks.com/files/. For example, the file you stored in /FileStore/my-stuff/my-file.txt is accessible at:
"https://.cloud.databricks.com/files/my-stuff/my-file.txt"
Note If you are on Community Edition you may need to replace https://community.cloud.databricks.com/files/my-stuff/my-file.txt with https://community.cloud.databricks.com/files/my-stuff/my-file.txt?o=######where the number after o= is the same as in your Community Edition URL.
Refer: https://docs.databricks.com/user-guide/advanced/filestore.html

How to download file from website to S3 bucket without having to download to local machine

I'm trying to download a dataset from a website. However all the files I want to download add up to about 100 gb which I don't want to download to my local machine, then upload to s3. Is there a way to download directly to an s3 bucket? Or do you have to use ec2, and if so could somebody give brief instructions on how to do this? Thanks
S3's put_object() method supports a Body parameter for Bytes (or file):
Python example:
response = client.put_object(
Body=b'bytes'|file,
Bucket='string',
Key='string',
)
So if you download a webpage, using Python you'd use the requests.Get() method or .Net you use either HttpWebRequest or WebClient and then upload the file as a byte array so you never need to save it locally. It can all be done in memory.
Or do you have to use ec2
An Ec2 is just a VM in the cloud, you can programatically do this task (download 100gb to S3) from your Desktop PC/Laptop. Simply open an Command Window or a Terminal and type:
AWS Configure
Put in an IAM users creds and use the aws cli or use an AWS SDK like the python example above. You can give the S3 Bucket a Policy Document that will allow access to the IAM user. This will download everything to your local machine.
If you want to run this on an EC2 and avoid downloading everything to your local PC modify the role assigned to the EC2 and give it Put privileges to S3. This will be the easiest and most secure. If you use the in-memory and bytes approach it will download all the data but it wont save it to disk.

interface between google colaboratory and google cloud

From google colaboratory, if I want to read/write to a folder in a given bucket created in google cloud, how do I achieve this?
I have created a bucket, a folder within the bucket and uploaded bunch of images into it. Now from colaboratory, using jupyter notebook, want to create multiple sub-directories to organise these images into train, validation and test folders.
Subsequently access respective folders for training, validating and testing the model.
With Google drive, we just update the path to direct to specific directory with following commands, after authentication.
import sys
sys.path.append('drive/xyz')
We do some thing similar on desktop version also
import os
os.chdir(local_path)
Does some thing similar exist for Google Cloud Storage?
I colaboratory FAQs, it has procedure for reading and writing a single file, where we need to set the entire path. That will be tedious to re-organise a main directory into sub-directories and access them separately.
In general it's not a good idea to try to mount a GCS bucket on the local machine (which would allow you to use it as you mentioned). From Connecting to Cloud Storage buckets:
Note: Cloud Storage is an object storage system that does not have the
same write constraints as a POSIX file system. If you write data
to a file in Cloud Storage simultaneously from multiple sources, you
might unintentionally overwrite critical data.
Assuming you'd like to continue regardless of the warning, if you use a Linux OS you may be able to mount it using the Cloud Storage FUSE adapter. See related How to mount Google Bucket as local disk on Linux instance with full access rights.
The recommended way to access GCS from python apps is using the Cloud Storage Client Libraries, but accessing files will be different
than in your snippets. You can find some examples at Python Client for Google Cloud Storage:
from google.cloud import storage
client = storage.Client()
# https://console.cloud.google.com/storage/browser/[bucket-id]/
bucket = client.get_bucket('bucket-id-here')
# Then do other things...
blob = bucket.get_blob('remote/path/to/file.txt')
print(blob.download_as_string())
blob.upload_from_string('New contents!')
blob2 = bucket.blob('remote/path/storage.txt')
blob2.upload_from_filename(filename='/local/path.txt')
Update:
The Colaboratory doc recommends another method that I forgot about, based on the Google API Client Library for Python, but note that it also doesn't operate like a regular filesystem, it's using an intermediate file on the local filesystem:
uploading files to GCS
downloading files from GCS:

Azure python create empty vhd blob

I am using Azure python API to create page blob create_blob and updating the header using the link provided http://blog.stevenedouard.com/create-a-blank-azure-vm-disk-vhd-without-attaching-it/ and updating my actual image data using update_page but when i am trying to boot the VHD i am getting provision error in Azure. "Could not provision the virtual machine" can any one please suggest.
I think there may be something wrong with your vhd image. I would suggest you have a look at this article.
Here is a snippet of that article:
Please make sure of the following when uploading a VHD for use with Azure VMs:
A VM must be generalized to use as an image from which you will create other VMs. For Windows, you generalize with the sysprep tool. For Linux you generalize with the Windows Azure Linux Agent (waagent). Provisioning will fail if you upload a VHD as an image that has not been generalized.
A VM must not be generalized to use as a disk to only use as a single VM (and not base other VMs from it). Provisioning will fail if you upload a VHD as a disk that has generalized.
When using third-party storage tools for the upload make sure to upload the VHD a page blob (provisioning will fail if the VHD was uploaded as a block blob). Add-AzureVHD and Csupload will handle this for you. It is only with third-party tools that you could inadvertantly upload as a block blob instead of a page blob.
Upload only fixed VHDs (not dynamic, and not VHDX). Windows Azure Virtual Machines do not support dynamic disks or the VHDX format.
Note: Using CSUPLOAD or Add-AzureVHD to upload VHDs automatically converts the dynamic VHDs to fixed VHDs.
Maximum size of VHD can be up to 127 GB. While data disks can be up to 1 TB, OS disks must be 127 GB or less.
The VM must be configured for DHCP and not assigned a static IP address. Windows Azure Virtual Machines do not support static IP addresses.
I think that there are two points that you could focus on.
1.The VHD file should be a .vhd file. So ,your code should be 'blob_name='a-new-vhd.vhd''
2.The storage account and the VM which you created should be in the same location.
Hope it helps. Any concerns, please feel free to let me know.

Copy files from S3 to Google Cloud storage using boto

I'm trying to automate copying of files from S3 to Google Cloud Storage inside a python script.
Everywhere I look people recommend using gsutil as a command line utility.
Does anybody know if this copies files directly? Or does it first download the files to the computer and then uploads them to GS?
Can this be done using the boto library and google's OAuth2 plugin?
This is what i've got from google's documentation and a little bit or trial-error:
src_uri = boto.storage_uri('bucket/source_file')
dst_uri = boto.storage_uri('bucket/destination_file', 'gs')
object_contents = StringIO.StringIO()
src_uri.get_key().get_file(object_contents)
object_contents.seek(0)
dst_uri.set_contents_from_file(object_contents)
object_contents.close()
From what I understand I'm reading from a file into an object in the host where the script is running, and later uploading such content into a file in GS.
Is this right?
Since this question was asked, GCS Transfer Service has become available. If you want to copy files from S3 to GCS without an intermediate, this is a great option.

Categories

Resources