I have never used amazon web services so I apologize for the naive question. I am looking to run my code on a cluster as the quad-core architecture on my local machine doesn't seem to be doing the job. The documentation seems overwhelming and I don't even know which AWS services are going to be used for running my script on EC2. Would I have to use their storage facility (S3) because I guess if I have to run my script, I'm going to have to store it on the cloud in a place where the cluster instance has access to the files or do I upload my files somewhere else while working with EC2? If this is true is it possible for me to upload my entire directory which has all the contents of the files required by my application onto s3. Any guidance would be much appreciated. So I guess my question is do I have to use S3 to store my code in a place accessible by the cluster? If so is there an easy way to do it? Meaning I have only seen examples of creating buckets wherein one file can be transferred per bucket. Can you transfer an entire folder into a bucket?
If we don't require to use S3 then which other service should I use to give the cluster access to my scripts to be executed?
Thanks in advance!
You do not need to use S3, you would likely want to use EBS for storing the code if you need it to be preserved between instance launches. When you launch an instance you have the option to add an ebs storage volume to the drive. That drive will automatically be mounted to the instance and you can access it just like you would on any physical machine. ssh your code up to the amazon machine and fire away.
Related
I've almost run out of space on my C: and I'm currently working for myself remotely. I want to purchase cloud storage that will act as a mounted drive, so that it can do the following:
Store all of my Python projects along with any other files
Run my Python scripts on VS code (or any IDE) straight from the drive
Create virtual environments for my Python projects that will be stored on the drive
Set up APIs, from Python scripts stored on this drive, to other programs (eg GA or Heroku) so I can push and pull data as required
I just purchased OneDrive thinking I'd be able do this but according to the answer in this SO post it's not a good idea. This article is describing the exact behaviour that I'm after and pCloud looks like a good option, given it's security, but I can't find much resource on it's compatibility with Python.
Google Cloud, AWS and Azure are all out of my price range and look too complex for what I'm after. My cloud computing knowledge is fairly limited but I was wondering if anyone has any experience of running scripts in Python from the cloud (from pulling data from a warehouse to hosting an application in the public domain) that isn't using one of the big cloud computing companies?
I have a telegram bot with a Postgres DB hosted on Heroku Free dyno. In one stage of my code, I want to save pickled files permanently so that I can access them later. Storing it on a table doesn't feel like a nice idea as it is a nested class with a variable number of inputs.
Problem is that Heroku deletes these files frequently or at least on each restart or push. Is there any way to tackle this problem?
You have to use external services such as AWS S3, GCP Cloud Storage (Buckets), Azure Blob Storage etc.. for that. Or you may consider using an addon such as Felix Cloud Storage, Cloud Cube, Bucketeer, HDrive for easy integration.
Here is what the documentation states:
The Heroku filesystem is ephemeral - that means that any changes to
the filesystem whilst the dyno is running only last until that dyno is
shut down or restarted. Each dyno boots with a clean copy of the
filesystem from the most recent deploy. This is similar to how many
container based systems, such as Docker, operate.
In addition, under normal operations dynos will restart every day in a
process known as "Cycling".
These two facts mean that the filesystem on Heroku is not suitable for
persistent storage of data. In cases where you need to store data we
recommend using a database addon such as Postgres (for data) or a
dedicated file storage service such as AWS S3 (for static files). If
you don't want to set up an account with AWS to create an S3 bucket we
also have addons here that handle storage and processing of static
assets https://elements.heroku.com/addons
This question already exists:
How do I provide a zip file for download from an AWS EC2 instance running Ubuntu?
Closed 2 years ago.
So I am supposed to create an application that uploads files(images) to an AWS Ec2 instance running Linux. I used Flask to create the upload logic and a simple HTML webpage to upload files.
After this I am supposed to generate a thumbnails and web optimized images from these images and store them on a directory and then provide the "web-optimized" image directory for download. How do I achieve this?
I have asked this previously here. I have pasted the code that I am using on that thread as well.
So my questions are :
Is it a good idea to use pscp to transfer files to the EC2 instance?
Is it a good idea to use paramiko to run a ssh on the remote instance in order to invoke a shell script that does the image processing (thumbnail web-optimized image generation using python)?
How do I get the compressed images via a simple download button on the client/host computer?
Thanks.
I am not entirely sure if this is one of the limitations that you have to use EC2 instance only. If not I would recommend something like the following if you have to write a robust solution.
Upload the file to an AWS S3 bucket, would be much cheaper to hold all the images (original). Let us call this images_original
Now you can write a lambda to monitor the bucket, every time a new upload happens, process the image to create thumbnails etc for different resolutions, and upload them to their individual buckets (images_thumbnails, images_640_480 etc).
To transform the images, I don't think you should be doing SSH into a remote instance to run the transform. Its better to use a python native library, such as pillow inside the lambda itself.
On the client, you can simply access the respective file in the bucket. Eg if you want a thumbnail, you can access the thumbnail_bucket_url/file_name.jpeg etc.
Additionally, you can have a service that gets signed urls from S3.
To learn how to monitor an S3 bucket using a lambda you can refer THIS
One important thing to remember while creating the lambda, all the depending libraries will have to be uploaded as a zipfile.
eg. pip install Pillow -t
Currently, I know of two ways to download a file from a bucket to your computer
1) Manually go to the bucket and clicking download
2) Use gsutil
Is there a way to do this in a program in Google Cloud Functions? You can't execute gsutil in a Python script. I also tried this:
with open("C:\", "wb") as file_obj:
blob.download_to_file(file_obj)
but I believe that looks for a directory on Google Cloud. Please help!
The code you are using is for downloading the files from your local machine, this is covered in this document, however, as you mention this would look for the local file in the machine executing the Cloud Function.
I would suggest you to create a cron to download the files to your local instance through gsutil, doing this through a Cloud Function is not going to be possible.
Hope you find this useful.
You can't directly achieve what you want. Cloud Function must be able to reach your local computer for copying the file on it.
The common way for sending file to a computer is to use FTP protocol. So install a FTP server on your computer and to set up your function for reading into your bucket and then send the file to your FTP server (you have to get your public IP, be sure that the firewall rules/routers are configured for this,...). It's not the easiest way.
A gsutil with a rsync command work perfectly. Use a planned task on Windows if you want to cron this.
I am using Azure python API to create page blob create_blob and updating the header using the link provided http://blog.stevenedouard.com/create-a-blank-azure-vm-disk-vhd-without-attaching-it/ and updating my actual image data using update_page but when i am trying to boot the VHD i am getting provision error in Azure. "Could not provision the virtual machine" can any one please suggest.
I think there may be something wrong with your vhd image. I would suggest you have a look at this article.
Here is a snippet of that article:
Please make sure of the following when uploading a VHD for use with Azure VMs:
A VM must be generalized to use as an image from which you will create other VMs. For Windows, you generalize with the sysprep tool. For Linux you generalize with the Windows Azure Linux Agent (waagent). Provisioning will fail if you upload a VHD as an image that has not been generalized.
A VM must not be generalized to use as a disk to only use as a single VM (and not base other VMs from it). Provisioning will fail if you upload a VHD as a disk that has generalized.
When using third-party storage tools for the upload make sure to upload the VHD a page blob (provisioning will fail if the VHD was uploaded as a block blob). Add-AzureVHD and Csupload will handle this for you. It is only with third-party tools that you could inadvertantly upload as a block blob instead of a page blob.
Upload only fixed VHDs (not dynamic, and not VHDX). Windows Azure Virtual Machines do not support dynamic disks or the VHDX format.
Note: Using CSUPLOAD or Add-AzureVHD to upload VHDs automatically converts the dynamic VHDs to fixed VHDs.
Maximum size of VHD can be up to 127 GB. While data disks can be up to 1 TB, OS disks must be 127 GB or less.
The VM must be configured for DHCP and not assigned a static IP address. Windows Azure Virtual Machines do not support static IP addresses.
I think that there are two points that you could focus on.
1.The VHD file should be a .vhd file. So ,your code should be 'blob_name='a-new-vhd.vhd''
2.The storage account and the VM which you created should be in the same location.
Hope it helps. Any concerns, please feel free to let me know.