Local access to Amazon S3 Bucket from EC2 instance - python

I have an EC2 instance and an S3 bucket in the same region. The bucket contains reasonably large (5-20mb) files that are used regularly by my EC2 instance.
I want to programatically open the file on my EC2 instance (using python). Like so:
file_from_s3 = open('http://s3.amazonaws.com/my-bucket-name/my-file-name')
But using a "http" URL to access the file remotely seems grossly inefficient, surely this would mean downloading the file to the server every time I want to use it.
What I want to know is, is there a way I can access S3 files locally from my EC2 instance, for example:
file_from_s3 = open('s3://my-bucket-name/my-file-name')
I can't find a solution myself, any help would be appreciated, thank you.

Whatever you do the object will be downloaded behind the scenes from S3 into your EC2 instance. That cannot be avoided.
If you want to treat files in the bucket as local files you need to install any one of several S3 filesystem plugins for FUSE (example : s3fs-fuse ). Alternatively you can use boto for easy access to S3 objects via python code.

Related

Python how to download a file from s3 and then reuse

how can I download a file from s3 and then reuse instead of keeping downloading it everytime when the endpoint is called?
#app.route("/get", methods=['GET'])
def get():
s3 = boto3.resource('s3')
ids = pickle.loads(s3.Bucket('bucket').Object('file').get()['Body'].read())
...
Based on the comments.
Since the files are large (16GB) and need to be read and updated often, instead of S3, an EFS filesystem could be used for their storage:
Amazon Elastic File System (Amazon EFS) provides a simple, serverless, set-and-forget elastic file system for use with AWS Cloud services and on-premises resources.
EFS provides NFS filesystems that you can mount to your instance, or even multiple instances at the same time. You can also mount the same filesystem to ECS containers and lambda functions.
Since EFS provides regular filesystem, you can write and read the files directly in it. There is no need to copy it first as in S3 which is object storage (not filesystem).
Its worth pointing out, that convenience of EFS costs more then using S3. However, now you can reduce the cost of using EFS, if this is a problem, by using just released Amazon EFS One Zone storage class.

How can i automatically delete AWS S3 files using python?

I want delete some files from S3 after certain time. i need to set a time limit for each object not for the bucket. is that possible?
I am using boto3 to upload the file into S3.
region = "us-east-2"
bucket = os.environ["S3_BUCKET_NAME"]
credentials = {
'aws_access_key_id': os.environ["AWS_ACCESS_KEY"],
'aws_secret_access_key': os.environ["AWS_ACCESS_SECRET_KEY"]
}
client = boto3.client('s3', **credentials)
transfer = S3Transfer(client)
transfer.upload_file(file_name, bucket, folder+file_name,
extra_args={'ACL': 'public-read'})
Above is the code i used to upload the object.
You have many options here. Some ideas:
You can automatically delete files are a given time period by using Amazon S3 Object Lifecycle Management. See: How Do I Create a Lifecycle Policy for an S3 Bucket?
If you requirements are more-detailed (eg different files after different time periods), you could add a Tag to each object specifying when you'd like the object deleted, or after how many days it should be deleted. Then, you could define an Amazon CloudWatch Events rule to trigger an AWS Lambda function at regular periods (eg once a day or once an hour). You could then code the Lambda function to look at the tags on objects, determine whether they should be deleted and delete the desired objects. You will find examples of this on the Internet, often called a Stopinator.
If you have an Amazon EC2 instance that is running all the time for other work, then you could simply create a cron job or Scheduled Task to run a similar program (without using AWS Lambda).

How to mount S3 bucket as local FileSystem?

I have a python app running on a Jupiter-notebook on AWS. I loaded a C-library into my python code which expects a path to a file.
I would like to access this file from the S3 bucket.
I tried to use s3fs:
s3 = s3fs.S3FileSystem(anon=False)
using s3.ls('..') lists all my bucket files... this is ok so far. But, the library I am using should actually use the s3 variable inside where I have no access. I can only pass the path to the c library.
Is there a way to mount the s3 bucket in a way, where I don't have to call
s3.open(), and can just call open(/path/to/s3) were somewhere hidden the s3 bucket is really mounted as a local filesystem?
I think it should work like this without using s3. Because I can't change the library I am using internally to use the s3 variable...
with s3.open("path/to/s3/file",'w') as f:
df.to_csv(f)
with open("path/to/s3/file",'w') as f:
df.to_csv(f)
Or am I doing it completely wrong?
The c library iam using is loaded as DLL in python and i call a function :
lib.OpenFile(path/to/s3/file)
I have to pass the path to s3 into the library OpenFile function.
If you're looking to mount the S3 bucket as part of the file system, then use s3fs-fuse
https://github.com/s3fs-fuse/s3fs-fuse
That will make it part of the file system, and the regular file system functions will work as you would expect.
If you are targeting windows, it is possible to use rclone along with winfsp to mount a S3 bucket as local FileSystem
The simplified steps are :
rclone config to create a remote
rclone mount remote:bucket * to mount
https://github.com/rclone/rclone
https://rclone.org/
https://github.com/billziss-gh/winfsp
http://www.secfs.net/winfsp/
Might not the completely relevant to this question, but I am certain it will be to a lot of users coming here.

Upload a file to S3 compatible service using python

I'm using an S3 compatible service. That means my dynamic storage is not hosted on AWS. I found a couple of python scripts that upload files to AWS S3. I would like to do the same but I need to be able to set my own host url. How can that be done?
You can use the Boto3 library (https://boto3.readthedocs.io/en/latest/) for all your S3 needs in Python. To use a custom S3-compatible host instead of the AWS, set the endpoint_url argument when constructing a S3 resource object, e.g.:
import boto3
session = boto3.session.Session(...)
s3 = session.resource("s3", endpoint_url="http://...", ...)
You can use amazon route53.
Please refer
http://docs.aws.amazon.com/AmazonS3/latest/dev/website-hosting-custom-domain-walkthrough.html

Will I get charge for transfering files between S3 accounts using boto's bucket.copy_key() function?

I wrote a little script that copies files from bucket on one S3 account to the bucket in another S3 account.
In this script I use bucket.copy_key() function to copy key from one bucket in another bucket.
I tested it, it works fine, but the question is: do I get charged for copying files between S3 to S3 in same region?
What I'm worry about that may be I missed something in boto source code, and I hope it's not store the file on my machine, than send it to another S3.
Also (sorry, if its to much questions in one topic) if I upload and run this script from EC2 instance will I get charge for bandwidth?
If you are using the copy_key method in boto then you are doing server-side copying. There is a very small per-request charge for COPY operations just as there are for all S3 operations but if you are copying between two buckets in the same region, there is no network transfer charges. This is true whether you run the copy operations on your local machine or on an EC2 instance.

Categories

Resources