S3 buckets to Glacier on demand Is it possible from boto3 API?

S3 buckets to Glacier on demand Is it possible from boto3 API? - python

I'm testing a script to recover date stored in an S3 bucket, with a lifecycle rule that moves the data to glacier each day. So, in theory, when I upload a file to the S3 bucket, after a day, the Amazon infrastructure should move it to glacier.
But I want to test a script that I'm developing in python to test the
restore process. So, if I understand the boto3 API I haven't seen any method to force a file stored in the S3 bucket to move immediately
to the glacier storage. Is is possible to do that or it's necessary
to wait till the Amazon infrastructure fires the lifecycle rule.
I would like to use some code like this:
bucket = s3.Bucket(TARGET_BUCKET)
for obj in bucket.objects.filter(Bucket=TARGET_BUCKET, Prefix=TARGET_KEYS + KEY_SEPARATOR):
obj.move_to_glacier()
But I can't find any API that make this move to glacier on demand. Also, I don't
know if I can force this on demand using a bucket lifecycle rule

Update:
S3 has changed the PUT Object API, effective 2018-11-26. This was not previously possible, but you can now write objects directly to the S3 Glacier storage class.
One of the things we hear from customers about using S3 Glacier is that they prefer to use the most common S3 APIs to operate directly on S3 Glacier objects. Today we’re announcing the availability of S3 PUT to Glacier, which enables you to use the standard S3 “PUT” API and select any storage class, including S3 Glacier, to store the data. Data can be stored directly in S3 Glacier, eliminating the need to upload to S3 Standard and immediately transition to S3 Glacier with a zero-day lifecycle policy.
https://aws.amazon.com/blogs/architecture/amazon-s3-amazon-s3-glacier-launch-announcements-for-archival-workloads/
The service now accepts the following values for x-amz-storage-class:
STANDARD
STANDARD_IA
ONEZONE_IA
INTELLIGENT_TIERING
GLACIER
REDUCED_REDUNDANCY
PUT+Copy (which is always used, typically followed by DELETE, for operations that change metadata or rename objects) also supports the new functionality.
Note that to whatever extent your SDK "screens" these values locally, taking advantage of this functionality may require you to upgrade to a more current version of the SDK.
This isn't possible. The only way to migrate an S3 object to the GLACIER storage class is through lifecycle policies.
x-amz-storage-class
Constraints: You cannot specify GLACIER as the storage class. To transition objects to the GLACIER storage class, you can use lifecycle configuration.
http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPUT.html
The REST API is the interface used by all of the SDKs, the console, and aws-cli.
Note... test with small objects, but don't archive small objects to Glacier in production. S3 will bill you for 90 days minimum of Glacier storage even if you delete the object before 90 days. (This charge is documented.)

It is possible to upload the files from S3 to Glacier using the upload_archive() method of Glacier.
Update: This will not be same as S3 object lifecycle management but direct upload to Glacier.
glacier_client = boto3.client('glacier')
bucket = s3.Bucket(TARGET_BUCKET)
for obj in bucket.objects.filter(Prefix=TARGET_KEYS + KEY_SEPARATOR):
archive_id = glacier_client.upload_archive(vaultName='TARGET_VAULT',body=obj.get()['Body'].read())
print obj.key, archive_id
.filter() does not accept the Bucket keyword argument.

Related

How can I transfer objects in buckets between two aws accounts with python?

I need to transfer all objects of all buckets, with the same folders and buckets structures, from an aws account to another aws account.
I've been doing it with this code through aws cli, one bucket at a time:
aws s3 sync s3://SOURCE-BUCKET-NAME s3://DESTINATION-BUCKET-NAME --no-verify-ssl
Can I do it with python for all objects of all buckets?

The AWS CLI actually is a Python program. It includes multi-threading to copy multiple objects simultaneously, so it will likely be much more efficient than the equivalent Python program you can make yourself.
You can tweak some settings that might help: AWS CLI S3 Configuration — AWS CLI Command Reference
There is no option to copy "all buckets" -- you would still need to sync/copy one bucket at a time.
Another approach would be to use S3 Bucket Replication, where AWS will replicate the buckets for you. This now works on existing objects. See: Replicating existing objects between S3 buckets | AWS Storage Blog
Or, you could use S3 Batch Operations, which can take a manifest (a listing of objects) as input and then copy those objects to a desired destination. See: Performing large-scale batch operations on Amazon S3 objects - Amazon Simple Storage Service

aws s3 sync is a high level functionality not available in AWS SDKs such as boto3. You have to implement it yourself on top of boto3 or search though many available code snippets that already implement that, such as python - Sync two buckets through boto3 - Stack Overflow.

How to delete thousands of objects from s3 bucket with in specific object folder?

Im having thousands of objects in all the folders gocc1, gocc2,etc
s3://awss3runner/gocc1/gocc2/goccf/
i just want to delete the objects(50,000+) from goccf and its versions
import boto3
session = boto3.Session()
s3 = session.resource(service_name='s3')
#bucket = s3.Bucket('awss3runner','goccf')if we use this getting error
bucket = s3.Bucket('awss3runner') # (working but if we use this everything in the bucket getting deleted)
bucket.object_versions.delete()
is there anyway to delete goccf objects and its versions

You can use the DeleteObjects API in S3 (https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjects.html)
I would first perform a list operation to enumerate all the objects you wish to delete, then pass that into DeleteObjects. Be very careful as you could accidentally delete other objects in your bucket.
Another option, is to use an S3 lifecycle policy, if this is going to be a one-off operation. With a lifecycle policy you can specify a path in your S3 bucket and set the objects to Expire. They will be asynchronously removed from your S3 bucket https://docs.aws.amazon.com/AmazonS3/latest/userguide/lifecycle-expire-general-considerations.html

Python how to download a file from s3 and then reuse

how can I download a file from s3 and then reuse instead of keeping downloading it everytime when the endpoint is called?
#app.route("/get", methods=['GET'])
def get():
s3 = boto3.resource('s3')
ids = pickle.loads(s3.Bucket('bucket').Object('file').get()['Body'].read())
...

Based on the comments.
Since the files are large (16GB) and need to be read and updated often, instead of S3, an EFS filesystem could be used for their storage:
Amazon Elastic File System (Amazon EFS) provides a simple, serverless, set-and-forget elastic file system for use with AWS Cloud services and on-premises resources.
EFS provides NFS filesystems that you can mount to your instance, or even multiple instances at the same time. You can also mount the same filesystem to ECS containers and lambda functions.
Since EFS provides regular filesystem, you can write and read the files directly in it. There is no need to copy it first as in S3 which is object storage (not filesystem).
Its worth pointing out, that convenience of EFS costs more then using S3. However, now you can reduce the cost of using EFS, if this is a problem, by using just released Amazon EFS One Zone storage class.

How can i automatically delete AWS S3 files using python?

I want delete some files from S3 after certain time. i need to set a time limit for each object not for the bucket. is that possible?
I am using boto3 to upload the file into S3.
region = "us-east-2"
bucket = os.environ["S3_BUCKET_NAME"]
credentials = {
'aws_access_key_id': os.environ["AWS_ACCESS_KEY"],
'aws_secret_access_key': os.environ["AWS_ACCESS_SECRET_KEY"]
}
client = boto3.client('s3', **credentials)
transfer = S3Transfer(client)
transfer.upload_file(file_name, bucket, folder+file_name,
extra_args={'ACL': 'public-read'})
Above is the code i used to upload the object.

You have many options here. Some ideas:
You can automatically delete files are a given time period by using Amazon S3 Object Lifecycle Management. See: How Do I Create a Lifecycle Policy for an S3 Bucket?
If you requirements are more-detailed (eg different files after different time periods), you could add a Tag to each object specifying when you'd like the object deleted, or after how many days it should be deleted. Then, you could define an Amazon CloudWatch Events rule to trigger an AWS Lambda function at regular periods (eg once a day or once an hour). You could then code the Lambda function to look at the tags on objects, determine whether they should be deleted and delete the desired objects. You will find examples of this on the Internet, often called a Stopinator.
If you have an Amazon EC2 instance that is running all the time for other work, then you could simply create a cron job or Scheduled Task to run a similar program (without using AWS Lambda).

Will I get charge for transfering files between S3 accounts using boto's bucket.copy_key() function?

I wrote a little script that copies files from bucket on one S3 account to the bucket in another S3 account.
In this script I use bucket.copy_key() function to copy key from one bucket in another bucket.
I tested it, it works fine, but the question is: do I get charged for copying files between S3 to S3 in same region?
What I'm worry about that may be I missed something in boto source code, and I hope it's not store the file on my machine, than send it to another S3.
Also (sorry, if its to much questions in one topic) if I upload and run this script from EC2 instance will I get charge for bandwidth?

If you are using the copy_key method in boto then you are doing server-side copying. There is a very small per-request charge for COPY operations just as there are for all S3 operations but if you are copying between two buckets in the same region, there is no network transfer charges. This is true whether you run the copy operations on your local machine or on an EC2 instance.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.