I know how to upload a file to s3 buckets in Python. I am looking for a way to upload data to a file in s3 bucket directly. In this way, I do not need to save my data to a local file, and then upload the file. Any suggestions? Thanks!
AFAIK standard Object.put() supports this.
resp = s3.Object('bucket_name', 'key/key.txt').put(Body=b'data')
Edit: it was pointed out that you might want the client method, which is just put_object with the kwargs differently organized
client.put_object(Body=b'data', Bucket='bucket_name', Key='key/key.txt')
Related
I am making a website to get to know aws and Django better. The idea is to let a user upload an excel file, convert it to csv and then let the user download the converted csv file.
I am using amazon s3 for file storage. My question is, what is the best way to make the conversion? Is there any way to access the excel file once it is stored in the s3 bucket and convert it to csv via Django? Sorry if my question is silly but I haven’t been able to find much information on that online. Thanks in advance
AWS Lambda is the best way. It has many types of Event triggers. You can specify the bucket and the event(put, delete, copy, etc).
So what you have to do is, create a lambda function which will be
triggered only when an object gets inserted into the S3 bucket. In that
lambda function you can do your coding such as getting the file from
the S3 bucket and the conversion.
Since you are familiar with python already, I suggest to use Boto 3 to get files from the S3 bucket
Check out my blog about AWS lambda with s3 if you want to get a more clearer idea and more about permissions when work with S3 bucket.
My Blog
On every put event of Bucket you can trigger a AWS Lambda function which will convert your File format and save in desired bucket location.
I want to convert s3 bucket video file resolutions to other resolutions and store it back to s3.
I know we can use ffmpeg locally to change the resolution.
Can we use ffmpeg to convert s3 files and store it back using Django?
I am puzzled as from where to start and what should be the process.
Whether I should have the video file from s3 bucket in buffer and then convert it using ffmpeg and then upload it back to s3.
Or is there anyway to do it directly without having to keep it in buffer.
You can break the problem into 3 parts:
Get the file from s3: use boto3 & the AWS APIs to download the file
Convert the file locally: ffmpeg has got you covered here. It'll generate a new file
Upload the file back to s3: use boto3 again
This is fairly simple and robust for a script.
Now if you want to optimize you can try using something like s3fs to mount your s3 bucket to local & do the conversion
I am looking for suggestions for my program:
One part of my program generates a .csv file that I need to upload to cloud. Essentially, the program should upload the .csv file to cloud and return the url for that location (csv_url)
Another part of my program has to use that csv_url with wget to download this file.
How can I tackle this problem? Will uploading the file to a S3 bucket work for me? How to return a consolidated url in that case? apart from s3 bucket is there any other medium where I can try and upload my file? Any suggestion would be very helpful.
Try boto3 library from Amazon, its has all the functions your would like to do
S3, GET/POST/PUT/DELETE/LIST.
PUT Example--
# Upload a new file
data = open('test.jpg', 'rb')
s3.Bucket('my-bucket').put_object(Key='test.jpg', Body=data)
Yes , uploading the file to AWS s3 will definitely work for you and you need nothing else and if you want to do that with python , it's quite easy
import boto3
s3 = boto3.client('s3')
s3.upload_file('images/4.jpeg', 'mausamrest', 'test/jkl.jpeg',ExtraArgs={'ACL': 'public-read'})
where mausamrest is bucket and test/jkl.jpeg is keyname or you can say filename in s3
and this is how you will have your url
https://s3.amazonaws.com/mausamrest/test/jkl.jpeg
s3.amazonaws.com/bucketname/keyname this is the format of how your object url will be
in my case image is opening in browser as i have done that kind of thing , in your case your csv will get downloaded
I wrote a python script to process very large files (few TB in total), which I'll run on an EC2 instance. Afterwards, I want to store the processed files in an S3 bucket. Currently, my script first saves the data to disk and then uploads it to S3. Unfortunately, this will be quite costly given the extra time spent waiting for the instance to first write to disk and then upload.
Is there any way to use boto3 to write files directly to an S3 bucket?
Edit: to clarify my question, I'm asking if I have an object in memory, writing that object directly to S3 without first saving the object onto disk.
You can use put_object for this. Just pass in your file object as body.
For example:
import boto3
client = boto3.client('s3')
response = client.put_object(
Bucket='your-s3-bucket-name',
Body='bytes or seekable file-like object',
Key='Object key for which the PUT operation was initiated'
)
It's working with the S3 put_object method:
key = 'filename'
response = s3.put_object(Bucket='Bucket_Name',
Body=json_data,
Key=key)
I'm making a small app to export data from BigQuery to google-cloud-storage and then copy it into aws s3, but having trouble finding out how to do it in python.
I have already written the code in kotlin (because it was easiest for me, and reasons outside the scope of my question, we want it to run in python), and in kotlin the google sdk allows me to get an InputSteam from the Blob object, which i can then inject into the amazon s3 sdk's AmazonS3.putObject(String bucketName, String key, InputStream input, ObjectMetadata metadata).
With the python sdk it seems i only have the options to download file to a file and as a string.
I would like (as i do in kotlin) to pass some object returned from the Blob object, into the AmazonS3.putObject() method, without having to save the content as a file first.
I am in no way a python pro, so i might have missed an obvious way of doing this.
I ended up with the following solution, as apparently download_to_filename downloads data into a file-like-object that the boto3 s3 client can handle.
This works just fine for smaller files, but as it buffers it all in memory, it could be problematic for larger files.
def copy_data_from_gcs_to_s3(gcs_bucket, gcs_filename, s3_bucket, s3_filename):
gcs_client = storage.Client(project="my-project")
bucket = gcs_client.get_bucket(gcs_bucket)
blob = bucket.blob(gcs_filename)
data = BytesIO()
blob.download_to_file(data)
data.seek(0)
s3 = boto3.client("s3")
s3.upload_fileobj(data, s3_bucket, s3_filename)
If anyone has information/knowledge about something other than BytesIO to handle the data (fx. so i can stream the data directly into s3, without having to buffer it in memory on the host-machine) it would be very much appreciated.
Google-resumable-media can be used to download file through chunks from GCS and smart_open to upload them to S3. This way you don't need to download whole file into memory. Also there is an similar question that addresses this issue Can you upload to S3 using a stream rather than a local file?