Writing, not uploading, to S3 with boto

Writing, not uploading, to S3 with boto - python

Is there a way to directly write to a file in S3 rather than uploading a completed file? I'm aware of the various set_contents_from_ methods, which work well when I want to upload a completed file. I'm looking for a way to write directly to an S3 key as data comes in.
I see in the documentation that there is mention of an open_write method for Key objects, but it is specifically called out as not implemented. I'd rather not go with something cheesy like the following:
def WriteToS3(file_name,data):
current_data = Key.get_contents_as_string(file_name)
new_data = current_data + data
Key.set_contents_from_string(new_data)
Any help is greatly appreciated.

Try using S3 multipart upload feature to archive this. You can upload each part sequentially, and complete the upload. See more details here: http://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html

Related

Writing to a CSV file in an S3 bucket using boto 3

I'm working on a project that needs to update a CSV file with user info periodically. The CSV is stored in an S3 bucket so I'm assuming I would use boto3 to do this. However, I'm not exactly sure how to go about this- would I need to download the CSV from S3 and then append to it, or is there a way to do it directly? Any code samples would be appreciated.

Ideally this would be something where DynamoDB would work pretty well (as long as you can create a hash key). Your solution would require the following.
Download the CSV
Append new values to the CSV Files
Upload the CSV.
A big issue here is the possibility (not sure how this is planned) that the CSV file is updated multiple times before being uploaded, which would lead to data loss.
Using something like DynamoDB, you could have a table, and just use the put_item api call to add new values as you see fit. Then, whenever you wish, you could write a python script to scan for all the values and then write a CSV file however you wish!

Increase read from s3 performance of lambda code

I am reading a large json file from s3 bucket. The lambda gets called a few hundred times in a second. When the concurrency is high, the lambdas start timing out.
Is there a more efficient way of writing the below code, where I do not have to download the file every time from S3 or reuse the content in memory across different instances of lambda :-)
The contents of the file change only once in a week!
I cannot split the file (due to the json structure) and it has to be read at once.
s3 = boto3.resource('s3')
s3_bucket_name = get_parameter('/mys3bucketkey/')
bucket = s3.Bucket(s3_bucket_name)
try:
bucket.download_file('myfile.json', '/tmp/' + 'myfile.json')
except:
print("File to be read is missing.")
with open(r'/tmp/' + 'myfile.json') as file:
data = json.load(file)

Probably, you don't reach the request rate limit https://docs.aws.amazon.com/AmazonS3/latest/dev/optimizing-performance.html but worth trying to copy the same S3 file with another prefix.
One of possible solution is to avoid querying S3 by putting the JSON file into the function code. Additionally, you may want to add it as a Lambda layer and load from /opt from your Lambda: https://docs.aws.amazon.com/lambda/latest/dg/configuration-layers.html In this case you can automate the function update when the s3 file is updated by adding another lambda that will be triggered by the S3 update and call https://docs.aws.amazon.com/lambda/latest/dg/API_UpdateFunctionCode.html
As a long-term solution, check Fargate https://aws.amazon.com/fargate/getting-started/ with which you can build a low latency container-based services and put the file into a container.

When the Lambda function executes, it could check for the existence of the file in /tmp/ since the container might be re-used.
If it is not there, the function can download it.
If the file is already there, then there is no need to download it. Just use it!
However, you'll have to figure out how to handle the weekly update. Perhaps a change of filename based on date? Or check the timestamp on the file to see whether a new one is needed?

Use AWS Lambda to create CSV and save to S3 (Python)

I've got some Python code that goes to a URL, parses an HTML table, and saves the result to a CSV. Changes to the table happen frequently, and I'd like a trending view of these changes. To accomplish this, I'd like my code to run as a function in Lambda, and save snapshots of the table to S3 every 12 hours.
I've created the Lambda, used CloudWatch to trigger the function based on time, and given it permissions to access the relevant S3 bucket, BUT I can't find any resources on how to save the output of the function to said bucket. Any pointers or alternate suggestions would be greatly appreciated. Thanks!
(Note: I have found a resource on here that describes this process using Node, which isn't out of the question, but I'd prefer to remain in Python if possible.)

import boto3
S3obj =boto3.resource('s3').Object(bucket, key)
Filecontents = S3obj.get()['Body'].read()
....
S3obj.put(Body=newfilecontents)
Sorry for any typos its hard to answer when using a phone to type

Upload image with an in-memory stream to input using Pillow + WebDriver?

I'm getting an Image from URL with Pillow, and creating an stream (BytesIO/StringIO).
r = requests.get("http://i.imgur.com/SH9lKxu.jpg")
stream = Image.open(BytesIO(r.content))
Since I want to upload this image using an <input type="file" /> with selenium WebDriver. I can do something like this to upload a file:
self.driver.find_element_by_xpath("//input[#type='file']").send_keys("PATH_TO_IMAGE")
I would like to know If its possible to upload that image from a stream without having to mess with files / file paths... I'm trying to avoid filesystem Read/Write. And do it in-memory or as much with temporary files. I'm also Wondering If that stream could be encoded to Base64, and then uploaded passing the string to the send_keys function you can see above :$
PS: Hope you like the image :P

You seem to be asking multiple questions here.
First, how do you convert a a JPEG without downloading it to a file? You're already doing that, so I don't know what you're asking here.
Next, "And do it in-memory or as much with temporary files." I don't know what this means, but you can do it with temporary files with the tempfile library in the stdlib, and you can do it in-memory too; both are easy.
Next, you want to know how to do a streaming upload with requests. The easy way to do that, as explained in Streaming Uploads, is to "simply provide a file-like object for your body". This can be a tempfile, but it can just as easily be a BytesIO. Since you're already using one in your question, I assume you know how to do this.
(As a side note, I'm not sure why you're using BytesIO(r.content) when requests already gives you a way to use a response object as a file-like object, and even to do it by streaming on demand instead of by waiting until the full content is available, but that isn't relevant here.)
If you want to upload it with selenium instead of requests… well then you do need a temporary file. The whole point of selenium is that it's scripting a web browser. You can't just type a bunch of bytes at your web browser in an upload form, you have to select a file on your filesystem. So selenium needs to fake you selecting a file on your filesystem. This is a perfect job for tempfile.NamedTemporaryFile.
Finally, "I'm also Wondering If that stream could be encoded to Base64".
Sure it can. Since you're just converting the image in-memory, you can just encode it with, e.g., base64.b64encode. Or, if you prefer, you can wrap your BytesIO in a codecs wrapper to base-64 it on the fly. But I'm not sure why you want to do that here.

how to add http header to a pdf file using python?

Hi is it possible to add a http header to a pdf file like Content-Disposition:attachment;filename=test.pdf in python, for now what i do is i send my pdf files to Amazons S3 by adding
--add-header=Content-Disposition:attachment;filename=test.pdf
but i would like to set this in python

Do you mean you want to upload files to S3 using Python and set some extra headers along the way?
If so, you could use simples3 library. Or some other similar library. I'll cover using simples3:
Follow the docs on how to instantiate a S3Bucket object. Once you have that, you can do:
bucket.put('path/to/your/pdf/object/on/s3',
pdf_object_bytes,
headers={'Content-Disposition': 'attachment;filename=test.pdf'})

Tried writing to a file or parsing everything as a string?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.