I have an API that saves an the image to S3 bucket and returns the S3 URL but the saving part of the PIL image is slow. Here is a snippet of code:
from PIL import Image
import io
import boto3
BUCKET = ''
s3 = boto3.resource('s3')
def convert_fn(args):
pil_image = Image.open(args['path']).convert('RGBA')
.
.
.
in_mem_file = io.BytesIO()
pil_image.save(in_mem_file, format='PNG') #<--- This takes too long
in_mem_file.seek(0)
s3.meta.client.upload_fileobj(
in_mem_file,
BUCKET,
'outputs/{}.png'.format(args['save_name']),
ExtraArgs={
'ACL': 'public-read',
'ContentType':'image/png'
}
)
return json.dumps({"Image saved in": "https://{}.s3.amazonaws.com/outputs/{}.png".format(BUCKET, args['save_name'])})
How can I speed up the upload?, Would it be easier to return the bytes?
The Image.save method is the most time consuming part of my script. I want to increase the performance of my app and I'm thinking that returning as a stream of bytes may be the fastest way to return the image.
Compressing image data to PNG takes time - CPU time. There might be a better performant lib to that than PIL, but you'd have to interface it with Python, and it still would take sometime.
"Returning bytes" make no sense - you either want to have image files saved on S3 or don't. And the "bytes" will only represent an image as long as they are properly encoded into an image file, unless you have code to compose back an image from raw bytes.
For speeding this up, you could either create an AWS lambda project that will take the unencoded array, generate the png file and save it to S3 in an async mode, or, easier, you might try saving the image in an uncompressed format, that will spare you from the CPU time to compress PNG: try saving it as a .tga or .bmp file instead of a .png, but expect final files to be 10 to 30 times larger than the equivalent .PNGs.
Also, it is not clear from the code if this is in a web-api view, and you'd like to speedup the API return, and it would be ok if the image would be generated and uploaded in background after the API returns.
In that case, there are ways to improve the responsivity of your app, but we need to have the "web code": i.e. which framework you are using, the view function itself, and the calling to the function presented here.
In PIL.Image.save when saving PNG there is an argument called compression_level with a compression_level=0 we can create faster savings at the cost of no compression. Docs
Related
It seems that every question related to this one is asking about restricting uploads by file size, etc.
I am creating an online demo of an application using Bottle (similar to Flask). Part of the program requires a user to download a very specific image and then upload it to see the result of a crop operation done on that image. The image is very particular, so I am trying to save the user time in using the demo, but also prevent shifty image files from being uploaded to the server.
My question is, is it possible to restrict the upload to only that particular image. Is there a way to know that an image uploaded is not that exact image?
For stub testing, I just did the following:
myimg = request.files.get("myimg")
fname = myimg.filename
if fname == "bob.jpeg" or fname == "sally.jpeg":
do what the app needs to do
else:
return "Only the demo images are allowed"
Obviously, restricting it by filename is silly because anyone can just rename a new file.
I was thinking perhaps that an image could be turned into some kind of special array, and then I can compare the two arrays, or is there a more elegant solution?
Any help/ideas are appreciated.
To confirm whether the uploaded file is indeed the one you expect, simply examine its bytes and compare them to the known, good, file. You can either checksum the incoming bytes yourself (using, say, hashlib), or use filecmp.cmp (which I think is simpler, assuming you have a copy of the acceptable image on disk where the server can read it).
For example:
uploaded_img = request.files.get("myimg")
if not filecmp.cmp(uploaded_img.filename, "/path/to/known/good/image", shallow=False):
raise ValueError("An unexpected file was uploaded")
I am wanting to compress an image I am receiving from a user before I send it off to an S3 bucket to be stored. Unfortunately for me, I haven't had to deal with images a lot. The image comes over as a FileStorage object and this is where my issue starts. I'm trying to use the PIL library to compress the image using the optimize option when you save your file using PIL. The only issue is I don't have a proper path to save this file. I tried to fake it out to no avail. Is there a proper way to generate a temporary directory/path to save the image just so I can compress it before I send it to the S3 bucket?
Here is the current simple code
file = request.files['filepond']
# file = Image.open(file)
# file = file.save('TestDirectory', file.format, optimize=True, quality=30)
print(file.filename)
print(file.content_type)
Thanks all!
I have to process very large images (size > 2 GB) stored in aws s3.
Before processing I actually want to display some of them.
Download time is infeasible, is it possible to display them without downloading using only Python?
You could give a URL to the user to open in a web browser. This does involve downloading the image, but it would be done outside of Python.
If you want to present them with a "thumbnail", then you would need a method of converting the image. This could be done with an AWS Lambda function that:
Loads the image into memory (it's too big for the default disk space)
Resizes the image to a smaller size
Stores it in Amazon S3
Provides a URL to the smaller image
This is similar to Tutorial: Using AWS Lambda with Amazon S3 but it would need a tweak to manipulate the image in memory instead of downloading the image to the Lambda function's disk storage (that is limited to 512MB).
I'm getting an Image from URL with Pillow, and creating an stream (BytesIO/StringIO).
r = requests.get("http://i.imgur.com/SH9lKxu.jpg")
stream = Image.open(BytesIO(r.content))
Since I want to upload this image using an <input type="file" /> with selenium WebDriver. I can do something like this to upload a file:
self.driver.find_element_by_xpath("//input[#type='file']").send_keys("PATH_TO_IMAGE")
I would like to know If its possible to upload that image from a stream without having to mess with files / file paths... I'm trying to avoid filesystem Read/Write. And do it in-memory or as much with temporary files. I'm also Wondering If that stream could be encoded to Base64, and then uploaded passing the string to the send_keys function you can see above :$
PS: Hope you like the image :P
You seem to be asking multiple questions here.
First, how do you convert a a JPEG without downloading it to a file? You're already doing that, so I don't know what you're asking here.
Next, "And do it in-memory or as much with temporary files." I don't know what this means, but you can do it with temporary files with the tempfile library in the stdlib, and you can do it in-memory too; both are easy.
Next, you want to know how to do a streaming upload with requests. The easy way to do that, as explained in Streaming Uploads, is to "simply provide a file-like object for your body". This can be a tempfile, but it can just as easily be a BytesIO. Since you're already using one in your question, I assume you know how to do this.
(As a side note, I'm not sure why you're using BytesIO(r.content) when requests already gives you a way to use a response object as a file-like object, and even to do it by streaming on demand instead of by waiting until the full content is available, but that isn't relevant here.)
If you want to upload it with selenium instead of requests… well then you do need a temporary file. The whole point of selenium is that it's scripting a web browser. You can't just type a bunch of bytes at your web browser in an upload form, you have to select a file on your filesystem. So selenium needs to fake you selecting a file on your filesystem. This is a perfect job for tempfile.NamedTemporaryFile.
Finally, "I'm also Wondering If that stream could be encoded to Base64".
Sure it can. Since you're just converting the image in-memory, you can just encode it with, e.g., base64.b64encode. Or, if you prefer, you can wrap your BytesIO in a codecs wrapper to base-64 it on the fly. But I'm not sure why you want to do that here.
Each time user accesses http://www.example.com/some-random-symbols-1x1.png, I should return transparent image 1x1px. I've created according file, but how should I read it in my code to return to the user? I know how to display the image from the datastore or blobstore. But have no idea how to read and return binary file.
It can not be static file due to the following reasons:
url will contain some-random-symbols;
once url is accessed, prior to displaying images, I would like to log that somebody accessed the file.
A 1x1 transparent PNG (or GIF) is small enough that you can hard-code the base64 representation directly and emit it directly via self.response.write() (after decoding).
Reading from disk every time is relatively expensive. If you want to go that route, lazily initialize a global variable.
In a more general case, I'd use the blobstore and the BlobstoreDownloadHandler, but for a tiny gif that will definitely fit into memory, something like this to read the file's content:
with open('path/to/file.gif') as f:
img_content = f.read()
I'd put this outside of my handler, so it was done once per instance. If you're using 2.5, then you'll need to import 'with' from future, or open and close the file yourself.
then in your handler, assuming webapp2:
self.response.content_type = 'image/gif'
self.response.write(img_content)