Using Pillow and img2pdf to convert images to pdf - python

I have a task that requires me to get data from an image upload (jpg or png), resize it based on the requirement, and then transform it into pdf and then store in s3.
The file comes in as ByteIO
I have Pillow available so I can resize the image with it
Now the file type is class 'PIL.Image.Image' and I don't know how to proceed.
I found img2pdf library on https://gitlab.mister-muffin.de/josch/img2pdf, but I don't know how to use it when I have PIL format (use tobytes()?)
The s3 upload also looks like a file-like object, so I don't want to save it into a temp file before loading it again. Do I even need img2pdf in this case then?
How do I achieve this goal?
EDIT: I tried using tobytes() and upload to s3 directly. Upload was successful. However, when downloading to see the content, it shows an empty page. It seems like the file data is not written into the pdf file
EDIT 2: Actually went on the s3 and check the file stored. When I download it and open it, it shows cannot be opened
EDIT 3: I don't really have working code as I'm still experimenting what could work, but here's a gist
data = request.FILES['file'].file # where the data is
im = Image.open(data)
(width, height) = (im.width // 2, im.height // 2) # example action I wanna take with Pillow
data = im_resized.tobytes()
# potential step for using img2pdf here but I don't know how
# img2pdf.convert(data) # this fails because "ImageOpenError: cannot read input image (not jpeg2000). PIL: error reading image: cannot identify image file <_io.BytesIO..."
# img2pdf.convert(im_resized) # this also fails because "TypeError: Neither implements read() nor is str or bytes"
upload_to_s3(data) # some function that utilizes boto3 to upload to s3

The problem is that u use Image.Image object instead of JPEG or something like it
Try this:
bytes_io = io.BytesIO()
image.save(bytes_io, 'PNG')
with open(output_pdf, "wb") as f:
f.write(img2pdf.convert(bytes_io.getvalue()))

Related

unable to read a forensic .E01 image with pytsk3.FS_Info(image)

I am Python beginner, and I want to read an .E01 image file with pytsk3.
pytsk3.FS_Info(image) creates this error:
OSError: FS_Info_Con: (tsk3.cpp:214) Unable to open the image as a filesystem at offset: 0x00000000 with error: Possible encryption detected (High entropy (8.00))
I used FTK imager to create the image, therefore it cannot be encrypted. I was able to open it with autopsy, so the image itself is fine (not corrupt).
import pytsk3
image_path = "C:/Users/Karin/Desktop/images/USB2.E01"
image = pytsk3.Img_Info(image_path) --> works correctly
fs = pytsk3.FS_Info(image) --> **does not work**
root = fs.open_dir(path="/")
for entry in root:
print(entry.info.name.name)
USB2.E01
E01 file is "forensically" created file. It's not a bit-to-bit copy of the source drive. It has additional information such as metadata and hashes of data chunks inside of it. Also this data could be compressed or not.
So, usually the basic programming method of reading simple files would not work.
I would recommend you to read more about the .E01 structure and check this library https://github.com/libyal/libewf

Convert mp3 song image from png to jpg

I have a set of many songs, some of which have png images in metadata, and I need to convert these to jpg.
I know how to convert png images to jpg in general, but I am currently accessing metadata using eyed3, which returns ImageFrame objects, and I don't know how to manipulate these. I can, for instance, access the image type with
print(img.mime_type)
which returns
image/png
but I don't know how to progress from here. Very naively I tried loading the image with OpenCV, but it is either not a compatible format or I didn't do it properly. And anyway I wouldn't know how to update the old image with the new one either!
Note: While I am currently working with eyed3, it is perfectly fine if I can solve this any other way.
I was finally able to solve this, although in a not very elegant way.
The first step is to load the image. For some reason I could not make this work with eyed3, but TinyTag does the job:
from PIL import Image
from tinytag import TinyTag
tag = TinyTag.get(mp3_path, image=True)
image_data = tag.get_image()
img_bites = io.BytesIO(image_data)
photo = Image.open(im)
Then I manipulate it. For example we may resize it and save it as jpg. Because we are using Pillow (PIL) for these operations, we actually need to save the image and finally load it back to get the binary data (this detail is probably what should be improved in the process).
photo = photo.resize((500, 500)) # suppose we want 500 x 500 pixels
rgb_photo = photo.convert("RGB")
rgb_photo.save(temp_file_path, format="JPEG")
The last step is thus to load the image and set it as metadata. You have more details about this step in this answer.:
audio_file = eyed3.load(mp3_path) # this has been loaded before
audio_file.tag.images.set(
3, open(temp_file_path, "rb").read(), "image/jpeg"
)
audio_file.tag.save()

Error when trying to convert base64 string to image in Python

I am trying create a Python dashboard in which a user uploads an image and then the image is analyzed. The uploaded image is received as a base64 string and it needs to be converted to an image. I have tried
decoded = BytesIO(base64.b64decode(base64_string))
image = Image.open(decoded)
but I received this error:
cannot identify image file <_io.BytesIO object at 0x00000268954E9888>
Image needs a file-like object, the sort of thing returned by an open. The easiest way to do this is using a with statement:
decoded = base64.b64decode(base64)
with BytesIO(decoded) as fh:
image = Image.open(fh)
# do stuff with image here: when the with block ends
# it's very likely the image will no longer be usable
See if that works better for you. :)

PDF to IMG to PDF all done in memory

In order to remove sensitive content from a PDF, I am converting it to image and back to PDF again.
I am able to do this while saving the jpeg image, however I would eventually like to adapt my code so that the file is in memory the whole time. PDF in memory -> JPEG in memory -> PDF in memory. I'm having trouble with the intermediary step.
from pdf2image import convert_from_path, convert_from_bytes
import img2pdf
images = convert_from_path('testing.pdf', fmt='jpeg')
image = images[0]
# opening from filename
with open("output/output.pdf","wb") as f:
f.write(img2pdf.convert(image.tobytes()))
On the last line, I am getting the error:
ImageOpenError: cannot read input image (not jpeg2000). PIL: error reading image: cannot identify image file <_io.BytesIO object at 0x1040cc8f0>
I'm not sure how to be converting this image to the string that img2pdf is looking for.
The pdf2image module will extract the images as Pillow images. And according the Pillow tobytes() documention: "This method returns the raw image data from the internal storage." Which is some bitmap representation.
To get your code working use BytesIO module like so:
# opening from filename
import io
with open("output/output.pdf","wb") as f, io.BytesIO() as output:
image.save(output, format='jpg')
f.write(img2pdf.convert(output.getvalue()))

How to send embedded image created using PIL/pillow as email (Python 3)

I am creating image that I would like to embed in the e-mail. I cannot figure out how to create image as binary and pass into MIMEImage. Below is the code I have and I have error when I try to read image object - the error is "AttributeError: 'NoneType' object has no attribute 'read'".
image=Image.new("RGBA",(300,400),(255,255,255))
image_base=ImageDraw.Draw(image)
emailed_password_pic=image_base.text((150,200),emailed_password,(0,0,0))
imgObj=emailed_password_pic.read()
msg=MIMEMultipart()
html="""<p>Please finish registration <br/><img src="cid:image.jpg"></p>"""
img_file='image.jpg'
msgText = MIMEText(html,'html')
msgImg=MIMEImage(imgObj)
msgImg.add_header('Content-ID',img_file)
msg.attach(msgImg)
msg.attach(msgText)
If you look at line 4 - I am trying to read image so that I can pass it into MIMEImage. Apparently, image needs to be read as binary. However, I don't know how to convert it to binary so that .read() can process it.
FOLLOW-UP
I edited code per suggestions from jsbueno - thank you very much!!!:
emailed_password=os.urandom(16)
image=Image.new("RGBA",(300,400),(255,255,255))
image_base=ImageDraw.Draw(image)
emailed_password_pic=image_base.text((150,200),emailed_password,(0,0,0))
stream_bytes=BytesIO()
image.save(stream_bytes,format='png')
stream_bytes.seek(0)
#in_memory_file=stream_bytes.getvalue()
#imgObj=in_memory_file.read()
imgObj=stream_bytes.read()
msg=MIMEMultipart()
sender='xxx#abc.com'
receiver='jjjj#gmail.com'
subject_header='Please use code provided in this e-mail to confirm your subscription.'
msg["To"]=receiver
msg["From"]=sender
msg["Subject"]=subject_header
html="""<p>Please finish registration by loging into your account and typing in code from this e-mail.<br/><img src="cid:image.png"></p>"""
img_file='image.png'
msgText=MIMEText(html,'html')
msgImg=MIMEImage(imgObj) #Is mistake here?
msgImg.add_header('Content-ID',img_file)
msg.attach(msgImg)
msg.attach(msgText)
smtpObj=smtplib.SMTP('smtp.mandrillapp.com', 587)
smtpObj.login(userName,userPassword)
smtpObj.sendmail(sender,receiver,msg.as_string())
I am not getting errors now but e-mail does not have image in it. I am confused about the way image gets attached and related to in html/email part. Any help is appreciated!
UPDATE:
This code actually works - I just had minor typo in the code on my PC.
There are a couple of conceptual errors there, both in using PIL and on what format an image should be in order to be incorporated into an e-mail.
In PIL: the ImageDraw class operates inplace, not like the Image class calls, which usually return a new image after each operation. In your code, it means that the call to image_base.text is actually changing the pixel data of the object that lies in your image variable. This call actually returns None and the code above should raise an error like "AttributeError: None object does not have attribute 'read'" on the following line.
Past that (that is, you should fetch the data from your image variable to attach it to the e-mail) comes the second issue: PIL, for obvious reasons, have images in an uncompressed, raw pixel data format in memory. When attaching images in e-mails we usually want images neatly packaged inside a file - PNG or JPG formats are usually better depending on the intent - let's just stay with .PNG. So, you have to create the file data using PIL, and them attach the file data (i.e. the data comprising a PNG file, including headers, metadata, and the actual pixel data in a compressed form). Otherwise you'd be putting in your e-mail a bunch of (uncompressed) pixel data that the receiving party would have no way to assemble back into an image (even if he would treat the data as pixels, raw pixel data does not contain the image shape so-)
You have two options: either generate the file-bytes in memory, or write them to an actual file in disk, and re-read that file for attaching. The second form is easier to follow. The first is both more efficient and "the right thing to do" - so let's keep it:
from io import BytesIO
# In Python 2.x:
# from StringIO import StringIO.StringIO as BytesIO
image=Image.new("RGBA",(300,400),(255,255,255))
image_base=ImageDraw.Draw(image)
# this actually modifies "image"
emailed_password_pic=image_base.text((150,200),emailed_password,(0,0,0))
stream = BytesIO()
image.save(stream, format="png")
stream.seek(0)
imgObj=stream.read()
...
(NB: I have not checked the part dealing with mail and mime proper in your code - if you are using it correctly, it should work now)

Categories

Resources