For a personal project i would like to convert files of any types (pdf, png, mp3...) to bytes type and then reconvert the bytes file to the original type.
I made the first part, but i need help for the second part.
In the following example, I read a .jpg file as bytes and i save its content in the "content" object. Now i would like to reconvert "content" (bytes type) to the original .jpg type.
test_file = open("cadenas.jpg", "rb")
content = test_file.read()
content
b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x0 ...
Could you help me ?
Regards
Pictures uses Base64 encoding.
This should do the job.
import base64
test_file = open('cadenas.jpg', 'rb')
content = test_file.read()
content_encode = base64.encodestring(content)
content_decode = base64.decodebytes(content_encode)
result_file = open('cadenas2.jpg', 'wb')
result_file.write(content_decode)
Related
with open("image.jpg", 'rb') as original_file:
original = original_file.read()
# write part
with open("dubs.jpg", 'wb') as duplicate_file:
duplicate_file.write(original)
#directly if i pass original to write and create an image it'll work. But i want to save that byte information and able to recreate the same image using same byte data, even if im on different pc.
How about this? Does this work for your case?
import base64
file_name = '<image_file_path>'
# open the file as bytes
with open(file_name, 'rb') as f:
in_jpg_encoding = f.read()
# Encode the bytes object to base64
base64_encoding = base64.b64encode(in_jpg_encoding)
# Store the base64 encoding in text file
with open("base64.txt", "wb") as fw:
fw.write(base64_encoding)
# -------------------------------------------------
# Open the base64 encoding text file
with open("base64.txt", 'rb') as f:
result = f.read()
# Apply the base64 decoding
out_b64_decoding = base64.b64decode(result)
# Store the file as .jpg
with open("recreate.jpg", "wb") as fw:
fw.write(out_b64_decoding)
You can use the following to convert any type of image files to bytes :
import base64
with open("example.png", "rb") as imageFile:
bytes = base64.b64encode(imageFile.read())
print(bytes)
then you can convert the "bytes" variable resulted from the above code by using the following :
from PIL import Image
from io import BytesIO
image = Image.open(BytesIO(base64.b64decode(bytes)))
image.save('output\\path\\example.png', 'PNG')
I am retriving data form an API that returns a JSON object with the following structure:
{
"status":"OK",
"text":{
"doc_id":647508,
"bill_id":502329,
"date":"2012-05-23",
"type":"Enrolled",
"mime":"application/rtf",
"doc":"MIME 64 Encoded Document”
}
}
where the encoded document is a PDF file. Here is an example of the PDFs I am working with: https://legiscan.com/WA/text/HB1531/id/1473804/Washington-2017-HB1531-Introduced.pdf. I am trying to read the encoded file into a string object. So far I have been able to do so by converting the response into bytes and then reading the pdf :
import PyPDF2
import base64
with open("sample.pdf", "wb") as f:
inp_str = response.json()['text']['doc'].encode('utf-8')
f.write(base64.b64decode(inp_str))
with open('sample.pdf', "rb") as f:
pdf = PyPDF2.PdfFileReader(f)
It feels that this is not a very efficient way to process multiple documents. I have tried following a related question (Is it possible to input pdf bytes straight into PyPDF2 instead of making a PDF file first):
read_pdf = PyPDF2.PdfFileReader(io.BytesIO(response.json()['text']['doc'].encode()))
but I always get the error PdfReadError: Could not read malformed PDF file
Is there any way to do this?
I have a jpg image that is in memory and therefore is in bytes. I then compress it with the following:
zlib.compress(pickle.dumps(self.fileInstance))
I compress it to store it into a MongoDB database. This image is one of the elements in the record that I save. I save the with the following.
list_search_result = list(search_result)
json_search_result = json.dumps(list_search_result, cls=Encoder, indent=2)
I got an error with bytes and Json, so I simply convert the bytes image to a string with the following.
from bson.objectid import ObjectId
import json
import datetime
class Encoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, ObjectId):
return str(obj)
elif isinstance(obj, datetime.datetime):
return str(obj)
elif isinstance(obj,bytes):
return str(obj)
return super(Encoder, self).default(obj)
Now, I want to recover the image from the json file. I thought that doing the same steps in the opposite order would work but no.
So here is what I do to store:
image -> pickle -> compress -> str -> json
I thought this would work:
json -> bytes -> decompress -> depickle -> image
I receive the zlib.error : Error -3 during the following:
image = pickle.load(zlib.decompress(attachment[1].encode()))
image = io.BytesIO(image)
dt = Image.open(image)
EDIT:
Okay so I was toying around, and I think the issue might be the .encode(). I start with b" ". After str(b" "), I get "b' '". If I do the .encode(), I get b"b' '". How do I deal with this?
str() is useful to display something - it creates text readable for human.
It may show that you had bytes ("b''") or it may show string like \xe0 for values which can't be converted to chars. But it doesn't have to create text useful for keeping in database.
Many databases have field to keep bytes and then you could keep image as bytes (without converting to pickle which may only add more bytes, and without compressing because images already use some compression)
If you have to keep (or send by internet) file as string then better convert it to base64. And this method used by some APIs to send image in JSON.
Convert image to base64
import base64
fh = open('lenna.png', 'rb')
data = fh.read()
fh.close()
data = base64.b64encode(data).decode()
print(text[:100]) # text
# ... save in database ...
Convert base64 to image
# ... read from database ...
data = base64.b64decode(text.encode())
print(data[:100]) # bytes
fh = open('lenna.png', 'wb')
fh.write(data)
fh.close()
Result:
# text
iVBORw0KGgoAAAANSUhEUgAAAgAAAAIACAIAAAB7GkOtAAAAA3NCSVQICAjb4U/gAAAgAElEQVR4nOzbXa5tS5Il5DHMzH3OtfY+
# bytes
b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x02\x00\x00\x00\x02\x00\x08\x02\x00\x00\x00{\x1aC\xad\x00\x00\x00\x03sBIT\x08\x08\x08\xdb\xe1O\xe0\x00\x00 \x00IDATx\x9c\xec\xdb]\xaemK\x92%\xe41\xcc\xcc}\xce\xb5\xf6>\xe7\xde\x88\xc8\xa2\x1e\xea\x85\x16 !$\x10\x88?\x05/t\x06\x95\xc8\xe2\x95\x06\xd0'
Tested on image lenna.png (Wikipedia: Lenna)
I have a flask get request like this
content = {"vname":"","myPhoto":{"fieldname":"myPhoto","originalname":"flower-purple-lical-blosso.jpg","encoding":"7bit","mimetype":"image/jpeg","buffer":{"type":"Buffer","data":[255,216,255,224,0,...]}
My image file is with the key data.
data = content['myphoto']['buffer']['data']
I am unable to save it as a jpeg file.
I am not sure how to decode this object as an image as its currently a list.
If I correctly understood your question, it could be done like this:
#!/usr/bin/env python3
content = {"vname":"","myPhoto":{"fieldname":"myPhoto","originalname":"flower-purple-lical-blosso.jpg","encoding":"7bit","mimetype":"image/jpeg","buffer":{"type":"Buffer","data":[255,216,255,224,0,20]}}}
data = content['myPhoto']['buffer']['data']
str_data = ''.join(chr(d) for d in data) # build string using list comprehension
bytes_data = str_data.encode() # build bytes array from string
with open('output.jpg', 'wb') as f: # open file for writing bytes
f.write(bytes_data) # write bytes array to file
Code of course is not perfect and could be used as starting point.
I am using PyPDF2 to generate a PDF, and I would like to upload this PDF to Cloudinary, which accepts images as IO objects.
The example from their docs: cloudinary.uploader.upload(open('/tmp/image1.jpg', 'rb'))
In my application, I instantiate a PdfFileWriter and add pages:
output = PyPDF2.PdfFileWriter()
output.addPage(page)
Then I can save the generated PDF locally:
outputStream = file(destination_file_name, "wb")
output.write(outputStream)
outputStream.close()
But obviously I'm trying to avoid this. Instead I'm trying to send an IO object to cloudinary:
image_StringIO_object = StringIO.StringIO()
output.write(image_StringIO_object)
cloudinary.uploader.upload(image_StringIO_object,
api_key=CLOUDINARY_API_KEY,
api_secret=CLOUDINARY_API_SECRET,
cloud_name=CLOUDINARY_CLOUD_NAME,
format="PDF")
This returns the error:
Empty file
If instead I try to pass the value of the StringIO object:
cloudinary.uploader.upload(image_StringIO_object.getvalue(),
...)
I get the error:
file() argument 1 must be encoded string without null bytes, not str
Got the answer from Cloudinary support:
The result from getvalue() on the StringIO object needs to be base64 encoded and prepended with a tag:
out = StringIO.StringIO()
output.write(out)
cloudinary.uploader.upload("data:image/pdf;base64," +
base64.b64encode(out.getvalue()))