Python - Decoding an image from json - python

Working with an api and the requests library.
Background info: creating a program to interop with an sms messaging platform. All responses from the api are provided in json. When working with the endpoint to obtain sms attachments, the response is in json and is basically a large block of encoded text.
I'm able to send the proper request to obtain the response I want, but I don't know what to do with the response.
How can I decode the json and work with the image file?
The complete response is too long to post here, but I imagine anything important is included in the beginning... I'm just unsure what to do with it.
b'{"id":1067442,"friendlyName":"attachment0","md5":"4dCsb2PEljAqYY1JBl3FXA==","contentType":"image/jpeg","size":1017241,"createdDate":1582649116340,"updatedDate":1582649116340,"thumbnailBase64":"/9j/4AAQSkZJRgABAgAAAQABAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRofHh0aHBwgJC4nICIsIxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL/2wBDAQkJCQwLDBgNDRgyIRwhMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjL/wAARCACWAJYDASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSExBhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwDu1AwB29B/9al4BPr7f5/Gk7Dn86Xt/niuY1HZ5HApcZ6ikXpxxTvccUhABwD1/rUkZ2nHUUwc07GRkUrgTBfTp7U9QB0pEyyg96cxSNGkdwqqCxZjwB6+3elcLmfq19aaRCt/dSMqAGMqD97jI49eP1rz2bUdc8YGR7eaLS9EVtpup22ofYZxvP6VieMfFP8AbGrMu9RaRnbBEcksvqR2z/8AWrkpdRS7vIobi7mlSFdsUKnIX2FdEY6ak8x6zp/hDwhHEJbrUotQY9ZJbsBc+wUgfzrQi0XwUp+SHSH+ro38zXlkdzEBuYiOIDqzYAqvcahaO4W3thcTngNInC/nT5Q5j2iLQfD8ozb6fprD1jhQ4/IVTvfBGkz5aGFYj6xqB+nSvJrU3EUgeGcxSA53wHaQfwrttL+J/wBhtI7fWrZ5pFYoZ4mA3Y9Qcc/jScRqRHqHgu7sQ09oInRPmYL8mQPUdDVeLXWh037HJtLbt0cj9AOhUiux/wCEx0XUtFu5rS4/eLEcROMNk8DHY8ntXO+B9PXUb2+nlQNBHGIfmGQWY5P8h+dS1pqProUrfVBuCGOEjHy+W+P0NU9KeeDVL2a6sLiO2kHlrKUyD................

First parse the json string:
import json
response = b'{"id":1067442,"friendlyName":"attachment0","md5":"4dCsb2PEljAqYY1JBl3FXA==","contentType":"image/jpeg","size":1017241,"createdDate":1582649116340,"updatedDate":1582649116340,"thumbnailBase64":"/9j/4AAQSkZJRgABAgAAAQABAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRofHh0aHBwgJC4nICIsIxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL/2wBDAQkJCQwLDBgNDRgyIRwhMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjL/wAARCACWAJYDASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSExBhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwDu1AwB29B/9al4BPr7f5/Gk7Dn86Xt/niuY1HZ5HApcZ6ikXpxxTvccUhABwD1/rUkZ2nHUUwc07GRkUrgTBfTp7U9QB0pEyyg96cxSNGkdwqqCxZjwB6+3elcLmfq19aaRCt/dSMqAGMqD97jI49eP1rz2bUdc8YGR7eaLS9EVtpup22ofYZxvP6VieMfFP8AbGrMu9RaRnbBEcksvqR2z/8AWrkpdRS7vIobi7mlSFdsUKnIX2FdEY6ak8x6zp/hDwhHEJbrUotQY9ZJbsBc+wUgfzrQi0XwUp+SHSH+ro38zXlkdzEBuYiOIDqzYAqvcahaO4W3thcTngNInC/nT5Q5j2iLQfD8ozb6fprD1jhQ4/IVTvfBGkz5aGFYj6xqB+nSvJrU3EUgeGcxSA53wHaQfwrttL+J/wBhtI7fWrZ5pFYoZ4mA3Y9Qcc/jScRqRHqHgu7sQ09oInRPmYL8mQPUdDVeLXWh037HJtLbt0cj9AOhUiux/wCEx0XUtFu5rS4/eLEcROMNk8DHY8ntXO+B9PXUb2+nlQNBHGIfmGQWY5P8h+dS1pqProUrfVBuCGOEjHy+W+P0NU9KeeDVL2a6sLiO2kHlrKUyD................"}'
response_parsed = json.loads(response)
response_parsed['contentType'] being image/jpeg hints to us that the format of the image is jpeg.
You didn't provide us with the actual image data, just the beginning of the thumbnail of the image which is stored in thumbnailBase64.
The thumbnail is encoded in base64. We can decode the base64 string into bytes using base64.b64decode:
from base64 import b64decode
thumbnail_bytes = b64decode(response_parsed['thumbnailBase64'])
Now that we have the thumbnail bytes we can save it to a file and view it like any regular image file:
with open(r'thumbnail.jpg', 'wb') as x:
x.write(thumbnail_bytes)
The actual image is probably in a JSON field called "image" or "imageData" or something similar. You did not include it in your question so there's no way to be sure.
This results in:
Obviously it's corrupt because you only included the beginning of the thumbnail.

Something like that could do the trick
import json, base64
resp = b'{"id":1067442,"friendlyName":"attachment0","md5":"4dCsb2PEljAqYY1JBl3FXA==","contentType":"image/jpeg","size":1017241,"createdDate":1582649116340,"updatedDate":1582649116340,"thumbnailBase64":"/9j/4AAQSkZJRgABAgAAAQABAAD/2wBDAAgG...'
resp_dict = json.loads(resp)
with open("img.jpeg", "wb") as fp:
content = base64.b64decode(resp_dict['thumbnailBase64'])
fp.write(content)
In addition, keep in mind that base64 value can be used directly to print an image into a HTML page
<div>
<p>Taken from wikpedia</p>
<img src="data:image/jpeg;base64, iVBORw0KGgoAAAANSUhEUgAAAAUA
AAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO
9TXL0Y4OHwAAAABJRU5ErkJggg==" alt="Red dot" />
</div>
In all case, you will need to check the content-type to correctly interpret the data you received. This is not always jpeg, but may be png, gif, tiff, etc.

Related

Save the API response (Json format ) as image file locally to system

Below is the api response im getting :::
{"contentType":"image/jpeg","createdTime":"2021-10-10T11:00:47.000Z","fileName":"Passport_Chris J Passport Color - pp.jpg","id":10144,"size":105499,"updatedTime":"2021-10-10T11:00:47.000Z","links":[{"rel":"self","href":"https://dafzprod.custhelp.com/services/rest/connect/v1.4/CompanyRegd.ManagerDetails/43/FileAttachments/10144?download="},{"rel":"canonical","href":"https://dafzprod.custhelp.com/services/rest/connect/v1.4/CompanyRegd.ManagerDetails/43/FileAttachments/10144"},{"rel":"describedby","href":"https://dafzprod.custhelp.com/services/rest/connect/v1.4/metadata-catalog/CompanyRegd.ManagerDetails/FileAttachments","mediaType":"application/schema+json"}]}
i need to save this file as jpg format locally to my system? could you please provide me a solution through python
You might have to decode the JSON-string (if not already done):
import json
json_decoded = json.loads(json_string)
Afterwards you can get the URL to retrieve and the filename from this JSON-structure
url = json_decoded['links'][0]['href']
local_filename = json_decoded['fileName']
Now you can download the file and save it (as seen here How to save an image locally using Python whose URL address I already know?):
import urllib.request
urllib.request.urlretrieve(url, local_filename)

Pyrebase sending profile pic in json

I am creating user accounts using prebase auth.create_user_with_email_and_password. Then I am storing the users' data in firebase realtime database. db.child("users").push("data") where data= {"name": name, "email" : email, "password": password, "picture": picture}.
here picture is; picture = request.files['file']
But I am not able to send picture alongwith other data of user, image can not be sent in json object.
How can we upload picture, there are some solutions to upload image data but I want to send it alngwith other data so it may get stored with the other attributes of user data.
I would suggest two approches you could use:
Solution 1:
You could store your images in file system in any directory (i.e img_dir) and renaming it to ensure the name is unique. (i usually use a timestamp prefix) (myimage_20210101.jpg).Now you can store this name in a DataBase. Then, while generating the JSON, you pull this filename ,generating a complete URL (http://myurl.com/img_dir/myimage_20210101.jpg) and then you can insert it into the JSON.
Solution 2:
Encode the image with Base 64 Encoding and decode it while fetching.
Remember to convert to string first by calling .decode(), since you can't JSON-serialize a bytes without knowing its encoding.
That's because base64.b64encode returns bytes, not strings.
import base64
encoded_= base64.b64encode(img_file.read()).decode('utf-8')
How to save an encoded64 image?
my_picture= '............'
You'll need to decode the picture from base64 first, and then save it to a file:
import base64
# Separate the metadata from the image data
head, data = my_picture.split(',', 1)
# Decode the image data
plain_image = base64.b64decode(data)
# Write the image to a file
with open('image.jpg', 'wb') as f:
f.write(plain_image)

decoding base64 encoded image in Flask

My Flask app is receives base64 encoded images like so
`b'"{\\"image\\": \\"/9j/4AAQSkZJRgABAQEASABIAAD/4gxYSUNDX1BST0ZJTEUAAQEAAAxITGlubwIQAABtbnRyU...opHNf//Z\\", \\"id\\": \\"e3ad9809-b84c-57f1-bd03-a54e25c59bcc\\"}"'
I have tried a number of ways but I cannot seem to get the image bytes for decoding. If I json.loads this it becomes a string. If I treat it as a bytes dictionary it doesn't take the key and asks it to be an integer which will not work of course. I have tried many things, this is just one other which doesn't work. Any help is appreciated, Thanks.
def main():
try:
p = randint(100, 200)
image = request.data
jsonResponse = json.loads(image.decode('utf-8'))
im = base64.decode(jsonResponse)
print(im)
with open(f'Image{p}.jpg', 'wb') as f:
f.write(im)
If you use request context (from flask import request), then you can access the sended data as a python dictionary using the request.json attribute instead of the request.data.
To get the dictionary as you wish from the example you provided, a double call to json.loads solved the problem:
import json
data = b'"{\\"image\\": \\"/9j/4AAQSkZJRgABAQEASABIAAD/4gxYSUNDX1BST0ZJTEUAAQEAAAxITGlubwIQAABtbnRyU...opHNf//Z\\", \\"id\\": \\"e3ad9809-b84c-57f1-bd03-a54e25c59bcc\\"}"'
data_as_dictionary = json.loads(json.loads(data))
It seems like because your data has the leading " and the trailing ", json.loads is parsing it as a string (which is a valid json) and that's why you don't get the dictionary you want in the first call to json.loads.
import json
import base64
from codecs import encode
request = b'"{\\"image\\": \\"/9j/4AAQSkZJRgABAQAAAQABAAD/2wCEAAoHCBYVFRgVEhUZGBgYGBoYGBkYGBgYGhgYGBgZGhgYGBgcIS4lHB4rIRgYJjgmKy8xNTU1GiQ7QDs0Py40NTEBDAwMEA8QGhISGDQhISM0MTQ0NDQ0NDQ0NDE2MTE0NDQ0NDQ0NDQ0NDE0NDQ0NDE0NDE0NDQ0NjQ0NDQxNDQxNP/AABEIAKIBNwMBIgACEQEDEQH/xAAcAAAABwEBAAAAAAAAAAAAAAAAAQIDBAUGBwj/xABCEAABAwEFBAcEBwcDBQAAAAABAAIRAwQSITFBBVFhcQYiMoGRsfATQqHBFFJygpLR4QcjM2KisvE0c7MVJENjwv/EABkBAAMBAQEAAAAAAAAAAAAAAAABAgMEBf/EACMRAQEBAQABBQACAwEAAAAAAAABAhExAxIhQVEEYTJxsRP/2gAMAwEAAhEDEQA/ANABinWpEYp1oQDbk28YhOFE4YhAOBLaEQCUEyFUEtMqsBk3TlorKsYaVWtgqaqDuFuWSepVAUGbim30YMhHkLGmAQVb0hDRCo7JVkxrK0VNuXJTo4epPkJ0BRXsumQpNJ0hOCjeMFEo1LpjRTHnBMMpghFEPvdIUfau0WWai+tUMNY2c4Ljo0cScEGOLTdK5n+0ba7q9oFmYepSPWA96oRiTyBgcylbw5O1WG1PtdZ1pr74Yz3WDQAcN+pUp7VJsVnApNwyTdoC59XtdeZJOIUqxsYOG5V78FPsNTLFK+GmVvTi9CvLJTCoab+tjuV1YKu9TPKtW8TmsT7Qm2CfXJTWWfCVrIw1rnllelmyL7PbUxFSn1hGbmjNvGMx371R2K0iowPGuY3EZhdCqM0K51a6H0a2PpgQyr1mbg7EwB3HwC6/4nqXOvbfF/65f5OJvPunmf8AExEjQhep154IIQhCZAEoBFCUEjKCy9v/AIhWpAWYtjf3hU6qbO/CBUspc5PixkxwTntw1Iq2wiI1WdvTkzInBsCESIGQCgk0bNgTgCSwJxcLqMQg4YhLhE4ZIIoJRRBGUAmsJYVUNaWlW9TsqK1gKVpwhr5S7+hTb6RGISWGcClw02x0RfaVpqYxVFs1nWEq+pJU4dc2QmILTwUkFJc2RikYOdIRURgozyW4aKRQyQCLc8NY559xpd4CVxClUL6jqju09xcebiSuv9LLRcslYzEsLRG9/V+a4/ZmwcN+Cja8NJY3QyCmq6fszCGhMWkwsnTEOq2FPsLRLRvx9fFV73ypuzyS5g3HPnglfCpWks9Ge8KVZwJiMs8EmzVWhjATi4uA+6JKVYCHmRjilFdXdkoyJ3qws5IwPrkVHotujEwpF6Fvn4cm703am6rG9ObJeptqN7bDIPeCPiFr67lnekLuoR63o7y9iszs5VBQqB7WuHvAHxTiYsTYYBukd0mE+vaze5l/XlanLZ+AgjQVJBKCSlNQCll7a/rkLUrKWv8AiOU1Ou8+ER4Epu1e7zCmsoAmSnXUm6qNUs4v2XT7IQS9MEalq2jAlkYJLAluyXC6yAieMkoBB4xQQgiclAJLwgG7S+Gpmi6UNpyGYKDY6shKnFqQmKlHUJTKkp4JGkbIJLsdFdUnwcVW7KZiSrG5Km+VTwlSjhRqNTQqUiAzWZITVJ8GE/VyQFIEICg6du/7J+ObmD+sLm+y6N54nTFX/SbbDrSatnpu6jHwRdGNx0Xg7PMEd6qNiUiHun3RE8ystalb4xc86uK9QMbOpyWft9dwxMAc1Z7UqGRdEnQeSzlt2O9wL6rr5cOq2YDDP9SjMaU03bADsSCN8hWNHbI9zx/wqClsxjGm9MndECOOqFhsjr40E4j58FesxObXQNjbQFSrSaAQymxwx1JxcfgPBMdHtqXC4vOB9QrnobshrgXvAIiI4AYrGdJ6brNWexohsy29lByxWXtv0190nxXVbFtilUaOu2dxwxhWFmqB2IOC89u2jVY7DygfBbropt9xeKVUuY8xdnIzuWnbPLP2zXZHSqpWQ6XVy0d3+AtJZqxcCHZhY7pw+XsaNBijyXOEWQyxp3tB8U8k0mQ0DcAEte5iczJ+PI3e6tEgjQVJAJTUSUEAZWWtY67jxWqOSyVq/iO5pX5LV58o7qzsgjcSYkoiMcE57M4KLJETWtJk4BBCMAgpbN0wJTkYag5cDrJAROzSgiOaYAIOCNmSWQglTt2sWU5CqrJXDus3PUK/2pQD2XSFkbRQfRdI7KOBo6cOEjNPUX6FU9jtd4S3A6hWlnqB/NI2h2UOqSrCkFB2W2GKwpZKFkVac5I6L9DmnE3Vp6jNAKqnAJ+looDXyQNynsdggONWYhnWces98kZkgyB5yp2yxIe7UvI7hgPJV1nd13Fwg4xw3x3SpuyXgX2/zuP9S5uO3VLtAglQq75zxUy0vkmFDfT3qsnxBdZ7xygb1Ns1iazEDv3lMmoG92m8q0cYBvbkWjjYdEiW04OsxyKHSbo8LSwFkCoyYnJw+q75HTvTPQyuHNiRIn4LSPqBsTqYniRgnnx8s92zXw5dYNnOY8e0oNJa6JutMEFa62dHWWpoc9rmVGjq1MnNOYN0RPeprqbXVL2Tp60e9G8eCubKZRM98q3rklk5VZYaL2tb7TtgAOIEAkarI7Wsrq1pcWtJAfBw0YBgMN4HxW+tLsVSWmu6lZ6tSnAN95BiesX3Rn3KpZiy+eI5d/HjvwoXNIMEQRmCiT1oeXBjnYucxpcd5xEnjACaXsenr3Zl/Y8n1MezWs/l4JGggtEDCUEkJQQBnJZO0j947mtYcispa6gD3Ab1NK8+ymFozRvtLQqiu4zmlvyap4Pf+Rcl0hEksyHJBS0dGqUSE09qv61mBCrbTZSF57r4gNCSnrqbcM0yEwYJ5jJTbBgptCnkihDtjIAVXabM1wIzBWstFlDmqitNkLOSJRYxdtsbqLr7Oz5Kbs+1h+LcHeatqzAeqQqC17NfTN+nN0GSBonSjf7OrdQA5qypFZDY9v8AagQYcB4rQWO1Y3XYFZtFnCNJa6UpBI1RkOwTtCrvSanaRvpahBuc7bsLqdZzBhdMgkZtcSQfAx3FU1IkEnST56ldI6Tln0dz3sBe2AwnMScYPKVzIZYbzhzOJy46rG55XRNdkSG1JMTPgjtDlGs9UT6z4JNoq6DEpLmjRmZGhnvCh1hVvlwc597C5p4HVPvr6epTuzWOc+YJABOXwT4Pd3wsdh1rTTe40mXnjAsc4NExgCZx5Baelb7W6zl1qYGvDwWkC7gJmBJMZCVTdHGOBe8tJ608IyWvpW1rhdcLwiOHJZql55iHsq1l5F7Pfj61Wne8Abis9ZLHceS3LTlx4q1rukGdPWSrNshepzVnAq2jAu3ZcToPW9VttpF1OpZ703XU4PN7HGTzJSKlSX02DIODnQM7pmY3YBPbRa2kx5aSXVdTu1Prerxm71Mz7T756c91+vmKW0OBd1eyIa3k0QD3xPem0aNe3nMzJJ9PH1q61dXzSUYQQVJGEoJISggDOR5LH2r+I7mtgcjyWSrs67jxU28KzvwiGylxlSBZMuCM2prUZtJwjVRbafMyJNyAiQB3oJLdrhNVKYKfCJwXnuxV17JhIVTUZErU3cFWWqy4EoJUNarGg3AJh9nIhTKDckBLTb6YcMQnXIkGzm0dmkOvNCaszARdcM1pXAFRHWIXw4I6XGT23sF9IitZsNXNGvEJ7Zu0hWABMPC2FSmDgsnt7YBDva0MHDEtGv6qeKXdjtfuuzVkx0rG7L2iKguP6r278MVe2O1kG67PzR0LI9pSGqNSMmVKATgZbp4+KLW/WLj+Fv6rnjQIPqPXyWy6abTove2k17S5l9rhOTiQC2ci4XcVjHYEDLLPfr+Xesdf5Vtn/GI7MDjEkmI9YJu3GIxj1jKU94DhMZ4BRtoPkxh8wlDNCo8dll7jon7ALQ937tpkbnNGu4nHGFKsr2gYJ+laC1wLcI4Ita55F1StFsZ1hRc7CHAXHB2/AGUunt9gj2jXU3/VIInkCpuwtquc64+64bwBB9Y+C0lpstOqwse1hBG4YEZHmFnzq9an4rrBa78EEHQEHeDmpNeocAXS6BllpoPWaz2xKJp1KjccHYboxy4a9yuXA48sOJjhronKyJNrNOoCADDCM8MYxHGJUS1Whz3XncgNANyK0Ol53CAOIAz+J8E2vW/jelnOZrnzXnev6t1q5+oJGggupzCRwggmBhGESUEgDsjyWQtLjfdzWvdkeSyFbtu5lJOu/SrtHaKl0zg1PssgJkp76OPBRaJi8+TgQRkQgpau2BAoBGVwOwQCaqNwUhNPCAafRBhMtpQVMhEQgI7wialVM0iEABmjATT7Qxn8R7G/ae1vmUdC0Mf2Htf9hzXeRQC3DEJTmSiGacUmym3tgXj7Sl1XjHnzULZ+0L/Uq9V7cMVotsbcstnJ+kV6bCB2S6X9zGy4+Cw21+lmz6suZUe17RLT7J4v8Bh5wjgbiwVogOz81jOl/wC0B9Or7OxOZDDD3loffeM2tnC6MpzJ1EY5LbHTmq+n7KlLGkQ5/wD5HDdI7I5Y8Vj/AGxTkC3/AOqF5iqZJ984yT9ffzzGm5WbbfeAa7tt1ntAjtTrh458ss4yEulaDF12QyOreI3jeEtZlVNcaWqRMg/FQa78cT5HLio9kt0dWprk4fDmOKl1YPeAd8mf8qOcV3qTZnTqryy7NvtwdA/xOHes1SeW+GnNXuy9q3BA7lOp+NM39aPZ2zbgvhxMYEREQrU2ohoLBI4g7sfJUWytqA4fWOpxJOXdxU2rWu4MwbHViYwiG9+IlZ8XdI1kquvk3ocSRvnLCdMoVjarSWw3MkwBOc5/DFVHtS0l93DSMSToN84+fFTqNNxN+p2jkPqjdOp4rf8Aj+jd6/r7Yer6sxn+y2NgAZxrv4o4S0lexI80SJKRJgSCNBCQSwkhKCQB+R5LJ1Wddx4rWvyPJY6pN93Mpa8AzUtd3AJL65w4qPaGkuKce3sqUdt6tZwCCLQIJNnbQjKAQK892FJt6cSHZoA0lLKxPTzpWbM32FBw9s5sudn7JhyP2jpuGO6QFdLOmNOykspgVK2rZ6jPtuGv8ox3wuZbV6U2quT7Su8NPuMNxkbrrYvD7RJVTVqEySZJJJJzJOZJ1UZ70AouEzAnHGBOPFKp1rpvNwcAAHDAjkRiFGe/5ph1X5INvNl9NLXTge2L2zdAqBr4AA989c56u0SrV0ltdWb1ofdPusu0xy6jQSOZKwlG2XT96fFXFhtQIbjo75lLgiTdPjiY3nem6+y2PGV07x8xkVYU2hwB5evNOsYkbD22yvpuuP5tOjm7wot5bnatgFRhES5vWZ9oaTuOXes3tLY5Y1tWn1qbgCDqzDFr90HCU5S4rGPRv3jNJLSjVAtlSRw1H1eI3t4f5TzLS4YHEaHnqN4UUNIMj5J0ZZYZkfV4t4cP0KVglTadryj13KSyuTv+Sqm0C4wBJOUa8loNn7AqlsvMDEgTLiBmYGcYSM9cQp1JF5tOWG1FpwdG+RMb4GpV3YbXUqdSm0vOGIBDQOJzG5L2L0dYXBz3B4OW7DeFt7NZmMDWU2hrcJjALHWp9NJKrbJs0UmBz4dUfiTo0bm/nzTxUm2ul0bhCjFer/Gz7cT+/l5/r692r/XwSUECgt2RKCCCACCCUgCCUEQSggCfkeSxz+27mVs6nZPJY2q4XnRvKnXhP2UymMyl3W8FW1KzpIlB7j1cVMyPfPqLYlBNzgEaS3bwgUAguB2FJDs0tN6oCB0g2uyy0HVn4xg1urnnst8czoAVwC22p9R7n1HXnvcXPdvcc+Q3DQADRbX9pu3xVrCgx0sok3o96qRB/COrzLlz6o/13KRwHOUd/wAkbqnyTT6vkfNUZDh5hINInQ6p42k/EfokNtBHh80AybM+MtAdO5BrXsMwRiRwUynaJ8W/NS6b5j7RPwCAlbH2jMA5wZ/EStCGSJCzVmswMEYdV0x9orQbPgNu8B5KTPsCiV7O9pL6Dg1xxex2LKmnWGjv5h3ypTuHr180CUBnPodnruugGzVtabgCxx/k/TwSH9FqwyLCBreI+BCta1lp13mlUF0t6zTk4t+sxw0BwO7UK1bSDWhoc5wAjrEuOG86p22D/bGv2I9hAqOaAcJEux3Ywptbo+AwupvLnjFuAAMZgDfH5arQ2yz327yPiPzUKzVC0wfH18Uu0cjLWOpdMjLMgZt/mbOm8foVr9m2uQATuOBzjJ7CciPzBVBtyxXHiozBjzOHuv1HI5+KOwWiIABgnFozafrsnTePHQos78nLxvdnV2B4vwJxvAYGPe3iNRm3ktW1jWNL3EQAXFxiIGMyFxy2bTe54pgG7O8sLnDJ4JyA3EY6zktDX6V1a1nfZ6gAq3gC+LjnNZiQ5mQcbumBxwEgKP8Az+etJv6a15kknXFIKgbCtoq0xvaACOGkcNPBWBC9bFlzLHmblmrKQiSikqyBJSkUIA0EUI0AAlhJCWEAVTsnksaacvdzPmtnU7J5LGF3WdzPmo14T9oFambxgJdSmergpzSEq+0Jdo9kHoEaUXIklu2o0AhquB2FFUvSfan0azVaw7TW3WfbeQ1ncCZ5Aq6K5n+1Xag/d2dp/wDa/wCLGD+8/hStEcxr1JJvEknMnEknMnjMqK+PXNSnmUy9vruShozmDFIcweakPb67k2Tw3KgaucNybcPJSPaYeJ8cE4WtdhGjR3nPyQCaNln8TfmpNOllwcfCEdEQcPr/AAbdP5p+lkPvHzCDHZ3kRP1XeF4qxs1WMQcmtPeIUBow+5PiRHmn6YInk0d8KQvncOPw9FER69esE1RcS08CSO7MeCWXiPXMfBBmLZZg8ZlrmmWPHaY7ePyyKcsFrLiWVAG1G4kDJ40ezgd2h7ktjvXrh5Jm12W/Dmm69hljs4OoO9pyI/JBLCFEtdmnrDv/ADSrDa77TeF17TD2zN07wdWnMFS2oNUhgex1OpkRG+DmCPNUj9n1GSxrXXsLzxIADphjXaggEk65b1oLXRumR64KfYKgeLp3gyeF4R/UU+iT5YuzWN94sIlxDozJ6oPZnvHPdmnzk0OmQcO1MxjIIgESDhrgtXadngVg9pghgBHA1HmfxBviqbbbSHsN0CWvvEakVCPGCPgiVXE/oxVcx5nI5+Ja6fC991bEhclt20HMe32T4LLxJaZBc+C5p0c0QMDqSuj9G9pfSbOx5EOEseP5m6jmIPeuz0Oycrl9blvYsCEUJwhIIXS5yUEaCAJCEEaAASwkhLAUgmqOqeSxdR/WcBvK2tXsnksUHC87mfNLXgqiiSTinQMklrCSU5cMhLsZc18pc4IIpwQSbu3hAIILgdhS4n+0f/XVeTP+JqCCnRxkBmif68EEEQzVT14KM/NBBBGxl935p9mf3x5lEgqCTY8x9/yKmUtPsu/uKCCkz7cj9hvk1StXfaCCCDTbB2u8/wBoTVDId3mgggHaeY9ap/Tu+QQQQSEP9TT40nzxg6q1YjQTohq19nuUWx5+tyCCRrK29tv+0P8AnYqbpDmzv/8AhBBE8rrEN0XSv2f/AOmd/uv/ALWIILux5cO/DTFJKCC2ZCRIIKgCCCCAUEYQQUiBV7J5LCntO5lBBTrwL5SaacGaCCmKByCCCYf/2Q==\\", \\"id\\": \\"e3ad9809-b84c-57f1-bd03-a54e25c59bcc\\"}"'
data_as_dictionary = json.loads(json.loads(request.decode('utf-8')))
base64_img = data_as_dictionary['image']
bytes_img = encode(base64_img, 'utf-8')
binary_img = base64.decodebytes(bytes_img)
with open("imageToSave.jpg", "wb") as fh:
fh.write(binary_img)
you need several steps. first of all, you should transform request data into a python dictionary then take the image as base64 then encode it in bytes array then as binary, and last save it.
The above code works well.

Post pdf to api (python) returns response with wrong encoding

I am trying to use an ocr API with python to convert pdf to text. The API i'm using is : https://www.convertapi.com/pdf-to-txt . When i upload the file through the website it works perfectly but the API call has the following issue:
Python code:
import requests
url ='https://v2.convertapi.com/convert/pdf/to/txt?Secret=mykey'
files = {'file': open('C:\<some_url>\filename.pdf', 'rb')}
r = requests.post(url, files=files)
The API call works fine, but it when i try to access the response through
r.text
it returns giberish: (Notice the FileData section)
'{"ConversionCost":4,"Files":[{"FileName":"stateoftheartKWextraction.txt","FileExt":"txt","FileSize":60179,"FileData":"QXV0b21hdGljIEtleXBocmFzZSBFeHRyYWN0aW9uOiBBIFN1cnZleSBvZiB0aGUgU3RhdGUgb2YgdGhlIEFydA0KDQpLYXppIFNhaWR1bCBIYXNhbiAgYW5kICBWaW5jZW50IE5nDQpIdW1hbiBMYW5ndWFnZSBUZWNobm9sb2d5IFJlc2VhcmNoIEluc3RpdHV0ZSBVbml2ZXJzaXR5IG9mIFRleGFzIGF0IERhbGxhcyBSaWNoYXJkc29uLCBUWCA3NTA4My0wNjg4DQp7c2FpZHVsLHZpbmNlfUBobHQudXRkYWxsYXMuZW...
Even if i use json load to convert it into a dict, it still prints the text in giberish.
I've tried to upload the file as not binary but that doesn't work(it throws an exception).
I've tried many pdf files and they all were in english.
Thank you.
The text is decoded, so you need to decode it. Let's take the first file as an example.
import base64
r = r.json()
text = r['Files'][0]['FileData']
print(base64.b64decode(text))
By the way, they seem to have a Python library as well, you might want to check that out: https://github.com/ConvertAPI/convertapi-python

Image scraped as HTML page with urlretrieve

I'm trying to scrape this image using urllib.urlretrieve.
>>> import urllib
>>> urllib.urlretrieve('http://i9.mangareader.net/one-piece/3/one-piece-1668214.jpg',
path) # path was previously defined
This code successfully saves the file in the given path. However, when I try to open the file, I get:
Could not load image 'imagename.jpg':
Error interpreting JPEG image file (Not a JPEG file: starts with 0x3c 0x21)
When I do file imagename.jpg in my bash terminal, I get imagefile.jpg: HTML document, ASCII text.
So how do I scrape this image as a JPEG file?
It's because the owner of the server hosting that image is deliberately blocking access from Python's urllib. That's why it's working with requests. You can also do it with pure Python, but you'll have to give it an HTTP User-Agent header that makes it look like something other than urllib. For example:
import urllib2
req = urllib2.Request('http://i9.mangareader.net/one-piece/3/one-piece-1668214.jpg')
req.add_header('User-Agent', 'Feneric Was Here')
resp = urllib2.urlopen(req)
imgdata = resp.read()
with open(path, 'wb') as outfile:
outfile.write(imgdata)
So it's a little more involved to get around, but still not too bad.
Note that the site owner probably did this because some people had gotten abusive. Please don't be one of them! With great power comes great responsibility, and all that.

Categories

Resources