Adding EXIF data to Images - python

I want to add modify exif data in an image by modifying the ImageDescription tag. I use the library from the following:
https://github.com/bennoleslie/pexif
I am able to modify the ImageDescription tag and read the modified tag after writing it to another image file. But now when i upload the image to imgur and instagram and download the image again, and read the exif data, the modified ImageDescription tag is not there anymore in the exif data. To read the exif data, i use tools like exiftool and identify -v, but none them display the modified ImageDescription. I also used the above pxeif library to read the tag name ImageDescription and its not there. Any suggestions for why this is happening?
Below is the code i used using the pexif library and the image is a .jpg:
img = pexif.JpegFile.fromFile(path_to_images + image)
image_id = image.split('.')[0]
img.exif.primary.ImageDescription = image_id
img.writeFile(path_to_encoded_images + image_id + "_encoded.jpg")

This likely isn't the fault of your code or the library. Many image hosting services deliberately strip exif data. So, even if the data is well formed, don't count on it surviving imgur or instagram (or others for that matter).
See http://imgur.userecho.com/topic/42809-where-is-the-exif-data/ and http://www.embeddedmetadata.org/social-media-test-results.php
As to why they do this, there are probably various answers. Probably one big one is that less savvy users won't unwittingly post their gps coordinates all over the place via geo tagging.

Related

Python - Convert JPG to text file

Good morning all,
I've made a Python script that adds text on top of images, based on a preset template. I'm now developing a template editor that will let the user edit the template in GUI, then save the template as a config file. The idea is that one user can create a template, export it, send it to a new user on a separate computer, who can import it into their config file. The second user will retain full edit abilities on the template (if any changes needs to be made).
Now, in addition to the text, I also want the ability to add up to two images (company logos, ect.) to the template/stills. Now, my question: Is there a way to convert a JPG to pure text data, that can be saved to a config file, and that can be reinterpreted to a JPG at the receiving system. And if not, what would be the best way to achieve this? What I'm hoping to avoid is the user having to send the image files separately.
Sounds questionable that you want to ship an image as text file (it's easy, base64 is supplied with python, but it drastically increases the amount of bytes. I'd strongly recommend not doing that).
I'd rather take the text and embed it in the image metadata! That way, you would still have a valid image file, but if loaded with your application, that application could read the metadata, interpret it as text config.
There's EXIF and XMP metadata, for both there's python modules.
Alternatively, would make more sense to simply put images and config files into one archive file (you know .docx word documents? They do exactly that, just like .odt; java jar files? Same. Android APK files? All archive files with multiple files inside) python brings a zip module to enable you to do that easily.
Instead of an archive, you could also build a PDF file. That way, you could simply have the images embedded in the PDF, the text editable on top of it, any browser can display it, and the text stays editable. Operating on pdf files can be done in many ways, but I like Fitz from the PyMuPDF package. Just make a document the size of your image, add the image file, put the text on top. On the reader side, find the image and text elements. It's relatively ok to do!
PDF is a very flexible format, if you need more config that just text information, you can add arbitrary text streams to the document that are not displayed.
If I understand properly, you want to use the config file as a settings file that stores the preferences of a user, you could store such data as JSON/XML/YAML or similar, such files are used to store data in pure readable text than binary can be parsed into a Python dict object. As for storing the images, you can have the generated images uploaded to a server then use their URL when they are needed to re-download them, unless if I didn’t understand the question?

Python web crawling/scraping - Download diagram(PDF or TIFF) from Webpage and save to Local machine

I have one website which has search button and i need to give some numeric value and give enter button. It will go to another page and it display some content in which there are some URL, if i click that URL, it will ask to save diagram and the diagram is either tiff format or PDF.
To download Tiff format diagram, i am using swift plugin in internet explore and save to my machine
Here i am doing this work manually, just i want to do automate this whole process.
Steps:
Using python request module and pass the URL with numeric value to post method
save response content to variable
perform pattern matching and fetch url
click the url but i am stuck with this part to save the diagram local since it is tiff.
is there any module to download tiff based diagram and save to local machine?
Just I want to share How i resolved the issue for the above question and it might be useful for others.
Since tiff image needs to be downloaded from web, so I used python request module with pillow module as below,
from PIL import image
import requests
tiffURL='https://***.tif'
img=Image.open(requests.get(tiffURL,stream=True).raw)
img.save('imagename.jpg')
#img.save('imagename.jpg',quality=95)
Note:
tiff image can not be viewed by normal editor , so i converted to jpg
if you want high resoultion, you can pass quality=95 to save method

How to extract images from PDF or Word, together with the text around images?

I found there are some library for extracting images from PDF or word, like docx2txt and pdfimages. But how can I get the content around the images (like there may be a title below the image)? Or get a page number of each image?
Some other tools like PyPDF2 and minecart can extract image page by page. However, I cannot run those code successfully.
Is there a good way to get some information of the images? (from the image got from docx2txt or pdfimages, or another way to extract image with info)
I found the code of doc2txt and it's simply parse the xml of docx file. So it's actually an very easy task..
Ref: doc2txt
docx2python pulls the images into a folder and leaves -----image1.png---- markers in the extracted text. This might get you close to where you'd like to go.
Few month ago, I reprogramed docx2python to reproducing a structured(with level) xml format file from a docx file, which works out pretty good on many files.
As far as I know, a paragraph contains several Runs and each Run contain one only text, sometimes contains images. You can read this document for details. https://learn.microsoft.com/en-us/dotnet/api/documentformat.openxml.wordprocessing.paragraph?view=openxml-2.8.1 .
docx2python support extracting image with text around it. You use docx2python reading paragraphes, while ----media/imagen---- shows in your text, which is a image placeholder. Then you can reach this image if you set extract_image=True. Well, you get what your image called in pagaraph text and list of image files. Match as you like.

Uncompress and save zlib data in PDF with python

We get PDF files delivered to us daily and we need to get the images out. For example, what I want to do is to get the image back out of this PDF file I have, with python. Most pdf files we get are multipage and we want to export each embedded image to separate files. Most have jpeg files in them, but his one does not.
Object 5 is embedded as a zlib compressed stream. I am pretty sure it is zlib compressed because it is marked as FlateDecode and the start of the stream is \x78\x9c which is typical for zlib. You can see (part of) the hex dump here
The question is, how do I 'deflate' it and save the resulting file.
Thank you for sharing your wisdom.
I searched everywhere and tried many things but couldn't get to work. I managed to decompress the data like this:
import zlib
with open("MDL1703140088.pdf", "rb") as f:
pdf = f.read()
image = zlib.decompress(pdf[640:69307])
640 is zlib header(b'x\x9c') position and 69307 is the position of something like footer of pdf spec. b'\nendstream\n' is there. Detail is in the spec and some helpful Q&A can be found here. But omitting the end position is allowed in this case because decompress() seems to ignore following non-compressed data. You can validate this by:
decomp = zlib.decompressobj()
image = decomp.decompress(pdf[640:])
print(decomp.unused_data) # starts from b'\nendstream\n
So far so good. But when I write image to a PNG file, it cannot be read by any image viewer. Actually decompressed data looks so quite empty here and there. I attached some PNG header, but no luck. Hey, it's too much...
As I said earlier (strangely my comment was removed by someone), you'd better use some other existing tools. If Acrobat is not your option, what about pdftopng (part of Xpdf)? pdftopng MDL1703140088.pdf . gave me a valid PNG file flawlessly. Obviously command-line tools can be executed in Python, as you may know.

Get EXIF data without downloading whole image - Python

Is is possible to get the EXIF information of an image remotely and with only downloading the EXIF data?
From what I can understand about EXIF bytes in image files, the EXIF data is in the first few bytes of an image.
So the question is how to download only the first few bytes of a remote file, with Python? (Edit: Relying on HTTP Range Header is not good enough, as not all remote hosts support it, in which case full download will occur.)
Can I cancel the download after x bytes of progress, for example?
You can tell the web server to only send you parts of a file by setting the HTTP range header. See This answer for an example using urllib to partially download a file. So you could download a chunk of e.g. 1000 bytes, check if the exif data is contained in the chunk, and download more if you can't find the exif app1 header or the exif data is incomplete.
This depends on the image format heavily. For example, if you have a TIFF file, there is no knowing a priori where the EXIF data, if any, is within the file. It could be right after the header and before the first IFD, but this is unlikely. It could be way after the image data. Chances are it's somewhere in the middle.
If you want the EXIF information, extract that on the server (cache, maybe) and ship that down packaged up nicely instead of demanding client code do that.

Categories

Resources