Python exifread - Parsing difficulties

Python exifread - Parsing difficulties - python

I'm trying to get the coordinates of a number of photos, i.e. I'm trying to get the exif data using a python script. The goal is to georeference all the photos and display their locations on a map. I am encountering problems with exif, however. I'm on Windows (64bit) and installed the corresponding (Strawberry) Perl software and then the Exiftool module (version 12.30) using Anaconda (Navigator), but to no avail. It gives me the following error: ModuleNotFoundError: No module named 'exif'. If I use the command pip install exif it tells me that the requirements are already met. What am I missing here? I'll gladly provide more information if required.
... I also tried an alternative: the module exifread works without import problems but does not seem to have all the necessary functionality (I can read the coordinates, but can't handle the extraction of the coordinates, it gives me a IfdTag-object when I would like an array of the degrees, minutes and seconds that I can then further process.)

There is a utility function exifread.utils.get_gps_coord() that provides convenient method to access coordinates as tuple in the format (latitude, longitude). Note negative value for latitude means South, negative value for longitude - West
example
import exifread
path = 'image.jpg'
with open(path, 'rb') as f:
tags = exifread.process_file(f, details=False)
coord = exifread.utils.get_gps_coords(tags)
print(coord)
For sake of completeness, there are also other modules to work with exif:
Pillow - there is functionality to work with exif
piexif
Also, as mentioned in the comments - you can use ExifTool (Perl software), via subprocess

Related

Extruding an STL from a binary png image with python

I have been working on this code for a project at work which will (hopefully) take in images from a scanning electron microscope and generate 3D STL files of the structures were imaging. I'm at the stage with the code where I'm trying to generate a 3D structure from a 'coloured in' binary image I've made with some edge detection code I wrote. I came across this post How can i extrude a stl with python that basically does exactly what I need (generating a meshed 3D structure from a binary image). I've tried using/adapting the code in the answer to that post (see below) but I keep running into the following error: polyline2 = mr.distanceMapTo2DIsoPolyline(dm.value(), isoValue=127) RuntimeError: Bad expected access. I cant find anything online about why this is happening and I'm no expert in Python so have no idea myself. If anyone has an idea, I'd really appreciate it!
Code from answer to above post:
import meshlib.mrmeshpy as mr
# load image as Distance Map object:
dm = mr.loadDistanceMapFromImage(mr.Path("your-image.png"), 0)
# find boundary contour of the letter:
polyline2 = mr.distanceMapTo2DIsoPolyline(dm.value(), isoValue=127)
# triangulate the contour
mesh = mr.triangulateContours(polyline2.contours2())
# extrude itself:
mr.addBaseToPlanarMesh(mesh, zOffset=30)
# export the result:
mr.saveMesh(mesh, mr.Path("output-mesh.stl"))
I have tried the following:
Reconfigured the MeshLib package that this command uses. Package docs here: https://meshinspector.github.io/MeshLib/html/index.html#PythonIntegration
Updating VS studio/python/MeshLib

In older version of meshlib python module RuntimeError: Bad expected access indicated that mr.loadDistanceMapFromImage had failed, you should had checked it like this:
import meshlib.mrmeshpy as mr
# load image as Distance Map object:
dm = mr.loadDistanceMapFromImage(mr.Path("your-image.png"), 0)
# check dm
if ( not dm.has_value() ):
raise Exception(dm.error())
# find boundary contour of the letter:
polyline2 = mr.distanceMapTo2DIsoPolyline(dm.value(), isoValue=127)
# triangulate the contour
mesh = mr.triangulateContours(polyline2.contours2())
# extrude itself:
mr.addBaseToPlanarMesh(mesh, zOffset=30)
# export the result:
mr.saveMesh(mesh, mr.Path("output-mesh.stl"))
But in actual release your code will rise exception with real error.
Please make sure that path is correct, if it doesn't help please provide more info like png file and version of python and version of MeshLib and anything else you find related.
P.S. If there is real problem with MeshLib better open issue in github.

Skull stripping with python/simpleITK

I'm trying to perform a skull stripping with simpleITK in python.
I'm using the StripTsImageFilter function as follows:
#upload data
# Path of nii img
path = r'C:\Users\Kate\Jupyter\DataThesis\PROGRESSION\0003\fet.nii.gz'
# Read the .nii image with SimpleITK:
img = sitk.ReadImage(path)
#read atlas and atlasmap
#Obtained from 3DSlicer documentation: https://www.slicer.org/wiki/Documentation/Nightly/Modules/SwissSkullStripper
atlas = sitk.ReadImage(r'C:\Users\Kate\Jupyter\thesis\atlasImage.mha')
atlasMask = sitk.ReadImage(r'C:\Users\Kate\Jupyter\thesis\atlasMask.mha')
#Skull stripping
#https://www.istb.unibe.ch/e43946/e43949/e158631/e187931/pane187932/e187939/files187941/article_eng.pdf
brainMask = sitk.StripTsImageFilter(img, atlas, atlasMask)
I get the error 'AttributeError: module 'SimpleITK' has no attribute 'StripTsImageFilter''
I have tried implementing img.StripTsImageFilter and tried using sitk.SkullStrip.StripTsFilter as well.
Does anyone know how to solve this?

StripTsImageFilter is a 'remote' module for ITK. So it is not wrapped by SimpleITK and is not even built by default in ITK.
To gain access to it in Python you're going to have to use ITK's Python wrapping, and you're going to have to build ITK-Python yourself, since it is not in the pre-build ITK-Python on PyPi.

As Dave pointed out, the StripTsImageFilter is included in SimpleITK (or 'raw' ITK) by default. You can install the remote module from PyPi like so:
pip install itk-skullstripping
More information is available here:
https://github.com/InsightSoftwareConsortium/ITKSkullStrip
Converting itk to sitk and back is explained here: https://discourse.itk.org/t/in-python-how-to-convert-between-simpleitk-and-itk-images/1922

Is there any problem with the OpenSlide.read_region function?

I am using the python API of openslide packages to read some ndpi file.When I use the read_region function, sometimes it return a odd image. What problems could have happend?
I have tried to read the full image, and it will be worked well. Therefore, I think there is no problem with the original file.
from openslide import OpenSlide
import cv2
import numpy as np
slide = OpenSlide('/Users/xiaoying/django/ndpi-rest-api/slide/read/21814102D-PAS - 2018-05-28 17.18.24.ndpi')
image = slide.read_region((1, 0),6, (780, 960))
image.save('image1.png')
The output is strange output

As the read_region documentation says, the x and y parameters are always in the coordinate space of level 0. For the behavior you want, you'll need to multiply those parameters by the downsample of the level you're reading.

This appears to be a version-realted bug, see also
https://github.com/openslide/openslide/issues/291#issuecomment-722935212
The problem seems to relate to libpixman verions 0.38.x . There is a Workaround section written by GunnarFarneback suggesting to load a different version first e.g.
export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libpixman-1.so.0.34.0
upadte easier solution is:
We are using Python 3.6.8+ and this did the trick for us: conda install pixman=0.36.0

Reading pre-processed cr2 RAW image data in python

I am trying to read raw image data from a cr2 (canon raw image file). I want to read the data only (no header, etc.) pre-processed if possible (i.e pre-bayer/the most native unprocessed data) and store it in a numpy array. I have tried a bunch of libraries such as opencv, rawkit, rawpy but nothing seems to work correctly.
Any suggestion on how I should do this? What I should use? I have tried a bunch of things.
Thank you

Since libraw/dcraw can read cr2, it should be easy to do. With rawpy:
#!/usr/bin/env python
import rawpy
raw = rawpy.imread("/some/path.cr2")
bayer = raw.raw_image # with border
bayer_visible = raw.raw_image_visible # just visible area
Both bayer and bayer_visible are then a 2D numpy array.

You can use rawkit to get this data, however, you won't be able to use the actual rawkit module (which provides higher level APIs for dealing with Raw images). Instead, you'll want to use mostly the libraw module which allows you to access the underlying LibRaw APIs.
It's hard to tell exactly what you want from this question, but I'm going to assume the following: Raw bayer data, including the "masked" border pixels (which aren't displayed, but are used to calculate various things about the image). Something like the following (completely untested) script will allow you to get what you want:
#!/usr/bin/env python
import ctypes
from rawkit.raw import Raw
with Raw(filename="some_file.CR2") as raw:
raw.unpack()
# For more information, see the LibRaw docs:
# http://www.libraw.org/docs/API-datastruct-eng.html#libraw_rawdata_t
rawdata = raw.data.contents.rawdata
data_size = rawdata.sizes.raw_height * rawdata.sizes.raw_width
data_pointer = ctypes.cast(
rawdata.raw_image,
ctypes.POINTER(ctypes.c_ushort * data_size)
)
data = data_pointer.contents
# Grab the first few pixels for demonstration purposes...
for i in range(5):
print('Pixel {}: {}'.format(i, data[i]))
There's a good chance that I'm misunderstanding something and the size is off, in which case this will segfault eventually, but this isn't something I've tried to make LibRaw do before.
More information can be found in this question on the LibRaw forums, or in the LibRaw struct docs.
Storing in a numpy array I leave as an excersize for the user, or for a follow up answer (I have no experience with numpy).

How to get the diff of two PDF files using Python?

I need to find the difference between two PDF files. Does anybody know of any Python-related tool which has a feature that directly gives the diff of the two PDFs?

What do you mean by "difference"? A difference in the text of the PDF or some layout change (e.g. an embedded graphic was resized). The first is easy to detect, the second is almost impossible to get (PDF is an VERY complicated file format, that offers endless file formatting capabilities).
If you want to get the text diff, just run a pdf to text utility on the two PDFs and then use Python's built-in diff library to get the difference of the converted texts.
This question deals with pdf to text conversion in python: Python module for converting PDF to text.
The reliability of this method depends on the PDF Generators you are using. If you use e.g. Adobe Acrobat and some Ghostscript-based PDF-Creator to make two PDFs from the SAME word document, you might still get a diff although the source document was identical.
This is because there are dozens of ways to encode the information of the source document to a PDF and each converter uses a different approach. Often the pdf to text converter can't figure out the correct text flow, especially with complex layouts or tables.

I do not know your use case, but for regression tests of script which generates pdf using reportlab, I do diff pdfs by
Converting each page to an image using ghostsript
Diffing each page against page image of standard pdf, using PIL
e.g
im1 = Image.open(imagePath1)
im2 = Image.open(imagePath2)
imDiff = ImageChops.difference(im1, im2)
This works in my case for flagging any changes introduced due to code changes.

Met the same question on my encrypted pdf unittest, neither pdfminer nor pyPdf works well for me.
Here are two commands (pdftocairo, pdftotext) work perfect on my test. (Ubuntu Install: apt-get install poppler-utils)
You can get pdf content by:
from subprocess import Popen, PIPE
def get_formatted_content(pdf_content):
cmd = 'pdftocairo -pdf - -' # you can replace "pdftocairo -pdf" with "pdftotext" if you want to get diff info
ps = Popen(cmd, shell=True, stdin=PIPE, stdout=PIPE, stderr=PIPE)
stdout, stderr = ps.communicate(input=pdf_content)
if ps.returncode != 0:
raise OSError(ps.returncode, cmd, stderr)
return stdout
Seems pdftocairo can redraw pdf files, pdftotext can extract all text.
And then you can compare two pdf files:
c1 = get_formatted_content(open('f1.pdf').read())
c2 = get_formatted_content(open('f2.pdf').read())
print(cmp(c1, c2)) # for binary compare
# import difflib
# print(list(difflib.unified_diff(c1, c2))) # for text compare

Even though this question is quite old, my guess is that I can contribute to the topic.
We have several applications generating tons of PDFs. One of these apps is written in Python and recently I wanted to write integration tests to check if the PDF generation was working correctly.
Testing PDF generation is HARD, because the specs for PDF files are very complicated and non-deterministic. Two PDFs, generated with the same exact input data, will generate different files, so direct file comparison is discarded.
The solution: we have to go with testing the way they look like (because THAT should be deterministic!).
In our case, the PDFs are being generated with the reportlab package, but this doesn't matter from the test perspective, we just need a filename or the PDF blob (bytes) from the generator. We also need an expectation file containing a "good" PDF to compare with the one coming from the generator.
The PDFs are converted to images and then compared. This can be done in multiple ways, but we decided to use ImageMagick, because it is extremely versatile and very mature, with bindings for almost every programming language out there. For Python 3, the bindings are offered by the Wand package.
The test looks something like the following. Specific details of our implementation were removed and the example was simplified:
import os
from unittest import TestCase
from wand.image import Image
from app.generators.pdf import PdfGenerator
DIR = os.path.dirname(__file__)
class PdfGeneratorTest(TestCase):
def test_generated_pdf_should_match_expectation(self):
# `pdf` is the blob of the generated PDF
# If using reportlab, this is what you get calling `getpdfdata()`
# on a Canvas instance, after all the drawing is complete
pdf = PdfGenerator().generate()
# PDFs are vectorial, so we need to set a resolution when
# converting to an image
actual_img = Image(blob=pdf, resolution=150)
filename = os.path.join(DIR, 'expected.pdf')
# Make sure to use the same resolution as above
with Image(filename=filename, resolution=150) as expected:
diff = actual.compare(expected, metric='root_mean_square')
self.assertLess(diff[1], 0.01)
The 0.01 is as low as we can tolerate small differences. Considering that diff[1] varies from 0 to 1 using the root_mean_square metric, we are here accepting a difference up to 1% on all channels, comparing with the sample expected file.

Check this out, it can be useful: pypdf

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python exifread - Parsing difficulties - python

Related

Extruding an STL from a binary png image with python

Skull stripping with python/simpleITK

Is there any problem with the OpenSlide.read_region function?

Reading pre-processed cr2 RAW image data in python

How to get the diff of two PDF files using Python?

Categories

Resources