Reading pre-processed cr2 RAW image data in python

Reading pre-processed cr2 RAW image data in python - python

I am trying to read raw image data from a cr2 (canon raw image file). I want to read the data only (no header, etc.) pre-processed if possible (i.e pre-bayer/the most native unprocessed data) and store it in a numpy array. I have tried a bunch of libraries such as opencv, rawkit, rawpy but nothing seems to work correctly.
Any suggestion on how I should do this? What I should use? I have tried a bunch of things.
Thank you

Since libraw/dcraw can read cr2, it should be easy to do. With rawpy:
#!/usr/bin/env python
import rawpy
raw = rawpy.imread("/some/path.cr2")
bayer = raw.raw_image # with border
bayer_visible = raw.raw_image_visible # just visible area
Both bayer and bayer_visible are then a 2D numpy array.

You can use rawkit to get this data, however, you won't be able to use the actual rawkit module (which provides higher level APIs for dealing with Raw images). Instead, you'll want to use mostly the libraw module which allows you to access the underlying LibRaw APIs.
It's hard to tell exactly what you want from this question, but I'm going to assume the following: Raw bayer data, including the "masked" border pixels (which aren't displayed, but are used to calculate various things about the image). Something like the following (completely untested) script will allow you to get what you want:
#!/usr/bin/env python
import ctypes
from rawkit.raw import Raw
with Raw(filename="some_file.CR2") as raw:
raw.unpack()
# For more information, see the LibRaw docs:
# http://www.libraw.org/docs/API-datastruct-eng.html#libraw_rawdata_t
rawdata = raw.data.contents.rawdata
data_size = rawdata.sizes.raw_height * rawdata.sizes.raw_width
data_pointer = ctypes.cast(
rawdata.raw_image,
ctypes.POINTER(ctypes.c_ushort * data_size)
)
data = data_pointer.contents
# Grab the first few pixels for demonstration purposes...
for i in range(5):
print('Pixel {}: {}'.format(i, data[i]))
There's a good chance that I'm misunderstanding something and the size is off, in which case this will segfault eventually, but this isn't something I've tried to make LibRaw do before.
More information can be found in this question on the LibRaw forums, or in the LibRaw struct docs.
Storing in a numpy array I leave as an excersize for the user, or for a follow up answer (I have no experience with numpy).

Related

How do I create a Mat matrix using OpenCV in Python?

Obviously this line is wrong.
matOut = cv2::Mat::Mat(height, width, cv2.CV_8UC4)
this one too.
Mat matOut(height, width, cv2.CV_8UC4)
How do I create an modern openCV empty matrix with a given size, shape, format? The latest openCV docs on topic don't seem to be helping... I gleaned the formats used above directly from that document. Note: I'm assuming OpenCV (import cv2) Python3, import numpy, etc...
I was looking to create an empty matrix with the intent of copying content from a different buffer into it...
edit, more failed attempts...
matOut = cv2.numpy.Mat(height, width, cv2.CV_8UC4)
matOut = numpy.array(height, width, cv2.CV_8UC4)

So user696969 called out a largely successful solution to the question asked. You create a new shaped area via:
matOut = numpy.zeros([height, width, 4], dtype=numpy.uint8)
note I have replaced the desired content, cv2.CV_8UC4, with its expected response, the number 4. There are 4 eight bit bytes in a simple RGBA pixel descriptor. I would have preferred if OpenCV tooling had performed that response as a function call, but that didn't seem to work...
I do want to share my use case. I originally was going to create an empty shaped matrix so I could transfer data from a single dimensional array there. As I worked this problem I realized there was a better way. I start the routine where I have received a file containing 8 bit RGBA data without any prefix metadata. Think raw BMP without any header info.
matContent = numpy.frombuffer(fileContent, numpy.uint8)
matContentReshaped = matContent.reshape(height, width, 4)
cv2.imshow("Display Window", matContentReshaped)
k = cv2.waitKey(0)
Thats it. Easy, Peasy.... Thanks to user696969 and eldesgraciado for help here.

Is there any way to use arithmetic ops on FITS files in Python?

I'm fairly new to Python, and I have been trying to recreate a working IDL program to Python, but I'm stuck and keep getting errors. I haven't been able to find a solution yet.
The program requires 4 FITS files in total (img and correctional images dark, flat1, flat2). The operations are as follows:
flat12 = (flat1 + flat2)/2
img1 = (img - dark)/flat12
The said files have dimensions (1024,1024,1). I have resized them to (1024,1024) to be able to even use im_show() function.
I have also tried using cv2.add(), but I get this:
TypeError: Expected Ptr for argument 'src1'
Is there any workaround for this? Thanks in advance.

To read your FITS files use astropy.io.fits: http://docs.astropy.org/en/latest/io/fits/index.html
This will give you Numpy arrays (and FITS headers if needed, there are different ways to do this, as explained in the documentation), so you could do something like:
>>> from astropy.io import fits
>>> img = fits.getdata('image.fits', ext=0) # extension number depends on your FITS files
>>> dark = fits.getdata('dark.fits') # by default it reads the first "data" extension
>>> darksub = img - dark
>>> fits.writeto('out.fits', darksub) # save output
If your data has an extra dimension, as shown with the (1024,1024,1) shape, and if you want to remove that axis, you can use the normal Numpy array slicing syntax: darksub = img[0] - dark[0].
Otherwise in the example above it will produce and save a (1024,1024,1) image.

Is there any way to read one image row/column into an array in Python?

I've just translated some CT reconstruction software from IDL into Python, and this is my first experience ever with Python. The code works fine except that it's much, much slower. This is due, in part, to the fact that IDL allows me to save memory and time by reading in just one row of an image at a time, using the following:
image = read_tiff(filename, sub_rect = [0, slice, x, 1])
I need to read one row each from 1800 different projection images, but as far as I can tell I can only create an image array by reading in the entire image and then converting it to an array. Is there any trick to just read in one row from the start, since I don't need the other 2047 rows?

It looks like tifffile.py by Christoph Gohlke (http://www.lfd.uci.edu/~gohlke/code/tifffile.py.html) can do the job.
from tiffile import TiffFile
with TiffFile('test.tiff') as tif:
for page in tif:
image = page.asarray(memmap=True)
print image[0,:,:]
If I interpret the code correctly this will extract the first row of every page in the file without loading the whole file into memory (through numpy.memmap).

Best dtype for creating large arrays with numpy

I am looking to store pixel values from satellite imagery into an array. I've been using
np.empty((image_width, image_length)
and it worked for smaller subsets of an image, but when using it on the entire image (3858 x 3743) the code terminates very quickly and all I get is an array of zeros.
I load the image values into the array using a loop and opening the image with gdal
img = gdal.Open(os.path.join(fn + "\{0}".format(fname))).ReadAsArray()
but when I include print img_array I end up with just zeros.
I have tried almost every single dtype that I could find in the numpy documentation but keep getting the same result.
Is numpy unable to load this many values or is there a way to optimize the array?
I am working with 8-bit tiff images that contain NDVI (decimal) values.
Thanks

Not certain what type of images you are trying to read, but in the case of radarsat-2 images you can the following:
dataset = gdal.Open("RADARSAT_2_CALIB:SIGMA0:" + inpath + "product.xml")
S_HH = dataset.GetRasterBand(1).ReadAsArray()
S_VV = dataset.GetRasterBand(2).ReadAsArray()
# gets the intensity (Intensity = re**2+imag**2), and amplitude = sqrt(Intensity)
self.image_HH_I = numpy.real(S_HH)**2+numpy.imag(S_HH)**2
self.image_VV_I = numpy.real(S_VV)**2+numpy.imag(S_VV)**2
But that is specifically for that type of images (in this case each image contains several bands, so i need to read in each band separately with GetRasterBand(i), and than do ReadAsArray() If there is a specific GDAL driver for the type of images you want to read in, life gets very easy
If you give some more info on the type of images you want to read in, i can maybe help more specifically
Edit: did you try something like this ? (not sure if that will work on tiff, or how many bits the header is, hence the something:)
A=open(filename,"r")
B=numpy.fromfile(A,dtype='uint8')[something:].reshape(3858,3743)
C=B*1.0
A.close()
Edit: The problem is solved when using 64bit python instead of 32bit, due to memory errors at 2Gb when using the 32bit python version.

What is the best way to save image metadata alongside a tif?

In my work as a grad student, I capture microscope images and use python to save them as raw tif's. I would like to add metadata such as the name of the microscope I am using, the magnification level, and the imaging laser wavelength. These details are all important for how I post-process the images.
I should be able to do this with a tif, right? Since it has a header?
I was able to add to the info in a PIL image:
im.info['microscope'] = 'george'
but when I save and load that image, the info I added is gone.
I'm open to all suggestions. If I have too, I'll just save a separate .txt file with the metadata, but it would be really nice to have it embedded in the image.

Tifffile is one option for saving microscopy images with lots of metadata in python.
It doesn't have a lot of external documentation, but the docstings are great so you can get a lot of info just by typing help(tifffile) in python, or go look at the source code.
You can look at the TiffWriter.save function in the source code (line 750) for the different keyword arguments you can use to write metadata.
One is to use description, which accepts a string. It will show up as the tag "ImageDescription" when you read your image.
Another is to use the extratags argument, which accepts a list of tuples. That allows you to write any tag name that exist in TIFF.TAGS(). One of the easiest way is to write them as strings because then you don't have to specify length.
You can also write ImageJ metadata with ijmetadata, for which the acceptable types are listed in the source code here.
As an example, if you write the following:
import json
import numpy as np
import tifffile
im = np.random.randint(0, 255, size=(150, 100), dtype=np.uint8)
# Description
description = "This is my description"
# Extratags
metadata_tag = json.dumps({"ChannelIndex": 1, "Slice": 5})
extra_tags = [("MicroManagerMetadata", 's', 0, metadata_tag, True),
("ProcessingSoftware", 's', 0, "my_spaghetti_code", True)]
# ImageJ metadata. 'Info' tag needs to be a string
ijinfo = {"InitialPositionList": [{"Label": "Pos1"}, {"Label": "Pos3"}]}
ijmetadata = {"Info": json.dumps(ijinfo)}
# Write file
tifffile.imsave(
save_name,
im,
ijmetadata=ijmetadata,
description=description,
extratags=extra_tags,
)
You can see the following tags when you read the image:
frames = tifffile.TiffFile(save_name)
page = frames.pages[0]
print(page.tags["ImageDescription"].value)
Out: 'this is my description'
print(page.tags["MicroManagerMetadata"].value)
Out: {'ChannelIndex': 1, 'Slice': 5}
print(page.tags["ProcessingSoftware"].value)
Out: my_spaghetti_code

For internal use, try saving the metadata as JSON in the TIFF ImageDescription tag, e.g.
from __future__ import print_function, unicode_literals
import json
import numpy
import tifffile # http://www.lfd.uci.edu/~gohlke/code/tifffile.py.html
data = numpy.arange(256).reshape((16, 16)).astype('u1')
metadata = dict(microscope='george', shape=data.shape, dtype=data.dtype.str)
print(data.shape, data.dtype, metadata['microscope'])
metadata = json.dumps(metadata)
tifffile.imsave('microscope.tif', data, description=metadata)
with tifffile.TiffFile('microscope.tif') as tif:
data = tif.asarray()
metadata = tif[0].image_description
metadata = json.loads(metadata.decode('utf-8'))
print(data.shape, data.dtype, metadata['microscope'])
Note that JSON uses unicode strings.
To be compatible with other microscopy software, consider saving OME-TIFF files, which store defined metadata as XML in the ImageDescription tag.

I should be able to do this with a tif, right? Since it has a header?
No.
First, your premise is wrong, but that's a red herring. TIFF does have a header, but it doesn't allow you to store arbitrary metadata in it.
But TIFF is a tagged file format, a series of chunks of different types, so the header isn't important here. And you can always create your own private chunk (any ID > 32767) and store anything you want there.
The problem is, nothing but your own code will have any idea what you stored there. So, what you probably want is to store EXIF or XMP or some other standardized format for extending TIFF with metadata. But even there, EXIF or whatever you choose isn't going to have a tag for "microscope", so ultimately you're going to end up having to store something like "microscope=george\nspam=eggs\n" in some string field, and then parse it back yourself.
But the real problem is that PIL/Pillow doesn't give you an easy way to store EXIF or XMP or anything else like that.
First, Image.info isn't for arbitrary extra data. At save time, it's generally ignored.
If you look at the PIL docs for TIFF, you'll see that it reads additional data into a special attribute, Image.tag, and can save data by passing a tiffinfo keyword argument to the Image.save method. But that additional data is a mapping from TIFF tag IDs to binary hunks of data. You can get the Exif tag IDs from the undocumented PIL.ExifTags.TAGS dict (or by looking them up online yourself), but that's as much support as PIL is going to give you.
Also, note that accessing tag and using tiffinfo in the first place requires a reasonably up-to-date version of Pillow; older versions, and classic PIL, didn't support it. (Ironically, they did have partial EXIF support for JPG files, which was never finished and has been stripped out…) Also, although it doesn't seem to be documented, if you built Pillow without libtiff it seems to ignore tiffinfo.
So ultimately, what you're probably going to want to do is:
Pick a metadata format you want.
Use a different library than PIL/Pillow to read and write that metadata. (For example, you can use GExiv2 or pyexif for EXIF.)

You could try setting tags in the tag property of a TIFF image. This is an ImageFileDirectory object. See TiffImagePlugin.py.
Or, if you have libtiff installed, you can use the subprocess module to call the tiffset command to set a field in the header after you have saved the file. There are online references of available tags.
According to this page:
If one needs more than 10 private tags or so, the TIFF specification suggests that, rather then using a large amount of private tags, one should instead allocate a single private tag, define it as datatype IFD, and use it to point to a socalled 'private IFD'. In that private IFD, one can next use whatever tags one wants. These private IFD tags do not need to be properly registered with Adobe, they live in a namespace of their own, private to the particular type of IFD.
Not sure if PIL supports this, though.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.