Is there any information in numpy array header?

Is there any information in numpy array header? - python

There are 360 files with .bin extension which I know that they are 360 raw image files (16-bit grayscale). I guess
the size of images is something around 1518x999. I am puzzled how to get image data out of them. I examined them and found that
there are 149 bytes repeated at begining of all files and 15 bytes at end of all files (they are marked with white box in below pics).
Are these header and footer something common in numpy array? (I see numpy multiarray ... among header bytes. See below pics)
Can I extract some information about the image specs like width and height from the header and footer?
Here are three examples of the files.

Yes. The header contains information about the type and size of the array.
Using numpy (and pillow), you can easily retrieve the image as follows.
# Using python 3.6 or higher.
# To install numpy and pillow, run: pip3 install numpy pillow
from pathlib import Path
import numpy as np
from PIL import Image
input_dir = Path("./binFiles") # Directory where *.bin files are stored.
output_dir = Path("./_out") # Directory where you want to output the image files.
output_dir.mkdir(parents=True, exist_ok=True)
for path in input_dir.rglob("*.bin"):
buf = np.load(path, allow_pickle=True)
image = Image.fromarray(buf)
image.save(output_dir / (path.stem + ".png"))
Here is a sample.
(I couldn't upload in original png format, so this is converted one)
EDIT:
Questions
Is there any more information in header than what it was retrieved?
Is there any information in that footer?
Answer
Theoretically, both answers are no.
Your files are actually not in numpy file format, but numpy object in pickle file format.
I was able to rebuild the exact matching file using only dtype, shape, order, and an array of 3,032,964 (=999x1518x2) bytes. Thus, numpy or pickle may have added additional metadata, but only those four are the essential information (at least for the three files you provided).
If you want to know about "additional metadata", I don't have an answer for you, you might want to ask a refined new question since this is about pickle file format.
Here is the code I used for checking, in case you might want to check other files as well.
for input_path in input_dir.rglob("*.bin"):
# Load the original file.
numpy_array = np.load(input_path, allow_pickle=True)
# Convert to a byte array. 'A' means keep the order.
bytes_array = numpy_array.tobytes('A')
# Make sure there are no additional bytes other than the image pixels.
assert len(bytes_array) == numpy_array.size * numpy_array.itemsize
# Rebuild from byte array.
# Note that rebuilt_array is constructed using only dtype, shape, order,
# and a byte array matching the image size.
rebuilt_array = np.frombuffer(
bytes_array, dtype=numpy_array.dtype
).reshape(
numpy_array.shape, order='F' if np.isfortran(numpy_array) else 'C'
)
# Pickle the rebuilt array (mimicking the original file).
rebuilt_path = output_dir / (input_path.stem + ".pickle")
with rebuilt_path.open(mode='wb') as fo:
pickle.dump(rebuilt_array, fo, protocol=4)
# Make sure there are no additional bytes other than the rebuilt array.
assert rebuilt_path.read_bytes() == input_path.read_bytes()
print(f"{input_path.name} passed!")

Related

How to create a list of DICOM files and convert it to a single numpy array .npy?

I have a problem and don't know how to solve:
I'm learning how to analyze DICOM files with Python and, so,
I got a patient exam, on single patient and one single exam, which is 200 DICOM files all of the size 512x512 each archive representing a different layer of him and I want to turn them into a single archive .npy so I can use in another tutorial that I found online.
Many tutorials try to convert them to jpg or png using opencv first, but I don't want this since I'm not interested in a friendly image to see right now, I need the array. Also, this step screw all the quality of images.
I already know that using:
medical_image = pydicom.read_file(file_path)
image = medical_image.pixel_array
I can grab the path, turn 1 slice in a pixel array and them use it, but the thing is, it doesn't work in a for loop.
The for loop I tried was basically this:
image = [] # to create an empty list
for f in glob.iglob('file_path'):
img = pydicom.dcmread(f)
image.append(img)
It results in a list with all the files. Until here it goes well, but it seems it's not the right way, because I can use the list and can't find the supposed next steps anywhere, not even answers to the errors that I get in this part, (so I concluded it was wrong)

The following code snippet allows to read DICOM files from a folder dir_path and to store them into a list. Actually, the list does not consist of the raw DICOM files, but is filled with NumPy arrays of Hounsfield units (by using the apply_modality_lut function).
import os
from pathlib import Path
import pydicom
from pydicom.pixel_data_handlers import apply_modality_lut
dir_path = r"path\to\dicom\files"
dicom_set = []
for root, _, filenames in os.walk(dir_path):
for filename in filenames:
dcm_path = Path(root, filename)
if dcm_path.suffix == ".dcm":
try:
dicom = pydicom.dcmread(dcm_path, force=True)
except IOError as e:
print(f"Can't import {dcm_path.stem}")
else:
hu = apply_modality_lut(dicom.pixel_array, dicom)
dicom_set.append(hu)

You were well on your way. You just have to build up a volume from the individual slices that you read in. This code snippet will create a pixelVolume of dimension 512x512x200 if your data is as advertised.
import dicom
import numpy
images = [] # to create an empty list
# Read all of the DICOM images from file_path into list "images"
for f in glob.iglob('file_path'):
image = pydicom.dcmread(f)
images.append(image)
# Use the first image to determine the number of rows and columns
repImage = images[0]
rows=int(repImage.Rows)
cols=int(repImage.Columns)
slices=len(images)
# This tuple represents the dimensions of the pixel volume
volumeDims = (rows, cols, slices)
# allocate storage for the pixel volume
pixelVolume = numpy.zeros(volumeDims, dtype=repImage.pixel_array.dtype)
# fill in the pixel volume one slice at a time
for image in images:
pixelVolume[:,:,i] = image.pixel_array
#Use pixelVolume to do something interesting
I don't know if you are a DICOM expert or a DICOM novice, but I am just accepting your claim that your 200 images make sense when interpreted as a volume. There are many ways that this may fail. The slices may not be in expected order. There may be multiple series in your study. But I am guessing you have a "nice" DICOM dataset, maybe used for tutorials, and that this code will help you take a step forward.

Is there any way to use arithmetic ops on FITS files in Python?

I'm fairly new to Python, and I have been trying to recreate a working IDL program to Python, but I'm stuck and keep getting errors. I haven't been able to find a solution yet.
The program requires 4 FITS files in total (img and correctional images dark, flat1, flat2). The operations are as follows:
flat12 = (flat1 + flat2)/2
img1 = (img - dark)/flat12
The said files have dimensions (1024,1024,1). I have resized them to (1024,1024) to be able to even use im_show() function.
I have also tried using cv2.add(), but I get this:
TypeError: Expected Ptr for argument 'src1'
Is there any workaround for this? Thanks in advance.

To read your FITS files use astropy.io.fits: http://docs.astropy.org/en/latest/io/fits/index.html
This will give you Numpy arrays (and FITS headers if needed, there are different ways to do this, as explained in the documentation), so you could do something like:
>>> from astropy.io import fits
>>> img = fits.getdata('image.fits', ext=0) # extension number depends on your FITS files
>>> dark = fits.getdata('dark.fits') # by default it reads the first "data" extension
>>> darksub = img - dark
>>> fits.writeto('out.fits', darksub) # save output
If your data has an extra dimension, as shown with the (1024,1024,1) shape, and if you want to remove that axis, you can use the normal Numpy array slicing syntax: darksub = img[0] - dark[0].
Otherwise in the example above it will produce and save a (1024,1024,1) image.

How can I write to a png/tiff file patch-by-patch?

I want to create a png or tiff image file from a very large h5py dataset that cannot be loaded into memory all at once. So, I was wondering if there is a way in python to write to a png or tiff file in patches? (I can load the h5py dataset in slices to a numpy.ndarray).
I've tried using the pillow library and doing PIL.Image.paste giving the box coordinates, but for large images it goes out of memory.
Basically, I'm wondering if there's a way to do something like:
for y in range(0, height, patch_size):
for x in range(0, width, patch_size):
y2 = min(y + patch_size, height)
x2 = min(x + patch_size, width)
# image_arr is an h5py dataset that cannot be loaded completely
# in memory, so load it in slices
image_file.write(image_arr[y:y2, x:x2], box=(y, x, y2, x2))
I'm looking for a way to do this, without having the whole image loaded into memory. I've tried the pillow library, but it loads/keeps all the data in memory.
Edit: This question is not about h5py, but rather how extremely large images (that cannot be loaded into memory) can we written out to a file in patches - similar to how large text files can be constructed by writing to it line by line.

Try tifffile.memmap:
from tifffile import memmap
image_file = memmap('temp.tif', shape=(height, width), dtype=image_arr.dtype,
bigtiff=True)
for y in range(0, height, patch_size):
for x in range(0, width, patch_size):
y2 = min(y + patch_size, height)
x2 = min(x + patch_size, width)
image_file[y:y2, x:x2] = image_arr[y:y2, x:x2]
image_file.flush()
This creates a uncompressed BigTIFF file with one strip. Memory-mapped tiles are not implemented yet. Not sure how many libraries can handle that kind of file, but you can always directly read from the strip using the meta data in the TIFF tags.

Short answer to "if there is a way in Python to write to a png or tiff file in patches?". Well, yes - everything is possible in Python, given enough time and skill to implement it. On the other hand, NO, there is no ready-made solution for this - because it doesn't appear to be very useful.
I don't know about TIFF and a comment here says it is limited to 4GB, so this format is likely not a good candidate. PNG has no practical limit and can be written in chunks, so it is doable in theory - on the condition that at least one scan line of your resulting image does fit into memory.
If you really want to go ahead with this, here is the info that you need:
A PNG file consists of a few metadata chunks and a series of image data chunks. The latter are independent of each other and you can therefore construct a big image out of several smaller images (each of which contains a whole number of rows, a minimum of one row) by simply concatenating their image data chunks (IDAT) together and adding the needed metadata chunks (you can pick those from the first small image, except for the IHDR chunk - that one will need to be constructed to contain the final image size).
So, here is how I'd do it, if I had to (NOTE you will need some understanding of Python's bytes type and the methods of converting byte sequences to and from Python data types to pull this off):
find how many rows I can fit into memory and make that the height of my "small image chunk". The width is the width of the entire final image. let's call those width and small_height
go through my giant data set in h5py one chunk at a time (width * small_height), convert it to PNG and save it to disk in a temporary file, or if your image conversion library allows it - directly to a bytes string in memory. Then process the byte data as follows and delete it at the end:
-- on the first iteration: walk through the PNG data one record at a time (see the PNG spec: http://www.libpng.org/pub/png/spec/1.2/png-1.2-pdg.html, it is in length-tag-value form and very easy to write code that efficiently walks over the file record by record), save ALL the records into my target file, except: modify IHDR to have the final image size and skip the IEND record.
-- on all subsequent iterations: scan through the PNG data and pick only the IDAT records, write those out to the output file.
append an IEND record to the target file.
All done - you should now have a valid humongous PNG. I wonder who or what could read that, though.

Creating an image from single bits in Python 3

I have created an array of bits by using this:
Data = []
Bytes = numpy.fromfile(filename, dtype = "uint8")
Bits = numpy.unpackbits(Bytes)
for b in Bits:
Data.append(b)
"filename" ends with ".png".
Later on, I do some stuff with these bits. I want to save an image with another(or the same) set of bits. How do I do it? The best option would be using: saveToPNG(Data)

You can save those bits as a PNG file by simply reversing the steps you've used.
BTW, there's no need to create the Data list: you can access the bits in the Bits array with normal Python functions & operators as well as with Numpy. But if you really do want those bits in a plain Python list then there's no need for that slow for ... append loop: just pass the array to the list constructor.
I've changed your variable names to make them conform to the PEP-8 style guide.
import numpy as np
# File names
in_name = 'square.png'
out_name = 'square_out.png'
# Read data and convert to a list of bits
in_bytes = np.fromfile(in_name, dtype = "uint8")
in_bits = np.unpackbits(in_bytes)
data = list(in_bits)
# Convert the list of bits back to bytes and save
out_bits = np.array(data)
print(np.all(out_bits == in_bits))
out_bytes = np.packbits(out_bits)
print(np.all(out_bytes == in_bytes))
out_bytes.tofile(out_name)
However, I don't know why you want to do this. If you want access to the image data in the PNG file then you need to decode it properly. A simple way to do that is to use PIL (Pillow) to load the image file into a PIL Image object; Numpy can make an array from a PIL Image. You can then use standard Numpy tools to analyze or manipulate the raw image data, and then pass it back to PIL to save it as a PNG (or various other image file formats). See the final code block in this answer for an example.

Best dtype for creating large arrays with numpy

I am looking to store pixel values from satellite imagery into an array. I've been using
np.empty((image_width, image_length)
and it worked for smaller subsets of an image, but when using it on the entire image (3858 x 3743) the code terminates very quickly and all I get is an array of zeros.
I load the image values into the array using a loop and opening the image with gdal
img = gdal.Open(os.path.join(fn + "\{0}".format(fname))).ReadAsArray()
but when I include print img_array I end up with just zeros.
I have tried almost every single dtype that I could find in the numpy documentation but keep getting the same result.
Is numpy unable to load this many values or is there a way to optimize the array?
I am working with 8-bit tiff images that contain NDVI (decimal) values.
Thanks

Not certain what type of images you are trying to read, but in the case of radarsat-2 images you can the following:
dataset = gdal.Open("RADARSAT_2_CALIB:SIGMA0:" + inpath + "product.xml")
S_HH = dataset.GetRasterBand(1).ReadAsArray()
S_VV = dataset.GetRasterBand(2).ReadAsArray()
# gets the intensity (Intensity = re**2+imag**2), and amplitude = sqrt(Intensity)
self.image_HH_I = numpy.real(S_HH)**2+numpy.imag(S_HH)**2
self.image_VV_I = numpy.real(S_VV)**2+numpy.imag(S_VV)**2
But that is specifically for that type of images (in this case each image contains several bands, so i need to read in each band separately with GetRasterBand(i), and than do ReadAsArray() If there is a specific GDAL driver for the type of images you want to read in, life gets very easy
If you give some more info on the type of images you want to read in, i can maybe help more specifically
Edit: did you try something like this ? (not sure if that will work on tiff, or how many bits the header is, hence the something:)
A=open(filename,"r")
B=numpy.fromfile(A,dtype='uint8')[something:].reshape(3858,3743)
C=B*1.0
A.close()
Edit: The problem is solved when using 64bit python instead of 32bit, due to memory errors at 2Gb when using the 32bit python version.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.