An error while writing matrix into a raster file with rasterio

An error while writing matrix into a raster file with rasterio - python

The original two datasets are two .tiff image file with same coordinate, height, width and transform information (let's say data1 and data2). I need to perform a simple math on them, but they both have null values, so I first use masked=True:
new1 = data1.read(1,masked=True)
new2 = data2.read(1,masked=True)
Then do the math:
target = new1 - new2
When I get the target, I try codes as below:
target.width
target.height
target.transform
target.crs
They all return the same error:
'MaskedArray' object has no attribute 'xxx'(xxx represents all the attribute above: width, height, etc.)
It seems the target loses all the information after math, I need to write this new result into a new raster file, what should I do to solve it?

When using the read method of a dataset, it will return a numpy array (masked in your case).
The numpy array is not a rasterio dataset, so it doesn't have those attributes.
To write it to disk you need to create a new profie (or copy the source dataset one) and use rasterio.open to create a new raster file:
profile = data1.profile
band_number = 1
# to update the dtype
profile.update(dtype=target.dtype)
with rasterio.open('raster.tif', 'w', **profile) as dst:
dst.write(target, band_number)
See the docs for a more detailed example

Related

How to change the origin point of a geotif in python

I have a raster image from hydraulic modelisation, i want to change some values, and create a new raster to open it in Qgis. I have the following script:
H_export is the name of my raster
# for opening the raster read-only and saving it on a variable.
Raster_H_Dataset = gdal.Open(paths\["raster"\] + H_export, gdal.GA_ReadOnly)
# Copy the transformation to a variable.
GT_input = Raster_H_Dataset.GetGeoTransform()
# Read the bands of your raster using GetRasterBand
Raster_H_Band = Raster_H_Dataset.GetRasterBand(1)
# Transform the read band into an array, to work with it.
Raster_H_Array = Raster_H_Band.ReadAsArray()
# Read the size of your array
Raster_H_size1,Raster_H_size2=Raster_H_Array.shape
# Create the output to be saved with the same dimensions
Raster_H_Output=np.zeros(shape=(Raster_H_size1,Raster_H_size2))
# Do the processing you need to and save it to the output variable. Example, i copy the raster values in the new one and i change some columns with the value 10 to see differences
for i in range(0,Raster_H_size1):
for j in range(0,Raster_H_size2):
Raster_H_Output\[i,j\]=Raster_H_Array\[i,j\]
Raster_H_Output\[:, 6:500\] = 10
# Create a variable with the projection of your raster:
dst_crs='EPSG:2154'
# Transform your output to a float
Raster_H_Output = np.float32(Raster_H_Output)
# And finally, write the variable output to your file
with rasterio.open(
paths\["raster"\] +'output_map.tif',
'w',
driver='GTiff',
height=Raster_H_Output.shape\[0\],
width=Raster_H_Output.shape\[1\],
count=1,
dtype=np.float32,
crs=dst_crs,
Transform = GT_input) as dest_file:
dest_file.write(Raster_H_Output, 1)
dest_file.close()
I don't want to modify the source raster, juste create a new one with some values changes.
Everythings works, no bugs, except that the output raster isn't georeferenced...
Is there anything to do differently ?
Thanks by advance

Compare 2 color histogram one of they loaded from txt file

I'm trying to save color histograms of image in a txt file to load them in other scripts.
Here is how I save them using numpy:
imgx = cv2.imread('pruebas/pop.png')
imgx = cv2.cvtColor(imgx,cv2.COLOR_BGR2HSV)
histr_hx = cv2.calcHist([imgx],[0],None,[180],[0,179])
np.savetxt('h_hist.txt', histr_hx)
Here is how I load it with numpy too:
txtHist = np.loadtxt('h_hist.txt', ndmin=2)
I think it's all ok, because when I print histr_hx and txtHist, both have the same structure, appearance and type.
The problem is when I want to use this loaded histogram and compare it with a new one using cv2.compareHist(), then appears the following error:
> zh = cv2.compareHist(textHist,new_hist,cv2.HISTCMP_CORREL)
cv2.error: OpenCV(3.4.2) C:\Miniconda3\conda-bld\opencv-suite_1534379934306\work\modules\imgproc\src\histogram.cpp:1935: error: (-215:Assertion failed) H1.type() == H2.type() && H1.depth() == 5 in function
'cv::compareHist'
Both histograms are with the same structure, number of bins, etc. I don't understand what is the problem.

It seems that your loaded array is of type float64 whereas the calculated array is float32, which cv2.compareHist() does not like.
Is it important that you save the histograms as text files? Because you can also use np.save and np.load to save the arrays as numpy binary files, which also save the dtype of the data. With this the code would look like this:
imgx = cv2.imread('pruebas/pop.png')
imgx = cv2.cvtColor(imgx,cv2.COLOR_BGR2HSV)
histr_hx = cv2.calcHist([imgx],[0],None,[180],[0,179])
np.save('h_hist.npy', histr_hx)
txtHist = np.load('h_hist.npy')
cv2.compareHist(histr_hx, txtHist, cv2.HISTCMP_CORREL)
If the text representation is important, you can convert the loaded data to the datatype of histr_hx:
cv2.compareHist(histr_hx, txtHist.astype(histr_hx.dtype), cv2.HISTCMP_CORREL)

Converting VTK UnstructuredGrid to StructuredGrid

The background to my problem is that I have a 3D structure saved in a .vtk file that I need to manipulate (dilate, erode, etc.). The following code snippets are designed to be run sequentially, i.e. if you run them one after the other, there should be no problems (apart from those I mention!).
I'm very new to VTK, so apologies for any very basic mistakes!
Problem
My problem stems from a problem with SimpleITK, wherein it is unable to read UnstructuredGrid or PolyData:
In [1]: import SimpleITK as sitk
In [2]: img_vtk = sitk.ReadImage(file_vtk)
Traceback (most recent call last):
File "<ipython-input-52-435ce999db50>", line 1, in <module>
img_vtk = sitk.ReadImage(file_vtk)
File "/usr/local/lib/python3.5/dist-packages/SimpleITK/SimpleITK.py", line 8614, in ReadImage
return _SimpleITK.ReadImage(*args)
RuntimeError: Exception thrown in SimpleITK ReadImage: /tmp/SimpleITK/Code/IO/src/sitkImageReaderBase.cxx:97:
sitk::ERROR: Unable to determine ImageIO reader for "/data/ROMPA_MRIandSeg/09S/Analysis/1_model/clip_dilate.vtk"
SimpleITK can, however, read StructuredGrid, so I tried to solve this by reading using VTK and converting.
import vtk
reader = vtk.vtkGenericDataObjectReader() # Using generic to allow it to match either Unstructured or PolyData
reader.SetFileName(file_vtk)
reader.Update()
output = reader.GetOutput()
However, from that point on, every method I've tried seems to have failed.
Proposed Solutions
Conversion to numpy, then conversion to sitk image
I attempted to convert it to a numpy array (), then interpolate a regular grid, with a dummy variable of 1 to specify the values on the structure.
from vtk.utils import numpy_support
import scipy.interpolate
import numpy as np
nparray = numpy_support.vtk_to_numpy(output.GetPointData().GetArray(0))
output_bounds = output.GetBounds()
x_grid = range(math.floor(output_bounds[0]),math.ceil(output_bounds[1]),1)
y_grid = range(math.floor(output_bounds[2]),math.ceil(output_bounds[3]),1)
z_grid = range(math.floor(output_bounds[4]),math.ceil(output_bounds[5]),1)
grid = list()
for x in x_grid:
for y in y_grid:
for z in z_grid:
grid.append((x,y,z))
dummy = np.array([1 for i in range(nparray.shape[0])])
npgrid = scipy.interpolate.griddata(nparray,dummy,grid,fill_value=0)
npgrid.reshape(len(x_grid),len(y_grid),len(z_grid))
img = sitk.GetImageFromArray(npgrid)
sitk.WriteImage(img,file_out)
However, when I load this in ParaView, a bounding box is displayed for the output, but a contour of the output is empty.
Using ShepardMethod
I attempted to interpolate using the built-in ShepardMethod, after converting the UnstructuredGrid to PolyData (as I'd mostly seen ShepardMethod being applied to PolyData):
bounds = output.GetBounds()
spacings = [1.0,1.0,1.0] # arbitrary spacing
dimensions = [0,0,0]
for i,spacing in enumerate(spacings):
dimensions[i] = int(math.ceil((bounds[i*2 + 1]-bounds[i*2])/spacing))
vtkPoints = vtk.vtkPoints()
for i in range(0,nparray.shape[0]):
x=nparray[i,0]
y=nparray[i,1]
z=nparray[i,2]
p=[x,y,z]
vtkPoints.InsertNextPoint(p)
poly = vtk.vtkPolyData()
poly.SetPoints(vtkPoints)
shepard = vtk.vtkShepardMethod()
shepard.SetInputData(poly)
shepard.SetSampleDimensions(dimensions)
shepard.SetModelBounds(output.GetBounds())
shepard.Update()
shepard_data = shepard.GetOutput().GetPointData().GetArray(0)
shepard_numpy = numpy_support.vtk_to_numpy(shepard_data)
shepard_numpy = shepard_numpy.reshape(dimensions[0],dimensions[1],dimensions[2])
shepard_img = sitk.GetImageFromArray(shepard_numpy)
sitk.WriteImage(shepard_img,file_out)
As with the numpy effort above, this provided a bounding box in ParaView. Applying a contour provided a structure of two triangles, i.e. next to nothing seems to have been successfully written. Alternatively, I attempted to write the output directly using VTK.
shepard_data = shepard.GetOutput()
shepard_grid = vtk.vtkImageToStructuredGrid()
shepard_grid.SetInputData(shepard_data)
shepard_grid.Update()
writer = vtk.vtkStructuredGridWriter()
writer.SetFileName(file_out)
writer.SetInputData(shepard_grid.GetOutput())
writer.Write()
This produced the same output as before.
Using ProbeFilter
I tried the above using ProbeFilter instead (with both conversion to numpy and writing directly). Unfortunately, the output was the same as above.
mesh = vtk.vtkStructuredGrid()
mesh.SetDimensions(dimensions)
probe = vtk.vtkProbeFilter()
probe.SetInputData(mesh)
probe.SetSourceData(output)
probe.Update()
probe_out = probe.GetOutput()
writer = vtk.vtkStructuredGridWriter()
writer.SetFileName(file_out)
writer.SetInputData(probe.GetOutput())
writer.Write()
probe_data = probe.GetOutput().GetPointData().GetArray(0)
probe_numpy = numpy_support.vtk_to_numpy(probe_data)
probe_numpy = probe_numpy.reshape(dimensions[0],dimensions[1],dimensions[2])
probe_img = sitk.GetImageFromArray(probe_numpy)
sitk.WriteImage(probe_img,file_out)
However, this seemed to produce no viable output (vtkStructuredGridWriter produced an empty file, and probe_numpy was empty).
Changing ParaView output
My original data comes from a structuredGrid .vtk file, that I open using ParaView, and then clip to remove structures that aren't required in the mesh. Saving the output saves an unstructuredGrid, and I have been unable to figure out whether I can change that, and avoid this mess in the first place!

Just use "Resample With Dataset" filter in ParaView.
Open ParaView
Open a StructuredGrid file file with the geometry you want it to have
Open your UnstructuredGrid file
Add a "Resample with dataset" filter
Select structured data as source input
Apply

Load sequence of PNGs into vtkImageData for 3D volume render using python

I have a sequence of about 100 PNG files containing 512x512 pre-segmented CAT scan data. I want to use vtk on Python to create a 3D model using marching cubes algorithm. The part that I don't know how to do is to load the sequence of PNG files and convert them to a single vtk pixel data object suitable for sending to the vtkDiscreteMarchingCubes algorithm.
I also think that I need to convert the pixel values of the PNG data because right now the data is in the alpha channel, so this needs to be converted into scalar data with values of zero and 1.

use vtkPNGreader and load in individual slices and then populate a vtkImageData which you can define the dimensions as and for each z-slice or image fill the image data form the output of the reader into your vtkImageData.
Rough pseudocode - not checked for bugs :)
import vtk
from vtk.util import numpy_support
pngfiles = glob.glob('*.png')
png_reader = vtk.vtkPNGReader()
png_reader.SetFileName(pngfiles[0])
x,y = png_reader.GetOutput().GetDimensions()
data_3d = np.zeros([x,y,len(pngfiles)])
for i,p in enumerate(png):
png_reader.SetFileName(pngfiles[0])
png_reader.Update()
img_data = png_reader.GetOutput()
data_3D[:,:,i] = numpy_support.vtk_to_numpy(img_data)
#save your 3D numpy array out.
data_3Dvtk = numpy_support.numpy_to_vtk(data_3D)

Just in case anyone stumbles on here looking for another way to do this only using vtk, you can use vtkImageAppend class.
def ReadImages(files):
reader = vtk.vtkPNGReader()
image3D = vtk.vtkImageAppend()
image3D.SetAppendAxis(2)
for f in files:
reader.SetFileName(f)
reader.Update()
t_img = vtk.vtkImageData()
t_img.DeepCopy(reader.GetOutput())
image3D.AddInputData(t_img)
image3D.Update()
return image3D.GetOutput()
for converting the data you can take a look at what the output of t_img.GetPointData().GetArray('PNGImage') gives and see if it is the expected value.

Attach a queue to a numpy array in tensorflow for data fetch instead of files?

I have read the CNN Tutorial on the TensorFlow and I am trying to use the same model for my project.
The problem is now in data reading. I have around 25000 images for training and around 5000 for testing and validation each. The files are in png format and I can read them and convert them into the numpy.ndarray.
The CNN example in the tutorials use a queue to fetch the records from the file list provided. I tried to create my own such binary file by reshaping my images into 1-D array and attaching a label value in the front of it. So my data looks like this
[[1,12,34,24,53,...,105,234,102],
[12,112,43,24,52,...,115,244,98],
....
]
The single row of the above array is of length 22501 size where the first element is the label.
I dumped the file to using pickle and the tried to read from the file using the
tf.FixedLengthRecordReader to read from the file as demonstrated in example
I am doing the same things as given in the cifar10_input.py to read the binary file and putting them into the record object.
Now when I read from the files the labels and the image values are different. I can understand the reason for this to be that pickle dumps the extra information of braces and brackets also in the binary file and they change the fixed length record size.
The above example uses the filenames and pass it to a queue to fetch the files and then the queue to read a single record from the file.
I want to know if I can pass the numpy array as defined above instead of the filenames to some reader and it can fetch records one by one from that array instead of the files.

Probably the easiest way to make your data work with the CNN example code is to make a modified version of read_cifar10() and use it instead:
Write out a binary file containing the contents of your numpy array.
import numpy as np
images_and_labels_array = np.array([[...], ...], # [[1,12,34,24,53,...,102],
# [12,112,43,24,52,...,98],
# ...]
dtype=np.uint8)
images_and_labels_array.tofile("/tmp/images.bin")
This file is similar to the format used in CIFAR10 datafiles. You might want to generate multiple files in order to get read parallelism. Note that ndarray.tofile() writes binary data in row-major order with no other metadata; pickling the array will add Python-specific metadata that TensorFlow's parsing routines do not understand.
Write a modified version of read_cifar10() that handles your record format.
def read_my_data(filename_queue):
class ImageRecord(object):
pass
result = ImageRecord()
# Dimensions of the images in the dataset.
label_bytes = 1
# Set the following constants as appropriate.
result.height = IMAGE_HEIGHT
result.width = IMAGE_WIDTH
result.depth = IMAGE_DEPTH
image_bytes = result.height * result.width * result.depth
# Every record consists of a label followed by the image, with a
# fixed number of bytes for each.
record_bytes = label_bytes + image_bytes
assert record_bytes == 22501 # Based on your question.
# Read a record, getting filenames from the filename_queue. No
# header or footer in the binary, so we leave header_bytes
# and footer_bytes at their default of 0.
reader = tf.FixedLengthRecordReader(record_bytes=record_bytes)
result.key, value = reader.read(filename_queue)
# Convert from a string to a vector of uint8 that is record_bytes long.
record_bytes = tf.decode_raw(value, tf.uint8)
# The first bytes represent the label, which we convert from uint8->int32.
result.label = tf.cast(
tf.slice(record_bytes, [0], [label_bytes]), tf.int32)
# The remaining bytes after the label represent the image, which we reshape
# from [depth * height * width] to [depth, height, width].
depth_major = tf.reshape(tf.slice(record_bytes, [label_bytes], [image_bytes]),
[result.depth, result.height, result.width])
# Convert from [depth, height, width] to [height, width, depth].
result.uint8image = tf.transpose(depth_major, [1, 2, 0])
return result
Modify distorted_inputs() to use your new dataset:
def distorted_inputs(data_dir, batch_size):
"""[...]"""
filenames = ["/tmp/images.bin"] # Or a list of filenames if you
# generated multiple files in step 1.
for f in filenames:
if not gfile.Exists(f):
raise ValueError('Failed to find file: ' + f)
# Create a queue that produces the filenames to read.
filename_queue = tf.train.string_input_producer(filenames)
# Read examples from files in the filename queue.
read_input = read_my_data(filename_queue)
reshaped_image = tf.cast(read_input.uint8image, tf.float32)
# [...] (Maybe modify other parameters in here depending on your problem.)
This is intended to be a minimal set of steps, given your starting point. It may be more efficient to do the PNG decoding using TensorFlow ops, but that would be a larger change.

In your question, you specifically asked:
I want to know if I can pass the numpy array as defined above instead of the filenames to some reader and it can fetch records one by one from that array instead of the files.
You can feed the numpy array to a queue directly, but it will be a more invasive change to the cifar10_input.py code than my other answer suggests.
As before, let's assume you have the following array from your question:
import numpy as np
images_and_labels_array = np.array([[...], ...], # [[1,12,34,24,53,...,102],
# [12,112,43,24,52,...,98],
# ...]
dtype=np.uint8)
You can then define a queue that contains the entire data as follows:
q = tf.FIFOQueue([tf.uint8, tf.uint8], shapes=[[], [22500]])
enqueue_op = q.enqueue_many([image_and_labels_array[:, 0], image_and_labels_array[:, 1:]])
...then call sess.run(enqueue_op) to populate the queue.
Another—more efficient—approach would be to feed records to the queue, which you could do from a parallel thread (see this answer for more details on how this would work):
# [With q as defined above.]
label_input = tf.placeholder(tf.uint8, shape=[])
image_input = tf.placeholder(tf.uint8, shape=[22500])
enqueue_single_from_feed_op = q.enqueue([label_input, image_input])
# Then, to enqueue a single example `i` from the array.
sess.run(enqueue_single_from_feed_op,
feed_dict={label_input: image_and_labels_array[i, 0],
image_input: image_and_labels_array[i, 1:]})
Alternatively, to enqueue a batch at a time, which will be more efficient:
label_batch_input = tf.placeholder(tf.uint8, shape=[None])
image_batch_input = tf.placeholder(tf.uint8, shape=[None, 22500])
enqueue_batch_from_feed_op = q.enqueue([label_batch_input, image_batch_input])
# Then, to enqueue a batch examples `i` through `j-1` from the array.
sess.run(enqueue_single_from_feed_op,
feed_dict={label_input: image_and_labels_array[i:j, 0],
image_input: image_and_labels_array[i:j, 1:]})

I want to know if I can pass the numpy array as defined above instead
of the filenames to some reader and it can fetch records one by one
from that array instead of the files.
tf.py_func, that wraps a python function and uses it as a TensorFlow operator, might help. Here's an example.
However, since you've mentioned that your images are stored in png files, I think the simplest solution would be to replace this:
reader = tf.FixedLengthRecordReader(record_bytes=record_bytes)
result.key, value = reader.read(filename_queue)
with this:
result.key, value = tf.WholeFileReader().read(filename_queue))
value = tf.image.decode_jpeg(value)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.