filling gaps on an image using numpy and scipy - python

The image (test.tif) is attached.
The np.nan values are the whitest region.
How to fill those whitest region using some gap filling algorithms that uses values from the neighbours?
import scipy.ndimage
data = ndimage.imread('test.tif')

As others have suggested, scipy.interpolate can be used. However, it requires fairly extensive index manipulation to get this to work.
Complete example:
from pylab import *
import numpy
import scipy.ndimage
import scipy.interpolate
import pdb
data = scipy.ndimage.imread('data.png')
# a boolean array of (width, height) which False where there are missing values and True where there are valid (non-missing) values
mask = ~( (data[:,:,0] == 255) & (data[:,:,1] == 255) & (data[:,:,2] == 255) )
# array of (number of points, 2) containing the x,y coordinates of the valid values only
xx, yy = numpy.meshgrid(numpy.arange(data.shape[1]), numpy.arange(data.shape[0]))
xym = numpy.vstack( (numpy.ravel(xx[mask]), numpy.ravel(yy[mask])) ).T
# the valid values in the first, second, third color channel, as 1D arrays (in the same order as their coordinates in xym)
data0 = numpy.ravel( data[:,:,0][mask] )
data1 = numpy.ravel( data[:,:,1][mask] )
data2 = numpy.ravel( data[:,:,2][mask] )
# three separate interpolators for the separate color channels
interp0 = scipy.interpolate.NearestNDInterpolator( xym, data0 )
interp1 = scipy.interpolate.NearestNDInterpolator( xym, data1 )
interp2 = scipy.interpolate.NearestNDInterpolator( xym, data2 )
# interpolate the whole image, one color channel at a time
result0 = interp0(numpy.ravel(xx), numpy.ravel(yy)).reshape( xx.shape )
result1 = interp1(numpy.ravel(xx), numpy.ravel(yy)).reshape( xx.shape )
result2 = interp2(numpy.ravel(xx), numpy.ravel(yy)).reshape( xx.shape )
# combine them into an output image
result = numpy.dstack( (result0, result1, result2) )
imshow(result)
show()
Output:
This passes to the interpolator all values we have, not just the ones next to the missing values (which may be somewhat inefficient). It also interpolates every point in the output, not just the missing values (which is extremely inefficient). A better way is to interpolate just the missing values, and then patch them into the original image. This is just a quick working example to get started :)

I think viena's question is more related to an inpainting problem.
Here are some ideas:
In order to fill the gaps in B/W images you can use some filling algorithm like scipy.ndimage.morphology.binary_fill_holes. But you have a gray level image, so you can't use it.
I suppose that you don't want to use a complex inpainting algorithm. My first suggestion is: Don't try to use Nearest gray value (you don't know the real value of the NaN pixels). Using the NEarest value will generate a dirty algorithm. Instead, I would suggest you to fill the gaps with some other value (e.g. the mean of the row). You can do it without coding by using scikit-learn:
Source:
>>> from sklearn.preprocessing import Imputer
>>> imp = Imputer(strategy="mean")
>>> a = np.random.random((5,5))
>>> a[(1,4,0,3),(2,4,2,0)] = np.nan
>>> a
array([[ 0.77473361, 0.62987193, nan, 0.11367791, 0.17633671],
[ 0.68555944, 0.54680378, nan, 0.64186838, 0.15563309],
[ 0.37784422, 0.59678177, 0.08103329, 0.60760487, 0.65288022],
[ nan, 0.54097945, 0.30680838, 0.82303869, 0.22784574],
[ 0.21223024, 0.06426663, 0.34254093, 0.22115931, nan]])
>>> a = imp.fit_transform(a)
>>> a
array([[ 0.77473361, 0.62987193, 0.24346087, 0.11367791, 0.17633671],
[ 0.68555944, 0.54680378, 0.24346087, 0.64186838, 0.15563309],
[ 0.37784422, 0.59678177, 0.08103329, 0.60760487, 0.65288022],
[ 0.51259188, 0.54097945, 0.30680838, 0.82303869, 0.22784574],
[ 0.21223024, 0.06426663, 0.34254093, 0.22115931, 0.30317394]])
The dirty solution that uses the Nearest values can be this:
1) Find the perimeter points of the NaN regions
2) Compute all the distances between the NaN points and the perimeter
3) Replace the NaNs with the nearest's point gray value

If you want values from the nearest neighbors, you could use the NearestNDInterpolator from scipy.interpolate. There are also other interpolators as well you can consider.
You can locate the X,Y index values for the NaN values with:
import numpy as np
nan_locs = np.where(np.isnan(data))
There are some other options for the interpolation as well. One option is to replace NaN values with the results of a median filter (but your areas are kind of large for this). Another option might be grayscale dilation. The correct interpolation depends on your end domain.
If you haven't used a SciPy ND interpolator before, you'll need to provide X, Y, and value data to fit the interpolator to then X and Y data for values to interpolate at. You can do this using the where example above as a template.

OpenCV has some image in-painting algorithms that you could use. You just need to provide a binary mask which indicates which pixels should be in-painted.
import cv2
import numpy as np
import scipy.ndimage
data = ndimage.imread("test.tif")
mask = np.isnan(data)
inpainted_img = cv2.inpaint(img, mask, inpaintRadius=3, flags=cv2.INPAINT_TELEA)

Related

Saving masked image as FITS

I've constructed an image from some FITS files, and I want to save the resultant masked image as another FITS file. Here's my code:
import numpy as np
from astropy.io import fits
import matplotlib.pyplot as plt
#from astropy.nddata import CCDData
from ccdproc import CCDData
hdulist1 = fits.open('wise_neowise_w1-MJpersr.fits')
hdulist2 = fits.open('wise_neowise_w2-MJpersr.fits')
data1_raw = hdulist1[0].data
data2_raw = hdulist2[0].data
# Hide negative values in order to take logs
# Where {condition}==True, return data_raw, else return np.nan
data1 = np.where(data1_raw >= 0, data1_raw, np.nan)
data2 = np.where(data2_raw >= 0, data2_raw, np.nan)
# Calculation and image subtraction
w1mag = -2.5 * (np.log10(data1) - 9.0)
w2mag = -2.5 * (np.log10(data2) - 9.0)
color = w1mag - w2mag
## Find upper and lower 5th %ile of pixels
mask_percent = 5
masked_value_lower = np.nanpercentile(color, mask_percent)
masked_value_upper = np.nanpercentile(color, (100 - mask_percent))
## Mask out the upper and lower 5% of pixels
## Need to hide values outside the range [lower, upper]
color_masked = np.ma.masked_outside(color, masked_value_lower, masked_value_upper)
color_masked = np.ma.masked_invalid(color_masked)
plt.imshow(color)
plt.title('color')
plt.savefig('color.png', overwrite = True)
plt.imshow(color_masked)
plt.title('color_masked')
plt.savefig('color_masked.png', overwrite = True)
fits.writeto('color.fits',
color,
overwrite = True)
ccd = CCDData(color_masked, unit = 'adu')
ccd.write('color_masked.fits', overwrite = True))
hdulist1.close()
hdulist2.close()
When I use matplotlib.pyplot to imshow the images color and color_masked, they look as I expect:
However, my two output files, color_masked.fits == color.fits. I think somehow I'm not quite understanding the masking process properly. Can anyone see where I've gone wrong?
astropy.io.fits only handles normal arrays and that means it just ignores/discards the mask of your MaskedArray.
Depending on your use-case you have different options:
Saving the file so other FITS programs recognize the mask
I actually don't think that's possible. But some programs like DS9 can handle NaNs, so you could just set the masked values to NaN for the purpose of displaying them:
data_naned = np.where(color_masked.mask, np.nan, color_masked)
fits.writeto(filename, data_naned, overwrite=True)
They do still show up as "bright white spots" but they don't affect the color-scale.
If you want to take this a step further you could replace the masked pixels using a convolution filter before writing them to a file. Not sure if there's one in astropy that only replaces masked pixels though.
Saving the mask as extension so you can read them back
You could use astropy.nddata.CCDData (available since astropy 2.0) to save it as FITS file with mask:
from astropy.nddata import CCDData
ccd = CCDData(color_masked, unit='adu')
ccd.write('color_masked.fits', overwrite=True)
Then the mask will be saved in an extension called 'MASK' and it can be read using CCDData as well:
ccd2 = CCDData.read('color_masked.fits')
The CCDData behaves like a masked array in normal NumPy operations but you could also convert it to a masked-array by hand:
import numpy as np
marr = np.asanyarray(ccd2)

pymc imput passing back 1e20

Can't tell if I am doing something wrong with pymc's impute functionality or if this is a bug. Impute via the masked array passes 1e20 values to missing elements, while the inefficient method Impute seems to pass back correct samples. Below is a small example.
import numpy as np
import pymc as py
disasters_array = np.random.random((3,3))
disasters_array[1,1]=None
# The inefficient way, using the Impute function:
D = py.Impute('D', py.Normal, disasters_array, mu=.5, tau=1E5)
# The efficient way, using masked arrays:
# Generate masked array. Where the mask is true,
# the value is taken as missing.
print disasters_array
masked_values = np.ma.masked_invalid(disasters_array)
# Pass masked array to data stochastic, and it does the right thing
disasters = py.Normal('disasters', mu=.5, tau=1E5, value=masked_values, observed=True)
#py.deterministic
def test(disasters=disasters, D=D):
print D
print disasters
mcmc = py.MCMC(py.Model(set([test,disasters])))
Output:
Original Matrix:
[[ 0.23507836 0.2024624 0.90518228]
[ 0.95816 **nan** 0.43145808]
[ 0.99566308 0.25431568 0.25464137]]
D with imputations:
[[array(0.23507836309832741) array(0.20246240248367342)
array(0.9051822818081371)]
[array(0.9581599997650212) **array(0.5005324083232756)**
array(0.43145807852698237)]
[array(0.9956630757864052) array(0.2543156788973996)
array(0.25464136701826867)]]
Masked Array approach:
[[ 2.35078363e-01 2.02462402e-01 9.05182282e-01]
[ 9.58160000e-01 **1.00000000e+20** 4.31458079e-01]
[ 9.95663076e-01 2.54315679e-01 2.54641367e-01]]

Variance image in python using gdal and a running window approach

Variance image in gdal
I want a local variance image with a 3x3 of a geospatial raster image using python. My approach so far was to read in the raster band as an array, then using matrix notation to run a moving window and write the array into a new raster image. This approach worked well for a high pass filter as described in this tutorial: http://www.gis.usu.edu/~chrisg/python/2009/lectures/ospy_slides6.pdf
Then I tried to calculate the variance with several approaches, the last one using numpy (as np), but I just get a gray image with the same value everywhere.
I am open to any kind of solution. If it gives me the average local variance in the end, that would be even better.
rows = srcDS.RasterYSize
#read in as array
data = srcBand.ReadAsArray(0,0, cols, rows).astype(np.int)
#calculate the variance for a 3x3 window
outVariance = np.zeros((rows, cols), np.float)
outVariance[1:rows-1,1:cols-1] = np.var([(data[0:rows-2,0:cols-2]),
(data[0:rows-2,1:cols-1]),
(data[0:rows-2,2:cols] ),
(data[1:rows-1,0:cols-2]),
(data[1:rows-1,1:cols-1]),
(data[1:rows-1,2:cols] ),
(data[2:rows,0:cols-2] ),
(data[2:rows,1:cols-1] ),
(data[2:rows,2:cols] )])
#output
outDS = driver.Create(outFN, cols, rows, 1, GDT_Float32)
outDS.SetGeoTransform(srcDS.GetGeoTransform())
outDS.SetProjection(srcDS.GetProjection())
outBand = outDS.GetRasterBand(1)
outBand.WriteArray(outVariance,0,0)
...
You could try Scipy, it has a function for running local filters on an array.
from scipy import ndimage
outVariance = ndimage.generic_filter(data, np.var, size=3)
It has a 'mode=' keyword for how the edges should be handled.
edit:
You can test it yourself, declare a 3x3 array:
a = np.random.rand(3,3)
a
[[ 0.01869967 0.14037373 0.32960675]
[ 0.17213158 0.35287243 0.13498175]
[ 0.29511881 0.46387688 0.89359801]]
For a 3x3 window, the variance of the center cell of the array will simply be:
print np.var(a)
0.058884734425985602
That value should be equal to the center cell of the returned array by Scipy:
print ndimage.generic_filter(a, np.var, size=3)
print ndimage.generic_filter(a, np.var, size=(3,3))
print ndimage.generic_filter(a, np.var, footprint=np.ones((3,3)))
[[ 0.01127325 0.01465338 0.00959321]
[ 0.02001052 0.05888473 0.07897385]
[ 0.00978547 0.06966683 0.09633447]]
Note that all other values in the array are 'edge-values' so the result depends on how Scipy handles the edges. It defaults to mode='reflect'.
See the documentation for more detailed information:
http://docs.scipy.org/doc/scipy/reference/generated/scipy.ndimage.filters.generic_filter.html
simpler solution and also faster : use uniform
and a "variance trick" explained here : http://imagej.net/Integral_Image_Filters (the variance is the difference between "sum of square" and "square of sum")
import numpy as np
from scipy import ndimage
rows, cols = 500, 500
win_rows, win_cols = 5, 5
img = np.random.rand(rows, cols)
win_mean = ndimage.uniform_filter(img,(win_rows,win_cols))
win_sqr_mean = ndimage.uniform_filter(img**2,(win_rows,win_cols))
win_var = win_sqr_mean - win_mean**2
the generic_filter is 40 times slower than the strides...

Calculating variance image python

Is there an easy way to calculate a running variance filter on an image using Python/NumPy/Scipy? By running variance image I mean the result of calculating sum((I - mean(I))^2)/nPixels for each sub-window I in the image.
Since the images are quite large (12000x12000 pixels), I want to avoid the overhead of converting the arrays between formats just to be able to use a different library and then convert back.
I guess I could do this manually by finding the mean using something like
kernel = np.ones((winSize, winSize))/winSize**2
image_mean = scipy.ndimage.convolve(image, kernel)
diff = (image - image_mean)**2
# Calculate sum over winSize*winSize sub-images
# Subsample result
but it would be much nicer to have something like the stdfilt-function from Matlab.
Can anyone point me in the direction of a library that has this functionality AND supports numpy arrays, or hint at/provide a way to do this in NumPy/SciPy?
Simpler solution and also faster: use SciPy's ndimage.uniform_filter
import numpy as np
from scipy import ndimage
rows, cols = 500, 500
win_rows, win_cols = 5, 5
img = np.random.rand(rows, cols)
win_mean = ndimage.uniform_filter(img, (win_rows, win_cols))
win_sqr_mean = ndimage.uniform_filter(img**2, (win_rows, win_cols))
win_var = win_sqr_mean - win_mean**2
The "stride trick" is beautiful trick, but 4 slower and not that readable.
the generic_filter is 20 times slower than the strides...
You can use numpy.lib.stride_tricks.as_strided to get a windowed view of your image:
import numpy as np
from numpy.lib.stride_tricks import as_strided
rows, cols = 500, 500
win_rows, win_cols = 5, 5
img = np.random.rand(rows, cols)
win_img = as_strided(img, shape=(rows-win_rows+1, cols-win_cols+1,
win_rows, win_cols),
strides=img.strides*2)
And now win_img[i, j]is the (win_rows, win_cols) array with the top left corner at position [i, j]:
>>> img[100:105, 100:105]
array([[ 0.34150754, 0.17888323, 0.67222354, 0.9020784 , 0.48826682],
[ 0.68451774, 0.14887515, 0.44892615, 0.33352743, 0.22090103],
[ 0.41114758, 0.82608407, 0.77190533, 0.42830363, 0.57300759],
[ 0.68435626, 0.94874394, 0.55238567, 0.40367885, 0.42955156],
[ 0.59359203, 0.62237553, 0.58428725, 0.58608119, 0.29157555]])
>>> win_img[100,100]
array([[ 0.34150754, 0.17888323, 0.67222354, 0.9020784 , 0.48826682],
[ 0.68451774, 0.14887515, 0.44892615, 0.33352743, 0.22090103],
[ 0.41114758, 0.82608407, 0.77190533, 0.42830363, 0.57300759],
[ 0.68435626, 0.94874394, 0.55238567, 0.40367885, 0.42955156],
[ 0.59359203, 0.62237553, 0.58428725, 0.58608119, 0.29157555]])
You have to be careful, though, with not converting your windowed view of the image, into a windowed copy of it: in my example that would require 25 times more storage. I believe numpy 1.7 lets you select more than one axis, so you could then simply do:
>>> np.var(win_img, axis=(-1, -2))
I am stuck with numpy 1.6.2, so I cannot test that. The other option, which may fail with not-so-large windows, would be to do, if I remember my math correctly:
>>> win_mean = np.sum(np.sum(win_img, axis=-1), axis=-1)/win_rows/win_cols
>>> win_sqr_mean = np.sum(np.sum(win_img**2, axis=-1), axis=-1)/win_rows/win_cols
>>> win_var = win_sqr_mean - win_mean**2
And now win_var is an array of shape
>>> win_var.shape
(496, 496)
and win_var[i, j] holds the variance of the (5, 5) window with top left corner at [i, j].
After a bit of optimization we came up with this function for a generic 3D image:
def variance_filter( img, VAR_FILTER_SIZE ):
from numpy.lib.stride_tricks import as_strided
WIN_SIZE=(2*VAR_FILTER_SIZE)+1
if ~ VAR_FILTER_SIZE%2==1:
print 'Warning, VAR_FILTER_SIZE must be ODD Integer number '
# hack -- this could probably be an input to the function but Alessandro is lazy
WIN_DIMS = [ WIN_SIZE, WIN_SIZE, WIN_SIZE ]
# Check that there is a 3D image input.
if len( img.shape ) != 3:
print "\t variance_filter: Are you sure that you passed me a 3D image?"
return -1
else:
DIMS = img.shape
# Set up a windowed view on the data... this will have a border removed compared to the img_in
img_strided = as_strided(img, shape=(DIMS[0]-WIN_DIMS[0]+1, DIMS[1]-WIN_DIMS[1]+1, DIMS[2]-WIN_DIMS[2]+1, WIN_DIMS[0], WIN_DIMS[1], WIN_DIMS[2] ), strides=img.strides*2)
# Calculate variance, vectorially
win_mean = numpy.sum(numpy.sum(numpy.sum(img_strided, axis=-1), axis=-1), axis=-1) / (WIN_DIMS[0]*WIN_DIMS[1]*WIN_DIMS[2])
# As per http://en.wikipedia.org/wiki/Variance, we are removing the mean from every window,
# then squaring the result.
# Casting to 64 bit float inside, because the numbers (at least for our images) get pretty big
win_var = numpy.sum(numpy.sum(numpy.sum((( img_strided.T.astype('<f8') - win_mean.T.astype('<f8') )**2).T, axis=-1), axis=-1), axis=-1) / (WIN_DIMS[0]*WIN_DIMS[1]*WIN_DIMS[2])
# Prepare an output image of the right size, in order to replace the border removed with the windowed view call
out_img = numpy.zeros( DIMS, dtype='<f8' )
# copy borders out...
out_img[ WIN_DIMS[0]/2:DIMS[0]-WIN_DIMS[0]+1+WIN_DIMS[0]/2, WIN_DIMS[1]/2:DIMS[1]-WIN_DIMS[1]+1+WIN_DIMS[1]/2, WIN_DIMS[2]/2:DIMS[2]-WIN_DIMS[2]+1+WIN_DIMS[2]/2, ] = win_var
# output
return out_img.astype('>f4')
You can use scipy.ndimage.generic_filter. I can't test with matlab, but perhaps this gives you what you're looking for:
import numpy as np
import scipy.ndimage as ndimage
subs = 10 # this is the size of the (square) sub-windows
img = np.random.rand(500, 500)
img_std = ndimage.filters.generic_filter(img, np.std, size=subs)
You can make the sub-windows of arbitrary sizes using the footprint keyword. See this question for an example.

How to create a grid from LiDAR points (X,Y,Z) with GDAL python?

I'm new really to python programming, and I was just wondering if you can create a regular grid of 0.5 by o.5 m of resolution using LiDAR points.
My data are in LAS format (reading with from liblas import file as lasfile) and they have the following format: X,Y,Z. Where X and Y are coordinates.
The points are randomly positioned and some pixel are empty (NAN value) and in some pixel there are more of one points. Where there are more of one point, I wish to obtain a mean value. In the end i need to save the data in a TIF format or Ascii format.
I am studying osgeo module and GDAL but I honest to say that i don't know if osgeo module is the best solution.
I am really glad for help with some code that i can study and implement,
Thanks in Advance for the help, I really need.
I don't know the best way to get a grid with these parameters.
It's a bit late but maybe this answer will be useful for others, if not for you...
I have done this with Numpy and Pandas, and it's pretty fast. I was using TLS data and could do this with several million data points without any trouble on a decent 2009-vintage laptop. The key is 'binning' by rounding the data, and then using Pandas' GroupBy methods to do the aggregating and calculate the means.
If you need to round to a power of 10 you can use np.round, otherwise you can round to an arbitrary value by making a function to do so, which I have done by modifying this SO answer.
import numpy as np
import pandas as pd
# make rounding function:
def round_to_val(a, round_val):
return np.round( np.array(a, dtype=float) / round_val) * round_val
# load data
data = np.load( 'shape of ndata, 3')
n_d = data.shape[0]
# round the data
d_round = np.empty( [n_d, 5] )
d_round[:,0] = data[:,0]
d_round[:,1] = data[:,1]
d_round[:,2] = data[:,2]
del data # free up some RAM
d_round[:,3] = round_to_val( d_round[:,0], 0.5)
d_round[:,4] = round_to_val( d_round[:,1], 0.5)
# sorting data
ind = np.lexsort( (d_round[:,4], d_round[:,3]) )
d_sort = d_round[ind]
# making dataframes and grouping stuff
df_cols = ['x', 'y', 'z', 'x_round', 'y_round']
df = pd.DataFrame( d_sort)
df.columns = df_cols
df_round = df[['x_round', 'y_round', 'z']]
group_xy = df_round.groupby(['x_round', 'y_round'])
# calculating the mean, write to csv, which saves the file with:
# [x_round, y_round, z_mean] columns. You can exit Python and then start up
# later to clear memory if that's an issue.
group_mean = group_xy.mean()
group_mean.to_csv('your_binned_data.csv')
# Restarting...
import numpy as np
from scipy.interpolate import griddata
binned_data = np.loadtxt('your_binned_data.csv', skiprows=1, delimiter=',')
x_bins = binned_data[:,0]
y_bins = binned_data[:,1]
z_vals = binned_data[:,2]
pts = np.array( [x_bins, y_bins])
pts = pts.T
# make grid (with borders rounded to 0.5...)
xmax, xmin = 640000.5, 637000
ymax, ymin = 6070000.5, 6067000
grid_x, grid_y = np.mgrid[640000.5:637000:0.5, 6067000.5:6070000:0.5]
# interpolate onto grid
data_grid = griddata(pts, z_vals, (grid_x, grid_y), method='cubic')
# save to ascii
np.savetxt('data_grid.txt', data_grid)
When I've done this, I have saved the output as a .npy and converted to a tiff with the Image library, and then georeferenced in ArcMap. There is probably a way to do that with osgeo but I haven't used it.
Hope this helps someone at least...
You can use the histogram function in Numpy to do binning, for instance:
import numpy as np
points = np.random.random(1000)
#create 10 bins from 0 to 1
bins = np.linspace(0, 1, 10)
means = (numpy.histogram(points, bins, weights=data)[0] /
numpy.histogram(points, bins)[0])
Try LAStools, particularly lasgrid or las2dem.

Categories

Resources