numpy moving window percent cover - python

I have a classified raster that I am reading into a numpy array. (n classes)
I want to use a 2d moving window (e.g. 3 by 3) to create a n-dimensional vector that stores the %cover of each class within the window. Because the raster is large it would be useful to store this information so as not to re-compute it each time....therefor I think the best solution is creating a 3d array to act as the vector. A new raster will be created based on these %/count values.
My idea is to:
1) create a 3d array n+1 'bands'
2) band 1 = the original classified raster. each other 'band' value = count cells of a value within the window (i.e. one band per class) ....for example:
[[2 0 1 2 1]
[2 0 2 0 0]
[0 1 1 2 1]
[0 2 2 1 1]
[0 1 2 1 1]]
[[2 2 3 2 2]
[3 3 3 2 2]
[3 3 2 2 2]
[3 3 0 0 0]
[2 2 0 0 0]]
[[0 1 1 2 1]
[1 3 3 4 2]
[1 2 3 4 3]
[2 3 5 6 5]
[1 1 3 4 4]]
[[2 3 2 2 1]
[2 3 3 3 2]
[2 4 4 3 1]
[1 3 5 3 1]
[1 3 3 2 0]]
4) read these bands into a vrt so only needs be created the once ...and can be read in for further modules.
Question: what is the most efficient 'moving window' method to 'count' within the window?
Currently - I am trying, and failing with the following code:
def lcc_binary_vrt(raster, dim, bands):
footprint = np.zeros(shape = (dim,dim), dtype = int)+1
g = gdal.Open(raster)
data = gdal_array.DatasetReadAsArray(g)
#loop through the band values
for i in bands:
print i
# create a duplicate '0' array of the raster
a_band = data*0
# we create the binary dataset for the band
a_band = np.where(data == i, 1, a_band)
count_a_band_fname = raster[:-4] + '_' + str(i) + '.tif'
# run the moving window (footprint) accross the band to create a 'count'
count_a_band = ndimage.generic_filter(a_band, np.count_nonzero(x), footprint=footprint, mode = 'constant')
geoTiff.create(count_a_band_fname, g, data, count_a_band, gdal.GDT_Byte, np.nan)
Any suggestions very much appreciated.
Becky

I don't know anything about the spatial sciences stuff, so I'll just focus on the main question :)
what is the most efficient 'moving window' method to 'count' within the window?
A common way to do moving window statistics with Numpy is to use numpy.lib.stride_tricks.as_strided, see for example this answer. Basically, the idea is to make an array containing all the windows, without any increase in memory usage:
from numpy.lib.stride_tricks import as_strided
...
m, n = a_band.shape
newshape = (m-dim+1, n-dim+1, dim, dim)
newstrides = a_band.strides * 2 # strides is a tuple
count_a_band = as_strided(ar, newshape, newstrides).sum(axis=(2,3))
However, for your use case this method is inefficient, because you're summing the same numbers over and over again, especially if the window size increases. A better way is to use a cumsum trick, like in this answer:
def windowed_sum_1d(ar, ws, axis=None):
if axis is None:
ar = ar.ravel()
else:
ar = np.swapaxes(ar, axis, 0)
ans = np.cumsum(ar, axis=0)
ans[ws:] = ans[ws:] - ans[:-ws]
ans = ans[ws-1:]
if axis:
ans = np.swapaxes(ans, 0, axis)
return ans
def windowed_sum(ar, ws):
for axis in range(ar.ndim):
ar = windowed_sum_1d(ar, ws, axis)
return ar
...
count_a_band = windowed_sum(a_band, dim)
Note that in both codes above it would be tedious to handle edge cases. Luckily, there is an easy way to include these and get the same efficiency as the second code:
count_a_band = ndimage.uniform_filter(a_band, size=dim, mode='constant') * dim**2
Though very similar to what you already had, this will be much faster! The downside is that you may need to round to integers to get rid of floating point rounding errors.
As a final note, your code
# create a duplicate '0' array of the raster
a_band = data*0
# we create the binary dataset for the band
a_band = np.where(data == i, 1, a_band)
is a bit redundant: You can just use a_band = (data == i).

Related

Python Numpy. Delete an element (or elements) in a 2D array if said element is located between a pair of specified elements

I have a 2D NumPy array exclusively filled with 1s and 0s.
a = [[0 0 0 0 1 0 0 0 1]
[1 1 1 1 1 1 1 1 1]
[1 1 1 1 1 1 1 1 1]
[1 1 1 1 0 0 0 0 1]
[1 1 1 1 1 1 1 1 1]
[1 1 1 0 1 1 1 1 1]
[1 1 1 1 1 1 0 0 1]
[1 1 1 1 1 1 1 1 1]]
To get the location of the 0s I used the following code:
new_array = np.transpose(np.nonzero(a==0))
As expected, I get the following result showing the location of the 0s within the array
new_array = [[0 0]
[0 1]
[0 2]
[0 3]
[0 5]
[0 6]
[0 7]
[3 4]
[3 5]
[3 6]
[3 7]
[5 3]
[6 6]
[6 7]]
Now comes my question:
Is there way to get the location of the 0s at the start and end of and horizontal group if said group is larger than 2?
EDIT: If group were to finish at the end of a row and continue on the one below it, it would count as 2 separate groups.
My first thought was to implement a process that would delete 0s if they are located in-between 0s but I was not able to figure out how to do that.
I would like "new_array" output to be:
new_array = [[0 0]
[0 3]
[0 5]
[0 7]
[3 4]
[3 7]
[5 3]
[6 6]
[6 7]]
Thanks beforehand!!
EDIT 2:
Thanks you all for your very helpful insights, I was able to solve the problem that I had.
To satisfy the curiosity, this data represents musical information. The purpose of the program I'm working on is to create a musical score based on a image (that consist exclusively of horizontal lines).
Once the image conversion to 1s and 0s is done, I needed to extract the following information from it: Onset, Pitch, and Duration. This translates into position in the "x" axis, position on the "y" axis and total length of group.
Since X and Y locations are fairly easy to get, I decided to process them separately from the "Duration" calculation (which was the main problem to solve in this post).
Thanks to your help I was able to solve the Duration problem and create a new array with all necessary information:
[[0 0 4]
[5 0 3]
[4 3 4]
[6 6 2]]
Note that 1st column represent Onset, 2nd column represents Pitch, and 3rd column represents Duration.
It has also come to my attention the comment that suggested to add an identifier to each event. Eventually I will need to implement that to differentiate between different instruments (and later sending them to individual Midi channels). However, for this first iteration of the program that only aims to create a music score for a single instrument, it is not necessary since all events belong to a single instrument.
I have very little experience with programming, I don't know if this was the most efficient way of achieving my goal. Any suggestions are welcomed.
Thanks!
One possible solution that is easier to follow is:
b = np.diff(a, prepend=1) # prepend a column of 1s and detect
# jumps between adjacent columns (left to right)
y, x = np.where(b > 0) # find positions of the jumps 0->1 (left to right)
# shift positive jumps to the left by 1 position while filling gaps with 0:
b[y, x - 1] = 1
b[y, x] = 0
new_array = list(zip(*np.where(b)))
Another one is:
new_array = list(zip(*np.where(np.diff(a, n=2, prepend=1, append=1) > 0)))
Both solutions are based on the np.diff that computes differences between consecutive columns (when axis=-1 for 2D arrays).
A flaw in the other solution is that it reports all sequences
of zeroes, regardless of their length.
Your expected output also contains such groups, composed of 1 or 2
zeroes, but in my opinion it shouldn't.
My solution is free of the above flaw.
An elegant tool to process groups of adjacent equal elements is
itertools.groupby, so start from:
import itertools
Then generate your intended result as:
res = []
for rowIdx, row in enumerate(a):
colIdx = 0 # Start column index
for k, grp in itertools.groupby(row):
vals = list(grp) # Values in the group
lgth = len(vals) # Length of the group
colIdx2 = colIdx + lgth - 1 # End column index
if k == 0 and lgth > 2: # Record this group
res.append([rowIdx, colIdx])
res.append([rowIdx, colIdx2])
colIdx = colIdx2 + 1 # Advance column index
result = np.array(res)
The result, for your source data, is:
array([[0, 0],
[0, 3],
[0, 5],
[0, 7],
[3, 4],
[3, 7]])
As you can see, it doesn't include shorter sequences of zeroes
in row 5 and 6.

How do I subtract two columns from the same array and put the value in their own single column array with numpy?

Lets say i have a single array of 3x4 (3 rows, 4 columns) for example
import numpy as np
data = [[0,5,0,1], [0,5,0,1], [0,5,0,1]]
data = np.array(data)
print(data)
[[0 5 0 1]
[0 5 0 1]
[0 5 0 1]]
and i want to subtract column 4 from column 2 and have the values in their own, named, 3x1 array like this
print(subtraction)
[[4]
[4]
[4]]
how would i go about this in numpy?
result = (data[:, 1] - data[:, 3]).reshape((3, 1))

smart assignment in 2D numpy array based on numpy 1D array

I have a numpy 2D array and I want to turn it to -1\1 values based on the following logic:
a. find the argmax() of each row
b. based on that 1D array (a) assign the values it contain the value 1
c. based on the negation of this 1D array assign the value -1
Example:
arr2D = np.random.randint(10,size=(3,3))
idx = np.argmax(arr2D, axis=1)
arr2D = [[5 4 1]
[0 9 4]
[4 2 6]]
idx = [0 1 2]
arr2D[idx] = 1
arr2D[~idx] = -1
what I get is this:
arr2D = [[-1 -1 -1]
[-1 -1 -1]
[-1 -1 -1]]
while I wanted:
arr2D = [[1 -1 -1]
[-1 1 -1]
[-1 -1 1]]
appreciate some help,
Thanks
Approach #1
Create a mask with those argmax -
mask = idx[:,None] == np.arange(arr2D.shape[1])
Then, use those indices and then use it to create those 1s and -1s array -
out = 2*mask-1
Alternatively, we could use np.where -
out = np.where(mask,1,-1)
Approach #2
Another way to create the mask would be -
mask = np.zeros(arr2D.shape, dtype=bool)
mask[np.arange(len(idx)),idx] = 1
Then, get out using one of the methods as listed in approach #1.
Approach #3
One more way would be like so -
out = np.full(arr2D.shape, -1)
out[np.arange(len(idx)),idx] = 1
Alternatively, we could use np.put_along_axis for the assignment -
np.put_along_axis(out,idx[:,None],1,axis=1)

Masking nested array with value at index with a second array

I have a nested array with some values. I have another array, where the length of both arrays are equal. I'd like to get an output, where I have a nested array of 1's and 0's, such that it is 1 where the value in the second array was equal to the value in that nested array.
I've taken a look on existing stack overflow questions but have been unable to construct an answer.
masks_list = []
for i in range(len(y_pred)):
mask = (y_pred[i] == y_test.values[i]) * 1
masks_list.append(mask)
masks = np.array(masks_list);
Essentially, that's the code I currently have and it works, but I think that it's probably not the most effecient way of doing it.
YPRED:
[[4 0 1 2 3 5 6]
[0 1 2 3 5 6 4]]
YTEST:
8 1
5 4
Masks:
[[0 0 1 0 0 0 0]
[0 0 0 0 0 0 1]]
Another good solution with less line of code.
a = set(y_pred).intersection(y_test)
f = [1 if i in a else 0 for i, j in enumerate(y_pred)]
After that you can check performance like in this answer as follow:
import time
from time import perf_counter as pc
t0=pc()
a = set(y_pred).intersection(y_test)
f = [1 if i in a else 0 for i, j in enumerate(y_pred)]
t1 = pc() - t0
t0=pc()
for i in range(len(y_pred)):
mask = (y_pred[i] == y_test[i]) * 1
masks_list.append(mask)
t2 = pc() - t0
val = t1 - t2
Generally it means if value is positive than the first solution are slower.
If you have np.array instead of list you can try do as described in this answer:
type(y_pred)
>> numpy.ndarray
y_pred = y_pred.tolist()
type(y_pred)
>> list
Idea(least loop): compare array and nested array:
masks = np.equal(y_pred, y_test.values)
you can look at this too:
np.array_equal(A,B) # test if same shape, same elements values
np.array_equiv(A,B) # test if broadcastable shape, same elements values
np.allclose(A,B,...) # test if same shape, elements have close enough values

How to create selective patches from an image based on the objects identified

I have created a numpy matrix with all elements initialized to zeros as shown:
[[[0 0 0 0]
[0 0 0 0]
[0 0 0 0]
...
This is to resemble an image of the screenshot of a webpage which is of the size 1200 X 1000.
I have identified a few rectangular region of interest for different HTML objects such as Radiobutton, Textbox and dropdown within the screenshot image and assigned them fixed values like 1,2 and 3 for the respective object-regions in the numpy matrix created.
So the resultant matrix almost looks like :
[[[0 0 0 0]
[0 0 0 0]
[0 0 0 0]
...,
[[1 1 1 1]
[1 1 1 1]
[1 1 1 1]
...,
[0 0 0 0]
[2 2 2 2]
[0 0 0 0]
...,
I wish to now prepare data set for Convolutional neural network with the patches from the screenshot image. For the purpose of improving the quality of the data supplied to the CNN, I wish to filter the patches and provide only the patches to the CNN which has presence of the objects i.e. Textbox, Radiobutton etc which were detected earlier (Radiobutton and dropdown selections should be there fully and button atleast 50% of the region should be included in the patch). Any ideas how it can be realized in python?
a very naive approach
maxY, maxX = np.shape(theMatrix)
for curY in range(0,maxY):
for curX in range(0,maxX):
print theMatrix[curY,curX],
print " "
You can just use the plot function to plot a 2D-array. Take for example:
import numpy as np
import matplotlib.pyplot as pyplot
x = np.random.rand(3, 2)
which will yield us
array([[ 0.53255518, 0.04687357],
[ 0.4600085 , 0.73059902],
[ 0.7153942 , 0.68812506]])
If you use the pyplot.plot(x, 'ro'), it will plot you the figure given below.
The row numbers are put in x-axis and the values are plotted in the y-axis. But from the nature of your problem , I suspect you need the columns numbers to be put in x-axis and the values in y-axis. To do so, you can simply transpose your matrix.
pyplot.plot(x.T,'ro')
pyplot.show()
which now yields (for the same array) the figure given below.

Categories

Resources