calculate pixel by pixel mean of the rasters using numpy - python

Since the two rasters (raster1 and raster2) overlap each other, I want to make new raster by calculating mean of each overlapped pixels; i.e., The resulting new raster is calculated as:
new = [[mean(1,3), mean(1,3), mean(1,3), mean(1,3), mean(1,3)],[mean(2,4),mean(2,4),mean(2,4),mean(2,4),mean(2,4)]]
import numpy as np
raster1 = np.array([[1,1,1,1,1],[2,2,2,2,2]])
raster2 = np.array([[3,3,3,3,3],[4,4,4,4,4]])
new = np.mean(raster1,raster2,axis=1)
print (new.tolist())
What is wrong?

Maybe I misunderstood you but do you want?
raster = (raster1 + raster2) / 2
Actually in this case you don't even need np.mean, just use matrix operations.
np.mean is used to deal with calculating mean for a single matrix on specific axis, so it is a different situation.

It should be
new = np.mean([raster1,raster2],axis=1)
with brackets. Actually I am guessing it should be
It should be
new = np.mean([raster1,raster2],axis=0)
The first argument to np.mean should be the whole array, see e.g. http://wiki.scipy.org/Numpy_Example_List_With_Doc#mean

Related

Applying numpy masks with multiple matching criteria

I am a python newbie, trying to understand how to work with numpy masks better.
I have two 2D data arrays plus axis values, so something like
import numpy as np
data1=np.arange(50).reshape(10,5)
data2=np.random.rand(10,5)
x=5*np.arange(5)+15
y=2*np.arange(10)
Where x contains the coordinates of the 1st axis of data1 and data2, and y gives the coordinates of the 2nd axis of data1 and data2.
I want to identify and count all the points in data1 for which
data1>D1min,
the corresponding x values are inside a given range, XRange, and
the corresponding y is are inside a given range, YRange
Then, when I am all done, I also need to do a check to make sure none of the corresponding data2 values are less than another limit, D2Max
so if
XRange = [27,38]
YRange = [2,12]
D1min = 23
D2Max = 0.8
I would want to include cells 3:4 in the x direction and 1:6 in the 2nd dimension (assuming I want to include the limiting values).
That means I would only consider data1[3:4,1:6]
Then the limits of the values in the 2D arrays come into it, so want to identify and count points for which data1[3:4,1:6] > 23.
Once I have done that I want to take those data locations and check to see if any of those locations have values <0.8 in data2.
In reality I don't have formulas for x and y, and the arrays are much larger. Also, x and y might not even be monotonic.
I figure I should use numpy masks for this and I have managed to do it, but the result seems really tortured - I think the code wold be more clear if I just looped though the values in the 2D arrays.
I think the main problem is that I have trouble combining masks with boolean operations. The ideas I get from searching on line often don't seem to work on arrays.
I assume there is a elegant and (hopefully) understandable way to do this in just a few lines with masks. Would anyone care to explain it to me?
Well I eventually came up with something, so I thought I'd post it. I welcome suggested improvements.
#expand x and y into 2D arrays so that they can more
#easily be used for masking using tile
x2D = np.tile(x,(len(y),1))
y2D = np.tile(y,(len(x),1)).T
#mask these based on the ranges in X and Y
Xmask = np.ma.masked_outside(x2D,XRange[0],XRange[1]).mask
Ymask = np.ma.masked_outside(y2D,YRange[0],YRange[1]).mask
#then combine them
#Not sure I need the shrink=False, but it seems safer
XYmask = np.ma.mask_or(Xmask, Ymask,shrink=False)
#now mask the data1 array based on D1mask.
highdat = np.ma.masked_less(data1,D1min)
#combine with XYmask
data1mask = np.ma.mask_or(highdat.mask, XYmask,shrink=False)
#apply to data1
data1masked = np.ma.masked_where(data1mask,data1)
#number of points fulfilling my criteria
print('Number of points: ',np.ma.count(data1masked))
#transfer mask from data1 to data2
data2masked = np.ma.masked_where(data1mask, data2)
#do my check based on data2
if data2masked.min() < D2Max: print('data2 values are low!')

Calculate minimum distance of point in image mask to a set of points

I have an image mask (differenceM) like this:
For every single white pixel (pixel value != 0), I want to calculate the minimum distance from that pixel to a set of points. The set of points are points on the external contour which will be stored as a numpy array of [x_val y_val]. I was thinking of doing this:
...
def calcMinDist(dilPoints):
...
#returns 2d array (same shape as image)
def allMinDistDil(dilMask):
dilPoints = getPoints(dilMask)
...
return arrayOfMinValues
#more code here
blkImg = np.zeros(maskImage.shape,dtype=np.uint8)
blkImg.fill(0)
img_out = np.where(differenceM,allMinDistDil(dilatedMask),blkImg)
....
But the problem with this is, in order to calculate minimum distance from a pixel point to a set of points (obtained from getPoints function), I'll need to pass in the pixel point (index?) as well. But (if my understanding is correct,) with this where function, it only checks for true and false values in the first parameter...So the way I wrote the np.where() function won't work.
I've considered using nested for loops for this problem but I'm trying to avoid using for loops because I have a lot of images to process.
May I ask for suggestions to solve this? Any help would be greatly appreciated!
(not enough rep to comment) As for distance you probably want scipy.spatial.distance.cdist( X, Y ) . You can calculate a minimum distance with as simple as:
from scipy.spatial import distance
def min_distance(points, set_of_points):
return distance.cdist(np.atleast_1d(point), set_of_points).min()
As for np.where can you provide a bit more on your data structure? Most of the times a simple boolean mask will do the job...
Instead of using the np.where() function to find the specific pixels that are not zero, I applied:
diffMaskNewArray = np.transpose(np.nonzero(binaryThreshMask))
to get the points where the values are not zeros. With this array of points, I iterated through each point in this array and compared it with the array of boundary points of the masks and used:
shortestDistDil = np.amin(distance.cdist(a, b, 'euclidean'))
to find the min distance between the point and the set of boundary points.

Counting the number of times a threshold is met or exceeded in a multidimensional array in Python

I have an numpy array that I brought in from a netCDF file with the shape (930, 360, 720) where it is organized as (time, latitudes, longitudes).
At each lat/lon pair for each of the 930 time stamps, I need to count the number of times that the value meets or exceeds a threshold "x" (such as 0.2 or 0.5 etc.) and ultimately calculate the percentage that the threshold was exceeded at each point, then output the results so they can be plotted later on.
I have attempted numerous methods but here is my most recent:
lat_length = len(lats)
#where lats has been defined earlier when unpacked from the netCDF dataset
lon_length = len(lons)
#just as lats; also these were defined before using np.meshgrid(lons, lats)
for i in range(0, lat_length):
for j in range(0, lon_length):
if ice[:,i,j] >= x:
#code to count number of occurrences here
#code to calculate percentage here
percent_ice[i,j] += count / len(time) #calculation
#then go on to plot percent_ice
I hope this makes sense! I would greatly appreciate any help. I'm self taught in Python so I may be missing something simple.
Would this be a time to use the any() function? What would be the most efficient way to count the number of times the threshold was exceeded and then calculate the percentage?
You can compare the input 3D array with the threshold x and then sum along the first axis with ndarray.sum(axis=0) to get the count and thereby the percentages, like so -
# Calculate count after thresholding with x and summing along first axis
count = (ice > x).sum(axis=0)
# Get percentages (ratios) by dividing with first axis length
percent_ice = np.true_divide(count,ice.shape[0])
Ah, look, another meteorologist!
There are probably multiple ways to do this and my solution is unlikely to be the fastest since it uses numpy's MaskedArray, which is known to be slow, but this should work:
Numpy has a data type called a MaskedArray which actually contains two normal numpy arrays. It contains a data array as well as a boolean mask. I would first mask all data that are greater than or equal to my threshold (use np.ma.masked_greater() for just greater than):
ice = np.ma.masked_greater_equal(ice)
You can then use ice.count() to determine how many values are below your threshold for each lat/lon point by specifying that you want to count along a specific axis:
n_good = ice.count(axis=0)
This should return a 2-dimensional array containing the number of good points. You can then calculate the number of bad by subtracting n_good from ice.shape[0]:
n_bad = ice.shape[0] - n_good
and calculate the percentage that are bad using:
perc_bad = n_bad/float(ice.shape[0])
There are plenty of ways to do this without using MaskedArray. This is just the easy way that comes to mind for me.

Polar transformation of pandas DataFrame

I have a pandas.DataFrame 2048 by 2048 with index and columns representing y and x coordinates respectively.
I want to make an axis transformation and get to polar coordinates, making a new pandas.DataFrame with index and columns representing radius and polar angle.
The only way I can think of is access the dataframe values one by one, calculating radius and angle and then setting the according value in the new dataframe, but it is extremely slow, since element-wise operations are not that fast in pandas. It's still slow even if I perform the operation row-by-row.
Is there a better way to do that without writing my own CPython extension in C?
This should not be problematic if your dataframe is built as I understand it (though I think I am wrong). To build an example:
from __future__ import division
import numpy as np, pandas as pd
index = np.arange(1,2049,dtype=float)
cols = np.arange(2050,4098,dtype=float)
df = pd.DataFrame(index=index, columns=cols)
# now calculate angle and radius, then set in new dataframe
phi = np.arctan(df.columns/df.index)
r = np.power( np.power(df.index,2) + np.power(df.columns,2), 0.5 )
df_polar = pd.DataFrame(index=r, columns=phi)
While this agrees with what you stated, I think I have missed something here. If this is not right, can you clarify?

Avoid for-loops in assignment of data values

So this is a little follow up question to my earlier question: Generate coordinates inside Polygon and my answer https://stackoverflow.com/a/15243767/1740928
In fact, I want to bin polygon data to a regular grid. Therefore, I calculate a couple of coordinates within the polygon and translate their lat/lon combination to their respective column/row combo of the grid.
Currently, the row/column information is stored in a numpy array with its number of rows corresponding to the number of data polygons and its number of columns corresponding to the coordinates in the polygon.
The whole code takes less then a second, but this code is the bottleneck at the moment (with ~7sec):
for ii in np.arange(len(data)):
for cc in np.arange(data_lats.shape[1]):
final_grid[ row[ii,cc], col[ii,cc] ] += data[ii]
final_grid_counts[ row[ii,cc], col[ii,cc] ] += 1
The array "data" simply contains the data values for each polygon (80000,). The arrays "row" and "col" contain the row and column number of a coordinate in the polygon (shape: (80000,16)).
As you can see, I am summing up all data values within each grid cell and count the number of matches. Thus, I know the average for each grid cell in case different polygons intersect it.
Still, how can these two for loops take around 7 seconds? Can you think of a faster way?
I think numpy should add an nd-bincount function, I had one lying around from a project I was working on some time ago.
import numpy as np
def two_d_bincount(row, col, weights=None, shape=None):
if shape is None:
shape = (row.max() + 1, col.max() + 1)
row = np.asarray(row, 'int')
col = np.asarray(col, 'int')
x = np.ravel_multi_index([row, col], shape)
out = np.bincount(x, weights, minlength=np.prod(shape))
return out.reshape(shape)
weights = np.column_stack([data] * row.shape[1])
final_grid = two_d_bincount(row.ravel(), col.ravel(), weights.ravel())
final_grid_counts = two_d_bincount(row.ravel(), col.ravel())
I hope this helps.
I might not fully understand the shapes of your different grids, but you can maybe eliminate the cc loop using something like this:
final_grid = np.empty((nrows,ncols))
for ii in xrange(len(data)):
final_grid[row[ii,:],col[ii,:]] = data[ii]
This of course assumes that final_grid is starting with no other info (that the count you're incrementing starts at zero). And I'm not sure how to test if it works not understanding how your row and col arrays work.

Categories

Resources