If I have an array and I apply summation
arr = np.array([[1.,1.,2.],[2.,3.,4.],[4.,5.,6]])
np.sum(arr,axis=1)
I get the total along the three rows ([4.,9.,15.])
My complication is that arr contains data that may be bad after a certain column index. I have an integer array that tells me how many "good" values I have in each row and I want to sum/average over the good values. Say:
ngoodcols=np.array([0,1,2])
np.sum(arr[:,0:ngoodcols],axis=1) # not legit but this is the idea
It is clear how to do this in a loop, but is there a way to sum only that many, producing [0.,2.,9.] without resorting to looping? Equivalently, I could use nansum if I knew how to set the elements in column indexes higher than b equal to np.nan, but this is a nearly equivalent problem as far as slicing is concerned.
One possibility is to use masked arrays:
import numpy as np
arr = np.array([[1., 1., 2.], [2., 3., 4.], [4., 5., 6]])
ngoodcols = np.array([0, 1, 2])
mask = ngoodcols[:, np.newaxis] <= np.arange(arr.shape[1])
arr_masked = np.ma.masked_array(arr, mask)
print(arr_masked)
# [[-- -- --]
# [2.0 -- --]
# [4.0 5.0 --]]
print(arr_masked.sum(1))
# [-- 2.0 9.0]
Note that here when there are not good values you get a "missing" value as a result, which may or may not be useful for you. Also, a masked array also allows you to easily do other operations that only apply for valid values (mean, etc.).
Another simple option is to just multiply by the mask:
import numpy as np
arr = np.array([[1., 1., 2.], [2., 3., 4.], [4., 5., 6]])
ngoodcols = np.array([0, 1, 2])
mask = ngoodcols[:, np.newaxis] <= np.arange(arr.shape[1])
print((arr * ~mask).sum(1))
# [0. 2. 9.]
Here when there are no good values you just get zero.
Here is one way using Boolean indexing. This sets elements in column indexes higher than ones in ngoodcols equal to np.nan and use np.nansum:
import numpy as np
arr = np.array([[1.,1.,2.],[2.,3.,4.],[4.,5.,6]])
ngoodcols = np.array([0,1,2])
arr[np.asarray(ngoodcols)[:,None] <= np.arange(arr.shape[1])] = np.nan
print(np.nansum(arr, axis=1))
# [ 0. 2. 9.]
Related
This is a follow-up to my previous question.
Given an NxM matrix A, I want to efficiently obtain the NxN matrix whose ith row is the sum along the 2nd axis of the result of applying np.minimum between A and the ith row of A.
Using a for loop,
> A = np.array([[1, 2], [3, 4], [5,6]])
> output = np.zeros(shape=(A.shape[0], A.shape[0]))
> for i in range(A.shape[0]):
output[i] = np.sum(np.minimum(A, A[i]), axis=1)
> output
array([[ 3., 3., 3.],
[ 3., 7., 7.],
[ 3., 7., 11.]])
Is is possible to optimize this further without the for loop?
Edit: I would also like to do it without allocating an MxMxN tensor because of memory constraints.
instead of a for loop. Using the NumPy minimum and sum functions, you can compute the desired matrix output as follows:
output = np.sum(np.minimum(A[:, None], A), axis=2)
TL;DR:
Knowing the value of NUM_ROWS, rmin and rmax, how do I construct a bool array my_idx such that np.arange(NUM_ROWS)[my_idx] == np.arange(NUM_ROWS)[rmin:rmax]? Can the construction operation be broadcasted, if rmin and rmax are arrays ad I'm interested in all slices [slice(from, to) for from, to in zip(rmin, rmax)]?
Long version with details
I have an array of polygons in a 2D image and I want to find rows and columns of the image that don't contain a polygon. In order to do this fast, I'm trying to vectorize the code as much as possible.
I calculate the extreme points of each polygon on both dimension and obtain for each polygon the min_row, min_col, max_row and max_col values. Let's consider just the rows (as for the columns it 's the same algorithm) and assume that, for example, these are the values I obtain for two polygons:
NUM_ROWS = 10
# Two intervals: slice(1,5) and slice(7,8)
row_mins = np.array([1, 7], dtype=np.int32)
row_maxs = np.array([5, 8], dtype=np.int32)
I want now to merge the intervals in a way equivalent to:
row_mask = np.zeros(NUM_ROWS)
for rmin, rmax in zip(row_mins, row_maxs):
row_mask[rmin:rmax] = 1
however, it should avoid the for loop and repeated setting of values in row_mask.
I thought of doing this by turning each range into a bool array and using np.logical_or.reduce(), but I can't find a way to generate the bool array equivalent to the [rmin:rmax] index.
Is there a way to convert a slice object to a bool index?
EDIT: Found the right way to do it.
I stand corrected. There IS a way to unpack a list of slices inside np.r_ and its as simple as using a tuple(). That means, once you have your slices mapped to the rmin and rmax arrays, you can simply convert them into an array with np.r_ and use that to update the values of the mask to 1.
import numpy as np
NUM_ROWS = 15
## 3 slices (1:5), (7:10), (12:14)
row_mins = np.array([1, 7, 12])
row_maxs = np.array([5, 10, 14])
mask = np.zeros(NUM_ROWS) #Zeros
slices = list(map(slice,row_mins,row_maxs)) #List of slices
mask[np.r_[tuple(slices)]]=1 #get ranges from list of slices and then update mask
mask
array([0., 1., 1., 1., 1., 0., 0., 1., 1., 1., 0., 0., 1., 1., 0.])
Old method I recommended -
If you want to make a mask with multiple slices, then you can do this without a for loop (vectorized), by using np.hstack with np.arange to get all the indexes and then set them to 1.
import numpy as np
NUM_ROWS = 15
## 3 slices (1:5), (7:10), (12:14)
row_mins = np.array([1, 7, 12])
row_maxs = np.array([5, 10, 14])
mask = np.zeros(NUM_ROWS) #Zeros
idx = np.hstack(list(map(np.arange,row_mins,row_maxs))) #Indexes to choose
mask[idx]=1 #Set to 1
mask
array([0., 1., 1., 1., 1., 0., 0., 1., 1., 1., 0., 0., 1., 1., 0.])
EDIT: Another way -
You could use np.eye() -
s = slice(1,4)
mask = np.eye(10)[s].sum(0)
print(mask)
[0. 1. 1. 1. 0. 0. 0. 0. 0. 0.]
Over a list of slices -
masks = [np.eye(NUM_ROWS)[slice(i,j)].sum(0) for i,j in zip(row_mins, row_maxs)]
final = np.logical_or.reduce(masks)
final
array([False, True, True, True, True, False, False, True, True,
True, False, False, True, True, False])
Hope this helps:
arr = np.arange(NUM_ROWS)
bool_indices = (arr >= rmin) & (arr < rmax)
As you are looking for the intersection between the two, a logical and between them should create that array.
Using the rest of your solution:
arrs = [(b >= rmin) & (b<rmax) for rmin,rmax in zip(row_mins,row_maxs)]
mask = np.logical_or.reduce(arrs)
I have a huge 2d numpy array of lists (dtype object) that I want to convert into a 2d numpy array of dtype float, stacking the dimension represented by lists onto the 0th axis (rows). The lists within each row always have the exact same length, and have at least one element.
Here is a minimal reproduction of the situation:
import numpy as np
current_array = np.array(
[[[0.0], [1.0]],
[[2.0, 3.0], [4.0, 5.0]]]
)
desired_array = np.array(
[[0.0, 1.0],
[2.0, 4.0],
[3.0, 5.0]]
)
I looked around for solutions, and stack and dstack functions work only if the first level is a tuple. reshape would require the third level to be a part of the array. I wonder, is there any relatively efficient way to do it?
Currently, I am just counting the dimensions, creating empty array and filling the values one by one, which honestly does not seem like a good solution.
In [321]: current_array = np.array(
...: [[[0.0], [1.0]],
...: [[2.0, 3.0], [4.0, 5.0]]]
...: )
In [322]: current_array
Out[322]:
array([[list([0.0]), list([1.0])],
[list([2.0, 3.0]), list([4.0, 5.0])]], dtype=object)
In [323]: _.shape
Out[323]: (2, 2)
Rework the two rows:
In [328]: current_array[1,:]
Out[328]: array([list([2.0, 3.0]), list([4.0, 5.0])], dtype=object)
In [329]: np.stack(current_array[1,:],1)
Out[329]:
array([[2., 4.],
[3., 5.]])
In [330]: np.stack(current_array[0,:],1)
Out[330]: array([[0., 1.]])
combine them:
In [331]: np.vstack((_330, _329))
Out[331]:
array([[0., 1.],
[2., 4.],
[3., 5.]])
in one line:
In [333]: np.vstack([np.stack(row, 1) for row in current_array])
Out[333]:
array([[0., 1.],
[2., 4.],
[3., 5.]])
Author of the question here.
I found a slightly more elegant (and faster) way than filling the array one by one, which is:
desired = np.array([np.concatenate([np.array(d) for d in lis]) for lis in current.T]).T
print(desired)
'''
[[0. 1.]
[2. 4.]
[3. 5.]]
'''
But it still does quite the number of operations. It transposes the table to be able to stack the neighboring 'dimensions' (one of them is the lists) with np.concatenate, and then converts the result to np.array and transposes it back.
I have a large 2d numpy array and two 1d arrays that represent x/y indexes within the 2d array. I want to use these 1d arrays to perform an operation on the 2d array.
I can do this with a for loop, but it's very slow when working on a large array. Is there a faster way? I tried using the 1d arrays simply as indexes but that didn't work. See this example:
import numpy as np
# Two example 2d arrays
cnt_a = np.zeros((4,4))
cnt_b = np.zeros((4,4))
# 1d arrays holding x and y indices
xpos = [0,0,1,2,1,2,1,0,0,0,0,1,1,1,2,2,3]
ypos = [3,2,1,1,3,0,1,0,0,1,2,1,2,3,3,2,0]
# This method works, but is very slow for a large array
for i in range(0,len(xpos)):
cnt_a[xpos[i],ypos[i]] = cnt_a[xpos[i],ypos[i]] + 1
# This method is fast, but gives incorrect answer
cnt_b[xpos,ypos] = cnt_b[xpos,ypos]+1
# Print the results
print 'Good:'
print cnt_a
print ''
print 'Bad:'
print cnt_b
The output from this is:
Good:
[[ 2. 1. 2. 1.]
[ 0. 3. 1. 2.]
[ 1. 1. 1. 1.]
[ 1. 0. 0. 0.]]
Bad:
[[ 1. 1. 1. 1.]
[ 0. 1. 1. 1.]
[ 1. 1. 1. 1.]
[ 1. 0. 0. 0.]]
For the cnt_b array numpy is obviously not summing correctly, but I'm unsure how to fix this without resorting to the (v. inefficient) for loop used to calculate cnt_a.
Another approach by using 1D indexing (suggested by #Shai) extended to answer the actual question:
>>> out = np.zeros((4, 4))
>>> idx = np.ravel_multi_index((xpos, ypos), out.shape) # extract 1D indexes
>>> x = np.bincount(idx, minlength=out.size)
>>> out.flat += x
np.bincount calculates how many times each of the index is present in the xpos, ypos and stores them in x.
Or, as suggested by #Divakar:
>>> out.flat += np.bincount(idx, minlength=out.size)
We could compute the linear indices, then accumulate into zeros-initialized output array with np.add.at. Thus, with xpos and ypos as arrays, here's one implementation -
m,n = xpos.max()+1, ypos.max()+1
out = np.zeros((m,n),dtype=int)
np.add.at(out.ravel(), xpos*n+ypos, 1)
Sample run -
In [95]: # 1d arrays holding x and y indices
...: xpos = np.array([0,0,1,2,1,2,1,0,0,0,0,1,1,1,2,2,3])
...: ypos = np.array([3,2,1,1,3,0,1,0,0,1,2,1,2,3,3,2,0])
...:
In [96]: cnt_a = np.zeros((4,4))
In [97]: # This method works, but is very slow for a large array
...: for i in range(0,len(xpos)):
...: cnt_a[xpos[i],ypos[i]] = cnt_a[xpos[i],ypos[i]] + 1
...:
In [98]: m,n = xpos.max()+1, ypos.max()+1
...: out = np.zeros((m,n),dtype=int)
...: np.add.at(out.ravel(), xpos*n+ypos, 1)
...:
In [99]: cnt_a
Out[99]:
array([[ 2., 1., 2., 1.],
[ 0., 3., 1., 2.],
[ 1., 1., 1., 1.],
[ 1., 0., 0., 0.]])
In [100]: out
Out[100]:
array([[2, 1, 2, 1],
[0, 3, 1, 2],
[1, 1, 1, 1],
[1, 0, 0, 0]])
you can iterate on both lists at once, and increment for each couple (if you are not used to it, zip can combine lists)
for x, y in zip(xpos, ypos):
cnt_b[x][y] += 1
But this will be about the same speed as your solution A.
If your lists xpos/ypos are of length n, I don't see how you can update your matrix in less than o(n) since you'll have to check each pair one way or an other.
Other solution: you could count (with collections.Counter possibly) the similar index pairs (ex: (0, 3) etc...) and update the matrix with the count value. But I doubt it would be much faster, since you the time gained on updating the matrix would be lost on counting multiple occurrences.
Maybe I am totally wrong tho, in which case I'd be curious too to see a not o(n) answer
I think you are looking for ravel_multi_index funciton
lidx = np.ravel_multi_index((xpos, ypos), cnt_a.shape)
converts to "flatten" 1D indices into cnt_a and cnt_b:
np.add.at( cnt_b, lidx, 1 )
I'm trying to do some basic classification of numpy arrays...
I want to compare a 2d array against a 3d array, along the 3rd dimension, and make a classification based on the corresponding z-axis values.
so given 3 arrays that are stacked into a 3d array:
import numpy as np
a1 = np.array([[1,1,1],[1,1,1],[1,1,1]])
a2 = np.array([[3,3,3],[3,3,3],[3,3,3]])
a3 = np.array([[5,5,5],[5,5,5],[5,5,5]])
a3d = dstack((a1,a2,a3))
and another 2d array
a2d = np.array([[1,2,4],[5,5,2],[2,3,3]])
I want to be able to compare a2d against a3d, and return a 2d array of which level of a3d is closest. (or I suppose any custom function that can compare each value along the z-axis, and return a value base on that comparison.)
EDIT
I modified my arrays to more closely match my data. a1 would be the minimum values, a2 the average values, and a3 the maximum values. So I want to output if each a2d value is closer to a1 (classed "1") a2 (classed "2") or a3 (classed "3"). I'm doing as a 3d array because in the real data, it won't be a simple 3-array choice, but for SO purposes, it helps to keep it simple. We can assume that in the case of a tie, we'll take the lower, so 2 would be classed as level "1", 4 as level "2".
You can use the following list comprehension :
>>> [sum(sum(abs(i-j)) for i,j in z) for z in [zip(i,a2d) for i in a3d]]
[30.0, 22.5, 30.0]
In preceding code i create the following list with zip,that is the zip of each sub array of your 3d list then all you need is calculate the sum of the elemets of subtract of those pairs then sum of them again :
>>> [zip(i,a2d) for i in a3d]
[[(array([ 1., 3., 1.]), array([1, 2, 1])), (array([ 2., 2., 1.]), array([5, 5, 4])), (array([ 3., 1., 1.]), array([9, 8, 8]))], [(array([ 4., 6., 4.]), array([1, 2, 1])), (array([ 5. , 6.5, 4. ]), array([5, 5, 4])), (array([ 6., 4., 4.]), array([9, 8, 8]))], [(array([ 7., 9., 7.]), array([1, 2, 1])), (array([ 8., 8., 7.]), array([5, 5, 4])), (array([ 9., 7., 7.]), array([9, 8, 8]))]]
then for all of your sub arrays you'll have the following list:
[30.0, 22.5, 30.0]
that for each sub-list show a the level of difference with 2d array!and then you can get the relative sub-array from a3d like following :
>>> a3d[l.index(min(l))]
array([[ 4. , 6. , 4. ],
[ 5. , 6.5, 4. ],
[ 6. , 4. , 4. ]])
Also you can put it in a function:
>>> def find_nearest(sub,main):
... l=[sum(sum(abs(i-j)) for i,j in z) for z in [zip(i,sub) for i in main]]
... return main[l.index(min(l))]
...
>>> find_nearest(a2d,a3d)
array([[ 4. , 6. , 4. ],
[ 5. , 6.5, 4. ],
[ 6. , 4. , 4. ]])
You might consider a different approach using numpy.vectorize which lets you efficiently apply a python function to each element of your array.
In this case, your python function could just classify each pixel with whatever breaks you define:
import numpy as np
a2d = np.array([[1,2,4],[5,5,2],[2,3,3]])
def classify(x):
if x >= 4:
return 3
elif x >= 2:
return 2
elif x > 0:
return 1
else:
return 0
vclassify = np.vectorize(classify)
result = vclassify(a2d)
Thanks to #perrygeo and #Kasra - they got me thinking in a good direction.
Since I want a classification of the closest 3d array's z value, I couldn't do simple math - I needed the (z)index of the closest value.
I did it by enumerating both axes of the 2d array, and doing a proximity compare against the corresponding (z)index of the 3d array.
There might be a way to do this without iterating the 2d array, but at least I'm avoiding iterating the 3d.
import numpy as np
a1 = np.array([[1,1,1],[1,1,1],[1,1,1]])
a2 = np.array([[3,3,3],[3,3,3],[3,3,3]])
a3 = np.array([[5,5,5],[5,5,5],[5,5,5]])
a3d = np.dstack((a1,a2,a3))
a2d = np.array([[1,2,4],[5,5,2],[2,3,3]])
classOut = np.empty_like(a2d)
def find_nearest_idx(array,value):
idx = (np.abs(array-value)).argmin()
return idx
# enumerate to get indices
for i,a in enumerate(a2d):
for ii,v in enumerate(a):
valStack = a3d[i,ii]
nearest = find_nearest_idx(valStack,v)
classOut[i,ii] = nearest
print classOut
which gets me
[[0 0 1]
[2 2 0]
[0 1 1]]
This tells me that (for example) a2d[0,0] is closest to the 0-index of a3d[0,0], which in my case means it is closest to the min value for that 2d position. a2d[1,1] is closest to the 2-index, which in my case means closer to the max value for that 2d position.