TL;DR:
Knowing the value of NUM_ROWS, rmin and rmax, how do I construct a bool array my_idx such that np.arange(NUM_ROWS)[my_idx] == np.arange(NUM_ROWS)[rmin:rmax]? Can the construction operation be broadcasted, if rmin and rmax are arrays ad I'm interested in all slices [slice(from, to) for from, to in zip(rmin, rmax)]?
Long version with details
I have an array of polygons in a 2D image and I want to find rows and columns of the image that don't contain a polygon. In order to do this fast, I'm trying to vectorize the code as much as possible.
I calculate the extreme points of each polygon on both dimension and obtain for each polygon the min_row, min_col, max_row and max_col values. Let's consider just the rows (as for the columns it 's the same algorithm) and assume that, for example, these are the values I obtain for two polygons:
NUM_ROWS = 10
# Two intervals: slice(1,5) and slice(7,8)
row_mins = np.array([1, 7], dtype=np.int32)
row_maxs = np.array([5, 8], dtype=np.int32)
I want now to merge the intervals in a way equivalent to:
row_mask = np.zeros(NUM_ROWS)
for rmin, rmax in zip(row_mins, row_maxs):
row_mask[rmin:rmax] = 1
however, it should avoid the for loop and repeated setting of values in row_mask.
I thought of doing this by turning each range into a bool array and using np.logical_or.reduce(), but I can't find a way to generate the bool array equivalent to the [rmin:rmax] index.
Is there a way to convert a slice object to a bool index?
EDIT: Found the right way to do it.
I stand corrected. There IS a way to unpack a list of slices inside np.r_ and its as simple as using a tuple(). That means, once you have your slices mapped to the rmin and rmax arrays, you can simply convert them into an array with np.r_ and use that to update the values of the mask to 1.
import numpy as np
NUM_ROWS = 15
## 3 slices (1:5), (7:10), (12:14)
row_mins = np.array([1, 7, 12])
row_maxs = np.array([5, 10, 14])
mask = np.zeros(NUM_ROWS) #Zeros
slices = list(map(slice,row_mins,row_maxs)) #List of slices
mask[np.r_[tuple(slices)]]=1 #get ranges from list of slices and then update mask
mask
array([0., 1., 1., 1., 1., 0., 0., 1., 1., 1., 0., 0., 1., 1., 0.])
Old method I recommended -
If you want to make a mask with multiple slices, then you can do this without a for loop (vectorized), by using np.hstack with np.arange to get all the indexes and then set them to 1.
import numpy as np
NUM_ROWS = 15
## 3 slices (1:5), (7:10), (12:14)
row_mins = np.array([1, 7, 12])
row_maxs = np.array([5, 10, 14])
mask = np.zeros(NUM_ROWS) #Zeros
idx = np.hstack(list(map(np.arange,row_mins,row_maxs))) #Indexes to choose
mask[idx]=1 #Set to 1
mask
array([0., 1., 1., 1., 1., 0., 0., 1., 1., 1., 0., 0., 1., 1., 0.])
EDIT: Another way -
You could use np.eye() -
s = slice(1,4)
mask = np.eye(10)[s].sum(0)
print(mask)
[0. 1. 1. 1. 0. 0. 0. 0. 0. 0.]
Over a list of slices -
masks = [np.eye(NUM_ROWS)[slice(i,j)].sum(0) for i,j in zip(row_mins, row_maxs)]
final = np.logical_or.reduce(masks)
final
array([False, True, True, True, True, False, False, True, True,
True, False, False, True, True, False])
Hope this helps:
arr = np.arange(NUM_ROWS)
bool_indices = (arr >= rmin) & (arr < rmax)
As you are looking for the intersection between the two, a logical and between them should create that array.
Using the rest of your solution:
arrs = [(b >= rmin) & (b<rmax) for rmin,rmax in zip(row_mins,row_maxs)]
mask = np.logical_or.reduce(arrs)
Related
If I have an array and I apply summation
arr = np.array([[1.,1.,2.],[2.,3.,4.],[4.,5.,6]])
np.sum(arr,axis=1)
I get the total along the three rows ([4.,9.,15.])
My complication is that arr contains data that may be bad after a certain column index. I have an integer array that tells me how many "good" values I have in each row and I want to sum/average over the good values. Say:
ngoodcols=np.array([0,1,2])
np.sum(arr[:,0:ngoodcols],axis=1) # not legit but this is the idea
It is clear how to do this in a loop, but is there a way to sum only that many, producing [0.,2.,9.] without resorting to looping? Equivalently, I could use nansum if I knew how to set the elements in column indexes higher than b equal to np.nan, but this is a nearly equivalent problem as far as slicing is concerned.
One possibility is to use masked arrays:
import numpy as np
arr = np.array([[1., 1., 2.], [2., 3., 4.], [4., 5., 6]])
ngoodcols = np.array([0, 1, 2])
mask = ngoodcols[:, np.newaxis] <= np.arange(arr.shape[1])
arr_masked = np.ma.masked_array(arr, mask)
print(arr_masked)
# [[-- -- --]
# [2.0 -- --]
# [4.0 5.0 --]]
print(arr_masked.sum(1))
# [-- 2.0 9.0]
Note that here when there are not good values you get a "missing" value as a result, which may or may not be useful for you. Also, a masked array also allows you to easily do other operations that only apply for valid values (mean, etc.).
Another simple option is to just multiply by the mask:
import numpy as np
arr = np.array([[1., 1., 2.], [2., 3., 4.], [4., 5., 6]])
ngoodcols = np.array([0, 1, 2])
mask = ngoodcols[:, np.newaxis] <= np.arange(arr.shape[1])
print((arr * ~mask).sum(1))
# [0. 2. 9.]
Here when there are no good values you just get zero.
Here is one way using Boolean indexing. This sets elements in column indexes higher than ones in ngoodcols equal to np.nan and use np.nansum:
import numpy as np
arr = np.array([[1.,1.,2.],[2.,3.,4.],[4.,5.,6]])
ngoodcols = np.array([0,1,2])
arr[np.asarray(ngoodcols)[:,None] <= np.arange(arr.shape[1])] = np.nan
print(np.nansum(arr, axis=1))
# [ 0. 2. 9.]
I have the following question. Is there somekind of method with numpy or scipy , which I can use to get an given unsorted array like this
a = np.array([0,0,1,1,4,4,4,4,5,1891,7]) #could be any number here
to something where the numbers are interpolated/mapped , there is no gap between the values and they are in the same order like before?:
[0,0,1,1,2,2,2,2,3,5,4]
EDIT
Is it furthermore possible to swap/shuffle the numbers after the mapping, so that
[0,0,1,1,2,2,2,2,3,5,4]
become something like:
[0,0,3,3,5,5,5,5,4,1,2]
Edit: I'm not sure what the etiquette is here (should this be a separate answer?), but this is actually directly obtainable from np.unique.
>>> u, indices = np.unique(a, return_inverse=True)
>>> indices
array([0, 0, 1, 1, 2, 2, 2, 2, 3, 5, 4])
Original answer: This isn't too hard to do in plain python by building a dictionary of what index each value of the array would map to:
x = np.sort(np.unique(a))
index_dict = {j: i for i, j in enumerate(x)}
[index_dict[i] for i in a]
Seems you need to rank (dense) your array, in which case use scipy.stats.rankdata:
from scipy.stats import rankdata
rankdata(a, 'dense')-1
# array([ 0., 0., 1., 1., 2., 2., 2., 2., 3., 5., 4.])
I have a 2D array, and it has some duplicate columns. I would like to be able to see which unique columns there are, and where the duplicates are.
My own array is too large to put here, but here is an example:
a = np.array([[ 1., 0., 0., 0., 0.],[ 2., 0., 4., 3., 0.],])
This has the unique column vectors [1.,2.], [0.,0.], [0.,4.] and [0.,3.]. There is one duplicate: [0.,0.] appears twice.
Now I found a way to get the unique vectors and their indices here but it is not clear to me how I would get the occurences of duplicates as well. I have tried several naive ways (with np.where and list comps) but those are all very very slow. Surely there has to be a numpythonic way?
In matlab it's just the unique function but np.unique flattens arrays.
Here's a vectorized approach to give us a list of arrays as output -
ids = np.ravel_multi_index(a.astype(int),a.max(1).astype(int)+1)
sidx = ids.argsort()
sorted_ids = ids[sidx]
out = np.split(sidx,np.nonzero(sorted_ids[1:] > sorted_ids[:-1])[0]+1)
Sample run -
In [62]: a
Out[62]:
array([[ 1., 0., 0., 0., 0.],
[ 2., 0., 4., 3., 0.]])
In [63]: out
Out[63]: [array([1, 4]), array([3]), array([2]), array([0])]
The numpy_indexed package (disclaimer: I am its author) contains efficient functionality for computing these kind of things:
import numpy_indexed as npi
unique_columns = npi.unique(a, axis=1)
non_unique_column_idx = npi.multiplicity(a, axis=1) > 1
Or alternatively:
unique_columns, column_count = npi.count(a, axis=1)
duplicate_columns = unique_columns[:, column_count > 1]
For small arrays:
from collections import defaultdict
indices = defaultdict(list)
for index, column in enumerate(a.transpose()):
indices[tuple(column)].append(index)
unique = [kk for kk, vv in indices.items() if len(vv) == 1]
non_unique = {kk:vv for kk, vv in indices.items() if len(vv) != 1}
What is best way of doing: given a 1-D array of discrete variables size N (here N=4) and X is the number of unique elements, I am trying to create a multidimensional array of size (N*X) where elements are 1 or 0 depending on the occurrence of elements in the 1-D array, e.g. Following array_1D (N=4 and X=3) will result in array_ND of size 3*4:
array_1D = np.array([x, y, z, x])
array_ND = [[1 0 0 1]
[0 1 0 0]
[0 0 1 0]]
Thanks,
Aso
Try this:
(np.unique(a)[..., None] == a).astype(np.int)
You can leave out the .astype(np.int) part if you want a boolean array. Here we have used broadcasting (the [..., None] part) to avoid explicit looping.
Broken down, as suggested in the comments:
>>> import numpy as np
>>> a = np.array([1, 2, 3, 1])
>>> unique_elements = np.unique(a)
>>> result = unique_elements[..., None] == a
>>> unique_elements
array([1, 2, 3])
>>> result
array([[ True, False, False, True],
[False, True, False, False],
[False, False, True, False]], dtype=bool)
If the initial array contains valid indexes from 0 to n - 1 then you can write
eye = np.eye(3)
array_1D = np.array([0, 1, 2, 0])
array_ND = eye[array_1D]
The resulting matrix will be
array([[ 1., 0., 0.],
[ 0., 1., 0.],
[ 0., 0., 1.],
[ 1., 0., 0.]])
which is the transpose of the one you expect.
What's happening here is that numpy uses the elements of array_1D as row indices of eye. So the resulting matrix contains as many rows as the elements of array_1D and each one of them relates to the respective element. (0 relates to 1 0 0, etc.)
I am trying to fill an empty(not np.empty!) array with values using append but I am gettin error:
My code is as follows:
import numpy as np
result=np.asarray([np.asarray([]),np.asarray([])])
result[0]=np.append([result[0]],[1,2])
And I am getting:
ValueError: could not broadcast input array from shape (2) into shape (0)
I might understand the question incorrectly, but if you want to declare an array of a certain shape but with nothing inside, the following might be helpful:
Initialise empty array:
>>> a = np.zeros((0,3)) #or np.empty((0,3)) or np.array([]).reshape(0,3)
>>> a
array([], shape=(0, 3), dtype=float64)
Now you can use this array to append rows of similar shape to it. Remember that a numpy array is immutable, so a new array is created for each iteration:
>>> for i in range(3):
... a = np.vstack([a, [i,i,i]])
...
>>> a
array([[ 0., 0., 0.],
[ 1., 1., 1.],
[ 2., 2., 2.]])
np.vstack and np.hstack is the most common method for combining numpy arrays, but coming from Matlab I prefer np.r_ and np.c_:
Concatenate 1d:
>>> a = np.zeros(0)
>>> for i in range(3):
... a = np.r_[a, [i, i, i]]
...
>>> a
array([ 0., 0., 0., 1., 1., 1., 2., 2., 2.])
Concatenate rows:
>>> a = np.zeros((0,3))
>>> for i in range(3):
... a = np.r_[a, [[i,i,i]]]
...
>>> a
array([[ 0., 0., 0.],
[ 1., 1., 1.],
[ 2., 2., 2.]])
Concatenate columns:
>>> a = np.zeros((3,0))
>>> for i in range(3):
... a = np.c_[a, [[i],[i],[i]]]
...
>>> a
array([[ 0., 1., 2.],
[ 0., 1., 2.],
[ 0., 1., 2.]])
numpy.append is pretty different from list.append in python. I know that's thrown off a few programers new to numpy. numpy.append is more like concatenate, it makes a new array and fills it with the values from the old array and the new value(s) to be appended. For example:
import numpy
old = numpy.array([1, 2, 3, 4])
new = numpy.append(old, 5)
print old
# [1, 2, 3, 4]
print new
# [1, 2, 3, 4, 5]
new = numpy.append(new, [6, 7])
print new
# [1, 2, 3, 4, 5, 6, 7]
I think you might be able to achieve your goal by doing something like:
result = numpy.zeros((10,))
result[0:2] = [1, 2]
# Or
result = numpy.zeros((10, 2))
result[0, :] = [1, 2]
Update:
If you need to create a numpy array using loop, and you don't know ahead of time what the final size of the array will be, you can do something like:
import numpy as np
a = np.array([0., 1.])
b = np.array([2., 3.])
temp = []
while True:
rnd = random.randint(0, 100)
if rnd > 50:
temp.append(a)
else:
temp.append(b)
if rnd == 0:
break
result = np.array(temp)
In my example result will be an (N, 2) array, where N is the number of times the loop ran, but obviously you can adjust it to your needs.
new update
The error you're seeing has nothing to do with types, it has to do with the shape of the numpy arrays you're trying to concatenate. If you do np.append(a, b) the shapes of a and b need to match. If you append an (2, n) and (n,) you'll get a (3, n) array. Your code is trying to append a (1, 0) to a (2,). Those shapes don't match so you get an error.
This error arise from the fact that you are trying to define an object of shape (0,) as an object of shape (2,). If you append what you want without forcing it to be equal to result[0] there is no any issue:
b = np.append([result[0]], [1,2])
But when you define result[0] = b you are equating objects of different shapes, and you can not do this. What are you trying to do?
Here's the result of running your code in Ipython. Note that result is a (2,0) array, 2 rows, 0 columns, 0 elements. The append produces a (2,) array. result[0] is (0,) array. Your error message has to do with trying to assign that 2 item array into a size 0 slot. Since result is dtype=float64, only scalars can be assigned to its elements.
In [65]: result=np.asarray([np.asarray([]),np.asarray([])])
In [66]: result
Out[66]: array([], shape=(2, 0), dtype=float64)
In [67]: result[0]
Out[67]: array([], dtype=float64)
In [68]: np.append(result[0],[1,2])
Out[68]: array([ 1., 2.])
np.array is not a Python list. All elements of an array are the same type (as specified by the dtype). Notice also that result is not an array of arrays.
Result could also have been built as
ll = [[],[]]
result = np.array(ll)
while
ll[0] = [1,2]
# ll = [[1,2],[]]
the same is not true for result.
np.zeros((2,0)) also produces your result.
Actually there's another quirk to result.
result[0] = 1
does not change the values of result. It accepts the assignment, but since it has 0 columns, there is no place to put the 1. This assignment would work in result was created as np.zeros((2,1)). But that still can't accept a list.
But if result has 2 columns, then you can assign a 2 element list to one of its rows.
result = np.zeros((2,2))
result[0] # == [0,0]
result[0] = [1,2]
What exactly do you want result to look like after the append operation?
numpy.append always copies the array before appending the new values. Your code is equivalent to the following:
import numpy as np
result = np.zeros((2,0))
new_result = np.append([result[0]],[1,2])
result[0] = new_result # ERROR: has shape (2,0), new_result has shape (2,)
Perhaps you mean to do this?
import numpy as np
result = np.zeros((2,0))
result = np.append([result[0]],[1,2])
SO thread 'Multiply two arrays element wise, where one of the arrays has arrays as elements' has an example of constructing an array from arrays. If the subarrays are the same size, numpy makes a 2d array. But if they differ in length, it makes an array with dtype=object, and the subarrays retain their identity.
Following that, you could do something like this:
In [5]: result=np.array([np.zeros((1)),np.zeros((2))])
In [6]: result
Out[6]: array([array([ 0.]), array([ 0., 0.])], dtype=object)
In [7]: np.append([result[0]],[1,2])
Out[7]: array([ 0., 1., 2.])
In [8]: result[0]
Out[8]: array([ 0.])
In [9]: result[0]=np.append([result[0]],[1,2])
In [10]: result
Out[10]: array([array([ 0., 1., 2.]), array([ 0., 0.])], dtype=object)
However, I don't offhand see what advantages this has over a pure Python list or lists. It does not work like a 2d array. For example I have to use result[0][1], not result[0,1]. If the subarrays are all the same length, I have to use np.array(result.tolist()) to produce a 2d array.