Numpy: Indexing of arrays

Numpy: Indexing of arrays - python

Given the following example
d = array([[1, 2, 3],
[1, 2, 3],
[1, 3, 3],
[4, 4, 4],
[5, 5, 5]
])
To get the sub-array containing 1 in the first column:
d[ d[:,0] == 1 ]
array([[1, 2, 3],
[1, 2, 3],
[1, 3, 3]])
How to get (without loops) the sub-array containing 1 and 5? Shouldn't be something like
d[ d[:,0] == [1,5] ] # ---> array([1, 2, 3])
which does not work?

Method #1: use bitwise or | to combine the conditions:
>>> d
array([[1, 2, 3],
[1, 2, 3],
[1, 3, 3],
[4, 4, 4],
[5, 5, 5]])
>>> (d[:,0] == 1) | (d[:,0] == 5)
array([ True, True, True, False, True], dtype=bool)
>>> d[(d[:,0] == 1) | (d[:,0] == 5)]
array([[1, 2, 3],
[1, 2, 3],
[1, 3, 3],
[5, 5, 5]])
Method #2: use np.in1d, which is probably easier if there are a lot of values:
>>> np.in1d(d[:,0], [1, 5])
array([ True, True, True, False, True], dtype=bool)
>>> d[np.in1d(d[:,0], [1, 5])]
array([[1, 2, 3],
[1, 2, 3],
[1, 3, 3],
[5, 5, 5]])

Related

Why does this numpy construct work and is there a more correct way of doing it

I'm using numpy to extract faces from tetrahdra defined by vertex indices.
I have an initial array defining the tehrahedra toplogy
tetrahedra = np.array([[0, 1, 2, 3], [1, 2, 3, 4], [2, 3, 4, 5]])
For each tetrahedra I identify the faces using mask array
face1 = [True, True, True, False]
face2 = [True, True, False, True]
face3 = [False, True, True, True]
face4 = [True, False, True, True]
And I find the following numpy expression yields face defnitions for each tetrahedra
faces = tetrahedra[:,np.reshape(np.r_[tetrahedra[0][face1 ],tetrahedra[0][face2 ],tetrahedra[0][face3 ],tetrahedra[0][face4 ]], (-1,3))]
EDIT : Thanks to #npaulj I now see that this only appears to work because tetrahedra[0] in the indexing notation is actually [0, 1, 2, 3]. This is better expressed by changing the boolean masks to direct index masks as follows,
mask1_ = np.array([0, 1, 2])
mask2_ = np.array([0, 1, 3])
mask3_ = np.array([1, 2, 3])
mask4_ = np.array([0, 2, 3])
and then updating the expression to
faces = tetrahedra[:,np.reshape(np.r_[face1_, face2_, face3_, face4_], (-1,3))]
Now my question is, how is this actually working and is there a preferred/faster way of doing this operation? The output is shown below
THanks in advance for any help with this. I'm tempted to just live with it since it appears to work, but I can't fugre out why it works which worries me.....
[[[0 1 2]
[0 1 3]
[1 2 3]
[0 2 3]]
[[1 2 3]
[1 2 4]
[2 3 4]
[1 3 4]]
[[2 3 4]
[2 3 5]
[3 4 5]
[2 4 5]]
]
edit
Cleaned up version is,
face_masks = np.array([[0, 1, 2], [0, 1, 3], [1, 2, 3],[0, 2, 3]])
faces = tetrahedra[:,face_masks]

All the intermediate steps:
In [77]: tetrahedra = np.array([[0, 1, 2, 3], [1, 2, 3, 4], [2, 3, 4, 5]])
...: face1 = [True, True, True, False]
...: face2 = [True, True, False, True]
...: face3 = [False, True, True, True]
...: face4 = [True, False, True, True]
In [78]: tetrahedra[0]
Out[78]: array([0, 1, 2, 3])
In [79]: tetrahedra[0][face1]
Out[79]: array([0, 1, 2])
r_ concatenates these 4 selections into one array:
In [80]: np.r_[tetrahedra[0][face1 ],tetrahedra[0][face2 ],tetrahedra[0][face3 ],tetrahedra[0][face4 ]]
Out[80]: array([0, 1, 2, 0, 1, 3, 1, 2, 3, 0, 2, 3])
In [81]: np.reshape(np.r_[tetrahedra[0][face1 ],tetrahedra[0][face2 ],tetrahedra[0][face3
...: ],tetrahedra[0][face4 ]], (-1,3))
Out[81]:
array([[0, 1, 2],
[0, 1, 3],
[1, 2, 3],
[0, 2, 3]])
And finally just use that to index the columns dimension of tetrahedra. Index a (3,4) with this (4,3) produces a (3,4,3).
vstack could also be used to concatenate the selections, but that's a minor change:
In [82]: np.vstack([tetrahedra[0][face1 ],tetrahedra[0][face2 ],tetrahedra[0][face3 ],tet
...: rahedra[0][face4 ]])
Out[82]:
array([[0, 1, 2],
[0, 1, 3],
[1, 2, 3],
[0, 2, 3]])
edit
Or if you don't want to count on tetrahedra[0] being [0,1,2,3], you could just seek the indices of the True elements:
In [106]: np.nonzero([face1,face2,face3,face4])
Out[106]:
(array([0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3]),
array([0, 1, 2, 0, 1, 3, 1, 2, 3, 0, 2, 3]))
The 2nd element of that tuple [1] is the same indices you with r_.
In [122]: idx = np.nonzero([face1,face2,face3,face4])
In [123]: idx[1]
Out[123]: array([0, 1, 2, 0, 1, 3, 1, 2, 3, 0, 2, 3])
In [124]: tetrahedra[np.arange(3)[:,None],idx[1]]
Out[124]:
array([[0, 1, 2, 0, 1, 3, 1, 2, 3, 0, 2, 3],
[1, 2, 3, 1, 2, 4, 2, 3, 4, 1, 3, 4],
[2, 3, 4, 2, 3, 5, 3, 4, 5, 2, 4, 5]])
In [125]: tetrahedra[np.arange(3)[:,None],idx[1]].reshape(3,4,3)
Out[125]:
array([[[0, 1, 2],
[0, 1, 3],
[1, 2, 3],
[0, 2, 3]],
[[1, 2, 3],
[1, 2, 4],
[2, 3, 4],
[1, 3, 4]],
[[2, 3, 4],
[2, 3, 5],
[3, 4, 5],
[2, 4, 5]]])

numpy.concatenate float64(101,1) and float64(101,)

I'm a MatLab user who recently converted to python. I am running a for loop that cuts a longer signal into individual trials, normalizes them to 100% trial and then would like to have the trials listed horizontally in a single variable. My code is
RHipFE=np.empty([101, 1])
newlength = 101
for i in range(0,len(R0X)-1,2):
iHipFE=redataf.RHipFE[R0X[i]:R0X[i+1]]
x=np.arange(0,len(iHipFE),1)
new_x = np.linspace(x.min(), x.max(), newlength)
iHipFEn = interpolate.interp1d(x, iHipFE)(new_x)
RHipFE=np.concatenate((RHipFE,iHipFEn),axis=1)
When I run this, I get the error "ValueError: all the input arrays must have same number of dimensions". Which I assume is because RHipFE is (101,1) while iHipFEn is (101,). Is the best solution to make iHipFEn (101,1)? If so, how does one do this in the above for loop?

Generally it's faster to collect arrays in a list, and use some form of concatenate once. List append is faster than concatenate:
In [51]: alist = []
In [52]: for i in range(3):
...: alist.append(np.arange(i,i+5))
...:
In [53]: alist
Out[53]: [array([0, 1, 2, 3, 4]), array([1, 2, 3, 4, 5]), array([2, 3, 4, 5, 6])]
Various ways of joining
In [54]: np.vstack(alist)
Out[54]:
array([[0, 1, 2, 3, 4],
[1, 2, 3, 4, 5],
[2, 3, 4, 5, 6]])
In [55]: np.column_stack(alist)
Out[55]:
array([[0, 1, 2],
[1, 2, 3],
[2, 3, 4],
[3, 4, 5],
[4, 5, 6]])
In [56]: np.stack(alist, axis=1)
Out[56]:
array([[0, 1, 2],
[1, 2, 3],
[2, 3, 4],
[3, 4, 5],
[4, 5, 6]])
In [57]: np.array(alist)
Out[57]:
array([[0, 1, 2, 3, 4],
[1, 2, 3, 4, 5],
[2, 3, 4, 5, 6]])
Internally, vstack, column_stack, stack expand the dimension of the components, and concatenate on the appropriate axis:
In [58]: np.concatenate([l[:,None] for l in alist],axis=1)
Out[58]:
array([[0, 1, 2],
[1, 2, 3],
[2, 3, 4],
[3, 4, 5],
[4, 5, 6]])

Numpy: Imposing row dependent maximum on array

Suppose I have the following array:
a = [[1, 4, 2, 3]
[3, 1, 5, 4]
[4, 3, 1, 2]]
What I'd like to do is impose a maximum value on the array, but have that maximum vary by row. For instance if I wanted to limit the 1st and 3rd row to a maximum value of 3, and the 2nd row to a value of 4, I could create something like:
[[1, 3, 2, 3]
[3, 1, 4, 4]
[3, 3, 1, 2]
Is there any better way than just looping over each row individually and setting it with 'nonzero'?

With numpy.clip (using the method version here):
a.clip(max=np.array([3, 4, 3])[:, None]) # np.clip(a, ...)
# array([[1, 3, 2, 3],
# [3, 1, 4, 4],
# [3, 3, 1, 2]])
Generalized:
def clip_2d_rows(a, maxs):
maxs = np.asanyarray(maxs)
if maxs.ndim == 1:
maxs = maxs[:, np.newaxis]
return np.clip(a, a_min=None, a_max=maxs)
You might be safer using the module-level function (np.clip) rather than the class method (np.ndarray.clip). The former uses a_max as a parameter, while the latter uses the builtin max as a parameter which is never a great idea.

With masking -
In [50]: row_lims = np.array([3,4,3])
In [51]: np.where(a > row_lims[:,None], row_lims[:,None], a)
Out[51]:
array([[1, 3, 2, 3],
[3, 1, 4, 4],
[3, 3, 1, 2]])

With
>>> a
array([[1, 4, 2, 3],
[3, 1, 5, 4],
[4, 3, 1, 2]])
Say you have
>>> maxs = np.array([[3],[4],[3]])
>>> maxs
array([[3],
[4],
[3]])
What about doing
>>> a.clip(max=maxs)
array([[1, 3, 2, 3],
[3, 1, 4, 4],
[3, 3, 1, 2]])

numpy delete list element from list of lists

I have an array of numpy arrays:
a = [[1, 2, 3, 4], [1, 2, 3, 5], [2, 5, 4, 3], [5, 2, 3, 1]]
I need to find and remove a particular list from a:
rem = [1,2,3,5]
numpy.delete(a,rem) does not return the correct results. I need to be able to return:
[[1, 2, 3, 4], [2, 5, 4, 3], [5, 2, 3, 1]]
is this possible with numpy?

A list comprehension can achieve this.
rem = [1,2,3,5]
a = [[1, 2, 3, 4], [1, 2, 3, 5], [2, 5, 4, 3], [5, 2, 3, 1]]
a = [x for x in a if x != rem]
outputs
[[1, 2, 3, 4], [2, 5, 4, 3], [5, 2, 3, 1]]

Numpy arrays do not support random deletion by element. Similar to strings in Python, you need to generate a new array to delete a single or multiple sub elements.
Given:
>>> a
array([[1, 2, 3, 4],
[1, 2, 3, 5],
[2, 5, 4, 3],
[5, 2, 3, 1]])
>>> rem
array([1, 2, 3, 5])
You can get each matching sub array and create a new array from that:
>>> a=np.array([sa for sa in a if not np.all(sa==rem)])
>>> a
array([[1, 2, 3, 4],
[2, 5, 4, 3],
[5, 2, 3, 1]])
To use np.delete, you would use an index and not a match, so:
>>> a
array([[1, 2, 3, 4],
[1, 2, 3, 5],
[2, 5, 4, 3],
[5, 2, 3, 1]])
>>> np.delete(a, 1, 0) # delete element 1, axis 0
array([[1, 2, 3, 4],
[2, 5, 4, 3],
[5, 2, 3, 1]])
But you can't loop over the array and delete elements...
You can pass multiple elements to np.delete however and you just need to match sub elements:
>>> a
array([[1, 2, 3, 5],
[1, 2, 3, 5],
[2, 5, 4, 3],
[5, 2, 3, 1]])
>>> np.delete(a, [i for i, sa in enumerate(a) if np.all(sa==rem)], 0)
array([[2, 5, 4, 3],
[5, 2, 3, 1]])
And given that same a, you can have an all numpy solution by using np.where:
>>> np.delete(a, np.where((a == rem).all(axis=1)), 0)
array([[2, 5, 4, 3],
[5, 2, 3, 1]])

Did you try list remove?
In [84]: a = [[1, 2, 3, 4], [1, 2, 3, 5], [2, 5, 4, 3], [5, 2, 3, 1]]
In [85]: a
Out[85]: [[1, 2, 3, 4], [1, 2, 3, 5], [2, 5, 4, 3], [5, 2, 3, 1]]
In [86]: rem = [1,2,3,5]
In [87]: a.remove(rem)
In [88]: a
Out[88]: [[1, 2, 3, 4], [2, 5, 4, 3], [5, 2, 3, 1]]
remove matches on value.
np.delete works with an index, not value. Also it returns a copy; it does not act in place. And the result is an array, not a nested list (np.delete converts the input to an array before operating on it).
In [92]: a = [[1, 2, 3, 4], [1, 2, 3, 5], [2, 5, 4, 3], [5, 2, 3, 1]]
In [93]: a1=np.delete(a,1, axis=0)
In [94]: a1
Out[94]:
array([[1, 2, 3, 4],
[2, 5, 4, 3],
[5, 2, 3, 1]])
This is more like list pop:
In [96]: a = [[1, 2, 3, 4], [1, 2, 3, 5], [2, 5, 4, 3], [5, 2, 3, 1]]
In [97]: a.pop(1)
Out[97]: [1, 2, 3, 5]
In [98]: a
Out[98]: [[1, 2, 3, 4], [2, 5, 4, 3], [5, 2, 3, 1]]
To delete by value you need first find the index of the desired row. With integer arrays that's not too hard. With floats it is trickier.
=========
But you don't need to use delete to do this in numpy; boolean indexing works:
In [119]: a = [[1, 2, 3, 4], [1, 2, 3, 5], [2, 5, 4, 3], [5, 2, 3, 1]]
In [120]: A = np.array(a) # got to work with array, not list
In [121]: rem=np.array([1,2,3,5])
Simple comparison; rem is broadcasted to match rows
In [122]: A==rem
Out[122]:
array([[ True, True, True, False],
[ True, True, True, True],
[False, False, False, False],
[False, True, True, False]], dtype=bool)
find the row where all elements match - this is the one we want to remove
In [123]: (A==rem).all(axis=1)
Out[123]: array([False, True, False, False], dtype=bool)
Just not it, and use it to index A:
In [124]: A[~(A==rem).all(axis=1),:]
Out[124]:
array([[1, 2, 3, 4],
[2, 5, 4, 3],
[5, 2, 3, 1]])
(the original A is not changed).
np.where can be used to convert the boolean (or its inverse) to indicies. Sometimes that's handy, but usually it isn't required.

Identify vectors with same value in one column with numpy in python

I have a large 2d array of vectors. I want to split this array into several arrays according to one of the vectors' elements or dimensions. I would like to receive one such small array if the values along this column are consecutively identical. For example considering the third dimension or column:
orig = np.array([[1, 2, 3],
[3, 4, 3],
[5, 6, 4],
[7, 8, 4],
[9, 0, 4],
[8, 7, 3],
[6, 5, 3]])
I want to turn into three arrays consisting of rows 1,2 and 3,4,5 and 6,7:
>>> a
array([[1, 2, 3],
[3, 4, 3]])
>>> b
array([[5, 6, 4],
[7, 8, 4],
[9, 0, 4]])
>>> c
array([[8, 7, 3],
[6, 5, 3]])
I'm new to python and numpy. Any help would be greatly appreciated.
Regards
Mat
Edit: I reformatted the arrays to clarify the problem

Using np.split:
>>> a, b, c = np.split(orig, np.where(orig[:-1, 2] != orig[1:, 2])[0]+1)
>>> a
array([[1, 2, 3],
[1, 2, 3]])
>>> b
array([[1, 2, 4],
[1, 2, 4],
[1, 2, 4]])
>>> c
array([[1, 2, 3],
[1, 2, 3]])

Nothing fancy here, but this good old-fashioned loop should do the trick
import numpy as np
a = np.array([[1, 2, 3],
[1, 2, 3],
[1, 2, 4],
[1, 2, 4],
[1, 2, 4],
[1, 2, 3],
[1, 2, 3]])
groups = []
rows = a[0]
prev = a[0][-1] # here i assume that the grouping is based on the last column, change the index accordingly if that is not the case.
for row in a[1:]:
if row[-1] == prev:
rows = np.vstack((rows, row))
else:
groups.append(rows)
rows = [row]
prev = row[-1]
groups.append(rows)
print groups
## [array([[1, 2, 3],
## [1, 2, 3]]),
## array([[1, 2, 4],
## [1, 2, 4],
## [1, 2, 4]]),
## array([[1, 2, 3],
## [1, 2, 3]])]

if a looks like this:
array([[1, 1, 2, 3],
[2, 1, 2, 3],
[3, 1, 2, 4],
[4, 1, 2, 4],
[5, 1, 2, 4],
[6, 1, 2, 3],
[7, 1, 2, 3]])
than this
col = a[:, -1]
indices = np.where(col[:-1] != col[1:])[0] + 1
indices = np.concatenate(([0], indices, [len(a)]))
res = [a[start:end] for start, end in zip(indices[:-1], indices[1:])]
print(res)
results in:
[array([[1, 2, 3],
[1, 2, 3]]), array([[1, 2, 4],
[1, 2, 4],
[1, 2, 4]]), array([[1, 2, 3],
[1, 2, 3]])]
Update: np.split() is much nicer. No need to add first and last index:
col = a[:, -1]
indices = np.where(col[:-1] != col[1:])[0] + 1
res = np.split(a, indices)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Numpy: Indexing of arrays - python

Related

Why does this numpy construct work and is there a more correct way of doing it

numpy.concatenate float64(101,1) and float64(101,)

Numpy: Imposing row dependent maximum on array

numpy delete list element from list of lists

Identify vectors with same value in one column with numpy in python

Categories

Resources