Python/Numpy index of array in array - python

I am just playing around with a particle simulator, I want to use matplotlib with python and numpy to make as realistic a simulator as possible as efficiently as possible (this is purely an exercise in fun with python) and I have a problem trying to calculate the inverse of distances.
I have an array containing positions of particles (x,y) like so:
x = random.randint(0,3,10).reshape(5,2)
>>> x
array([[1, 1],
[2, 1],
[2, 2],
[1, 2],
[0, 1]])
This is 5 particles with positions (x,y) in [0,3]. Now if I want to calculate the distance between one particle (say particle with position (0,1)) and the rest I would do something like
>>>x - [0,1]
array([[1, 0],
[2, 0],
[2, 1],
[1, 1],
[0, 0]])
The problem is I do NOT want to take the distance of the particle to itself: (0,0). This has length 0 and the inverse is infinite and is not defined for say gravity or the coloumb force.
So I tried:
where(x==[0,1])
>>>where(x==[0,1])
(array([0, 1, 4, 4]), array([1, 1, 0, 1]))
Which is not the position of the (0,1) particle in the x array. So how do I pick out the position of [0,1] from an array like x? The where() above checks where x is equal to 0 OR 1, not where x is equal to [0,1]. How do I do this "numpylike" without looping?
Ps: How the frack do you copy-paste code into stackoverflow? I mean bad forums have a [code]..[/code] option while here I spend 15 minutes properly indenting code (since tab in chromium on ubuntu simply hops out of the window instead of indenting with 4 whitespaces....) This is VERY annoying.
Edit: Seeing the first answer I tried:
x
array([[0, 2],
[2, 2],
[1, 0],
[2, 2],
[1, 1]])
>>> all(x==[1,1],axis=1)
array([False, False, False, False, True], dtype=bool)
>>> all(x!=[1,1], axis=1)
array([ True, True, False, True, False], dtype=bool)
Which is not what I was hoping for, the != should return the array WITHOUT [1,1]. But alas, it misses one (1,0):
>>>x[all(x!=[1,1], axis=1)]
array([[0, 2],
[2, 2],
[2, 2]])
Edit2: any did the trick, it makes more logical sense than all I suppose, thank you!

>>> import numpy as np
>>> x=np.array([[1, 1],
... [2, 1],
... [2, 2],
... [1, 2],
... [0, 1]])
>>> np.all(x==[0,1], axis=1)
array([False, False, False, False, True], dtype=bool)
>>> np.where(np.all(x==[0,1], axis=1))
(array([4]),)
>>> np.where(np.any(x!=[0,1], axis=1))
(array([0, 1, 2, 3]),)

Related

Delete numpy axis 1 based on condition

I need to remove values from a np axis based on a condition.
For example, I would want to remove [:,2] (the second values on axis 1) if the first value == 0, else I would want to remove [:,3].
Input:
[[0,1,2,3],[0,2,3,4],[1,3,4,5]]
Output:
[[0,1,3],[0,2,4],[1,3,4]]
So now my output has one less value on the 1st axis, depending on if it met the condition or not.
I know I can isolate and manipulate this based on
array[np.where(array[:,0] == 0)] but then I would have to deal with each condition separately, and it's very important for me to preserve the order of this array.
I am dealing with 3D arrays & am hoping to be able to calculate all this simultaneously while preserving the order.
Any help is much appreciated!
A possible solution:
a = np.array([[0,1,2,3],[0,2,3,4],[1,3,4,5]])
b = np.arange(a.shape[1])
np.apply_along_axis(
lambda x: x[np.where(x[0] == 0, np.delete(b,2), np.delete(b,3))], 1, a)
Output:
array([[0, 1, 3],
[0, 2, 4],
[1, 3, 4]])
Since you are starting and ending with a list, a straight forward iteration is a good solution:
In [261]: alist =[[0,1,2,3],[0,2,3,4],[1,3,4,5]]
In [262]: for row in alist:
...: if row[0]==0: row.pop(2)
...: else: row.pop(3)
...:
In [263]: alist
Out[263]: [[0, 1, 3], [0, 2, 4], [1, 3, 4]]
A possible array approach:
In [273]: arr = np.array([[0,1,2,3],[0,2,3,4],[1,3,4,5]])
In [274]: mask = np.ones(arr.shape, bool)
In [275]: mask[np.arange(3),np.where(arr[:,0]==0,2,3)]=False
In [276]: mask
Out[276]:
array([[ True, True, False, True],
[ True, True, False, True],
[ True, True, True, False]])
arr[mask] will be 1d, but since we are deleting the same number of elements each row, we can reshape it:
In [277]: arr[mask].reshape(arr.shape[0],-1)
Out[277]:
array([[0, 1, 3],
[0, 2, 4],
[1, 3, 4]])
I expect the list approach will be faster for small cases, but the array should scale better. I don't know where the trade off is.

using integer as index for multidimensional numpy array

I have boolean array of shape (n_samples, n_items) which represents a set: my_set[i, j] tells if sample i contains item j.
To populate it, the array is initialized as zeros, and receive another array of integers, with shape (n_samples, 3), telling for each example, three elements that belongs to it, for instance:
my_set = np.zeros((2, 5), dtype=bool)
init_values = np.array([[1,3,4], [0,1,2]], dtype=np.int64)
So, I need to fill my_set in row 0 and columns 1, 3, 4 and in row 1, columns 0, 1, 2, with with ones.
my_set contain valid values in appropriated range (that is, in [0, n_items)), and each column doesn't contain duplicated items.
Some failed approaches:
I know that a list of integers (or array) can be used as index, so I tried to use init_values as index straightforward, but it failed:
my_set[init_values] = 1
File "<ipython-input-9-9b2c4d19f4f6>", line 1, in <cell line: 1>
my_set[init_values] = 1
IndexError: index 3 is out of bounds for axis 0 with size 2
I don't know why the 3 is indexing over the first axis, so I tried a second approach: "pick up all rows and index only desired columns", using a mix of slicing and integer index. And it didn't throw error, but didn't worked as expected: checkout the shape, I expect it to be (2, 3), however...
my_set[:, init_values].shape
Out[11]: (2, 2, 3)
Not sure why it didn't work, but at least the first axis looks correct, so I tried to pick up only the first column, which is a list of integers, and therefore it is "more natural"... once again, it didn't worked:
my_set[:, init_values[:,0]].shape
Out[12]: (2, 2)
I expected this shape to be (2, 1) since I wanted all rows with a single column on each, corresponding to the indexes given in init_values.
I decided to go back to integer index approach for the first axis.... and it worked:
my_set[np.arange(len(my_set)), init_values[:,0]].shape
Out[13]: (2,)
However, it only works wor one column, so I need to iterate over columns to make it really work, but it looks like a good-initial workaround.
Current solution
So, to solve my original problem, I wrote this:
for c in range(init_values.shape[1])
my_set[np.arange(len(my_set)), init_values[:,c]] = 1
# now lets check my_set is properly filled
print(my_set)
Out[14]: [[False True False True True]
[ True True True False False]]
which is exactly what I need.
Question(s):
That said, here goes my main question:
Is there a more efficient way to do this? I see it quite inefficient as the number of elements grows (for this example I used 3 but I actually need larger values).
In addition to this I'd like to understand why using np.arange on the first index behaves different from slicing it as :: I didn't expect this behavior.
Any other comment to understand why previous approaches failed, are also welcome.
You only have column indices, so you also need to create their corresponding row indices:
>>> my_set[np.arange(len(my_set))[:, None], init_values] = 1
>>> my_set
array([[False, True, False, True, True],
[ True, True, True, False, False]])
[:, None] is used to convert the row indices row vector to the column vector, so that row and column indices have compatible shapes for broadcasting:
>>> np.arange(len(my_set))[:, None]
array([[0],
[1]])
>>> np.broadcast_arrays(np.arange(len(my_set))[:, None], init_values)
[array([[0, 0, 0],
[1, 1, 1]]),
array([[1, 3, 4],
[0, 1, 2]], dtype=int64)]
The essence of slicing is to apply the index of other dimensions to each index in the slicing range of this dimension. Here is a simple test. The matrix to be indexed is as follows:
>>> ar = np.arange(4).reshape(2, 2)
>>> ar
array([[0, 1],
[2, 3]])
If you want to get elements whit indices 0 and 1 in row 0, and elements with indices 1 and 0 in row 1, but you use the combination of column indices [[0, 1], [1, 0]] and slice, you will get:
>>> ar[:, [[0, 1], [1, 0]]]
array([[[0, 1],
[1, 0]],
[[2, 3],
[3, 2]]])
This is equivalent to combining the row index from 0 to 1 with the column indices respectively:
>>> ar[0, [[0, 1], [1, 0]]]
array([[0, 1],
[1, 0]])
>>> ar[1, [[0, 1], [1, 0]]]
array([[2, 3],
[3, 2]])
In fact, broadcasting is used secretly here. The actual indices are:
>>> np.broadcast_arrays(0, [[0, 1], [1, 0]])
[array([[0, 0],
[0, 0]]),
array([[0, 1],
[1, 0]])]
>>> np.broadcast_arrays(1, [[0, 1], [1, 0]])
[array([[1, 1],
[1, 1]]),
array([[0, 1],
[1, 0]])]
This is not the same as the indices you actually need. Therefore, you need to manually generate the correct row indices for broadcasting:
>>> ar[[[0], [1]], [[0, 1], [1, 0]]]
array([[0, 1],
[3, 2]])
>>> np.broadcast_arrays([[0], [1]], [[0, 1], [1, 0]])
[array([[0, 0],
[1, 1]]),
array([[0, 1],
[1, 0]])]

How can I calculate distance between points in each row of an array

I have an array like this and I have to find the distance between each points. How could I do so in python with numpy?
array([[ 8139, 112607],
[ 8139, 115665],
[ 8132, 126563],
[ 8193, 113938],
[ 8193, 123714],
[ 8156, 120291],
[ 8373, 125253],
[ 8400, 131442],
[ 8400, 136354],
[ 8401, 129352],
[ 8439, 129909],
[ 8430, 135706],
[ 8430, 146359],
[ 8429, 139089],
[ 8429, 133243]])
Let's minimize this problem down to 4 points:
points = np.array([[8139, 115665], [8132, 126563], [8193, 113938], [8193, 123714]])
In general, you need to do 2 steps:
Make an indices of pairs of points you want to take
Apply np.hypot for these pairs.
TL;DR
Making an indices of points
There are many ways of how you would like to create pairs of indices for each pair of points you'd like to take. But where do they come from? In every case it's a good idea to start building them from adjancency matrix.
Case 1
In the most common way you can start from building it like so:
adjacency = np.ones(shape=(len(points), len(points)), dtype=bool)
>>> adjacency
[[ True True True True]
[ True True True True]
[ True True True True]
[ True True True True]]
It corresponds to indices you need to take like so:
adjacency_idx_view = np.transpose(np.nonzero(adjacency))
for n in adjacency_idx_view.reshape(len(points), len(points), 2):
>>> print(n.tolist())
[[0, 0], [1, 0], [2, 0], [3, 0]]
[[0, 1], [1, 1], [2, 1], [3, 1]]
[[0, 2], [1, 2], [2, 2], [3, 2]]
[[0, 3], [1, 3], [2, 3], [3, 3]]
And this is how you collect them:
x, y = np.nonzero(adjacency)
>>> np.transpose([x, y])
array([[0, 0],
[0, 1],
[0, 2],
[0, 3],
[1, 0],
[1, 1],
[1, 2],
[1, 3],
[2, 0],
[2, 1],
[2, 2],
[2, 3],
[3, 0],
[3, 1],
[3, 2],
[3, 3]], dtype=int64)
It could be done also manually like in #
Corralien's answer:
x = np.repeat(np.arange(len(points)), len(points))
y = np.tile(np.arange(len(points)), len(points))
Case 2
In previous case every pair of point is duplicated. There are also pairs with points duplicating. A better option is to omit this excessive data and take only pairs with index of first point being less than index of the second one:
adjacency = np.less.outer(np.arange(len(points)), np.arange(len(points)))
>>> print(adjacency)
[[False True True True]
[False False True True]
[False False False True]
[False False False False]]
x, y = np.nonzero(adjacency)
This is not used widely. Although this lays beyond the hood of np.triu_indices. Hence, as an alternative, we could use:
x, y = np.triu_indices(len(points), 1)
And this results in:
>>> np.transpose([x, y])
array([[0, 1],
[0, 2],
[0, 3],
[0, 4],
[1, 2],
[1, 3],
[1, 4],
[2, 3],
[2, 4],
[3, 4]])
Case 3
You could also try omit only pairs of duplicated points and leave pairs with points being swapped. As in Case 1 it costs 2x memory and consumption time so I'll leave it for demonstration purposes only:
adjacency = ~np.identity(len(points), dtype=bool)
>>> adjacency
array([[False, True, True, True],
[ True, False, True, True],
[ True, True, False, True],
[ True, True, True, False]])
x, y = np.nonzero(adjacency)
>>> np.transpose([x, y])
array([[0, 1],
[0, 2],
[0, 3],
[1, 0],
[1, 2],
[1, 3],
[2, 0],
[2, 1],
[2, 3],
[3, 0],
[3, 1],
[3, 2]], dtype=int64)
I'll leave making x and y manually (without masking) as an exercise for the others.
Apply np.hypot
Instead of np.sqrt(np.sum((a - b) ** 2, axis=1)) you could do np.hypot(np.transpose(a - b)). I'll take my Case 2 as my index generator:
def distance(points):
x, y = np.triu_indices(len(points), 1)
x_coord, y_coord = np.transpose(points[x] - points[y])
return np.hypot(x_coord, y_coord)
>>> distance(points)
array([10898.00224812, 1727.84403231, 8049.18113848, 12625.14736548,
2849.65296133, 9776. ])
You can use np.repeat and np.tile to create all combinations then compute the euclidean distance:
xy = np.array([[8139, 115665], [8132, 126563], [8193, 113938], [8193, 123714],
[8156, 120291], [8373, 125253], [8400, 131442], [8400, 136354],
[8401, 129352], [8439, 129909], [8430, 135706], [8430, 146359],
[8429, 139089], [8429, 133243]])
a = np.repeat(xy, len(xy), axis=0)
b = np.tile(xy, [len(xy), 1])
d = np.sqrt(np.sum((a - b) ** 2, axis=1))
The output of d is (196,) which is 14 x 14.
Update
but I have to do it in a function.
def distance(xy):
a = np.repeat(xy, len(xy), axis=0)
b = np.tile(xy, [len(xy), 1])
return np.sqrt(np.sum((a - b) ** 2, axis=1))
d = distance(xy)

How to delete an element from a 2D Numpy array without knowing its position

I have a 2D array:
[[0,0], [0,1], [1,0], [1,1]]
I want to delete the [0,1] element without knowing its position within the array (as the elements may be shuffled).
Result should be:
[[0,0], [1,0], [1,1]]
I've tried using numpy.delete but keep getting back a flattened array:
>>> arr = np.array([[0,0], [0,1], [1,0], [1,1]])
>>> arr
array([[0, 0],
[0, 1],
[1, 0],
[1, 1]])
>>> np.delete(arr, [0,1])
array([0, 1, 1, 0, 1, 1])
Specifying the axis removes the 0, 1 elements rather than searching for the element (which makes sense):
>>> np.delete(arr, [0,1], axis=0)
array([[1, 0],
[1, 1]])
And trying to find the location (as has been suggested) seems equally problematic:
>>> np.where(arr==[0,1])
(array([0, 1, 1, 3]), array([0, 0, 1, 1]))
(Where did that 3 come from?!?)
Here we find all of the rows that match the candidate [0, 1]
>>> (arr == [0, 1]).all(axis=1)
array([False, True, False, False])
Or alternatively, the rows that do not match the candidate
>>> ~(arr == [0, 1]).all(axis=1)
array([ True, False, True, True])
So, to select all those rows that do not match [0, 1]
>>> arr[~(arr == [0, 1]).all(axis=1)]
array([[0, 0],
[1, 0],
[1, 1]])
Note that this will create a new array.
mask = (arr==np.array([0,1])).all(axis=1)
arr1 = arr[~mask,:]
Look at mask.. It should be [False, True,...].
From the documentation:
numpy.delete(arr, obj, axis=None)
axis : int, optional
The axis along which to delete the subarray defined by obj. If axis
is None, obj is applied to the flattened array
If you don't specify the axis(i.e. None), it will automatically flatten your array; you just need to specify the axis parameter, in your case np.delete(arr, [0,1],axis=0)
However, just like in the example above, [0,1] is a list of indices; you must provide the indices/location(you can do that with np.where(condition,array) for example)
Here you have a working example:
my_array = np.array([[0, 1],
[1, 0],
[1, 1],
[0, 0]])
row_index, = np.where(np.all(my_array == [0, 1], axis=1))
my_array = np.delete(my_array, row_index,axis=0)
print(my_array)
#Output is below
[[1 0]
[1 1]
[0 0]]

How to remove/select rows in matrix given external condition from another array with numpy?

I have a matrix and a truth table array for this matrix like so :
matrix = np.array([[1, 2, 2], [2, 3, 4], [4, 3, 5]])
truth_table = np.array([0, 1, 0])
The goal is to keep only the rows in the matrix where the truth table is equal to one, in this case only [[2, 3, 4]].
The matrix has as many row as the truth table has elements.
In any other language I would do this :
results = np.array([])
for i in range(truth_table.size) :
if(truth_table[i] == 1)
results.append(matrix[i])
The problem is that the matrix can be enormous and for loops are not optimized in Python for this sort of problem and thus can take a really long time to execute.
I am sure there is a better way to do this using numpy but I can't seem to find the solution.
Make sure your truth table has dtype=bool then you can just do matrix[truth_table]
import numpy as np
matrix = np.array([[1, 2, 2], [2, 3, 4], [4, 3, 5]])
truth_table = np.array([0, 1, 0], dtype=bool)
# or truth_table = np.array([False, True, False])
print(matrix[truth_table])
# prints [[2, 3, 4]]

Categories

Resources