Removing "rows" containing negative values in 3D numpy array

Removing "rows" containing negative values in 3D numpy array - python

What I have:
>>> forces
array([[[ 63.82078252, 0.63841691],
[ -62.45826693, 7.11946976],
[ -87.85946925, 15.1988562 ],
[-120.49417797, -16.31785819],
[ -81.36080338, 6.45074645]],
[[ 364.99959095, 4.92473888],
[ 236.5762723 , -7.22959548],
[ 69.55657789, 1.20164815],
[ -22.1684177 , 13.42611095],
[ -91.19739147, -16.15076634]]])
forces[0] and forces [1] each contain a list of paired values, e.g. 63.82078252 & 0.63841691 are one data point.
I want to remove all pairs where the first value is negative:
>>> forces
array([[[ 63.82078252, 0.63841691]],
[[ 364.99959095, 4.92473888],
[ 236.5762723 , -7.22959548],
[ 69.55657789, 1.20164815]]])
But this type of structure is not possible since the two slices of forces have different sizes: (1, 2) and (3, 2) respectively.
My sloppy attempt:
>>> forces[:,:,0][forces[:,:,0] < 0] = np.nan
>>> forces
array([[[ 63.82078252, 0.63841691],
[ nan, 7.11946976],
[ nan, 15.1988562 ],
[ nan, -16.31785819],
[ nan, 6.45074645]],
[[ 364.99959095, 4.92473888],
[ 236.5762723 , -7.22959548],
[ 69.55657789, 1.20164815],
[ nan, 13.42611095],
[ nan, -16.15076634]]])
and then using isnan to remove the relevant entries:
>>> forces = forces[~np.isnan(forces).any(axis=2)]
>>> forces
array([[ 63.82078252, 0.63841691],
[ 364.99959095, 4.92473888],
[ 236.5762723 , -7.22959548],
[ 69.55657789, 1.20164815]])
So these are the correct values but they are now lumped together into a 2D array.
How can I create a heterogeneously sized "array" that will contain two slices of size (1, 2) and (3, 2) respectively, for this simplified example?
Also any pointers on accomplishing the task more elegantly would be much appreciated!

It is simply
forces[forces[..., 0] >= 0]
Read more here: http://scipy-lectures.github.io/intro/numpy/array_object.html#fancy-indexing

Related

Replace elements in an Nx3x2 numpy array with elements located in an Mx2 numpy array

I have the following xy numpy array which represents the locations of the vertices of some triangles:
array([[[ 0.30539728, 49.82845203],
[ 0.67235022, 49.95042185],
[ 0.268982 , 49.95195348]],
[[ 0.268982 , 49.95195348],
[ 0.67235022, 49.95042185],
[ 0.27000135, 50.16334035]],
...
[[ 1.00647459, 50.25958169],
[ 0.79479121, 50.3010079 ],
[ 0.67235022, 49.95042185]],
[[ 0.79479121, 50.3010079 ],
[ 0.6886783 , 50.25867683],
[ 0.67235022, 49.95042185]]])
Here, it's an array of shape (10, 3, 2) but it could as well be (5, 3, 2) or (18, 3, 2), you name it. In any case it's of shape (N, 3, 2).
I have another numpy array to_replace of shape (4, 2) but it could as well be (6, 2) or (7, 2), but always of shape (M, 2):
array([[ 1.08267406, 49.88690993],
[ 1.1028248 , 50.01440407],
[ 0.74114309, 49.73183549],
[ 1.08267406, 49.88690993]])
It represents the locations of pairs of coordinates that can be found in my first array. Note that each of these pairs is present at least once in xy but could be present more than once.
Finally, I have a third array replace_by of which shape (8,) (or of shape (M*2) based on the indication above) and which values are meant to replace exactly those contained in to_replace in my first xy array. It looks like this:
array([ 0.87751214, 49.91866589, 0.88758751, 49.98241296, 0.70674665, 49.84112867, 0.87751214, 49.91866589])
So basically all pairs [1.08267406, 49.88690993] in xy should be replaced by [0.87751214, 49.91866589] for example.
My current code looks like this but it works only if to_replace and replace_by are strictly of shape (2, 2).
indices = (xy == to_replace[:, None][:, None])[0]
xy[indices] = replace_by
I already looked at a number of answers and actually got inspired by some of them but I still can't get it to work.

You can use numpy.isclose to compare rows and then use .all(axis=2) to find where all last rows are the same. Numpy will broadcast each row to fit xy shape.
import numpy as np
xy = np.array([[[ 0.30539728, 49.82845203],
[ 0.67235022, 49.95042185],
[ 0.268982 , 49.95195348]],
[[ 0.268982 , 49.95195348],
[ 0.67235022, 49.95042185],
[ 0.27000135, 50.16334035]],
[[ 1.00647459, 50.25958169],
[ 0.79479121, 50.3010079 ],
[ 0.67235022, 49.95042185]],
[[ 0.79479121, 50.3010079 ],
[ 0.6886783 , 50.25867683],
[ 0.67235022, 49.95042185]]])
xy_start = xy.copy()
to_replace = np.array([[ 1.08267406, 49.88690993],
[ 1.1028248 , 50.01440407],
# [ 0.74114309, 49.73183549],
[ 0.6886783 , 50.25867683],
[ 1.08267406, 49.88690993]])
replace_by = np.array([ 0.87751214, 49.91866589, 0.88758751, 49.98241296, 0.70674665, 49.84112867, 0.87751214, 49.91866589])
replace_by_reshaped = replace_by.reshape(-1, 2)
for i, row in enumerate(to_replace):
xy[np.isclose(xy, row).all(axis=2)] = replace_by_reshaped[i]
print(xy_start)
# [[[ 0.30539728 49.82845203]
# [ 0.67235022 49.95042185]
# [ 0.268982 49.95195348]]
# [[ 0.268982 49.95195348]
# [ 0.67235022 49.95042185]
# [ 0.27000135 50.16334035]]
# [[ 1.00647459 50.25958169]
# [ 0.79479121 50.3010079 ]
# [ 0.67235022 49.95042185]]
# [[ 0.79479121 50.3010079 ]
# [ 0.6886783 50.25867683]
# [ 0.67235022 49.95042185]]]
print(xy)
# [[[ 0.30539728 49.82845203]
# [ 0.67235022 49.95042185]
# [ 0.268982 49.95195348]]
# [[ 0.268982 49.95195348]
# [ 0.67235022 49.95042185]
# [ 0.27000135 50.16334035]]
# [[ 1.00647459 50.25958169]
# [ 0.79479121 50.3010079 ]
# [ 0.67235022 49.95042185]]
# [[ 0.79479121 50.3010079 ]
# [ 0.70674665 49.84112867]
# [ 0.67235022 49.95042185]]]
EDIT
.all(axis=2) shrink axis=2 to True if all values along axis=2 are True and False else. I think little 2d example made it clear what is happening here.
>>> import numpy as np
>>> a = np.array([[0, 1], [0, 2], [3, 4]])
>>> a
array([[0, 1],
[0, 2],
[3, 4]])
>>> np.isclose(a, [0, 1])
array([[ True, True],
[ True, False],
[False, False]])
>>> np.isclose(a, [0, 1]).all(axis=1)
array([ True, False, False])
>>> a[np.isclose(a, [0, 1]).all(axis=1)]
array([[0, 1]])
>>> a[np.isclose(a, [0, 1]).all(axis=1)] = [12, 14]
>>> a
array([[12, 14],
[ 0, 2],
[ 3, 4]])

The numpy-indexed package (disclaimer: I am its author) contains functionality to solve this problem in a vectorized and elegant manner.
Given the arrays as you have defined them, this one-liner should do the trick:
import numpy_indexed as npi
npi.remap(xy.reshape(-1, 2), to_replace, replace_by.reshape(-1, 2)).reshape(-1, 3, 2)

how to split an array by value

My code so far is:
import numpy as np
data=np.genfromtxt('filename')
print(data)
which prints:
[[ 0.723 1. ]
[ 0.433 2. ]
[ 0.258 1. ]
[ 1.52 2. ]
[ 0.083 2. ]
[ 2.025 1. ]
[ 3.928 1. ]]
How do i split the data into two groups, based on if the line has a 1 or 2?

A simple solution is to use np.where which returns results of a conditional statement in the form of a tuple of arrays, which can be directly used with numpy's advanced slice notation to slice that data into a new variable.
import numpy as np
data = np.array(
[[ 0.723, 1. ],
[ 0.433, 2. ],
[ 0.258, 1. ],
[ 1.52, 2. ],
[ 0.083, 2. ],
[ 2.025, 1. ],
[ 3.928, 1. ]])
data1 = data[np.where(data[:,1] == 1)]
data2 = data[np.where(data[:,1] == 2)]
print(data1)
print(data2)

How about something like this:
import numpy as np
data = np.asarray([[0.723, 1.],
[0.433, 2.],
[0.258, 1.],
[1.520, 2.],
[0.083, 2.],
[2.025, 1.],
[3.928, 1.]])
split_data = [data[data[:,1] == 1.], data[data[:,1] == 2.]]
print(f'data:\n{data}')
print(f'split_data:\n{split_data}')
Explanation:
data[:,1] references the value in the 2nd "column" per se.
Output:
data:
[[0.723 1. ]
[0.433 2. ]
[0.258 1. ]
[1.52 2. ]
[0.083 2. ]
[2.025 1. ]
[3.928 1. ]]
split_data:
[array([[0.723, 1. ],
[0.258, 1. ],
[2.025, 1. ],
[3.928, 1. ]]),
array([[0.433, 2. ],
[1.52 , 2. ],
[0.083, 2. ]])]

Your question was rather brief, so I didn't quite catch the dataformat but I tried replicating it with:
foo = [[ 0.723, 1 ], [ 0.433, 2 ], [ 0.258, 1 ], [ 1.52, 2 ],
[ 0.083, 2 ], [ 2.025, 1 ], [ 3.928, 1 ]]
In case would want to filter this list foo to only contain numbers matching certain number you could use the following list comprehension:
foo_is_1 = [e for e in foo if e[1] == 1]
foo_is_2 = [e for e in foo if e[1] == 2]
print(foo_is_1)
print(foo_is_2)
In case you know nothing about the second argument and just want to split your list up in a list of lists with unique second arguments you could use:
list_of_lists = [[e for e in foo if e[1] == a] for a in list(set([a[1] for a in foo]))]
for entry in list_of_lists:
print(entry)
Which is basically two list comprehensions, one for each unique second argument a, and one for each entry e in foo.

Set numpy array elements to zero for each row's smallest 2 elements [duplicate]

This question already has an answer here:
Fill a matrix from a matrix of indices
(1 answer)
Closed 5 years ago.
For example
E =
array([[ 10. , 2.38761596, 7.00090613, 4.51495754],
[ 2.38761596, 10. , 2.80035826, 1. ],
[ 7.00090613, 2.80035826, 10. , 5.95109207],
[ 4.51495754, 1. , 5.95109207, 10. ]])
The indices for smallest 2 for each row can be get from argsort :
IndexSortE = np.argsort(E)
smallest2 = IndexSortE[:,0:2]
smallest2
array([[1, 3],
[3, 0],
[1, 3],
[1, 0]])
Now how do I get E0 like this ?? :
E0 =
array([[ 10. , 0.00000000, 7.00090613, 0.00000000],
[ 0.00000000, 10. , 2.80035826, 0.00000000],
[ 7.00090613, 0.00000000, 10. , 0.00000000],
[ 0.00000000, 0.00000000, 5.95109207, 10. ]])
Thanks

You can create another array of row indices; then take advantage of advanced indexing to modify the corresponding values:
E[np.arange(E.shape[0])[:,None], smallest2] = 0
E
#array([[ 10. , 0. , 7.00090613, 0. ],
# [ 0. , 10. , 2.80035826, 0. ],
# [ 7.00090613, 0. , 10. , 0. ],
# [ 0. , 0. , 5.95109207, 10. ]])
To add some explanations, use np.broadcast_arrays to see how these indices are broadcasted:
np.broadcast_arrays(np.arange(E.shape[0])[:,None], smallest2)
# [array([[0, 0],
# [1, 1],
# [2, 2],
# [3, 3]]), array([[1, 3],
# [3, 0],
# [1, 3],
# [1, 0]])]
gives a length two list, the first one gives row indices while the second one gives column indices. Now according to advanced indexing rules, this pair will position elements at
(0, 1), (0, 3),
(1, 3), (1, 0),
...
etc.

Reshape/resize(pivot?) N-dimensional numpy array column-wise

I need to reshape/resize(pivot?) [sorry I am fairly new to numpy, working with it about 6 weeks] an numpy array based on column.
The source numpy array is this:
[[[-0.98261404]
[-0.98261404]
[-0.95991508]
...,
[-0.92496699]
[-0.92731224]
[-0.926328 ]]
[[-0.91894622]
[-0.91894622]
[-0.92171439]
...,
[-1.02966519]
[-1.03908464]
[-1.03527072]]
[[-0.92201427]
[-0.92201427]
[-0.93004196]
...,
[-1.06750448]
[-1.07838491]
[-1.07398661]]
[[-0.9233676 ]
[-0.9233676 ]
[-0.93250255]
...,
[-1.07617807]
[-1.08736608]
[-1.08284474]]
[[-0.91913077]
[-0.91913077]
[-0.92023803]
...,
[-1.01886934]
[-1.02782743]
[-1.02419806]]]
I like to reshape/resize(pivot?) above as follows:
[[[-0.98261404]
[-0.91894622]
[-0.92201427]
[-0.9233676 ]
[-0.91913077]]
...,
[[-0.926328 ]
[-1.03527072]
[-1.07398661]
[-1.08284474]
[-1.02419806]]]
What is the best way to do this?
Thanks!

I believe what you want is (considering I did understood it properly):
>>> B = np.transpose(A, (0, 2, 1))
being A your data and B the resulting array. That will transpose/pivot the last 2 axes. Alternatively, you can write
>>> B = np.swapaxes(A, 1, 2)
which is equivalent (and probably easier to read). Extended for an N-dimensional arrays:
>>> B = np.swapaxes(A, a, b) # being `a` and `b` the axes
An example:
>>> import numpy as np
>>> A = np.random.rand(1, 2, 3)
>>> A
array([[[ 0.54766263, 0.95017886, 0.32949198],
[ 0.76255173, 0.88943131, 0.78594731]]])
>>> np.swapaxes(A, 1, 2)
array([[[ 0.54766263, 0.76255173],
[ 0.95017886, 0.88943131],
[ 0.32949198, 0.78594731]]])
Alternatively, you can just transpose the array:
>>> A.T # equivalent to np.transpose(A, (2, 1, 0))
array([[[ 0.54766263],
[ 0.76255173]],
[[ 0.95017886],
[ 0.88943131]],
[[ 0.32949198],
[ 0.78594731]]])
Reordering the dimensions in the opposite order (2, 1, 0).

Let's say your array is a.
Then this should do the trick :
x,y,z = a.shape
b = a.T #Transpose - get the indices grouped along the other axis
b = b.reshape(y, x, z) #Interchange the axes.
For example :
In [58]: a = np.random.random(20)
In [59]: a = a.reshape(4,5,1)
In [60]: a
Out[60]:
array([[[ 0.40906066],
[ 0.57160002],
[ 0.22642471],
[ 0.35845352],
[ 0.26999423]],
[[ 0.91962882],
[ 0.62664991],
[ 0.21286972],
[ 0.39995373],
[ 0.1141539 ]],
[[ 0.03040894],
[ 0.79666903],
[ 0.72822631],
[ 0.84388555],
[ 0.23265895]],
[[ 0.63548896],
[ 0.50314843],
[ 0.88547892],
[ 0.49824574],
[ 0.55835843]]])
In [61]: b = b.reshape(y, x, z)
In [62]: x,y,z = a.shape
In [63]: b = a.T
In [64]: b = b.reshape(y,x,z)
In [65]: b
Out[65]:
array([[[ 0.40906066],
[ 0.91962882],
[ 0.03040894],
[ 0.63548896]],
[[ 0.57160002],
[ 0.62664991],
[ 0.79666903],
[ 0.50314843]],
[[ 0.22642471],
[ 0.21286972],
[ 0.72822631],
[ 0.88547892]],
[[ 0.35845352],
[ 0.39995373],
[ 0.84388555],
[ 0.49824574]],
[[ 0.26999423],
[ 0.1141539 ],
[ 0.23265895],
[ 0.55835843]]])

Adding two 2D NumPy arrays ignoring NaNs in them

What is the right way to add 2 numpy arrays a and b (both 2D) with numpy.nan as missing value?
a + b
or
numpy.ma.sum(a,b)

Since the inputs are 2D arrays, you can stack them along the third axis with np.dstack and then use np.nansum which would ensure NaNs are ignored, unless there are NaNs in both input arrays, in which case output would also have NaN. Thus, the implementation would look something like this -
np.nansum(np.dstack((A,B)),2)
Sample run -
In [157]: A
Out[157]:
array([[ 0.77552455, 0.89241629, nan, 0.61187474],
[ 0.62777982, 0.80245533, nan, 0.66320306],
[ 0.41578442, 0.26144272, 0.90260667, nan],
[ 0.65122428, 0.3211213 , 0.81634856, nan],
[ 0.52957704, 0.73460363, 0.16484994, 0.20701344]])
In [158]: B
Out[158]:
array([[ 0.55809925, 0.1339353 , nan, 0.35154039],
[ 0.94484722, 0.23814073, 0.36048809, 0.20412318],
[ 0.25191484, nan, 0.43721322, 0.95810905],
[ 0.69115038, 0.51490958, nan, 0.44613473],
[ 0.01709308, 0.81771896, 0.3229837 , 0.64013882]])
In [159]: np.nansum(np.dstack((A,B)),2)
Out[159]:
array([[ 1.3336238 , 1.02635159, nan, 0.96341512],
[ 1.57262704, 1.04059606, 0.36048809, 0.86732624],
[ 0.66769925, 0.26144272, 1.33981989, 0.95810905],
[ 1.34237466, 0.83603089, 0.81634856, 0.44613473],
[ 0.54667013, 1.55232259, 0.48783363, 0.84715226]])

Just replace the NaNs with zeros in both arrays:
a[np.isnan(a)] = 0 # replace all nan in a with 0
b[np.isnan(b)] = 0 # replace all nan in b with 0
And then perform the addition:
a + b
This relies on the fact that 0 is the "identity element" for addition.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Removing "rows" containing negative values in 3D numpy array - python

It is simply forces[forces[..., 0] >= 0] Read more here: http://scipy-lectures.github.io/intro/numpy/array_object.html#fancy-indexing

Related

Replace elements in an Nx3x2 numpy array with elements located in an Mx2 numpy array

how to split an array by value

Set numpy array elements to zero for each row's smallest 2 elements [duplicate]

Reshape/resize(pivot?) N-dimensional numpy array column-wise

Adding two 2D NumPy arrays ignoring NaNs in them

Categories

Resources