Substitute row in Numpy if a condition is met - Variation

Substitute row in Numpy if a condition is met - Variation - python

I am still figuring out Numpy syntax! I have something that works but there must be a more concise way to perform this task. In the example below, I replace selected rows of an array with new entries, where the condition is just on one element.
import numpy as np
big_array = np.random.randint(10, size=(5, 2)) # multi-dimension array
print(big_array)
bad_values = np.less_equal(big_array[:,0], 4) # condition value in one dimension
bad_rows = np.nonzero(bad_values)[0] # indexes to change, e.g. rows
print(f'these are the rows to replace {bad_rows}')
new_rows = np.random.randint(10, size=((bad_rows.size),2))+10 # smaller multi-dim array
np.put(big_array[:,0],bad_rows,y[:,0]) # should be a single line to combine this
np.put(big_array[:,1],bad_rows,y[:,1]) # with this?
print(big_array)
sample output that I want might look like
[[2 4]
[5 9]
[6 6]
[6 7]
[0 6]]
these are the rows to replace [0 4]
[[16 17]
[ 5 9]
[ 6 6]
[ 6 7]
[18 17]]
I don't know how to format put for arguments with different dimensions. This seems like it should be a one-liner. (If I try where I get length issues broadcasting.) What am I missing?

Related

Numpy python - calculating sum of columns from irregular dimension

I have a multi-dimensional array for scores, and for which, I need to get sum of each columns at 3rd level in Python. I am using Numpy to achieve this.
import numpy as np
Data is something like:
score_list = [
[[1,1,3], [1,2,5]],
[[2,7,5], [4,1,3]]
]
This should return:
[[3 8 8] [5 3 8]]
Which is happening correctly using this:
sum_array = np_array.sum(axis=0)
print(sum_array)
However, if I have irregular shape like this:
score_list = [
[[1,1], [1,2,5]],
[[2,7], [4,1,3]]
]
I expect it to return:
[[3 8] [5 3 8]]
However, it comes up with warning and the return value is:
[list([1, 1, 2, 7]) list([1, 2, 5, 4, 1, 3])]
How can I get expected result?

numpy will try to cast it into an nd array which will fail, instead consider passing each sublist individually using zip.
score_list = [
[[1,1], [1,2,5]],
[[2,7], [4,1,3]]
]
import numpy as np
res = [np.sum(x,axis=0) for x in zip(*score_list)]
print(res)
[array([3, 8]), array([5, 3, 8])]

Here is one solution for doing this, keep in mind that it doesn't use numpy and will be very inefficient for larger matrices (but for smaller matrices runs just fine).
# Create matrix
score_list = [
[[1,1,3], [1,2,5]],
[[2,7,5], [4,1,3]]
]
# Get each row
for i in range(1, len(score_list)):
# Get each list within the row
for j in range(len(score_list[i])):
# Get each value in each list
for k in range(len(score_list[i][j])):
# Add current value to the same index
# on the first row
score_list[0][j][k] += score_list[i][j][k]
print(score_list[0])
There is bound to be a better solution but this is a temporary fix for you :)
Edit. Made more efficient

A possible solution:
a = np.vstack([np.array(score_list[x], dtype='object')
for x in range(len(score_list))])
[np.add(*[x for x in a[:, i]]) for i in range(a.shape[1])]
Another possible solution:
a = sum(score_list, [])
b = [a[x] for x in range(0,len(a),2)]
c = [a[x] for x in range(1,len(a),2)]
[np.add(x[0], x[1]) for x in [b, c]]
Output:
[array([3, 8]), array([5, 3, 8])]

numpy broadcast multiply on condition?

I have two arrays, one of shape arr1.shape = (1000,2) and the other of shape arr2.shape = (100,).
I'd like to somehow multiply arr1[:,1]*arr2 where arr1[:,0] == arr2.index so that I get a final shape of arr_out.shape = (1000,). The first column of arr1 is essentially an id where the following condition holds true: set(arr1[:,0]) == set(i for i in range(0,100)), i.e. there is always at least one value index of arr2 found in arr1[:,0].
I can't quite see how to do this in the numpy library but feel there should be a way using numpy multiply, if there was a way to configure the where condition correctly?
I considered perhaps a dummy index dimension for arr2 might help?
A toy example can be produced with the following code snippet
arr2_length = 100
arr1_length = 1000
arr1 = np.column_stack(
(np.random.randint(0,arr2_length,(arr1_length)),
np.random.rand(arr1_length))
)
arr2 = np.random.rand(arr2_length)
# Doesn't work
arr2_b = np.column_stack((
np.arange(arr2_length),
np.random.rand(arr2_length)
))
# np.multiply(arr1[:,1],arr2_b[:,1], where=(arr1[:,0]==arr2_b[:,0]))
One sort of solution I had was to leverage a left join in Pandas to broadcast the smaller array to a same-lengthed array and then multiply, as follows:
df = pd.DataFrame(arr1).set_index(0).join(pd.DataFrame(arr2))
arr_out = (df[0]*df[1]).values
But I'd really like to understand if there's a native numpy way of doing this since I feel using dataframe joins for a multiplication isn't a very readable solution and possibly suffers from excess memory overhead.
Thanks for your help.

I believe this does exactly what you want:
indices, values = arr1[:,0].astype(int), arr1[:,1]
arr_out = values * arr2[indices]

Is this what you're looking for?
import numpy as np
arr1 = np.random.randint(1, 5, (10, 2))
arr2 = np.random.randint(1, 5, (5,))
arr2_sampled = arr2[arr1[:, 0]]
result = arr1[:, 1]*arr2_sampled
Output:
arr1 =
[[4 2]
[3 3]
[2 3]
[3 1]
[2 1]
[2 4]
[1 1]
[4 2]
[4 1]
[3 4]]
arr2 =
[4 1 2 1 2]
arr2_sampled =
[2 1 2 1 2 2 1 2 2 1]
result =
[4 3 6 1 2 8 1 4 2 4]

Can't append numpy arrays after for loop?

After a for loop, I can not append each iteration into a single array:
in:
for a in l:
arr = np.asarray(a_lis)
print(arr)
How can I append and return in a single array the above three arrays?:
[[ 0.55133 0.58122 0.66129032 0.67562724 0.69354839 0.70609319
0.6702509 0.63799283 0.61827957 0.6155914 0.60842294 0.60215054
0.59946237 0.625448 0.60215054 0.60304659 0.59856631 0.59677419
0.59408602 0.61021505]
[ 0.58691756 0.6784946 0.64964158 0.66397849 0.67114695 0.66935484
0.67293907 0.66845878 0.65143369 0.640681 0.63530466 0.6344086
0.6281362 0.6281362 0.62634409 0.6281362 0.62903226 0.63799283
0.63709677 0.6978495]
[ 0.505018 0.53405018 0.59408602 0.65143369 0.66577061 0.66487455
0.65412186 0.64964158 0.64157706 0.63082437 0.62634409 0.6218638
0.62007168 0.6648746 0.62096774 0.62007168 0.62096774 0.62007168
0.62275986 0.81362 ]]
I tried to append as a list, using numpy's append, merge, and hstack. None of them worked. Any idea of how to get the previous output?

Use numpy.concatenate to join the arrays:
import numpy as np
a = np.array([[1, 2, 3, 4]])
b = np.array([[5, 6, 7, 8]])
arr = np.concatenate((a, b), axis=0)
print(arr)
# [[1 2 3 4]
# [5 6 7 8]]
Edit1: To do it inside the array (as mentioned in the comment) you can use numpy.vstack:
import numpy as np
for i in range(0, 3):
a = np.random.randint(0, 10, size=4)
if i == 0:
arr = a
else:
arr = np.vstack((arr, a))
print(arr)
# [[1 1 8 7]
# [2 4 9 1]
# [8 4 7 5]]
Edit2: Citing Iguananaut from the comments:
That said, using concatenate repeatedly can be costly. If you know the
size of the output in advance it's better to pre-allocate an array and
fill it as you go.

Extract n columns with highest sum in NumPy array

Imagine I have a NumPy matrix with 100 rows and 1000 columns.
How do I get a new matrix composed by the n columns that have the highest sums in the original matrix?

You can use np.argsort as done by #NPE here. Here's an example on two smaller arrays:
def nlargest_cols(a, n):
return a[:, sorted(a.sum(axis=0).argsort()[-n:][::-1])]
# `a` is a 3x4 array with column sums getting
# larger from left to right.
a = np.arange(12).reshape(3,4)
# `b` is `a` rotated 2 turns.
b = np.rot90(a, 2)
print(nlargest_cols(a, 2))
# [[ 2 3]
# [ 6 7]
# [10 11]]
print(nlargest_cols(b, 3))
# [[11 10 9]
# [ 7 6 5]
# [ 3 2 1]]

Python access all entrys from one dimension in a multidimensional numpy array

I would like to manipulate the data of all entries of a complicated 3d numpy array. I want all entries of all subarrays in the X-Y-Position. I know Matlab can do something like that (with the variable indicator : for everything) I indicated that below with DARK[:][1][1]. Which basically mean I want the second entry from the second the column in all sub arrays. Is there a way to do this in python?
import numpy
# Creating a dummy variable of the type I deal with (If this looks crappy sorry, the variable actually comes from the output of d = pyfits.getdata()):
a = []
for i in range(3):
d = numpy.array([[i, 2*i], [3*i, 4*i]])
a.append(d)
print a
# Pseudo code:
print 'Second row, second column: ', a[:][1][1]
I expect a result like this:
[array([[ 0, 0],[ 0, 0]]),
array([[ 1, 2],[ 3, 4]]),
array([[ 2, 4],[ 6, 8]])]
Second row, second column: [0, 4, 8]

You can do this using slightly different syntax.
import numpy as np
a = np.arange(27).reshape(3,3,3) # Create a 3x3x3 3d array
print("3d Array:")
print(a)
print("Second Row, Second Column: ", a[:,1,1])
Output:
>>> 3d Array:
[[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]]
[[ 9 10 11]
[12 13 14]
[15 16 17]]
[[18 19 20]
[21 22 23]
[24 25 26]]]
>>> Second Row, Second Column: [ 4 13 22]

Found the solution, thanks Divakar and eeScott:
import numpy as np
# Creating a dummy variable of the type I deal with (If this looks crappy sorry, the variable actually comes from the output of d = pyfits.getdata()):
a = []
for i in range(3):
d = np.array([[i, 2*i], [3*i, 4*i]])
a.append(d)
# print variable
print np.array(a)
print 'Second row, second column: ', np.array(a)[:, 1, 1]
# Alternative solution:
a = np.asarray(a)
print a
print 'Second row, second column: ', a[:,1,1]
Result:
[[[0 0][0 0]]
[[1 2][3 4]]
[[2 4][6 8]]]
Second row, second column: [0 4 8]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Substitute row in Numpy if a condition is met - Variation - python

Related

Numpy python - calculating sum of columns from irregular dimension

numpy broadcast multiply on condition?

Can't append numpy arrays after for loop?

Extract n columns with highest sum in NumPy array

Python access all entrys from one dimension in a multidimensional numpy array

Categories

Resources