I have two 2D arrays like:
A=array[[4,5,6],
[0,7,8],
[0,9,0]]
B = array[[11,12,13],
[14,15,16],
[17,18,19]]
In array A where element value is 0 i want to replace same value in array B by 0 and store the changed matrix in a new variable and retain the old B matrix.
Thanks in advance.
import numpy as np
A=np.array([[4,5,6],
[0,7,8],
[0,9,0]])
B =np.array([[11,12,13],
[14,15,16],
[17,18,19]])
C = B.copy()
B[A == 0] = 0
C, B = B, C
The line B[A == 0] basically first gets all the the values where the array A is 0 by the line A == 0 . It return a boolean array with true at the position where value is zero in array A. This boolean array is then used to mask the array B and assigns 0 to indices the boolean values is True.
Related
I am new to numpy so any help is appreciated. Say I have two 1-0 masks A and B in 2D numpy array with the same dimension.
Now I would like to do logical operation to subtract B from A
A B Expected Result
1 1 0
1 0 1
0 1 0
0 0 0
But i am not sure it works when a = 0 and b = 1 where a and b are elements from A and B respectively for A = A - B
So I do something like
A = np.where(B == 0, A, 0)
But this is not very readable. Is there a better way to do that
Because for logical or, I can do something like
A = A | B
Is there a similar operator that I can do the subtraction?
The operation that you have described can be given by the following boolean operation
A = A & (~B)
where & is the element-wise AND operation and ~ is the elementwise NOT operation.
for each elements a and b in A and B respectively, we have
a = 1 and b = 1 => a & (~b) = 0
a = 1 and b = 0 => a & (~b) = 1
a = 0 and b = 1 => a & (~b) = 0
a = 0 and b = 0 => a & (~b) = 0
Intuitively, this can be simply understood as the following. We interpret each array A and B as sets, each containing only the indices for which the value is 1. (in your case A = {0, 1} and B = {0,2}). Then the result we want is a set that contains the elements such that that element is in A AND NOT in B.
Note that boolean algebra proves that any binary boolean operation can be acheived using AND, NOT, and OR gates (strictly you need only NOT and either the AND or the OR gate), so naturally, the operation you have specified is no exception.
Since subtraction is not supported for booleans, you need to cast at least one of the arrays to an integer dtype before subtracting. If you want to make sure that the result can't be negative, you can use numpy.maximum.
np.maximum(A.astype(int) - B, 0)
I have a set like this:
N1 N2
0 a b
1 b f
2 c d
3 d a
4 e b
I want to get the indexes with the repeated values between the two columns, and the value itself.
From the example, I should get something like these shortlists:
(value, idx(N1), idx(N2))
(a, 0, 3)
(b, 1, 0)
(b, 1, 4)
(d, 3, 2)
I have been able to do it with two for-loops, but for a half-million rows dataframe it took hours...
Use numpy broadcasting comparison and then use argwhere to find the indices where the values where equal:
import numpy as np
# make a broadcasted comparison
mat = df['N2'].values == df['N1'].values[:, None]
# find the indices where the values are True
where = np.argwhere(mat)
# select the values
values = df['N1'][where[:, 0]]
# create the DataFrame
res = pd.DataFrame(data=[[val, *row] for val, row in zip(values, where)], columns=['values', 'idx_N1', 'idx_N2'])
print(res)
Output
values idx_N1 idx_N2
0 a 0 3
1 b 1 0
2 b 1 4
3 d 3 2
I have a nested array with some values. I have another array, where the length of both arrays are equal. I'd like to get an output, where I have a nested array of 1's and 0's, such that it is 1 where the value in the second array was equal to the value in that nested array.
I've taken a look on existing stack overflow questions but have been unable to construct an answer.
masks_list = []
for i in range(len(y_pred)):
mask = (y_pred[i] == y_test.values[i]) * 1
masks_list.append(mask)
masks = np.array(masks_list);
Essentially, that's the code I currently have and it works, but I think that it's probably not the most effecient way of doing it.
YPRED:
[[4 0 1 2 3 5 6]
[0 1 2 3 5 6 4]]
YTEST:
8 1
5 4
Masks:
[[0 0 1 0 0 0 0]
[0 0 0 0 0 0 1]]
Another good solution with less line of code.
a = set(y_pred).intersection(y_test)
f = [1 if i in a else 0 for i, j in enumerate(y_pred)]
After that you can check performance like in this answer as follow:
import time
from time import perf_counter as pc
t0=pc()
a = set(y_pred).intersection(y_test)
f = [1 if i in a else 0 for i, j in enumerate(y_pred)]
t1 = pc() - t0
t0=pc()
for i in range(len(y_pred)):
mask = (y_pred[i] == y_test[i]) * 1
masks_list.append(mask)
t2 = pc() - t0
val = t1 - t2
Generally it means if value is positive than the first solution are slower.
If you have np.array instead of list you can try do as described in this answer:
type(y_pred)
>> numpy.ndarray
y_pred = y_pred.tolist()
type(y_pred)
>> list
Idea(least loop): compare array and nested array:
masks = np.equal(y_pred, y_test.values)
you can look at this too:
np.array_equal(A,B) # test if same shape, same elements values
np.array_equiv(A,B) # test if broadcastable shape, same elements values
np.allclose(A,B,...) # test if same shape, elements have close enough values
What is the most efficient numpy way to replace masked values in an array with the average of the closest unmasked values next to them?
eg:
a = np.array([2,6,4,8])
b = np.ma.masked_where(a>5,a)
print b
masked_array(data = [2 -- 4 --],
mask = [False True False True],
fill_value = 999999)
I want the masked values in b to be replaced with the average of values just next to them. Boundaries can repeat the closest unmasked value. So in this example, b will be the following:
b = [2,3,4,4]
The main reason for this question is to see whether this can be done efficiently without the use of an iterator.
you can use np.interp and np.where
import numpy as np
a = np.array([2,6,4,8])
mymask = a>5
b = np.ma.masked_where(mymask,a)
print b
# [2 -- 4 --]
c = np.interp(np.where(mymask)[0],np.where(~mymask)[0],b[np.where(~mymask)[0]])
b[np.where(mymask)[0]] = c
print b
# [2 3 4 4]
In Numpy I have a boolean array of equal length of a matrix. I want to run a calculation on the matrix elements that correspond to the boolean array. How do I do this?
a: [true, false, true]
b: [[1,1,1],[2,2,2],[3,3,3]]
Say the function was to sum the elements of the sub arrays
index 0 is True: thus I add 3 to the summation (Starts at zero)
index 1 is False: thus summation remains at 3
index 2 is True: thus I add 9 to the summation for a total of 12
How do I do this (the boolean and summation part; I don't need how to add up each individual sub array)?
You can simply use your boolean array a to index into the rows of b, then take the sum of the resulting (2, 3) array:
import numpy as np
a = np.array([True, False, True])
b = np.array([[1,1,1],[2,2,2],[3,3,3]])
# index rows of b where a is True (i.e. the first and third row of b)
print(b[a])
# [[1 1 1]
# [3 3 3]]
# take the sum over all elements in these rows
print(b[a].sum())
# 12
It sounds like you would benefit from reading the numpy documentation on array indexing.