a = np.array([[0, 2, 0, 0], [0, 1, 3, 0], [0, 0, 10, 11], [0, 0, 1, 7]])
array([[ 0, 2, 0, 0],
[ 0, 1, 3, 0],
[ 0, 0, 10, 11],
[ 0, 0, 1, 7]])
There are 0 entries in each row. I need to assign a value to each of these zero entries, where the value is calculated as follows:
V = 0.1 * Si / Ni
where Si is the sum of row i
Ni is the number of zero entries in row i
I can calculate Si and Ni fairly easy:
S = np.sum(a, axis=1)
array([ 2, 4, 21, 8])
N = np.count_nonzero(a == 0, axis=1)
array([3, 2, 2, 2])
Now, V is calculated as:
V = 0.1 * S/N
array([0.06666667, 0.2 , 1.05 , 0.4 ])
But how do I assign V[i] to a zero entry in i-th row? So I'm expecting to get the following array a:
array([[ 0.06666667, 2, 0.06666667, 0.06666667],
[ 0.2, 1, 3, 0.2],
[ 1.05, 1.05, 10, 11],
[ 0.4, 0.4, 1, 7]])
I need some kind of selective broadcasting operation or assignment?
Use np.where
np.where(a == 0, v.reshape(-1, 1), a)
array([[ 0.06666667, 2. , 0.06666667, 0.06666667],
[ 0.2 , 1. , 3. , 0.2 ],
[ 1.05 , 1.05 , 10. , 11. ],
[ 0.4 , 0.4 , 1. , 7. ]])
Here's a way using np.where:
z = a == 0
np.where(z, (0.1*a.sum(1)/z.sum(1))[:,None], a)
array([[ 0.06666667, 2. , 0.06666667, 0.06666667],
[ 0.2 , 1. , 3. , 0.2 ],
[ 1.05 , 1.05 , 10. , 11. ],
[ 0.4 , 0.4 , 1. , 7. ]])
Maybe using a mask:
for i in range(V.size):
print((a[i,:] == 0) * V[i] + a[i,:])
Related
I have an numpy array. I want to normalized each rows based on this formula
x_norm = (x-x_min)/(x_max-x_min)
, where x_min is the minimum of each row and x_max is the maximum of each row. Here is a simple example:
a = np.array(
[[0, 1 ,2],
[2, 4 ,7],
[6, 10,5]
])
and desired output:
a = np.array([
[0, 0.5 ,1],
[0, 0.4 ,1],
[0.2, 1 ,0]
])
Thank you
IIUC, you can use raw numpy operations:
x = np.array(
[[0, 1 ,2],
[2, 4 ,7],
[6, 10,5]
])
x_norm = ((x.T-x.min(1))/(x.max(1)-x.min(1))).T
# OR
x_norm = (x-x.min(1)[:,None])/(x.max(1)-x.min(1))[:,None]
output:
array([[0. , 0.5, 1. ],
[0. , 0.4, 1. ],
[0.2, 1. , 0. ]])
NB. if efficiency matters, save the result of x.min(1) in a variable as it is used twice
You could use np.apply_along_axis
a = np.array(
[[0, 1 ,2],
[2, 4 ,7],
[6, 10,5]
])
def scaler(x):
return (x-x.min())/(x.max()-x.min())
np.apply_along_axis(scaler, axis=1, arr=a)
Output:
array([[0. , 0.5, 1. ],
[0. , 0.4, 1. ],
[0.2, 1. , 0. ]])
I have a list like this,
mylist = [
np.array([48.5, 38.0, 40.0]),
np.array([61.5, 52.5, 55.5, 46.5]),
np.array([35.5, 36.5]),
]
I want to find the index of the array, and the location of the specific value in the array together with the values in mylist.
I am able to achieve the last column with np.concatenate(mylist) but don't know how to handle the rest efficiently.
expected = np.vstack(
(
np.array([0, 0, 0, 1, 1, 1, 1, 2, 2]),
np.array([0, 1, 2, 0, 1, 2, 3, 0, 1]),
np.array([48.5, 38.0, 40.0, 61.5, 52.5, 55.5, 46.5, 35.5, 36.5]),
)
).T
It can be read as i.e. 38 is in the first array (index=0) and it is the second element of that array (index = 1).
This does what you ask, if this is really what you want.
import numpy as np
mylist = [
np.array([48.5, 38. , 40. ]),
np.array([61.5, 52.5, 55.5, 46.5 ]),
np.array([35.5, 36.5])]
a1 = []
a2 = []
for i,l in enumerate(mylist):
a1.extend( [i] * len(l) )
a2.extend( list(range(len(l))) )
final = np.array( [a1, a2, np.concatenate(mylist)] ).T
print(final)
Output:
[[ 0. 0. 48.5]
[ 0. 1. 38. ]
[ 0. 2. 40. ]
[ 1. 0. 61.5]
[ 1. 1. 52.5]
[ 1. 2. 55.5]
[ 1. 3. 46.5]
[ 2. 0. 35.5]
[ 2. 1. 36.5]]
You can use use map with len to find out len of each sublist in mylist. Then use that in np.repeat to get "X" co-ordinate. Now, apply np.arange on each of the lengths to get "Y" co-ordinate and concatenate them using np.hstack. Now, just np.column_stack them together.
lens = list(map(len, mylist))
idx0 = np.repeat(np.arange(len(mylist)), lens) # [0, 0, 0, 1, 1, 1, 1, 2, 2]
idx1 = np.hstack([np.arange(v) for v in lens]) # [0, 1, 2, 0, 1, 2, 3, 0, 1]
vals = np.hstack(mylist) # [48.5, 38. , 40. , 61.5, 52.5, 55.5, 46.5, 35.5, 36.5]
out = np.column_stack([idx0, idx1, vals])
print(out)
[[ 0. 0. 48.5]
[ 0. 1. 38. ]
[ 0. 2. 40. ]
[ 1. 0. 61.5]
[ 1. 1. 52.5]
[ 1. 2. 55.5]
[ 1. 3. 46.5]
[ 2. 0. 35.5]
[ 2. 1. 36.5]]
I have a feature matrix that I want to row normalize.
This is what I have done based on min-max scaling and I am getting an error. Can anyone help me with this error.
a = np.random.randint(10, size=(4,5))
s=a.max(axis=1) - a.min(axis=1)
np.amax(a,axis=1)
print(s)
(a - a.min(axis=1))/(a.max(axis=1) - a.min(axis=1))\
>>[7 6 4 5]
4 print(s)
5
----> 6 (a - a.min(axis=1))/(a.max(axis=1) - a.min(axis=1))
ValueError: operands could not be broadcast together with shapes (4,5) (4,)
Try to work with transposed matrix:
b = a.T
m = (b - b.min(axis=0)) / (b.max(axis=0) - b.min(axis=0))
m = m.T
>>> a
array([[2, 3, 2, 8, 3], # min=2 -> 0, max=8 -> 1
[3, 3, 9, 2, 1], # min=1 -> 0, max=9 -> 1
[1, 9, 8, 4, 7], # min=1 -> 0, max=9 -> 1
[6, 8, 7, 9, 4]]) # min=4 -> 0, max=9 -> 1
>>> m
array([[0. , 0.16666667, 0. , 1. , 0.16666667],
[0.25 , 0.25 , 1. , 0.125 , 0. ],
[0. , 1. , 0.875 , 0.375 , 0.75 ],
[0.4 , 0.8 , 0.6 , 1. , 0. ]])
I have an alternative solution , I am not sure if this one is correct.Would be great if someone can comment on it.
def row_normalize(mf):
row_sums = np.array(mf.sum(1))
new_matrix = mf / row_sums[:, np.newaxis]
return new_matrix
As an example, I have an array of branches and probabilities that looks like this:
paths = np.array([
[1, 0, 1.0],
[2, 0, 0.4],
[2, 1, 0.6],
[3, 1, 1.0],
[5, 1, 0.25],
[5, 2, 0.5],
[5, 4, 0.25],
[6, 0, 0.7],
[6, 5, 0.2],
[6, 2, 0.1]])
The columns are upper node, lower node, probability.
Here's a visual of the nodes:
6
/ | \
5 0 2
/ | \ / \
1 2 4 0 1
| /\ |
0 0 1 0
|
0
I want to be able to pick a starting node and output an array of the branches and cumulative probabilities, including all the duplicate branches. For example:
start_node = 5 should return
array([
[5, 1, 0.25],
[5, 2, 0.5],
[5, 4, 0.25],
[1, 0, 0.25],
[2, 0, 0.2],
[2, 1, 0.3],
[1, 0, 0.3]])
Notice the [1, 0, x] branch is included twice, as it's fed by both the [5, 1, 0.25] branch and the [2, 1, 0.3] branch.
Here's some code I got working but it's far too slow for my application (millions of branches):
def branch(start_node, paths):
output = paths[paths[:,0]==start_node]
next_nodes = output
while True:
can_go_lower = np.isin(next_nodes[:,1], paths[:,0])
if ~np.any(can_go_lower): break
next_nodes_checked = next_nodes[can_go_lower]
next_nodes = np.empty([0,3])
for nodes in next_nodes_checked:
to_append = paths[paths[:,0]==nodes[1]]
to_append[:,2] *= nodes[2]
next_nodes = np.append(next_nodes, to_append, axis=0)
output = np.append(output, next_nodes, axis=0)
return output
The branches are always higher to lower, therefor getting caught in circles isn't a concern. A way to vectorize the for loop and avoid the appends would be the best optimization, I think.
Instead of storing in numpy array lets' store graph in dict.
tree = {k:arr[arr[:, 0] == k] for k in np.unique(arr[:, 0])}
Make as set of nodes which are non-leaf:
non_leaf_nodes = set(np.unique(arr[:, 0]))
Now to find the branch and cumulative probability:
def branch(start_node, tree, non_leaf_nodes):
curr_nodes = [[start_node, start_node, 1.0]] #(prev_node, starting_node, current_probability)
output = []
while True:
next_nodes = []
for _, node, prob in curr_nodes:
if node not in non_leaf_nodes: continue
subtree = tree[node]
to_append = subtree.copy()
to_append[:, 2] *= prob
to_append = to_append.tolist()
output += to_append
next_nodes += to_append
curr_nodes = next_nodes
if len(curr_nodes) == 0:
break
return np.array(output)
Output:
>>> branch(5, tree, non_leaf_nodes)
array([
[5. , 1. , 0.25],
[5. , 2. , 0.5 ],
[5. , 4. , 0.25],
[1. , 0. , 0.25],
[2. , 0. , 0.2 ],
[2. , 1. , 0.3 ],
[1. , 0. , 0.3 ]])
I am expecting it to work faster. Let me know.
I have a list of numpy array indices which I created with argsort():
i =
[array([0, 1, 3, 2, 4], dtype=int64),
array([1, 3, 0, 2, 4], dtype=int64),
array([2, 4, 0, 1, 3], dtype=int64),
array([3, 1, 0, 2, 4], dtype=int64),
array([4, 2, 0, 3, 1], dtype=int64)]
This is the corresponding list of arrays with values:
v =
[array([0. , 0.19648367, 0.24237755, 0.200832 , 0.28600039]),
array([0.19648367, 0. , 0.25492185, 0.15594099, 0.31378135]),
array([0.24237755, 0.25492185, 0. , 0.25685254, 0.2042604 ]),
array([0.200832 , 0.15594099, 0.25685254, 0. , 0.29995309]),
array([0.28600039, 0.31378135, 0.2042604 , 0.29995309, 0. ])]
When I try to loop over the lists like this:
for line in i:
v[line]
I get the error:
TypeError: only integer scalar arrays can be converted to a scalar index
But when I try to access them individually like this:
v[0][i[0]]
It works and outputs the values in v[0] in correct order like this:
array([0. , 0.19648367, 0.200832 , 0.24237755, 0.28600039])
I want the arrays in v ordered from the smallest value to biggest.
What am I doing wrong?
This is all easier (and faster) if you don't use a python list of Numpy arrays, but instead use a multi-dimensional numpy array. Then you have all the great tool from numpy at you disposal and can avoid slow loops. For example for you can use np.take_along_axis:
from numpy import array
i = np.array([
[0, 1, 3, 2, 4],
[1, 3, 0, 2, 4],
[2, 4, 0, 1, 3],
[3, 1, 0, 2, 4],
[4, 2, 0, 3, 1]])
v = array([
[0., 0.19648367, 0.24237755, 0.200832 , 0.28600039],
[0.19648367, 0. , 0.25492185, 0.15594099, 0.31378135],
[0.24237755, 0.25492185, 0. , 0.25685254, 0.2042604 ],
[0.200832 , 0.15594099, 0.25685254, 0. , 0.29995309],
[0.28600039, 0.31378135, 0.2042604 , 0.29995309, 0. ]]
)
np.take_along_axis(v,i, 1)
result:
array([[0. , 0.19648367, 0.200832 , 0.24237755, 0.28600039],
[0. , 0.15594099, 0.19648367, 0.25492185, 0.31378135],
[0. , 0.2042604 , 0.24237755, 0.25492185, 0.25685254],
[0. , 0.15594099, 0.200832 , 0.25685254, 0.29995309],
[0. , 0.2042604 , 0.28600039, 0.29995309, 0.31378135]])
Loop through each line of i, and loop through each line of v at the same time using enumerate:
import numpy as np
i = np.array([[0, 1, 3, 2, 4], [1, 3, 0, 2, 4], [2, 4, 0, 1, 3], [3, 1, 0, 2, 4], [4, 2, 0, 3, 1]])
v = np.array([[0. , 0.19648367, 0.24237755, 0.200832 , 0.28600039],
[0.19648367, 0. , 0.25492185, 0.15594099, 0.31378135],
[0.24237755, 0.25492185, 0. , 0.25685254, 0.2042604 ],
[0.200832 , 0.15594099, 0.25685254, 0. , 0.29995309],
[0.28600039, 0.31378135, 0.2042604 , 0.29995309, 0. ]] )
# you can rearrange each line of v by using indices in each row of i
for index, line in enumerate(i):
print(v[index][line])
Output:
[0. 0.19648367 0.200832 0.24237755 0.28600039]
[0. 0.15594099 0.19648367 0.25492185 0.31378135]
[0. 0.2042604 0.24237755 0.25492185 0.25685254]
[0. 0.15594099 0.200832 0.25685254 0.29995309]
[0. 0.2042604 0.28600039 0.29995309 0.31378135]