How to row-normalize a feature matrix? Broadcasting error - python

I have a feature matrix that I want to row normalize.
This is what I have done based on min-max scaling and I am getting an error. Can anyone help me with this error.
a = np.random.randint(10, size=(4,5))
s=a.max(axis=1) - a.min(axis=1)
np.amax(a,axis=1)
print(s)
(a - a.min(axis=1))/(a.max(axis=1) - a.min(axis=1))\
>>[7 6 4 5]
4 print(s)
5
----> 6 (a - a.min(axis=1))/(a.max(axis=1) - a.min(axis=1))
ValueError: operands could not be broadcast together with shapes (4,5) (4,)

Try to work with transposed matrix:
b = a.T
m = (b - b.min(axis=0)) / (b.max(axis=0) - b.min(axis=0))
m = m.T
>>> a
array([[2, 3, 2, 8, 3], # min=2 -> 0, max=8 -> 1
[3, 3, 9, 2, 1], # min=1 -> 0, max=9 -> 1
[1, 9, 8, 4, 7], # min=1 -> 0, max=9 -> 1
[6, 8, 7, 9, 4]]) # min=4 -> 0, max=9 -> 1
>>> m
array([[0. , 0.16666667, 0. , 1. , 0.16666667],
[0.25 , 0.25 , 1. , 0.125 , 0. ],
[0. , 1. , 0.875 , 0.375 , 0.75 ],
[0.4 , 0.8 , 0.6 , 1. , 0. ]])

I have an alternative solution , I am not sure if this one is correct.Would be great if someone can comment on it.
def row_normalize(mf):
row_sums = np.array(mf.sum(1))
new_matrix = mf / row_sums[:, np.newaxis]
return new_matrix

Related

normalize the rows of numpy array based on a custom function

I have an numpy array. I want to normalized each rows based on this formula
x_norm = (x-x_min)/(x_max-x_min)
, where x_min is the minimum of each row and x_max is the maximum of each row. Here is a simple example:
a = np.array(
[[0, 1 ,2],
[2, 4 ,7],
[6, 10,5]
])
and desired output:
a = np.array([
[0, 0.5 ,1],
[0, 0.4 ,1],
[0.2, 1 ,0]
])
Thank you
IIUC, you can use raw numpy operations:
x = np.array(
[[0, 1 ,2],
[2, 4 ,7],
[6, 10,5]
])
x_norm = ((x.T-x.min(1))/(x.max(1)-x.min(1))).T
# OR
x_norm = (x-x.min(1)[:,None])/(x.max(1)-x.min(1))[:,None]
output:
array([[0. , 0.5, 1. ],
[0. , 0.4, 1. ],
[0.2, 1. , 0. ]])
NB. if efficiency matters, save the result of x.min(1) in a variable as it is used twice
You could use np.apply_along_axis
a = np.array(
[[0, 1 ,2],
[2, 4 ,7],
[6, 10,5]
])
def scaler(x):
return (x-x.min())/(x.max()-x.min())
np.apply_along_axis(scaler, axis=1, arr=a)
Output:
array([[0. , 0.5, 1. ],
[0. , 0.4, 1. ],
[0.2, 1. , 0. ]])

apply a function to divide an array by a vector

I have to apply a function which devide a value by another to every row of an numpy array.
here the function:
def myfunc(a, b):
return (a/b)
my numpy ndarray look like this and it represent the "a" value:
[[ 1 2 3 4 ]
[ 5 6 7 8 ]]
and my list which is my b value, looks like this:
[1, 2, 3, 4]
The result I want is :
[[1 1 1 1]
[5 3 2.33 2]]
To do that, I can't use a loop, so I tried with np.vectorize. Here my code:
test = np.vectorize(myfunc)
test(a, b)
this return :
array([[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[
[1]
<NDArray 1 #cpu(0)>,
[1]
<NDArray 1 #cpu(0)>,
[1]
<NDArray 1 #cpu(0)>,
[1]
<NDArray 1 #cpu(0)>]]]]]]]]]]]]]]]]]]]]]]]]]]]]]],
...
so every cell is devide4 times by the first value of b.
but for an unknown reason my code do not work for the ndarray. when I tried with an normal array, it's working. Example:
a = np.array([[1, 2, 3, 4], [5,6,7,8]])
b = [1,2,3,4]
def my(coord, shape):
return (coord/shape)
myfunc = np.vectorize(my)
myfunc(a, b)
result:
array([[1. , 1. , 1. , 1. ],
[5. , 3. , 2.33333333, 2. ]])
Do you guys know what I can do ? I really don't know how I get the ndarray, and why I can't have the right result.
Why do you need np.vectorize?
In [511]: a = np.array([[1, 2, 3, 4], [5,6,7,8]])
...: b = [1,2,3,4]
In [512]: b = np.array(b)
In [513]: a.shape
Out[513]: (2, 4)
In [514]: b.shape
Out[514]: (4,)
In [515]: a / b
Out[515]:
array([[1. , 1. , 1. , 1. ],
[5. , 3. , 2.33333333, 2. ]])

Set numpy array elements to zero for each row's smallest 2 elements [duplicate]

This question already has an answer here:
Fill a matrix from a matrix of indices
(1 answer)
Closed 5 years ago.
For example
E =
array([[ 10. , 2.38761596, 7.00090613, 4.51495754],
[ 2.38761596, 10. , 2.80035826, 1. ],
[ 7.00090613, 2.80035826, 10. , 5.95109207],
[ 4.51495754, 1. , 5.95109207, 10. ]])
The indices for smallest 2 for each row can be get from argsort :
IndexSortE = np.argsort(E)
smallest2 = IndexSortE[:,0:2]
smallest2
array([[1, 3],
[3, 0],
[1, 3],
[1, 0]])
Now how do I get E0 like this ?? :
E0 =
array([[ 10. , 0.00000000, 7.00090613, 0.00000000],
[ 0.00000000, 10. , 2.80035826, 0.00000000],
[ 7.00090613, 0.00000000, 10. , 0.00000000],
[ 0.00000000, 0.00000000, 5.95109207, 10. ]])
Thanks
You can create another array of row indices; then take advantage of advanced indexing to modify the corresponding values:
E[np.arange(E.shape[0])[:,None], smallest2] = 0
E
#array([[ 10. , 0. , 7.00090613, 0. ],
# [ 0. , 10. , 2.80035826, 0. ],
# [ 7.00090613, 0. , 10. , 0. ],
# [ 0. , 0. , 5.95109207, 10. ]])
To add some explanations, use np.broadcast_arrays to see how these indices are broadcasted:
np.broadcast_arrays(np.arange(E.shape[0])[:,None], smallest2)
# [array([[0, 0],
# [1, 1],
# [2, 2],
# [3, 3]]), array([[1, 3],
# [3, 0],
# [1, 3],
# [1, 0]])]
gives a length two list, the first one gives row indices while the second one gives column indices. Now according to advanced indexing rules, this pair will position elements at
(0, 1), (0, 3),
(1, 3), (1, 0),
...
etc.

Addition of every two columns

I would like calculate the sum of two in two column in a matrix(the sum between the columns 0 and 1, between 2 and 3...).
So I tried to do nested "for" loops but at every time I haven't the good results.
For example:
c = np.array([[0,0,0.25,0.5],[0,0.5,0.25,0],[0.5,0,0,0]],float)
freq=np.zeros(6,float).reshape((3, 2))
#I calculate the sum between the first and second column, and between the fird and the fourth column
for i in range(0,4,2):
for j in range(1,4,2):
for p in range(0,2):
freq[:,p]=(c[:,i]+c[:,j])
But the result is:
print freq
array([[ 0.75, 0.75],
[ 0.25, 0.25],
[ 0. , 0. ]])
Normaly the good result must be (0., 0.5,0.5) and (0.75,0.25,0). So I think the problem is in the nested "for" loops.
Is there a person who know how I can calculate the sum every two columns, because I have a matrix with 400 columns?
You can simply reshape to split the last dimension into two dimensions, with the last dimension of length 2 and then sum along it, like so -
freq = c.reshape(c.shape[0],-1,2).sum(2).T
Reshaping only creates a view into the array, so effectively, we are just using the summing operation here and as such must be efficient.
Sample run -
In [17]: c
Out[17]:
array([[ 0. , 0. , 0.25, 0.5 ],
[ 0. , 0.5 , 0.25, 0. ],
[ 0.5 , 0. , 0. , 0. ]])
In [18]: c.reshape(c.shape[0],-1,2).sum(2).T
Out[18]:
array([[ 0. , 0.5 , 0.5 ],
[ 0.75, 0.25, 0. ]])
Add the slices c[:, ::2] and c[:, 1::2]:
In [62]: c
Out[62]:
array([[ 0. , 0. , 0.25, 0.5 ],
[ 0. , 0.5 , 0.25, 0. ],
[ 0.5 , 0. , 0. , 0. ]])
In [63]: c[:, ::2] + c[:, 1::2]
Out[63]:
array([[ 0. , 0.75],
[ 0.5 , 0.25],
[ 0.5 , 0. ]])
Here is one way using np.split():
In [36]: np.array(np.split(c, np.arange(2, c.shape[1], 2), axis=1)).sum(axis=-1)
Out[36]:
array([[ 0. , 0.5 , 0.5 ],
[ 0.75, 0.25, 0. ]])
Or as a more general way even for odd length arrays:
In [87]: def vertical_adder(array):
return np.column_stack([np.sum(arr, axis=1) for arr in np.array_split(array, np.arange(2, array.shape[1], 2), axis=1)])
....:
In [88]: vertical_adder(c)
Out[88]:
array([[ 0. , 0.75],
[ 0.5 , 0.25],
[ 0.5 , 0. ]])
In [94]: a
Out[94]:
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
In [95]: vertical_adder(a)
Out[95]:
array([[ 1, 5, 4],
[11, 15, 9],
[21, 25, 14]])

Normalizing rows of a matrix python

Given a 2-dimensional array in python, I would like to normalize each row with the following norms:
Norm 1: L_1
Norm 2: L_2
Norm Inf: L_Inf
I have started this code:
from numpy import linalg as LA
X = np.array([[1, 2, 3, 6],
[4, 5, 6, 5],
[1, 2, 5, 5],
[4, 5,10,25],
[5, 2,10,25]])
print X.shape
x = np.array([LA.norm(v,ord=1) for v in X])
print x
Output:
(5, 4) # array dimension
[12 20 13 44 42] # L1 on each Row
How can I modify the code such that WITHOUT using LOOP, I can directly have the rows of the matrix normalized? (Given the norm values above)
I tried :
l1 = X.sum(axis=1)
print l1
print X/l1.reshape(5,1)
[12 20 13 44 42]
[[0 0 0 0]
[0 0 0 0]
[0 0 0 0]
[0 0 0 0]
[0 0 0 0]]
but the output is zero.
This is the L₁ norm:
>>> np.abs(X).sum(axis=1)
array([12, 20, 13, 44, 42])
This is the L₂ norm:
>>> np.sqrt((X * X).sum(axis=1))
array([ 7.07106781, 10.09950494, 7.41619849, 27.67670501, 27.45906044])
This is the L∞ norm:
>>> np.abs(X).max(axis=1)
array([ 6, 6, 5, 25, 25])
To normalise rows, just divide by the norm. For example, using L₂ normalisation:
>>> l2norm = np.sqrt((X * X).sum(axis=1))
>>> X / l2norm.reshape(5,1)
array([[ 0.14142136, 0.28284271, 0.42426407, 0.84852814],
[ 0.39605902, 0.49507377, 0.59408853, 0.49507377],
[ 0.13483997, 0.26967994, 0.67419986, 0.67419986],
[ 0.14452587, 0.18065734, 0.36131469, 0.90328672],
[ 0.18208926, 0.0728357 , 0.36417852, 0.9104463 ]])
>>> np.sqrt((_ * _).sum(axis=1))
array([ 1., 1., 1., 1., 1.])
More direct is the norm method in numpy.linalg, if you have it available:
>>> from numpy.linalg import norm
>>> norm(X, axis=1, ord=1) # L-1 norm
array([12, 20, 13, 44, 42])
>>> norm(X, axis=1, ord=2) # L-2 norm
array([ 7.07106781, 10.09950494, 7.41619849, 27.67670501, 27.45906044])
>>> norm(X, axis=1, ord=np.inf) # L-∞ norm
array([ 6, 6, 5, 25, 25])
(after OP edit): You saw zero values because / is an integer division in Python 2.x. Either upgrade to Python 3, or change dtype to float to avoid that integer division:
>>> linfnorm = norm(X, axis=1, ord=np.inf)
>>> X.astype(np.float) / linfnorm[:,None]
array([[ 0.16666667, 0.33333333, 0.5 , 1. ],
[ 0.66666667, 0.83333333, 1. , 0.83333333],
[ 0.2 , 0.4 , 1. , 1. ],
[ 0.16 , 0.2 , 0.4 , 1. ],
[ 0.2 , 0.08 , 0.4 , 1. ]])
You can pass axis=1 parameter:
In [58]: LA.norm(X, axis=1, ord=1)
Out[58]: array([12, 20, 13, 44, 42])
In [59]: LA.norm(X, axis=1, ord=2)
Out[59]: array([ 7.07106781, 10.09950494, 7.41619849, 27.67670501, 27.45906044])

Categories

Resources