Normalizing rows of a matrix python

Normalizing rows of a matrix python - python

Given a 2-dimensional array in python, I would like to normalize each row with the following norms:
Norm 1: L_1
Norm 2: L_2
Norm Inf: L_Inf
I have started this code:
from numpy import linalg as LA
X = np.array([[1, 2, 3, 6],
[4, 5, 6, 5],
[1, 2, 5, 5],
[4, 5,10,25],
[5, 2,10,25]])
print X.shape
x = np.array([LA.norm(v,ord=1) for v in X])
print x
Output:
(5, 4) # array dimension
[12 20 13 44 42] # L1 on each Row
How can I modify the code such that WITHOUT using LOOP, I can directly have the rows of the matrix normalized? (Given the norm values above)
I tried :
l1 = X.sum(axis=1)
print l1
print X/l1.reshape(5,1)
[12 20 13 44 42]
[[0 0 0 0]
[0 0 0 0]
[0 0 0 0]
[0 0 0 0]
[0 0 0 0]]
but the output is zero.

This is the L₁ norm:
>>> np.abs(X).sum(axis=1)
array([12, 20, 13, 44, 42])
This is the L₂ norm:
>>> np.sqrt((X * X).sum(axis=1))
array([ 7.07106781, 10.09950494, 7.41619849, 27.67670501, 27.45906044])
This is the L∞ norm:
>>> np.abs(X).max(axis=1)
array([ 6, 6, 5, 25, 25])
To normalise rows, just divide by the norm. For example, using L₂ normalisation:
>>> l2norm = np.sqrt((X * X).sum(axis=1))
>>> X / l2norm.reshape(5,1)
array([[ 0.14142136, 0.28284271, 0.42426407, 0.84852814],
[ 0.39605902, 0.49507377, 0.59408853, 0.49507377],
[ 0.13483997, 0.26967994, 0.67419986, 0.67419986],
[ 0.14452587, 0.18065734, 0.36131469, 0.90328672],
[ 0.18208926, 0.0728357 , 0.36417852, 0.9104463 ]])
>>> np.sqrt((_ * _).sum(axis=1))
array([ 1., 1., 1., 1., 1.])
More direct is the norm method in numpy.linalg, if you have it available:
>>> from numpy.linalg import norm
>>> norm(X, axis=1, ord=1) # L-1 norm
array([12, 20, 13, 44, 42])
>>> norm(X, axis=1, ord=2) # L-2 norm
array([ 7.07106781, 10.09950494, 7.41619849, 27.67670501, 27.45906044])
>>> norm(X, axis=1, ord=np.inf) # L-∞ norm
array([ 6, 6, 5, 25, 25])
(after OP edit): You saw zero values because / is an integer division in Python 2.x. Either upgrade to Python 3, or change dtype to float to avoid that integer division:
>>> linfnorm = norm(X, axis=1, ord=np.inf)
>>> X.astype(np.float) / linfnorm[:,None]
array([[ 0.16666667, 0.33333333, 0.5 , 1. ],
[ 0.66666667, 0.83333333, 1. , 0.83333333],
[ 0.2 , 0.4 , 1. , 1. ],
[ 0.16 , 0.2 , 0.4 , 1. ],
[ 0.2 , 0.08 , 0.4 , 1. ]])

You can pass axis=1 parameter:
In [58]: LA.norm(X, axis=1, ord=1)
Out[58]: array([12, 20, 13, 44, 42])
In [59]: LA.norm(X, axis=1, ord=2)
Out[59]: array([ 7.07106781, 10.09950494, 7.41619849, 27.67670501, 27.45906044])

Related

normalize the rows of numpy array based on a custom function

I have an numpy array. I want to normalized each rows based on this formula
x_norm = (x-x_min)/(x_max-x_min)
, where x_min is the minimum of each row and x_max is the maximum of each row. Here is a simple example:
a = np.array(
[[0, 1 ,2],
[2, 4 ,7],
[6, 10,5]
])
and desired output:
a = np.array([
[0, 0.5 ,1],
[0, 0.4 ,1],
[0.2, 1 ,0]
])
Thank you

IIUC, you can use raw numpy operations:
x = np.array(
[[0, 1 ,2],
[2, 4 ,7],
[6, 10,5]
])
x_norm = ((x.T-x.min(1))/(x.max(1)-x.min(1))).T
# OR
x_norm = (x-x.min(1)[:,None])/(x.max(1)-x.min(1))[:,None]
output:
array([[0. , 0.5, 1. ],
[0. , 0.4, 1. ],
[0.2, 1. , 0. ]])
NB. if efficiency matters, save the result of x.min(1) in a variable as it is used twice

You could use np.apply_along_axis
a = np.array(
[[0, 1 ,2],
[2, 4 ,7],
[6, 10,5]
])
def scaler(x):
return (x-x.min())/(x.max()-x.min())
np.apply_along_axis(scaler, axis=1, arr=a)
Output:
array([[0. , 0.5, 1. ],
[0. , 0.4, 1. ],
[0.2, 1. , 0. ]])

How to row-normalize a feature matrix? Broadcasting error

I have a feature matrix that I want to row normalize.
This is what I have done based on min-max scaling and I am getting an error. Can anyone help me with this error.
a = np.random.randint(10, size=(4,5))
s=a.max(axis=1) - a.min(axis=1)
np.amax(a,axis=1)
print(s)
(a - a.min(axis=1))/(a.max(axis=1) - a.min(axis=1))\
>>[7 6 4 5]
4 print(s)
5
----> 6 (a - a.min(axis=1))/(a.max(axis=1) - a.min(axis=1))
ValueError: operands could not be broadcast together with shapes (4,5) (4,)

Try to work with transposed matrix:
b = a.T
m = (b - b.min(axis=0)) / (b.max(axis=0) - b.min(axis=0))
m = m.T
>>> a
array([[2, 3, 2, 8, 3], # min=2 -> 0, max=8 -> 1
[3, 3, 9, 2, 1], # min=1 -> 0, max=9 -> 1
[1, 9, 8, 4, 7], # min=1 -> 0, max=9 -> 1
[6, 8, 7, 9, 4]]) # min=4 -> 0, max=9 -> 1
>>> m
array([[0. , 0.16666667, 0. , 1. , 0.16666667],
[0.25 , 0.25 , 1. , 0.125 , 0. ],
[0. , 1. , 0.875 , 0.375 , 0.75 ],
[0.4 , 0.8 , 0.6 , 1. , 0. ]])

I have an alternative solution , I am not sure if this one is correct.Would be great if someone can comment on it.
def row_normalize(mf):
row_sums = np.array(mf.sum(1))
new_matrix = mf / row_sums[:, np.newaxis]
return new_matrix

How to simply pass weights while np.average()

I am confused about passing weights into np.average() function. Example below:
import numpy as np
weights = [0.35, 0.05, 0.6]
abc = list()
a = [[ 0.5, 1],
[ 5, 7],
[ 3, 8]]
b = [[ 10, 1],
[ 0.5, 1],
[ 0.7, 0.2]]
c = [[ 10, 12],
[ 0.5, 13],
[ 5, 0.7]]
abc.append(a)
abc.append(b)
abc.append(c)
print(np.average(np.array(abc), weights=[weights], axis=0))
OUT:
TypeError: 1D weights expected when shapes of a and weights differ.
I know that shapes differ, but how to add simply list of weights without doing
np.average(np.array(abc), weights=[weights[0], weights[1], weights[2]], ..., axis=0)
because i am performing a loop, where weights differ with size up to 30.
Output: Weighted array like this:
OUT:
[[6.675, 7.6],
[ 2.075, 10.3],
[ 4.085, 3.23]]
*average(a * weights[0] + b * weights[1] + c * weights[2])*
Welcoming any other solution.

Not sure how the first element can be 4.675?
weights = [0.35, 0.05, 0.6]
a = [[ 0.5, 1],
[ 5, 7],
[ 3, 8]]
b = [[ 10, 1],
[ 0.5, 1],
[ 0.7, 0.2]]
c = [[ 10, 12],
[ 0.5, 13],
[ 5, 0.7]]
abc=[a, b, c]
print(np.average(np.array(abc), weights=weights,axis=0))

Your abc array has shape (1, 3, 3, 2). So either change axis=1 or use abc = [a, b, c] like #BingWang suggested.

how do i correctly handle a multi dimensional numpy array

I'm a Python newbie and struggling a bit with multi dimensional arrays in a for loop. What I have is:
CLASSES = ["background", "aeroplane", "bicycle", "bird", "boat",
"bottle", "bus", "car", "cat", "chair", "cow", "diningtable",
"dog", "horse", "motorbike", "person", "pottedplant", "sheep",
"sofa", "train", "tvmonitor"]
...
...
idxs = np.argsort(preds[0])[::-1][:5]
print(idxs)
#loop over top 5 predictions & display them
for (i, idx) in enumerate(idxs):
# draw the top prediction on the input image
print (idx)
if i == 0:
print (preds)
text = "Label: {}, {:.2f}%".format(CLASSES[idx], preds[0][idx] * 100)
cv2.putText(frame, text, (5, 25), cv2.FONT_HERSHEY_SIMPLEX,
0.7, (0, 0, 255), 2)
# display the predicted label + associated probability to the
# console
print("[INFO] {}. label: {}, probability: {:.5}".format(i + 1,CLASSES[idx], preds[0][idx]))
and I get something like:
[[[ 0. 7. 0.3361728 0.2269333 0.6589312
0.70067763 0.8960621 ]
[ 0. 15. 0.44955394 0.5509065 0.4315516
0.6530549 0.7223625 ]]]
[[[0 3 2 4 5 6 1]
[0 4 2 3 5 6 1]]]
[[0 3 2 4 5 6 1]
[0 4 2 3 5 6 1]]
[[[[ 0. 7. 0.3361728 0.2269333 0.6589312
0.70067763 0.8960621 ]
[ 0. 15. 0.44955394 0.5509065 0.4315516
0.6530549 0.7223625 ]]]]
Traceback (most recent call last):
File "real_time_object_detection.py", line 80, in <module>
text = "Label: {}, {:.2f}%".format(CLASSES[idx], preds[0][idx] * 100)
TypeError: only integer scalar arrays can be converted to a scalar index
I've copied this code from https://www.pyimagesearch.com/2017/08/21/deep-learning-with-opencv/ but it looks like I'm doing something wrong as idx should be an int but instead is an array
UPDATE:
I tried to figure out what's going on here but I got stuck with the following: why do all argsort calls give the same result? :o
>>> preds[0] = [[[ 0., 7., 0.3361728, 0.2269333, 0.6589312,0.70067763, 0.8960621 ],[ 0., 15., 0.44955394, 0.5509065, 0.4315516,0.6530549, 0.7223625 ]]]
>>> print(preds[0])
[[[0.0, 7.0, 0.3361728, 0.2269333, 0.6589312, 0.70067763, 0.8960621], [0.0, 15.0, 0.44955394, 0.5509065, 0.4315516, 0.6530549, 0.7223625]]]
>>> import numpy as np
>>> np.argsort(preds[0])
array([[[0, 3, 2, 4, 5, 6, 1],
[0, 4, 2, 3, 5, 6, 1]]])
>>> np.argsort(preds[0])[::-1]
array([[[0, 3, 2, 4, 5, 6, 1],
[0, 4, 2, 3, 5, 6, 1]]])
>>> np.argsort(preds[0])[::-1][:5]
array([[[0, 3, 2, 4, 5, 6, 1],
[0, 4, 2, 3, 5, 6, 1]]])
Plus why does it seem to alter the data, should it not just sort it?

Your preds[0], assigned to a variable name is a 3d array:
In [449]: preds0 = np.array([[[ 0., 7., 0.3361728, 0.2269333
...: , 0.6589312,0.70067763, 0.8960621 ],[ 0., 15., 0.4
...: 4955394, 0.5509065, 0.4315516,0.6530549, 0.7223625 ]]])
In [450]: preds0.shape
Out[450]: (1, 2, 7)
argsort applied to that is an array of the same shape:
In [451]: np.argsort(preds0)
Out[451]:
array([[[0, 3, 2, 4, 5, 6, 1],
[0, 4, 2, 3, 5, 6, 1]]])
In [452]: _.shape
Out[452]: (1, 2, 7)
With that size 1 initial dimension, not amount of reversing or slicing on that dimension makes a difference. I suspect you wanted to reverse and slice the last dimension, the size 7 one. BUT, be careful about that. The argsort of a multidimensional array, even when applied to one dimension (the default last), is a hard thing to understand, and to use.
The shape matches the array, but the values are the range of 0-6, the last dimension. numpy 1.15 added a couple of functions to make it easier to use the result of argsort (and some other functions):
In [455]: np.take_along_axis(preds0, Out[451], axis=-1)
Out[455]:
array([[[ 0. , 0.2269333 , 0.3361728 , 0.6589312 ,
0.70067763, 0.8960621 , 7. ],
[ 0. , 0.4315516 , 0.44955394, 0.5509065 ,
0.6530549 , 0.7223625 , 15. ]]])
Notice that rows are now sorted, same as produced by np.sort(preds0, axis=-1).
I could pick one 'row' of the index array:
In [459]: idxs = Out[451]
In [461]: idx = idxs[0,0]
In [462]: idx
Out[462]: array([0, 3, 2, 4, 5, 6, 1])
In [463]: idx[::-1] # reverse
Out[463]: array([1, 6, 5, 4, 2, 3, 0])
In [464]: idx[::-1][:5] # select
Out[464]: array([1, 6, 5, 4, 2])
In [465]: preds0[0,0,Out[464]]
Out[465]: array([7. , 0.8960621 , 0.70067763, 0.6589312 , 0.3361728 ])
Now I have the five largest values of preds0[0,0,:] in reverse order.
And to do it to the whole preds0 array:
np.take_along_axis(preds0, idxs[:,:,::-1][:,:,:5], axis=-1)
or for earlier versions:
preds0[[0], [[0],[1]], idxs[:,:,::-1][:,:,:5]]

How to sample a numpy array and perform computation on each sample efficiently?

Assume I have a 1d array, what I want is to sample with a moving window and within the window divide each element by the first element.
For example if I have [2, 5, 8, 9, 6] and a window size of 3, the result will be
[[1, 2.5, 4],
[1, 1.6, 1.8],
[1, 1.125, 0.75]].
What I'm doing now is basically a for loop
import numpy as np
arr = np.array([2., 5., 8., 9., 6.])
window_size = 3
for i in range(len(arr) - window_size + 1):
result.append(arr[i : i + window_size] / arr[i])
etc.
When the array is large it is quite slow, I wonder whether there's better ways? I guess there is no way around the O(n^2) complexity, but perhaps numpy has some optimizations that I don't know of.

Here's a vectorized approach using broadcasting -
N = 3 # Window size
nrows = a.size-N+1
a2D = a[np.arange(nrows)[:,None] + np.arange(N)]
out = a2D/a[:nrows,None].astype(float)
We can also use NumPy strides for a more efficient extraction of sliding windows, like so -
n = a.strides[0]
a2D = np.lib.stride_tricks.as_strided(a,shape=(nrows,N),strides=(n,n))
Sample run -
In [73]: a
Out[73]: array([4, 9, 3, 6, 5, 7, 2])
In [74]: N = 3
...: nrows = a.size-N+1
...: a2D = a[np.arange(nrows)[:,None] + np.arange(N)]
...: out = a2D/a[:nrows,None].astype(float)
...:
In [75]: out
Out[75]:
array([[ 1. , 2.25 , 0.75 ],
[ 1. , 0.33333333, 0.66666667],
[ 1. , 2. , 1.66666667],
[ 1. , 0.83333333, 1.16666667],
[ 1. , 1.4 , 0.4 ]])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Normalizing rows of a matrix python - python

You can pass axis=1 parameter: In [58]: LA.norm(X, axis=1, ord=1) Out[58]: array([12, 20, 13, 44, 42]) In [59]: LA.norm(X, axis=1, ord=2) Out[59]: array([ 7.07106781, 10.09950494, 7.41619849, 27.67670501, 27.45906044])

Related

normalize the rows of numpy array based on a custom function

How to row-normalize a feature matrix? Broadcasting error

How to simply pass weights while np.average()

how do i correctly handle a multi dimensional numpy array

How to sample a numpy array and perform computation on each sample efficiently?

Categories

Resources