How to simply pass weights while np.average()

How to simply pass weights while np.average() - python

I am confused about passing weights into np.average() function. Example below:
import numpy as np
weights = [0.35, 0.05, 0.6]
abc = list()
a = [[ 0.5, 1],
[ 5, 7],
[ 3, 8]]
b = [[ 10, 1],
[ 0.5, 1],
[ 0.7, 0.2]]
c = [[ 10, 12],
[ 0.5, 13],
[ 5, 0.7]]
abc.append(a)
abc.append(b)
abc.append(c)
print(np.average(np.array(abc), weights=[weights], axis=0))
OUT:
TypeError: 1D weights expected when shapes of a and weights differ.
I know that shapes differ, but how to add simply list of weights without doing
np.average(np.array(abc), weights=[weights[0], weights[1], weights[2]], ..., axis=0)
because i am performing a loop, where weights differ with size up to 30.
Output: Weighted array like this:
OUT:
[[6.675, 7.6],
[ 2.075, 10.3],
[ 4.085, 3.23]]
*average(a * weights[0] + b * weights[1] + c * weights[2])*
Welcoming any other solution.

Not sure how the first element can be 4.675?
weights = [0.35, 0.05, 0.6]
a = [[ 0.5, 1],
[ 5, 7],
[ 3, 8]]
b = [[ 10, 1],
[ 0.5, 1],
[ 0.7, 0.2]]
c = [[ 10, 12],
[ 0.5, 13],
[ 5, 0.7]]
abc=[a, b, c]
print(np.average(np.array(abc), weights=weights,axis=0))

Your abc array has shape (1, 3, 3, 2). So either change axis=1 or use abc = [a, b, c] like #BingWang suggested.

Related

Selectively set values in numpy array (or set on condition)

a = np.array([[0, 2, 0, 0], [0, 1, 3, 0], [0, 0, 10, 11], [0, 0, 1, 7]])
array([[ 0, 2, 0, 0],
[ 0, 1, 3, 0],
[ 0, 0, 10, 11],
[ 0, 0, 1, 7]])
There are 0 entries in each row. I need to assign a value to each of these zero entries, where the value is calculated as follows:
V = 0.1 * Si / Ni
where Si is the sum of row i
Ni is the number of zero entries in row i
I can calculate Si and Ni fairly easy:
S = np.sum(a, axis=1)
array([ 2, 4, 21, 8])
N = np.count_nonzero(a == 0, axis=1)
array([3, 2, 2, 2])
Now, V is calculated as:
V = 0.1 * S/N
array([0.06666667, 0.2 , 1.05 , 0.4 ])
But how do I assign V[i] to a zero entry in i-th row? So I'm expecting to get the following array a:
array([[ 0.06666667, 2, 0.06666667, 0.06666667],
[ 0.2, 1, 3, 0.2],
[ 1.05, 1.05, 10, 11],
[ 0.4, 0.4, 1, 7]])
I need some kind of selective broadcasting operation or assignment?

Use np.where
np.where(a == 0, v.reshape(-1, 1), a)
array([[ 0.06666667, 2. , 0.06666667, 0.06666667],
[ 0.2 , 1. , 3. , 0.2 ],
[ 1.05 , 1.05 , 10. , 11. ],
[ 0.4 , 0.4 , 1. , 7. ]])

Here's a way using np.where:
z = a == 0
np.where(z, (0.1*a.sum(1)/z.sum(1))[:,None], a)
array([[ 0.06666667, 2. , 0.06666667, 0.06666667],
[ 0.2 , 1. , 3. , 0.2 ],
[ 1.05 , 1.05 , 10. , 11. ],
[ 0.4 , 0.4 , 1. , 7. ]])

Maybe using a mask:
for i in range(V.size):
print((a[i,:] == 0) * V[i] + a[i,:])

how do i correctly handle a multi dimensional numpy array

I'm a Python newbie and struggling a bit with multi dimensional arrays in a for loop. What I have is:
CLASSES = ["background", "aeroplane", "bicycle", "bird", "boat",
"bottle", "bus", "car", "cat", "chair", "cow", "diningtable",
"dog", "horse", "motorbike", "person", "pottedplant", "sheep",
"sofa", "train", "tvmonitor"]
...
...
idxs = np.argsort(preds[0])[::-1][:5]
print(idxs)
#loop over top 5 predictions & display them
for (i, idx) in enumerate(idxs):
# draw the top prediction on the input image
print (idx)
if i == 0:
print (preds)
text = "Label: {}, {:.2f}%".format(CLASSES[idx], preds[0][idx] * 100)
cv2.putText(frame, text, (5, 25), cv2.FONT_HERSHEY_SIMPLEX,
0.7, (0, 0, 255), 2)
# display the predicted label + associated probability to the
# console
print("[INFO] {}. label: {}, probability: {:.5}".format(i + 1,CLASSES[idx], preds[0][idx]))
and I get something like:
[[[ 0. 7. 0.3361728 0.2269333 0.6589312
0.70067763 0.8960621 ]
[ 0. 15. 0.44955394 0.5509065 0.4315516
0.6530549 0.7223625 ]]]
[[[0 3 2 4 5 6 1]
[0 4 2 3 5 6 1]]]
[[0 3 2 4 5 6 1]
[0 4 2 3 5 6 1]]
[[[[ 0. 7. 0.3361728 0.2269333 0.6589312
0.70067763 0.8960621 ]
[ 0. 15. 0.44955394 0.5509065 0.4315516
0.6530549 0.7223625 ]]]]
Traceback (most recent call last):
File "real_time_object_detection.py", line 80, in <module>
text = "Label: {}, {:.2f}%".format(CLASSES[idx], preds[0][idx] * 100)
TypeError: only integer scalar arrays can be converted to a scalar index
I've copied this code from https://www.pyimagesearch.com/2017/08/21/deep-learning-with-opencv/ but it looks like I'm doing something wrong as idx should be an int but instead is an array
UPDATE:
I tried to figure out what's going on here but I got stuck with the following: why do all argsort calls give the same result? :o
>>> preds[0] = [[[ 0., 7., 0.3361728, 0.2269333, 0.6589312,0.70067763, 0.8960621 ],[ 0., 15., 0.44955394, 0.5509065, 0.4315516,0.6530549, 0.7223625 ]]]
>>> print(preds[0])
[[[0.0, 7.0, 0.3361728, 0.2269333, 0.6589312, 0.70067763, 0.8960621], [0.0, 15.0, 0.44955394, 0.5509065, 0.4315516, 0.6530549, 0.7223625]]]
>>> import numpy as np
>>> np.argsort(preds[0])
array([[[0, 3, 2, 4, 5, 6, 1],
[0, 4, 2, 3, 5, 6, 1]]])
>>> np.argsort(preds[0])[::-1]
array([[[0, 3, 2, 4, 5, 6, 1],
[0, 4, 2, 3, 5, 6, 1]]])
>>> np.argsort(preds[0])[::-1][:5]
array([[[0, 3, 2, 4, 5, 6, 1],
[0, 4, 2, 3, 5, 6, 1]]])
Plus why does it seem to alter the data, should it not just sort it?

Your preds[0], assigned to a variable name is a 3d array:
In [449]: preds0 = np.array([[[ 0., 7., 0.3361728, 0.2269333
...: , 0.6589312,0.70067763, 0.8960621 ],[ 0., 15., 0.4
...: 4955394, 0.5509065, 0.4315516,0.6530549, 0.7223625 ]]])
In [450]: preds0.shape
Out[450]: (1, 2, 7)
argsort applied to that is an array of the same shape:
In [451]: np.argsort(preds0)
Out[451]:
array([[[0, 3, 2, 4, 5, 6, 1],
[0, 4, 2, 3, 5, 6, 1]]])
In [452]: _.shape
Out[452]: (1, 2, 7)
With that size 1 initial dimension, not amount of reversing or slicing on that dimension makes a difference. I suspect you wanted to reverse and slice the last dimension, the size 7 one. BUT, be careful about that. The argsort of a multidimensional array, even when applied to one dimension (the default last), is a hard thing to understand, and to use.
The shape matches the array, but the values are the range of 0-6, the last dimension. numpy 1.15 added a couple of functions to make it easier to use the result of argsort (and some other functions):
In [455]: np.take_along_axis(preds0, Out[451], axis=-1)
Out[455]:
array([[[ 0. , 0.2269333 , 0.3361728 , 0.6589312 ,
0.70067763, 0.8960621 , 7. ],
[ 0. , 0.4315516 , 0.44955394, 0.5509065 ,
0.6530549 , 0.7223625 , 15. ]]])
Notice that rows are now sorted, same as produced by np.sort(preds0, axis=-1).
I could pick one 'row' of the index array:
In [459]: idxs = Out[451]
In [461]: idx = idxs[0,0]
In [462]: idx
Out[462]: array([0, 3, 2, 4, 5, 6, 1])
In [463]: idx[::-1] # reverse
Out[463]: array([1, 6, 5, 4, 2, 3, 0])
In [464]: idx[::-1][:5] # select
Out[464]: array([1, 6, 5, 4, 2])
In [465]: preds0[0,0,Out[464]]
Out[465]: array([7. , 0.8960621 , 0.70067763, 0.6589312 , 0.3361728 ])
Now I have the five largest values of preds0[0,0,:] in reverse order.
And to do it to the whole preds0 array:
np.take_along_axis(preds0, idxs[:,:,::-1][:,:,:5], axis=-1)
or for earlier versions:
preds0[[0], [[0],[1]], idxs[:,:,::-1][:,:,:5]]

Normalizing rows of a matrix python

Given a 2-dimensional array in python, I would like to normalize each row with the following norms:
Norm 1: L_1
Norm 2: L_2
Norm Inf: L_Inf
I have started this code:
from numpy import linalg as LA
X = np.array([[1, 2, 3, 6],
[4, 5, 6, 5],
[1, 2, 5, 5],
[4, 5,10,25],
[5, 2,10,25]])
print X.shape
x = np.array([LA.norm(v,ord=1) for v in X])
print x
Output:
(5, 4) # array dimension
[12 20 13 44 42] # L1 on each Row
How can I modify the code such that WITHOUT using LOOP, I can directly have the rows of the matrix normalized? (Given the norm values above)
I tried :
l1 = X.sum(axis=1)
print l1
print X/l1.reshape(5,1)
[12 20 13 44 42]
[[0 0 0 0]
[0 0 0 0]
[0 0 0 0]
[0 0 0 0]
[0 0 0 0]]
but the output is zero.

This is the L₁ norm:
>>> np.abs(X).sum(axis=1)
array([12, 20, 13, 44, 42])
This is the L₂ norm:
>>> np.sqrt((X * X).sum(axis=1))
array([ 7.07106781, 10.09950494, 7.41619849, 27.67670501, 27.45906044])
This is the L∞ norm:
>>> np.abs(X).max(axis=1)
array([ 6, 6, 5, 25, 25])
To normalise rows, just divide by the norm. For example, using L₂ normalisation:
>>> l2norm = np.sqrt((X * X).sum(axis=1))
>>> X / l2norm.reshape(5,1)
array([[ 0.14142136, 0.28284271, 0.42426407, 0.84852814],
[ 0.39605902, 0.49507377, 0.59408853, 0.49507377],
[ 0.13483997, 0.26967994, 0.67419986, 0.67419986],
[ 0.14452587, 0.18065734, 0.36131469, 0.90328672],
[ 0.18208926, 0.0728357 , 0.36417852, 0.9104463 ]])
>>> np.sqrt((_ * _).sum(axis=1))
array([ 1., 1., 1., 1., 1.])
More direct is the norm method in numpy.linalg, if you have it available:
>>> from numpy.linalg import norm
>>> norm(X, axis=1, ord=1) # L-1 norm
array([12, 20, 13, 44, 42])
>>> norm(X, axis=1, ord=2) # L-2 norm
array([ 7.07106781, 10.09950494, 7.41619849, 27.67670501, 27.45906044])
>>> norm(X, axis=1, ord=np.inf) # L-∞ norm
array([ 6, 6, 5, 25, 25])
(after OP edit): You saw zero values because / is an integer division in Python 2.x. Either upgrade to Python 3, or change dtype to float to avoid that integer division:
>>> linfnorm = norm(X, axis=1, ord=np.inf)
>>> X.astype(np.float) / linfnorm[:,None]
array([[ 0.16666667, 0.33333333, 0.5 , 1. ],
[ 0.66666667, 0.83333333, 1. , 0.83333333],
[ 0.2 , 0.4 , 1. , 1. ],
[ 0.16 , 0.2 , 0.4 , 1. ],
[ 0.2 , 0.08 , 0.4 , 1. ]])

You can pass axis=1 parameter:
In [58]: LA.norm(X, axis=1, ord=1)
Out[58]: array([12, 20, 13, 44, 42])
In [59]: LA.norm(X, axis=1, ord=2)
Out[59]: array([ 7.07106781, 10.09950494, 7.41619849, 27.67670501, 27.45906044])

How do I sort the rows of a 2d numpy array based on indices given by another 2d numpy array

Example:
arr = np.array([[.5, .25, .19, .05, .01],[.25, .5, .19, .05, .01],[.5, .25, .19, .05, .01]])
print(arr)
[[ 0.5 0.25 0.19 0.05 0.01]
[ 0.25 0.5 0.19 0.05 0.01]
[ 0.5 0.25 0.19 0.05 0.01]]
idxs = np.argsort(arr)
print(idxs)
[[4 3 2 1 0]
[4 3 2 0 1]
[4 3 2 1 0]]
How can I use idxs to index arr? I want to do something like arr[idxs], but this does not work.

It's not the prettiest, but I think something like
>>> arr[np.arange(len(arr))[:,None], idxs]
array([[ 0.01, 0.05, 0.19, 0.25, 0.5 ],
[ 0.01, 0.05, 0.19, 0.25, 0.5 ],
[ 0.01, 0.05, 0.19, 0.25, 0.5 ]])
should work. The first term gives the x coordinates we want (using broadcasting over the last singleton axis):
>>> np.arange(len(arr))[:,None]
array([[0],
[1],
[2]])
with idxs providing the y coordinates. Note that if we had used unravel_index, the x coordinates to use would always have been 0 instead:
>>> np.unravel_index(idxs, arr.shape)[0]
array([[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]])

How about something like this:
I changed variables to make the example more clear, but you basically need to index by two 2D arrays.
In [102]: a = np.array([[1,2,3], [4,5,6]])
In [103]: b = np.array([[0,2,1], [2,1,0]])
In [104]: temp = np.repeat(np.arange(a.shape[0]), a.shape[1]).reshape(a.shape).T
# temp is just [[0,1], [0,1], [0,1]]
# probably can be done more elegantly
In [105]: a[temp, b.T].T
Out[105]:
array([[1, 3, 2],
[6, 5, 4]])

counts with 2 variables

In a research study I have 2 variables:
x = number objects remembered
y = % tasks completed correctly
as follows:
x = np.array([2,2,2,3,3,3,3,3,4,4,4,4,4,4,5,5,5,5,5,6,6,6,6,7,7])
y = np.array([1.0, 1.0, 1.0, 0.33, 0.33, 0.66, 0.66, 1.0, 1.0, 1.0, 1.0, 0.75, 0.75, 0.5, 1.0, 1.0, 0.6, 0.4, 0.5,0.75, 1.0,1.0,0.6,0.5,0.75])
I would like to return the result of the number of:
WMC Percent Count
2 100 3
3 33 2
3 66 2 etc.
I note the scipy.stats.itemfreq and np.bincounts only work for one variable.

If you have access to a recent version of numpy (1.9.0 or higher) you can use unique with the return_counts flag enabled. That will give you 2 arrays, one with values and one with the counts.
Here's a slightly modified version of the numpy.unique method which works for your case:
def unique(ar):
ar = ar[np.lexsort((ar[:, 1], ar[:, 0]))]
flag = np.concatenate(([True], (ar[1:] != ar[:-1]).any(axis=1)))
idx = np.concatenate(np.nonzero(flag) + ([ar.size / 2],))
return np.array(zip(ar[flag][:, 0], ar[flag][:, 1], np.diff(idx)))
print unique(np.array(zip(x, y)))
Result:
[[ 2. 1. 3. ]
[ 3. 0.33 2. ]
[ 3. 0.66 2. ]
[ 3. 1. 1. ]
[ 4. 0.5 1. ]
[ 4. 0.75 2. ]
[ 4. 1. 3. ]
[ 5. 0.4 1. ]
[ 5. 0.5 1. ]
[ 5. 0.6 1. ]
[ 5. 1. 2. ]
[ 6. 0.6 1. ]
[ 6. 0.75 1. ]
[ 6. 1. 2. ]
[ 7. 0.5 1. ]
[ 7. 0.75 1. ]]

Earlier on in your code why not construct a dictionary linking 'number objects remembered' to '% tasks completed correctly'?
i.e.
completed_tasks = {2 : 1.0, 3 : 33, 4 : 66}
then, you can easily add the completed tasks count to the array that is returned by scipy.stats.itemfreq:
a = scipy.stats.itemfreq(x)
a = [i.append(completed_tasks[i[0]]) for i in a]

I would use collections.Counter for that purpose:
>>> import numpy as np
>>> x = np.array([2,2,2,3,3,3,3,3,4,4,4,4,4,4,5,5,5,5,5,6,6,6,6,7,7])
>>> y = np.array([1.0, 1.0, 1.0, 0.33, 0.33, 0.66, 0.66, 1.0, 1.0, 1.0, 1.0, 0.75, 0.75, 0.5, 1.0, 1.0, 0.6, 0.4, 0.5,0.75, 1.0,1.0,0.6,0.5,0.75])
>>> from collections import Counter
>>> c = Counter(zip(x,y))
>>> c
Counter({(2, 1.0): 3, (4, 1.0): 3, (3, 0.66000000000000003): 2, (5, 1.0): 2, (3, 0.33000000000000002): 2, (6, 1.0): 2, (4, 0.75): 2, (7, 0.5): 1, (6, 0.59999999999999998): 1, (5, 0.40000000000000002): 1, (5, 0.59999999999999998): 1, (3, 1.0): 1, (7, 0.75): 1, (6, 0.75): 1, (5, 0.5): 1, (4, 0.5): 1})

Not sure if it is suitable in your case, however, you can do this using itertools.groupby() on the zipped lists:
import numpy as np
from itertools import groupby
x = np.array([2,2,2,3,3,3,3,3,4,4,4,4,4,4,5,5,5,5,5,6,6,6,6,7,7])
y = np.array([1.0, 1.0, 1.0, 0.33, 0.33, 0.66, 0.66, 1.0, 1.0, 1.0, 1.0, 0.75, 0.75, 0.5, 1.0, 1.0, 0.6, 0.4, 0.5,0.75, 1.0,1.0,0.6,0.5,0.75])
print "WMC\tPercent\tCount"
for key, group in groupby(sorted(zip(x, y))):
print "{}\t{}\t{}".format(key[0], int(key[1]*100), len(list(group)))
Output
WMC Percent Count
2 100 3
3 33 2
3 66 2
3 100 1
4 100 3
4 75 2
4 50 1
5 100 2
5 60 1
5 40 1
5 50 1
6 75 1
6 100 2
6 60 1
7 50 1
7 75 1
Updated to produce numpy array
import numpy as np
from itertools import groupby
x = np.array([2,2,2,3,3,3,3,3,4,4,4,4,4,4,5,5,5,5,5,6,6,6,6,7,7])
y = np.array([1.0, 1.0, 1.0, 0.33, 0.33, 0.66, 0.66, 1.0, 1.0, 1.0, 1.0, 0.75, 0.75, 0.5, 1.0, 1.0, 0.6, 0.4, 0.5,0.75, 1.0,1.0,0.6,0.5,0.75])
results = np.array([(key[0], int(key[1]*100), len(list(group)))
for key, group in groupby(sorted(zip(x, y)))])
Output
>>> results
array([[ 2, 100, 3],
[ 3, 33, 2],
[ 3, 66, 2],
[ 3, 100, 1],
[ 4, 50, 1],
[ 4, 75, 2],
[ 4, 100, 3],
[ 5, 40, 1],
[ 5, 50, 1],
[ 5, 60, 1],
[ 5, 100, 2],
[ 6, 60, 1],
[ 6, 75, 1],
[ 6, 100, 2],
[ 7, 50, 1],
[ 7, 75, 1]])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to simply pass weights while np.average() - python

Not sure how the first element can be 4.675? weights = [0.35, 0.05, 0.6] a = [[ 0.5, 1], [ 5, 7], [ 3, 8]] b = [[ 10, 1], [ 0.5, 1], [ 0.7, 0.2]] c = [[ 10, 12], [ 0.5, 13], [ 5, 0.7]] abc=[a, b, c] print(np.average(np.array(abc), weights=weights,axis=0))

Your abc array has shape (1, 3, 3, 2). So either change axis=1 or use abc = [a, b, c] like #BingWang suggested.

Related

Selectively set values in numpy array (or set on condition)

how do i correctly handle a multi dimensional numpy array

Normalizing rows of a matrix python

How do I sort the rows of a 2d numpy array based on indices given by another 2d numpy array

counts with 2 variables

Categories

Resources