Scenario:
I want to create a majority vote system based that takes into account the weight of someone's vote about N observations.
So, M observers will give their guess about N observations, selecting from 3 classes (1,2,3). For each observation, each observer will have a weight associated with it.
Defining:
G: Matrix of guesses per observation / observer (N observations × M observers);
W: Weights for each observation / observer (N observations × M observers)
Example:
# 2 observations, 3 observers
G = [[1, 2, 3],
[2, 2, 1]]
# Weights (influence) each observer has about each observation
W = [[0.1, 0.2, 0.3],
[0.3, 0.1, 0.2]]
I need to compute another matrix with shape (N observations × C classes) that stores the probability of an observation comes from an specific class.
Example using values above:
G = [[1, 2, 3],
[2, 2, 1]]
W = [[0.1, 0.2, 0.3],
[0.3, 0.1, 0.2]]
P = [[0.1, 0.2, 0.3],
[0.2, (0.3 + 0.1), 0]]
After computing the P matrix, I could apply np.argmax() row-wise to get the column (class) with highest value:
P = [[0.1, 0.2, 0.3], #class 3 has highest value (0.3)
[0.2, 0.4, 0]] #class 2 has highest value (0.4)
result = [3, 2]
I would like to know how can I combine G and W to generate the P matrix.
You can get the job done in a vectorized manner by using NumPy's indices and advanced indexing:
In [569]: import numpy as np
In [570]: G = np.array([[1, 2, 3], [2, 2, 1]] )
In [571]: W = np.array([[0.1, 0.2, 0.3], [0.3, 0.1, 0.2]])
In [572]: C = 3
In [573]: M, N = G.shape
In [574]: row, col = np.indices((M, N))
In [575]: P3d = np.zeros(shape=(M, N, C))
In [576]: P3d[row, col, G-1] = W
In [577]: P = P3d.sum(axis=1)
In [578]: P
Out[578]:
array([[0.1, 0.2, 0.3],
[0.2, 0.4, 0. ]])
Initialize P with zero values then iterate by observations/rows of G and value of index i.e g[observation][index] if class 1 then add weight[observation][index] from W matrix to P[observation][class]+=weight[observation][index]. i.e in your sample testcase. for row 1. index 0 has value 1 and weight[0][0] is 0.1 so add 0.1 to row 0 and index[class] of P. similarly for index 2 and 3 value are same as index therefore same in P.
Now for row 2, index 1 has class 2 so we add weight of class 2 to p[2][class]+=0.3 and for index 2 class is again 2 so weight of that observer is 0.1 so again p[2][class]+=weight i.e 0.1. for last index class is 1 so p[2][class]+=weight now Our matrix is ready so use np.argmax() for required answer.
Related
I have a torch tensor a of shape (x, n) and another tensor b of shape (y, n) where y <= x. every column of b contains a sequence of row indices for a and what I would like to be able to do is to somehow index a with b such that I obtain a tensor of shape (y, n) in which the ith column contains a[:, i][b[:, i]] (not quite sure if that's the correct way to express it).
Here's an example (where x = 5, y = 3 and n = 4):
import torch
a = torch.Tensor(
[[0.1, 0.2, 0.3, 0.4],
[0.6, 0.7, 0.8, 0.9],
[1.1, 1.2, 1.3, 1.4],
[1.6, 1.7, 1.8, 1.9],
[2.1, 2.2, 2.3, 2.4]]
)
b = torch.LongTensor(
[[0, 3, 1, 2],
[2, 2, 2, 0],
[1, 1, 0, 4]]
)
# How do I get from a and b to c
# (so that I can also assign to those elements in a)?
c = torch.Tensor(
[[0.1, 1.7, 0.8, 1.4],
[1.1, 1.2, 1.3, 0.4],
[0.6, 0.7, 0.3, 2.4]]
)
I can't get my head around this. What I'm looking for is a method that will not yield the tensor c but also let me assign a tensor of the same shape as c to the elements of a which c is made up of.
I try to use index_select but it supports only 1-dim array for index.
bt = b.transpose(0, 1)
at = a.transpose(0, 1)
ct = [torch.index_select(at[i], dim=0, index=bt[i]) for i in range(len(at))]
c = torch.stack(ct).transpose(0, 1)
print(c)
"""
tensor([[0.1000, 1.7000, 0.8000, 1.4000],
[1.1000, 1.2000, 1.3000, 0.4000],
[0.6000, 0.7000, 0.3000, 2.4000]])
"""
It might be not the best solution, but hope this helps you at least.
I tried to define a function (tent map) as following:
def f(r, x):
return np.piecewise([r, x], [x < 0.5, x >= 0.5], [lambda r, x: 2*r*x, lambda r, x: 2*r*(1-x)])
And r, x will be numpy arrays:
no_r = 10001
r = np.linspace(0, 4, no_r)
x = np.random.rand(no_r)
I would like the result to be a numpy array matching the shapes of r and x, calculated using each pairs of elements of arrays r and x with the same indicies. For example if r = [0, 1, 2, 3] and x = [0.1, 0.7, 0.3, 1], the result should be [0, 0.6, 1.2, 0].
An error occured: "boolean index did not match indexed array along dimension 0; dimension is 2 but corresponding boolean dimension is 10001"
So what should I do to achieve the intended purpose?
what you want to get as result can be done with np.select such as:
def f(r, x):
return np.select([x < 0.5,x >= 0.5], [2*r*x, 2*r*(1-x)])
Then with
r = np.array([0, 1, 2, 3])
x = np.array([0.1, 0.7, 0.3, 1])
print (f(r,x))
[0. 0.6 1.2 0. ]
EDIT: in this case, with only 2 conditions that are exclusive, you can also use np.where:
def f(r,x):
return np.where(x<0.5,2*r*x, 2*r*(1-x))
will give the same result.
I have a numpy array of M*N dimensions in which each element of the array is a float with a value between 0-1.
Input: for simplicity purpose lets consider a 3*4 array:
a=np.array([
[0.1, 0.2, 0.3, 0.6],
[0.3, 0.4, 0.8, 0.7],
[0.5, 0.6, 0.2, 0.1]
])
I want to consider 3 columns at a time (say col 0,1,2 for first iteration and 1,2,3 for second) and get the maximum value of multiplication of all possible combinations of the 3 columns and also get the index of their respective values.
In this case I should get max value of 0.5*0.6*0.8=0.24 and the index of the rows of values that gave the max value: (2,2,1) in this case.
Output: [[0.24,(2,2,1)],[0.336,(2,1,1)]]
I can do this using loops but I want to avoid them as it would affect running time, is there anyway I can do that in numpy?
Here's an approach using NumPy strides that is supposedly very efficient for such sliding windowed operations as it creates a view into the array without actually making copies -
N = 3 # Window size
m,n = a.strides
p,q = a.shape
a3D = np.lib.stride_tricks.as_strided(a,shape=(p, q-N +1, N),strides=(m,n,n))
out1 = a3D.argmax(0)
out2 = a3D.max(0).prod(1)
Sample run -
In [69]: a
Out[69]:
array([[ 0.1, 0.2, 0.3, 0.6],
[ 0.3, 0.4, 0.8, 0.7],
[ 0.5, 0.6, 0.2, 0.1]])
In [70]: out1
Out[70]:
array([[2, 2, 1],
[2, 1, 1]])
In [71]: out2
Out[71]: array([ 0.24 , 0.336])
We can zip those two outputs together if needed in that format -
In [75]: zip(out2,map(tuple,out1))
Out[75]: [(0.23999999999999999, (2, 2, 1)), (0.33599999999999997, (2, 1, 1))]
I have 2 symmetric matrices, one of them being a correlation matrix and the other one similar to a correlation matrix. Examples of these matrices are shown below:
Correlation Matrix (c):
A B C D
A 1 0.5 0.1 0.4
B 0.5 1 0.9 0.3
C 0.1 0.9 1 0.3
D 0.4 0.3 0.3 1
Other Matrix (z):
A B C D
A 3 2 2 2
B 2 3 3 2
C 2 3 3 2
D 2 2 2 3
I'm ordering the correlation matrix in descending order so I can look at the top-most correlation values, using the following code:
c = corrMatrixMin10.abs()
s = c.unstack()
so = s.sort_values(kind="quicksort")
pd.DataFrame(so[so.values!=1].sort_values(ascending=False))
My question is as follows:
When I arrange the correlation matrix c in a descending order, the correlation matrix itself loses its shape. How do I have the other matrix z in the exact same order?
For example: The intersection of columns A and B in the matrix c is 0.5. The intersection of columns A and B in the matrix z is 2. How can I still preserve this order to associate these 2 values after arranging the matrix c in a descending order?
Any help would be greatly appreciated. TIA.
The code to generate the 2 matrices is as follows:
c = pd.DataFrame([[1, 0.5, 0.1, 0.4],
[0.5, 1, 0.9, 0.3],
[ 0.1, 0.9, 1, 0.3],
[ 0.4, 0.3, 0.3, 1]],
columns=list('ABCD'))
z = pd.DataFrame([[3, 2, 2, 2],
[2, 3, 3, 2],
[ 2, 3, 3, 2],
[ 2, 2, 2, 3]],
columns=list('ABCD'))
You can use Series.reindex
c_series = c.unstack().drop([(x, x) for x in c]).sort_values(ascending=False)
z_series = z.unstack().reindex(c_series.index)
I'm trying to define my own discrete distribution. The code I have works for integer values but not for decimal values. For example, this works:
>>> from scipy.stats import rv_discrete
>>> probabilities = [0.2, 0.5, 0.3]
>>> values = [1, 2, 3]
>>> distrib = rv_discrete(values=(values, probabilities))
>>> print distrib.rvs(size=10)
[1 3 3 2 2 2 2 2 1 3]
But if I use decimal values, it doesn't work:
>>> from scipy.stats import rv_discrete
>>> probabilities = [0.2, 0.5, 0.3]
>>> values = [.1, .2, .3]
>>> distrib = rv_discrete(values=(values, probabilities))
>>> print distrib.rvs(size=10)
[0 0 0 0 0 0 0 0 0 0]
Thanks..
Per stats.rv_discrete's doc string:
values : tuple of two array_like, optional
(xk, pk) where xk are integers with non-zero
probabilities pk with sum(pk) = 1.
(my emphasis). So the discrete distributions created by rv_discrete must use integer values. However, it is not hard to map those integer values to floats by using the rvs values as integer indices into values:
In [4]: values = np.array([0.1, 0.2, 0.3])
In [5]: idx = distrib.rvs(size=10); idx
Out[5]: array([1, 1, 0, 0, 1, 1, 0, 2, 1, 1])
In [6]: values[idx]
Out[6]: array([ 0.2, 0.2, 0.1, 0.1, 0.2, 0.2, 0.1, 0.3, 0.2, 0.2])
Thus you could use:
import numpy as np
import scipy.stats as stats
np.random.seed(2016)
probabilities = np.array([0.2, 0.5, 0.3])
values = np.array([0.1, 0.2, 0.3])
distrib = stats.rv_discrete(values=(range(len(probabilities)), probabilities))
idx = distrib.rvs(size=10)
result = values[idx]
print(result)
# [ 0.3 0.3 0.3 0.3 0.2 0.2 0.2 0.3 0.3 0.2]