How do i create a majority voting based in two arrays?

How do i create a majority voting based in two arrays? - python

Scenario:
I want to create a majority vote system based that takes into account the weight of someone's vote about N observations.
So, M observers will give their guess about N observations, selecting from 3 classes (1,2,3). For each observation, each observer will have a weight associated with it.
Defining:
G: Matrix of guesses per observation / observer (N observations × M observers);
W: Weights for each observation / observer (N observations × M observers)
Example:
# 2 observations, 3 observers
G = [[1, 2, 3],
[2, 2, 1]]
# Weights (influence) each observer has about each observation
W = [[0.1, 0.2, 0.3],
[0.3, 0.1, 0.2]]
I need to compute another matrix with shape (N observations × C classes) that stores the probability of an observation comes from an specific class.
Example using values above:
G = [[1, 2, 3],
[2, 2, 1]]
W = [[0.1, 0.2, 0.3],
[0.3, 0.1, 0.2]]
P = [[0.1, 0.2, 0.3],
[0.2, (0.3 + 0.1), 0]]
After computing the P matrix, I could apply np.argmax() row-wise to get the column (class) with highest value:
P = [[0.1, 0.2, 0.3], #class 3 has highest value (0.3)
[0.2, 0.4, 0]] #class 2 has highest value (0.4)
result = [3, 2]
I would like to know how can I combine G and W to generate the P matrix.

You can get the job done in a vectorized manner by using NumPy's indices and advanced indexing:
In [569]: import numpy as np
In [570]: G = np.array([[1, 2, 3], [2, 2, 1]] )
In [571]: W = np.array([[0.1, 0.2, 0.3], [0.3, 0.1, 0.2]])
In [572]: C = 3
In [573]: M, N = G.shape
In [574]: row, col = np.indices((M, N))
In [575]: P3d = np.zeros(shape=(M, N, C))
In [576]: P3d[row, col, G-1] = W
In [577]: P = P3d.sum(axis=1)
In [578]: P
Out[578]:
array([[0.1, 0.2, 0.3],
[0.2, 0.4, 0. ]])

Initialize P with zero values then iterate by observations/rows of G and value of index i.e g[observation][index] if class 1 then add weight[observation][index] from W matrix to P[observation][class]+=weight[observation][index]. i.e in your sample testcase. for row 1. index 0 has value 1 and weight[0][0] is 0.1 so add 0.1 to row 0 and index[class] of P. similarly for index 2 and 3 value are same as index therefore same in P.
Now for row 2, index 1 has class 2 so we add weight of class 2 to p[2][class]+=0.3 and for index 2 class is again 2 so weight of that observer is 0.1 so again p[2][class]+=weight i.e 0.1. for last index class is 1 so p[2][class]+=weight now Our matrix is ready so use np.argmax() for required answer.

Related

PyTorch: index 2D tensor with 2D tensor of row indices

I have a torch tensor a of shape (x, n) and another tensor b of shape (y, n) where y <= x. every column of b contains a sequence of row indices for a and what I would like to be able to do is to somehow index a with b such that I obtain a tensor of shape (y, n) in which the ith column contains a[:, i][b[:, i]] (not quite sure if that's the correct way to express it).
Here's an example (where x = 5, y = 3 and n = 4):
import torch
a = torch.Tensor(
[[0.1, 0.2, 0.3, 0.4],
[0.6, 0.7, 0.8, 0.9],
[1.1, 1.2, 1.3, 1.4],
[1.6, 1.7, 1.8, 1.9],
[2.1, 2.2, 2.3, 2.4]]
)
b = torch.LongTensor(
[[0, 3, 1, 2],
[2, 2, 2, 0],
[1, 1, 0, 4]]
)
# How do I get from a and b to c
# (so that I can also assign to those elements in a)?
c = torch.Tensor(
[[0.1, 1.7, 0.8, 1.4],
[1.1, 1.2, 1.3, 0.4],
[0.6, 0.7, 0.3, 2.4]]
)
I can't get my head around this. What I'm looking for is a method that will not yield the tensor c but also let me assign a tensor of the same shape as c to the elements of a which c is made up of.

I try to use index_select but it supports only 1-dim array for index.
bt = b.transpose(0, 1)
at = a.transpose(0, 1)
ct = [torch.index_select(at[i], dim=0, index=bt[i]) for i in range(len(at))]
c = torch.stack(ct).transpose(0, 1)
print(c)
"""
tensor([[0.1000, 1.7000, 0.8000, 1.4000],
[1.1000, 1.2000, 1.3000, 0.4000],
[0.6000, 0.7000, 0.3000, 2.4000]])
"""
It might be not the best solution, but hope this helps you at least.

Piecewise function in numpy with multiple arguments

I tried to define a function (tent map) as following:
def f(r, x):
return np.piecewise([r, x], [x < 0.5, x >= 0.5], [lambda r, x: 2*r*x, lambda r, x: 2*r*(1-x)])
And r, x will be numpy arrays:
no_r = 10001
r = np.linspace(0, 4, no_r)
x = np.random.rand(no_r)
I would like the result to be a numpy array matching the shapes of r and x, calculated using each pairs of elements of arrays r and x with the same indicies. For example if r = [0, 1, 2, 3] and x = [0.1, 0.7, 0.3, 1], the result should be [0, 0.6, 1.2, 0].
An error occured: "boolean index did not match indexed array along dimension 0; dimension is 2 but corresponding boolean dimension is 10001"
So what should I do to achieve the intended purpose?

what you want to get as result can be done with np.select such as:
def f(r, x):
return np.select([x < 0.5,x >= 0.5], [2*r*x, 2*r*(1-x)])
Then with
r = np.array([0, 1, 2, 3])
x = np.array([0.1, 0.7, 0.3, 1])
print (f(r,x))
[0. 0.6 1.2 0. ]
EDIT: in this case, with only 2 conditions that are exclusive, you can also use np.where:
def f(r,x):
return np.where(x<0.5,2*r*x, 2*r*(1-x))
will give the same result.

get max value of multiplication of column combinations and their respective index in python

I have a numpy array of M*N dimensions in which each element of the array is a float with a value between 0-1.
Input: for simplicity purpose lets consider a 3*4 array:
a=np.array([
[0.1, 0.2, 0.3, 0.6],
[0.3, 0.4, 0.8, 0.7],
[0.5, 0.6, 0.2, 0.1]
])
I want to consider 3 columns at a time (say col 0,1,2 for first iteration and 1,2,3 for second) and get the maximum value of multiplication of all possible combinations of the 3 columns and also get the index of their respective values.
In this case I should get max value of 0.5*0.6*0.8=0.24 and the index of the rows of values that gave the max value: (2,2,1) in this case.
Output: [[0.24,(2,2,1)],[0.336,(2,1,1)]]
I can do this using loops but I want to avoid them as it would affect running time, is there anyway I can do that in numpy?

Here's an approach using NumPy strides that is supposedly very efficient for such sliding windowed operations as it creates a view into the array without actually making copies -
N = 3 # Window size
m,n = a.strides
p,q = a.shape
a3D = np.lib.stride_tricks.as_strided(a,shape=(p, q-N +1, N),strides=(m,n,n))
out1 = a3D.argmax(0)
out2 = a3D.max(0).prod(1)
Sample run -
In [69]: a
Out[69]:
array([[ 0.1, 0.2, 0.3, 0.6],
[ 0.3, 0.4, 0.8, 0.7],
[ 0.5, 0.6, 0.2, 0.1]])
In [70]: out1
Out[70]:
array([[2, 2, 1],
[2, 1, 1]])
In [71]: out2
Out[71]: array([ 0.24 , 0.336])
We can zip those two outputs together if needed in that format -
In [75]: zip(out2,map(tuple,out1))
Out[75]: [(0.23999999999999999, (2, 2, 1)), (0.33599999999999997, (2, 1, 1))]

Ordering 2 symmetric matrices in the same fashion

I have 2 symmetric matrices, one of them being a correlation matrix and the other one similar to a correlation matrix. Examples of these matrices are shown below:
Correlation Matrix (c):
A B C D
A 1 0.5 0.1 0.4
B 0.5 1 0.9 0.3
C 0.1 0.9 1 0.3
D 0.4 0.3 0.3 1
Other Matrix (z):
A B C D
A 3 2 2 2
B 2 3 3 2
C 2 3 3 2
D 2 2 2 3
I'm ordering the correlation matrix in descending order so I can look at the top-most correlation values, using the following code:
c = corrMatrixMin10.abs()
s = c.unstack()
so = s.sort_values(kind="quicksort")
pd.DataFrame(so[so.values!=1].sort_values(ascending=False))
My question is as follows:
When I arrange the correlation matrix c in a descending order, the correlation matrix itself loses its shape. How do I have the other matrix z in the exact same order?
For example: The intersection of columns A and B in the matrix c is 0.5. The intersection of columns A and B in the matrix z is 2. How can I still preserve this order to associate these 2 values after arranging the matrix c in a descending order?
Any help would be greatly appreciated. TIA.
The code to generate the 2 matrices is as follows:
c = pd.DataFrame([[1, 0.5, 0.1, 0.4],
[0.5, 1, 0.9, 0.3],
[ 0.1, 0.9, 1, 0.3],
[ 0.4, 0.3, 0.3, 1]],
columns=list('ABCD'))
z = pd.DataFrame([[3, 2, 2, 2],
[2, 3, 3, 2],
[ 2, 3, 3, 2],
[ 2, 2, 2, 3]],
columns=list('ABCD'))

You can use Series.reindex
c_series = c.unstack().drop([(x, x) for x in c]).sort_values(ascending=False)
z_series = z.unstack().reindex(c_series.index)

Getting scipy's rv_discrete to work with floating point values?

I'm trying to define my own discrete distribution. The code I have works for integer values but not for decimal values. For example, this works:
>>> from scipy.stats import rv_discrete
>>> probabilities = [0.2, 0.5, 0.3]
>>> values = [1, 2, 3]
>>> distrib = rv_discrete(values=(values, probabilities))
>>> print distrib.rvs(size=10)
[1 3 3 2 2 2 2 2 1 3]
But if I use decimal values, it doesn't work:
>>> from scipy.stats import rv_discrete
>>> probabilities = [0.2, 0.5, 0.3]
>>> values = [.1, .2, .3]
>>> distrib = rv_discrete(values=(values, probabilities))
>>> print distrib.rvs(size=10)
[0 0 0 0 0 0 0 0 0 0]
Thanks..

Per stats.rv_discrete's doc string:
values : tuple of two array_like, optional
(xk, pk) where xk are integers with non-zero
probabilities pk with sum(pk) = 1.
(my emphasis). So the discrete distributions created by rv_discrete must use integer values. However, it is not hard to map those integer values to floats by using the rvs values as integer indices into values:
In [4]: values = np.array([0.1, 0.2, 0.3])
In [5]: idx = distrib.rvs(size=10); idx
Out[5]: array([1, 1, 0, 0, 1, 1, 0, 2, 1, 1])
In [6]: values[idx]
Out[6]: array([ 0.2, 0.2, 0.1, 0.1, 0.2, 0.2, 0.1, 0.3, 0.2, 0.2])
Thus you could use:
import numpy as np
import scipy.stats as stats
np.random.seed(2016)
probabilities = np.array([0.2, 0.5, 0.3])
values = np.array([0.1, 0.2, 0.3])
distrib = stats.rv_discrete(values=(range(len(probabilities)), probabilities))
idx = distrib.rvs(size=10)
result = values[idx]
print(result)
# [ 0.3 0.3 0.3 0.3 0.2 0.2 0.2 0.3 0.3 0.2]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How do i create a majority voting based in two arrays? - python

Related

PyTorch: index 2D tensor with 2D tensor of row indices

Piecewise function in numpy with multiple arguments

get max value of multiplication of column combinations and their respective index in python

Ordering 2 symmetric matrices in the same fashion

Getting scipy's rv_discrete to work with floating point values?

Categories

Resources