Calculate marginal distribution from joint distribution in Python

Calculate marginal distribution from joint distribution in Python - python

I have these two arrays/matrices which represent the joint distribution of 2 discrete random variables X and Y. I represented them in this format because I wanted to use the numpy.cov function and that seems to be the format cov requires.
https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.cov.html
joint_distibution_X_Y = [
[0.01, 0.02, 0.03, 0.04,
0.01, 0.02, 0.03, 0.04,
0.01, 0.02, 0.03, 0.04,
0.01, 0.02, 0.03, 0.04],
[0.002, 0.002, 0.002, 0.002,
0.004, 0.004, 0.004, 0.004,
0.006, 0.006, 0.006, 0.006,
0.008, 0.008, 0.008, 0.008],
]
join_probability_X_Y = [
0.01, 0.02, 0.04, 0.04,
0.03, 0.24, 0.15, 0.06,
0.04, 0.10, 0.08, 0.08,
0.02, 0.04, 0.03, 0.02
]
How do I calculate the marginal distribution of X (and also of Y) from the so given joint distribution of X and Y? I mean... is there any library method which I can call?
I want to get as a result e.g. something like:
X_values = [0.002, 0.004, 0.006, 0.008]
X_weights = [0.110, 0.480, 0.300, 0.110]
I want to avoid coding the calculation of the marginal distribution myself.
I assume there's already some Python library method for that.
What is it and how can I call it given the data I have?

You could use margins:
import numpy as np
from scipy.stats.contingency import margins
join_probability_X_Y = np.array([
[0.01, 0.02, 0.04, 0.04],
[0.03, 0.24, 0.15, 0.06],
[0.04, 0.10, 0.08, 0.08],
[0.02, 0.04, 0.03, 0.02]
])
x, y = margins(join_probability_X_Y)
print(x.T)
Output
[[0.11 0.48 0.3 0.11]]

Related

Hungarian algorithm in Python for non-square cost matrices

I want to use the Hungarian assignment algorithm in python on a non-square numpy array.
My input matrix X looks like this:
X = np.array([[0.26, 0.64, 0.16, 0.46, 0.5 , 0.63, 0.29],
[0.49, 0.12, 0.61, 0.28, 0.74, 0.54, 0.25],
[0.22, 0.44, 0.25, 0.76, 0.28, 0.49, 0.89],
[0.56, 0.13, 0.45, 0.6 , 0.53, 0.56, 0.05],
[0.66, 0.24, 0.61, 0.21, 0.47, 0.31, 0.35],
[0.4 , 0.85, 0.45, 0.14, 0.26, 0.29, 0.24]])
The desired result is the matrix ordered such as X becomes X_desired_output:
X_desired_output = np.array([[0.63, 0.5 , 0.29, 0.46, 0.26, 0.64, 0.16],
[0.54, 0.74, 0.25, 0.28, 0.49, 0.12, 0.61],
[[0.49, 0.28, 0.89, 0.76, 0.22, 0.44, 0.25],
[[0.56, 0.53, 0.05, 0.6 , 0.56, 0.13, 0.45],
[[0.31, 0.47, 0.35, 0.21, 0.66, 0.24, 0.61],
[[0.29, 0.26, 0.24, 0.14, 0.4 , 0.85, 0.45]])
Here I would like to maximize the cost and not minimize so the input to the algorithm would be in theory either 1-X or simply X.
I have found https://software.clapper.org/munkres/ that leads to:
from munkres import Munkres
m = Munkres()
indices = m.compute(-X)
indices
[(0, 5), (1, 4), (2, 6), (3, 3), (4, 0), (5, 1)]
# getting the indices in list format
ii = [i for (i,j) in indices]
jj = [j for (i,j) in indices]
How can I use these to sort X ? jjonly contain 6 elements as opposed to the original 7 columns of X.
I am looking to actually get the matrix sorted.

After spending some hours working on it, I found a solution. The problem was due to the fact that X.shape[1] > X.shape[0], some columns are not assigned at all and this leads to the problem.
The documentation states that
"The Munkres algorithm assumes that the cost matrix is square.
However, it’s possible to use a rectangular matrix if you first pad it
with 0 values to make it square. This module automatically pads
rectangular cost matrices to make them square."
from munkres import Munkres
m = Munkres()
indices = m.compute(-X)
indices
[(0, 5), (1, 4), (2, 6), (3, 3), (4, 0), (5, 1)]
# getting the indices in list format
ii = [i for (i,j) in indices]
jj = [j for (i,j) in indices]
# re-order matrix
X_=X[:,jj] # re-order columns
X_=X_[ii,:] # re-order rows
# HERE IS THE TRICK: since the X is not diagonal, some columns are not assigned to the rows !
not_assigned_columns = X[:, [not_assigned for not_assigned in np.arange(X.shape[1]).tolist() if not_assigned not in jj]].reshape(-1,1)
X_desired = np.concatenate((X_, not_assigned_columns), axis=1)
print(X_desired)
array([[0.63, 0.5 , 0.29, 0.46, 0.26, 0.64, 0.16],
[0.54, 0.74, 0.25, 0.28, 0.49, 0.12, 0.61],
[0.49, 0.28, 0.89, 0.76, 0.22, 0.44, 0.25],
[0.56, 0.53, 0.05, 0.6 , 0.56, 0.13, 0.45],
[0.31, 0.47, 0.35, 0.21, 0.66, 0.24, 0.61],
[0.29, 0.26, 0.24, 0.14, 0.4 , 0.85, 0.45]])

Creating a color map in python from a list of RGB colors

I want to create a color map in python similar to this image:
but my map only consists of 3 rows and 4 columns. And I want to assign a certain color value to each square using RGB values. I have tried this code
colors=np.array([[0.01, 0.08, 0.01], [0.01, 0.16, 0.01], [0.01, 0.165, 0.01], [0.01, 0.3, 0.01],
[0.01, 0.2, 0.01], [0.666, 0.333, 0.01], [0.01, 0.165, 0.01], [0.01, 0.3, 0.01],
[0.01, 0.2, 0.01], [0.666, 0.333, 0.01], [0.01, 0.165, 0.01], [0.01, 0.3, 0.01]])
fig, ax=plt.subplots()
ax.imshow(colors)
ax.set_aspect('equal')
plt.show()
but the output does not match my expectations. It seems that with this method I cannot use the RGB values to represent the color for a square. Can anyone help me, please? Thank you!

You have a (12,3) colors array, while you need a (4, 3, 3) image, one RGB color per pixel.
import numpy as np # type: ignore
import matplotlib.pyplot as plt # type: ignore
colors = np.array(
[
# this is the first row
[
# these are the 3 pixels in the first row
[0.01, 0.08, 0.01],
[0.01, 0.16, 0.01],
[0.01, 0.165, 0.01],
],
[
[0.01, 0.3, 0.01],
[0.01, 0.2, 0.01],
[0.666, 0.333, 0.01],
],
[
[0.01, 0.165, 0.01],
[0.01, 0.3, 0.01],
[0.01, 0.2, 0.01],
],
# this is the fourth row
[
[0.666, 0.333, 0.01],
[0.01, 0.165, 0.01],
[0.01, 0.3, 0.01],
],
]
)
print(colors.shape)
fig, ax = plt.subplots()
ax.imshow(colors)
ax.set_aspect("equal")
plt.show()
Rearrange the data as you need in rows/columns.

Remove elements with fixed percentage from a list of sub-lists in Python [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have the following list:
list = [[0.002, 0.001, 0.055, 0.44, 0.11, 0.002, 0.001, 0.055, 0.44, 0.11],
[0.001, 0.006, 0.009, 0.002, 0.33],
[0.02, 0.004,0.003, 0.001, 0.008]]
I want to preserve 20% of the elements for each sub-list, and delete the list elements from the beginning of the sub-lists, so the result will look like:
list = [[0.055, 0.44, 0.11, 0.002, 0.001, 0.055, 0.44, 0.11],
[0.006, 0.009, 0.002, 0.33],
[0.004,0.003, 0.001, 0.008]]
I wrote the following code:
def del_list_rate(list):
list_del = []
n = 0.2
d = int(le * (1 - n))
for list1 in list:
le = len(list1)
d = int(le * (1 - n))
del list1[0 : le-d]
list_del.append(list1)
is there any approach to code it in a faster way?

In [8]: list
Out[8]:
[[0.002, 0.001, 0.055, 0.44, 0.11, 0.002, 0.001, 0.055, 0.44, 0.11],
[0.001, 0.006, 0.009, 0.002, 0.33],
[0.02, 0.004, 0.003, 0.001, 0.008]]
In [9]: [i[int(0.2 * len(i)):] for i in list]
Out[9]:
[[0.055, 0.44, 0.11, 0.002, 0.001, 0.055, 0.44, 0.11],
[0.006, 0.009, 0.002, 0.33],
[0.004, 0.003, 0.001, 0.008]]

To improve upon bigbounty's answer, I would suggest the method:
def del_list_rate(list):
return [sublist[int(0.2 * len(sublist)):] for sublist in list]

Numpy stack in the first dimension?

I have two np.arrays with shape (3,8), how can I make it into (2,3,8)
I tried with np.concatenate but it gives me only
Traceback (most recent call last): File "", line 1, in
File "<array_function internals>", line 6, in
concatenate TypeError: only integer scalar arrays can be converted to
a scalar index
error.
My a1 array:
array([[0.08, 0.3 , 0.51, 0.37, 0.02, 0.52, 0.05, 0.08],
[0.77, 0.01, 0.08, 0.67, 0.01, 0.02, 0.17, 0.77],
[0.3 , 0. , 0.07, 0.17, 0.11, 0.04, 0.05, 0.34]], dtype=float32)
My a2 array:
array([[0.08, 0.3 , 0.51, 0.37, 0.02, 0.52, 0.05, 0.08],
[0.77, 0.01, 0.08, 0.67, 0.01, 0.02, 0.17, 0.77],
[0.3 , 0. , 0.07, 0.17, 0.11, 0.04, 0.05, 0.34]], dtype=float32)

Try doing:
a1 = a1.reshape((1,3,8))
a2 = a2.reshape((1,3,8))
np.concatenate((a1,a2))
or
array = np.concatenate((a1.reshape((1,3,8)),a2.reshape((1,3,8))))
Based on the error message it also looks like you may have forgotten to include parentheses around your arrays in the np.concatenate().

Try the following simple way to stack your array:
>>> import numpy as np
>>> a = np.array([[0.08, 0.3 , 0.51, 0.37, 0.02, 0.52, 0.05, 0.08], [0.77, 0.01, 0.08, 0.67, 0.01, 0.02, 0.17, 0.77], [0.3 , 0. , 0.07, 0.17, 0.11, 0.04, 0.05, 0.34]])
>>> a.shape
(3, 8)
>>> b = np.array([[0.08, 0.3 , 0.51, 0.37, 0.02, 0.52, 0.05, 0.08], [0.77, 0.01, 0.08, 0.67, 0.01, 0.02, 0.17, 0.77], [0.3 , 0. , 0.07, 0.17, 0.11, 0.04, 0.05, 0.34]])
>>> b.shape
(3, 8)
>>> c = np.array([a, b])
>>> c.shape
(2, 3, 8)
>>>

While np.concatenate requires existing dimensions, which are created in #Tim Crammond's answer, np.stack will create the axes for you:
np.stack((a, b), axis=0)
This is roughly equivalent to #Sophie Roseinsta's suggestion of using np.array directly.

Creating list of five elements from a List of Number in Python

i'm using Python 3.7.
I have a tuple of Numbers like this:
x = ((1,2,3,4,5,6,7,8,9....etc))
I would Like to obtain a list of list divided by 100 and with Five numbers from the list in an iterative way... something like this:
[[[0.0], [0.01], [0.02], [0.03], [0.04]],
[[0.01], [0.02], [0.03], [0.04], [0.05]],
[[0.02], [0.03], [0.04], [0.05], [0.06]],
[[0.03], [0.04], [0.05], [0.06], [0.07]],
[[0.04], [0.05], [0.06], [0.07], [0.08]],
[[0.05], [0.06], [0.07], [0.08], [0.09]],... etc
I Tried this but it doesn't work properly:
Data = [[[(interest_over_time_data+j)/100] for
interest_over_time_data in range(5)]for j in
interest_over_time_data]
The real numbers are not a list of consecutive number so I cannot add +1 to each element...
Thank you in advance!

You want a list of lists, that calls for a double list comprehension.
You want sliding windows, that calls for slicing, better done with itertools.islice
this code below creates 5 sliding sublists with 100 division.
import itertools
x = (1,2,3,4,5,6,7,8,9)
result = [[v/100.0 for v in itertools.islice(x,start,start+5)] for start in range(6)]
result:
[[0.01, 0.02, 0.03, 0.04, 0.05],
[0.02, 0.03, 0.04, 0.05, 0.06],
[0.03, 0.04, 0.05, 0.06, 0.07],
[0.04, 0.05, 0.06, 0.07, 0.08],
[0.05, 0.06, 0.07, 0.08, 0.09],
[0.06, 0.07, 0.08, 0.09]]

You can use 3rd party NumPy for an array-based solution:
import numpy as np
first_row = np.arange(5) / 100
first_col = np.arange(10) / 100
res = first_row + first_col[:, None]
array([[ 0. , 0.01, 0.02, 0.03, 0.04],
[ 0.01, 0.02, 0.03, 0.04, 0.05],
[ 0.02, 0.03, 0.04, 0.05, 0.06],
[ 0.03, 0.04, 0.05, 0.06, 0.07],
[ 0.04, 0.05, 0.06, 0.07, 0.08],
[ 0.05, 0.06, 0.07, 0.08, 0.09],
[ 0.06, 0.07, 0.08, 0.09, 0.1 ],
[ 0.07, 0.08, 0.09, 0.1 , 0.11],
[ 0.08, 0.09, 0.1 , 0.11, 0.12],
[ 0.09, 0.1 , 0.11, 0.12, 0.13]])

Love one line solutions:
[[[x[p]/100] for p in range(k,k+5)] for k in range(len(x)-4)]
#>[[[0.01], [0.02], [0.03], [0.04], [0.05]],
#> [[0.02], [0.03], [0.04], [0.05], [0.06]],
#> [[0.03], [0.04], [0.05], [0.06], [0.07]],
#> [[0.04], [0.05], [0.06], [0.07], [0.08]],
#> [[0.05], [0.06], [0.07], [0.08], [0.09]]]

Okay, you want your output containing lists with a length of 5 shifting from the first element of x to the last. Therefore your output will contain n-4 lists, where n is len(x).
So first we need to iterate over range(len(x)-4)
Then we want five elements from x starting at a given offset i. We can use slicing for this, e.g. x[i:i+5].
And we want all elements of this sublist divided by 100.
All together packed in list comprehension it looks like this:
x = (1,2,3,4,5,6,7,8,9)
res = [
[j/100.0 for j in x[i:i+5]]
for i in range(len(x)-4)
]
print(res)
Which results in
[[0.01, 0.02, 0.03, 0.04, 0.05],
[0.02, 0.03, 0.04, 0.05, 0.06],
[0.03, 0.04, 0.05, 0.06, 0.07],
[0.04, 0.05, 0.06, 0.07, 0.08],
[0.05, 0.06, 0.07, 0.08, 0.09]]
Or if you want to have 0.0 as in your example output:
x = (1,2,3,4,5,6,7,8,9)
x = [0] + list(x)
res = [
[j/100.0 for j in x[i:i+5]]
for i in range(len(x)-4)
]
print(res)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Calculate marginal distribution from joint distribution in Python - python

Related

Hungarian algorithm in Python for non-square cost matrices

Creating a color map in python from a list of RGB colors

Remove elements with fixed percentage from a list of sub-lists in Python [closed]

Numpy stack in the first dimension?

Creating list of five elements from a List of Number in Python

Categories

Resources