Hungarian algorithm in Python for non-square cost matrices - python

I want to use the Hungarian assignment algorithm in python on a non-square numpy array.
My input matrix X looks like this:
X = np.array([[0.26, 0.64, 0.16, 0.46, 0.5 , 0.63, 0.29],
[0.49, 0.12, 0.61, 0.28, 0.74, 0.54, 0.25],
[0.22, 0.44, 0.25, 0.76, 0.28, 0.49, 0.89],
[0.56, 0.13, 0.45, 0.6 , 0.53, 0.56, 0.05],
[0.66, 0.24, 0.61, 0.21, 0.47, 0.31, 0.35],
[0.4 , 0.85, 0.45, 0.14, 0.26, 0.29, 0.24]])
The desired result is the matrix ordered such as X becomes X_desired_output:
X_desired_output = np.array([[0.63, 0.5 , 0.29, 0.46, 0.26, 0.64, 0.16],
[0.54, 0.74, 0.25, 0.28, 0.49, 0.12, 0.61],
[[0.49, 0.28, 0.89, 0.76, 0.22, 0.44, 0.25],
[[0.56, 0.53, 0.05, 0.6 , 0.56, 0.13, 0.45],
[[0.31, 0.47, 0.35, 0.21, 0.66, 0.24, 0.61],
[[0.29, 0.26, 0.24, 0.14, 0.4 , 0.85, 0.45]])
Here I would like to maximize the cost and not minimize so the input to the algorithm would be in theory either 1-X or simply X.
I have found https://software.clapper.org/munkres/ that leads to:
from munkres import Munkres
m = Munkres()
indices = m.compute(-X)
indices
[(0, 5), (1, 4), (2, 6), (3, 3), (4, 0), (5, 1)]
# getting the indices in list format
ii = [i for (i,j) in indices]
jj = [j for (i,j) in indices]
How can I use these to sort X ? jjonly contain 6 elements as opposed to the original 7 columns of X.
I am looking to actually get the matrix sorted.

After spending some hours working on it, I found a solution. The problem was due to the fact that X.shape[1] > X.shape[0], some columns are not assigned at all and this leads to the problem.
The documentation states that
"The Munkres algorithm assumes that the cost matrix is square.
However, it’s possible to use a rectangular matrix if you first pad it
with 0 values to make it square. This module automatically pads
rectangular cost matrices to make them square."
from munkres import Munkres
m = Munkres()
indices = m.compute(-X)
indices
[(0, 5), (1, 4), (2, 6), (3, 3), (4, 0), (5, 1)]
# getting the indices in list format
ii = [i for (i,j) in indices]
jj = [j for (i,j) in indices]
# re-order matrix
X_=X[:,jj] # re-order columns
X_=X_[ii,:] # re-order rows
# HERE IS THE TRICK: since the X is not diagonal, some columns are not assigned to the rows !
not_assigned_columns = X[:, [not_assigned for not_assigned in np.arange(X.shape[1]).tolist() if not_assigned not in jj]].reshape(-1,1)
X_desired = np.concatenate((X_, not_assigned_columns), axis=1)
print(X_desired)
array([[0.63, 0.5 , 0.29, 0.46, 0.26, 0.64, 0.16],
[0.54, 0.74, 0.25, 0.28, 0.49, 0.12, 0.61],
[0.49, 0.28, 0.89, 0.76, 0.22, 0.44, 0.25],
[0.56, 0.53, 0.05, 0.6 , 0.56, 0.13, 0.45],
[0.31, 0.47, 0.35, 0.21, 0.66, 0.24, 0.61],
[0.29, 0.26, 0.24, 0.14, 0.4 , 0.85, 0.45]])

Related

Numpy stack in the first dimension?

I have two np.arrays with shape (3,8), how can I make it into (2,3,8)
I tried with np.concatenate but it gives me only
Traceback (most recent call last): File "", line 1, in
File "<array_function internals>", line 6, in
concatenate TypeError: only integer scalar arrays can be converted to
a scalar index
error.
My a1 array:
array([[0.08, 0.3 , 0.51, 0.37, 0.02, 0.52, 0.05, 0.08],
[0.77, 0.01, 0.08, 0.67, 0.01, 0.02, 0.17, 0.77],
[0.3 , 0. , 0.07, 0.17, 0.11, 0.04, 0.05, 0.34]], dtype=float32)
My a2 array:
array([[0.08, 0.3 , 0.51, 0.37, 0.02, 0.52, 0.05, 0.08],
[0.77, 0.01, 0.08, 0.67, 0.01, 0.02, 0.17, 0.77],
[0.3 , 0. , 0.07, 0.17, 0.11, 0.04, 0.05, 0.34]], dtype=float32)
Try doing:
a1 = a1.reshape((1,3,8))
a2 = a2.reshape((1,3,8))
np.concatenate((a1,a2))
or
array = np.concatenate((a1.reshape((1,3,8)),a2.reshape((1,3,8))))
Based on the error message it also looks like you may have forgotten to include parentheses around your arrays in the np.concatenate().
Try the following simple way to stack your array:
>>> import numpy as np
>>> a = np.array([[0.08, 0.3 , 0.51, 0.37, 0.02, 0.52, 0.05, 0.08], [0.77, 0.01, 0.08, 0.67, 0.01, 0.02, 0.17, 0.77], [0.3 , 0. , 0.07, 0.17, 0.11, 0.04, 0.05, 0.34]])
>>> a.shape
(3, 8)
>>> b = np.array([[0.08, 0.3 , 0.51, 0.37, 0.02, 0.52, 0.05, 0.08], [0.77, 0.01, 0.08, 0.67, 0.01, 0.02, 0.17, 0.77], [0.3 , 0. , 0.07, 0.17, 0.11, 0.04, 0.05, 0.34]])
>>> b.shape
(3, 8)
>>> c = np.array([a, b])
>>> c.shape
(2, 3, 8)
>>>
While np.concatenate requires existing dimensions, which are created in #Tim Crammond's answer, np.stack will create the axes for you:
np.stack((a, b), axis=0)
This is roughly equivalent to #Sophie Roseinsta's suggestion of using np.array directly.

Updtated titlle: Scipy.stats pdf bug?

I have a simple plot of a 2D Gaussian distribution.
from scipy.stats import multivariate_normal
from matplotlib import pyplot as plt
means = [ 1.03872615e+00, -2.66927843e-05]
cov_matrix = [[3.88809050e-03, 3.90737359e-06], [3.90737359e-06, 4.28819569e-09]]
# This works
a_lims = [0.7, 1.3]
b_lims = [-5, 5]
# This does not work
a_lims = [0.700006488869478, 1.2849292618191401]
b_lims =[-5.000288311285968, 5.000099437047633]
dist = multivariate_normal(mean=means, cov=cov_matrix)
a_plot, b_plot = np.mgrid[a_lims[0]:a_lims[1]:1e-2, b_lims[0]:b_lims[1]:0.1]
pos = np.empty(a_plot.shape + (2,))
pos[:, :, 0] = a_plot
pos[:, :, 1] = b_plot
z = dist.pdf(pos)
plt.figure()
plt.contourf(a_plot, b_plot, z, cmap='coolwarm', levels=100)
If I use the limits marked as "this works", I get the following plot (correct).
However, if I use the same limits, but slightly adjusted, it plots completely wrong, because localized at different values (below).
I guess it is a bug in mgrid. Does anyone have any ideas? More specifically, why does the maximum of the distribution move?
Focusing just on the xaxis:
In [443]: a_lims = [0.7, 1.3]
In [444]: np.mgrid[a_lims[0]:a_lims[1]:1e-2]
Out[444]:
array([0.7 , 0.71, 0.72, 0.73, 0.74, 0.75, 0.76, 0.77, 0.78, 0.79, 0.8 ,
0.81, 0.82, 0.83, 0.84, 0.85, 0.86, 0.87, 0.88, 0.89, 0.9 , 0.91,
0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99, 1. , 1.01, 1.02,
1.03, 1.04, 1.05, 1.06, 1.07, 1.08, 1.09, 1.1 , 1.11, 1.12, 1.13,
1.14, 1.15, 1.16, 1.17, 1.18, 1.19, 1.2 , 1.21, 1.22, 1.23, 1.24,
1.25, 1.26, 1.27, 1.28, 1.29, 1.3 ])
In [445]: a_lims = [0.700006488869478, 1.2849292618191401]
In [446]: np.mgrid[a_lims[0]:a_lims[1]:1e-2]
Out[446]:
array([0.70000649, 0.71000649, 0.72000649, 0.73000649, 0.74000649,
0.75000649, 0.76000649, 0.77000649, 0.78000649, 0.79000649,
0.80000649, 0.81000649, 0.82000649, 0.83000649, 0.84000649,
0.85000649, 0.86000649, 0.87000649, 0.88000649, 0.89000649,
0.90000649, 0.91000649, 0.92000649, 0.93000649, 0.94000649,
0.95000649, 0.96000649, 0.97000649, 0.98000649, 0.99000649,
1.00000649, 1.01000649, 1.02000649, 1.03000649, 1.04000649,
1.05000649, 1.06000649, 1.07000649, 1.08000649, 1.09000649,
1.10000649, 1.11000649, 1.12000649, 1.13000649, 1.14000649,
1.15000649, 1.16000649, 1.17000649, 1.18000649, 1.19000649,
1.20000649, 1.21000649, 1.22000649, 1.23000649, 1.24000649,
1.25000649, 1.26000649, 1.27000649, 1.28000649])
In [447]: _444.shape
Out[447]: (61,)
In [449]: _446.shape
Out[449]: (59,)
mgrid when given ranges like a:b:c uses np.arange(a, b, c). arange when given float step is not reliable with regards to the end point.
mgrid lets you use np.linspace which is better for floating point steps. For example with the first set of limits:
In [453]: a_lims = [0.7, 1.3]
In [454]: np.mgrid[a_lims[0]:a_lims[1]:61j]
Out[454]:
array([0.7 , 0.71, 0.72, 0.73, 0.74, 0.75, 0.76, 0.77, 0.78, 0.79, 0.8 ,
0.81, 0.82, 0.83, 0.84, 0.85, 0.86, 0.87, 0.88, 0.89, 0.9 , 0.91,
0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99, 1. , 1.01, 1.02,
1.03, 1.04, 1.05, 1.06, 1.07, 1.08, 1.09, 1.1 , 1.11, 1.12, 1.13,
1.14, 1.15, 1.16, 1.17, 1.18, 1.19, 1.2 , 1.21, 1.22, 1.23, 1.24,
1.25, 1.26, 1.27, 1.28, 1.29, 1.3 ])
===
By narrowing the b_lims considerably, and generating a finer mesh, I get a nice tilted ellipse.
means = [ 1, 0]
a_lims = [0.7, 1.3]
b_lims = [-.0002,.0002]
dist = multivariate_normal(mean=means, cov=cov_matrix)
a_plot, b_plot = np.mgrid[ a_lims[0]:a_lims[1]:1001j, b_lims[0]:b_lims[1]:1001j]
So I think the difference in your plots is an artifact of an excessively coarse mesh in the vertical direction. That potentially affects both the pdf generation and the contouring.
High resolution plot with original grid points. Only one b level intersects with the high probability values. Since the ellipse is tilted the two grids sample different parts, and hence the seemingly different pdfs.

How do I grab random elements on python from paired lists?

I tried to compare drop height versus rebound height and have some data here:
drop_heights = [0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6, 1.7, 2.0]
rebound_heights = [0.16, 0.30, 0.46, 0.6, 0.74, 0.88, 1.02, 1.15, 1.34, 1.51]
I want to select 5 random data points off of these variables, so I tried
smol_drop_heights = []
smol_rebound_heights = []
for each in range(0,5):
smol_drop_heights.append(drop_heights[randint(0, 9)])
smol_rebound_heights.append(rebound_heights[randint(0, 9)])
print(smol_drop_heights)
print(smol_rebound_heights)
When they print, they print different sets of data, and sometimes even repeat data, how do I fix this?
[0.8, 1.6, 0.6, 0.2, 0.12]
[1.02, 1.15, 0.88, 0.88, 0.6]
Here is a sample output, where you can see .88 is repeated.
A simple way to avoid repetitions and keep the data points paired and randomly sort the pairs:
from random import random
drop_heights = [0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6, 1.7, 2.0]
rebound_heights = [0.16, 0.30, 0.46, 0.6, 0.74, 0.88, 1.02, 1.15, 1.34, 1.51]
pairs = list(sorted(zip(drop_heights, rebound_heights), key=lambda _: random()))[:5]
smol_drop_heights = [d for d, _ in pairs]
smol_rebound_heights = [r for _, r in pairs]
One way to do it would be:
drop_heights = [0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6, 1.7, 2.0]
rebound_heights = [0.16, 0.30, 0.46, 0.6, 0.74, 0.88, 1.02, 1.15, 1.34, 1.51]
indices = [*range(len(drop_heights))]
from random import shuffle
shuffle(indices)
smol_drop_heights = []
smol_rebound_heights = []
for each in indices:
smol_drop_heights.append(drop_heights[each])
smol_rebound_heights.append(rebound_heights[each])
print(smol_drop_heights)
print(smol_rebound_heights)
Output:
[1.7, 0.8, 1.6, 1.2, 0.2, 0.4, 1.4, 2.0, 1.0, 0.6]
[1.34, 0.6, 1.15, 0.88, 0.16, 0.3, 1.02, 1.51, 0.74, 0.46]
Or, much shorter:
from random import sample
drop_heights = [0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6, 1.7, 2.0]
rebound_heights = [0.16, 0.30, 0.46, 0.6, 0.74, 0.88, 1.02, 1.15, 1.34, 1.51]
paired = [*zip(drop_heights, rebound_heights)]
smol_drop_heights, smol_rebound_heights = zip(*sample(paired,5))
print(smol_drop_heights[:5])
print(smol_rebound_heights[:5])
Here"s what I would do.
import random
import numpy as np
k=5
drop_heights = np.array([0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6, 1.7, 2.0])
rebound_heights = np.array([0.16, 0.30, 0.46, 0.6, 0.74, 0.88, 1.02, 1.15, 1.34, 1.51])
idx = random.sample(range(len(drop_heights )), k)
print(drop_heights[idx])
print(rebound_heights [idx])
You could try shuffling and then use the index of the original items like,
>>> drop_heights = [0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6, 1.7, 2.0]
>>> rebound_heights = [0.16, 0.30, 0.46, 0.6, 0.74, 0.88, 1.02, 1.15, 1.34, 1.51]
>>>
>>> import random
>>> d = drop_heights[:] # keep a copy to get index for making pairs later
>>> random.shuffle(drop_heights)
>>> # iterate through the new list and get the index of the item
>>> # from the original lists
>>> nd, nr = zip(*[(x,rebound_heights[d.index(x)]) for x in drop_heights])
>>> nd[:5]
(1.4, 0.6, 1.7, 0.2, 1.0)
>>> nr[:5]
(1.02, 0.46, 1.34, 0.16, 0.74)
or just use operator.itemgetter and random.sample like,
>>> drop_heights = [0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6, 1.7, 2.0]
>>> rebound_heights = [0.16, 0.30, 0.46, 0.6, 0.74, 0.88, 1.02, 1.15, 1.34, 1.51]
>>>
>>> import random, operator
>>> indexes = random.sample(range(len(drop_heights)), 5)
>>> indexes
[5, 0, 4, 7, 3]
>>> f = operator.itemgetter(*indexes)
>>> f(drop_heights)
(1.2, 0.2, 1.0, 1.6, 0.8)
>>> f(rebound_heights)
(0.88, 0.16, 0.74, 1.15, 0.6)
Your problem is that when you call randint, it gives a different random number each time. To solve this you would need to save an index variable, to a random number, each time the code loops, so that you add the same random variable each time.
for each in range(0, 4):
index = randint(0, 9)
smol_drop_heights.append(drop_heights[index])
smol_rebound_heights.append(rebound_heights[index])
print(smol_drop_heights)
print(smol_rebound_heights)
To solve the problem about repeats, just check if the lists already have the variable you want to add, you could do it with either variable, as neither of them have repeats in them, and since there may be repeats, a for loop will not be sufficient, so you will have to repeat until the lists are full.
So my final solution is:
while True:
index = randint(0, 9)
if drop_heights[index] not in smol_drop_heights:
smol_drop_heights.append(drop_heights[index])
smol_rebound_heights.append(rebound_heights[index])
if len(smol_drop_heights) == 4:
break
print(smol_drop_heights)
print(smol_rebound_heights)
And since you may want to arrange those value in order, you may do this:
smol_drop_heights = []
smol_rebound_heights = []
while True:
index = randint(0, 9)
if drop_heights[index] not in smol_drop_heights:
smol_drop_heights.append(drop_heights[index])
smol_rebound_heights.append(rebound_heights[index])
if len(smol_drop_heights) == 4:
smol_drop_heights.sort()
smol_rebound_heights.sort()
break
print(smol_drop_heights)
print(smol_rebound_heights)
Ok, you want to do two things, pair your lists. The idiomatic way to do this is to use zip:
drop_heights = [0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6, 1.7, 2.0]
rebound_heights = [0.16, 0.30, 0.46, 0.6, 0.74, 0.88, 1.02, 1.15, 1.34, 1.51]
paired = list(zip(drop_heights, rebound_heights))
Then, you want to sample five pairs from this. So use random.sample:
sampled = random.sample(paired, 5)
Finally, if you need them to be in seperate lists (you probably don't, but if you must), you can unpack it like this:
smol_drop_heights, smol_rebound_heights = zip(*sampled)
You can actually just do this in all at once, although it might become a bit unreadable:
smol_drop_heights, smol_rebound_heights = zip(*random.sample(list(zip(drop_heights, rebound_heights)), 5))

Creating list of five elements from a List of Number in Python

i'm using Python 3.7.
I have a tuple of Numbers like this:
x = ((1,2,3,4,5,6,7,8,9....etc))
I would Like to obtain a list of list divided by 100 and with Five numbers from the list in an iterative way... something like this:
[[[0.0], [0.01], [0.02], [0.03], [0.04]],
[[0.01], [0.02], [0.03], [0.04], [0.05]],
[[0.02], [0.03], [0.04], [0.05], [0.06]],
[[0.03], [0.04], [0.05], [0.06], [0.07]],
[[0.04], [0.05], [0.06], [0.07], [0.08]],
[[0.05], [0.06], [0.07], [0.08], [0.09]],... etc
I Tried this but it doesn't work properly:
Data = [[[(interest_over_time_data+j)/100] for
interest_over_time_data in range(5)]for j in
interest_over_time_data]
The real numbers are not a list of consecutive number so I cannot add +1 to each element...
Thank you in advance!
You want a list of lists, that calls for a double list comprehension.
You want sliding windows, that calls for slicing, better done with itertools.islice
this code below creates 5 sliding sublists with 100 division.
import itertools
x = (1,2,3,4,5,6,7,8,9)
result = [[v/100.0 for v in itertools.islice(x,start,start+5)] for start in range(6)]
result:
[[0.01, 0.02, 0.03, 0.04, 0.05],
[0.02, 0.03, 0.04, 0.05, 0.06],
[0.03, 0.04, 0.05, 0.06, 0.07],
[0.04, 0.05, 0.06, 0.07, 0.08],
[0.05, 0.06, 0.07, 0.08, 0.09],
[0.06, 0.07, 0.08, 0.09]]
You can use 3rd party NumPy for an array-based solution:
import numpy as np
first_row = np.arange(5) / 100
first_col = np.arange(10) / 100
res = first_row + first_col[:, None]
array([[ 0. , 0.01, 0.02, 0.03, 0.04],
[ 0.01, 0.02, 0.03, 0.04, 0.05],
[ 0.02, 0.03, 0.04, 0.05, 0.06],
[ 0.03, 0.04, 0.05, 0.06, 0.07],
[ 0.04, 0.05, 0.06, 0.07, 0.08],
[ 0.05, 0.06, 0.07, 0.08, 0.09],
[ 0.06, 0.07, 0.08, 0.09, 0.1 ],
[ 0.07, 0.08, 0.09, 0.1 , 0.11],
[ 0.08, 0.09, 0.1 , 0.11, 0.12],
[ 0.09, 0.1 , 0.11, 0.12, 0.13]])
Love one line solutions:
[[[x[p]/100] for p in range(k,k+5)] for k in range(len(x)-4)]
#>[[[0.01], [0.02], [0.03], [0.04], [0.05]],
#> [[0.02], [0.03], [0.04], [0.05], [0.06]],
#> [[0.03], [0.04], [0.05], [0.06], [0.07]],
#> [[0.04], [0.05], [0.06], [0.07], [0.08]],
#> [[0.05], [0.06], [0.07], [0.08], [0.09]]]
Okay, you want your output containing lists with a length of 5 shifting from the first element of x to the last. Therefore your output will contain n-4 lists, where n is len(x).
So first we need to iterate over range(len(x)-4)
Then we want five elements from x starting at a given offset i. We can use slicing for this, e.g. x[i:i+5].
And we want all elements of this sublist divided by 100.
All together packed in list comprehension it looks like this:
x = (1,2,3,4,5,6,7,8,9)
res = [
[j/100.0 for j in x[i:i+5]]
for i in range(len(x)-4)
]
print(res)
Which results in
[[0.01, 0.02, 0.03, 0.04, 0.05],
[0.02, 0.03, 0.04, 0.05, 0.06],
[0.03, 0.04, 0.05, 0.06, 0.07],
[0.04, 0.05, 0.06, 0.07, 0.08],
[0.05, 0.06, 0.07, 0.08, 0.09]]
Or if you want to have 0.0 as in your example output:
x = (1,2,3,4,5,6,7,8,9)
x = [0] + list(x)
res = [
[j/100.0 for j in x[i:i+5]]
for i in range(len(x)-4)
]
print(res)

is there a way to normalize vectors with different input size with numpy

The following function tries to normalize 3D vectors
def my_norm(v):
"""
#type v: Nx3 numpy array
"""
return v / numpy.linalg.norm(v, axis=1)[:, None]
It works when N > 1. For N=1, I got ValueError: 'axis' entry is out of bounds. I can do the following check to deal with both cases, but I wonder if there is a cleaner way?
def my_norm(v):
"""
#type v: Nx3 numpy array
"""
if len(v) == 1:
return v / numpy.linalg.norm(v)
return v / numpy.linalg.norm(v, axis=1)[:, None]
Use axis=-1 and keep the dimensions with keepdims=True -
v/np.linalg.norm(v, axis=-1,keepdims=True)
Sample runs
1D Case :
In [61]: v = np.random.rand(6)
In [62]: v/np.linalg.norm(v)
Out[62]: array([ 0.22, 0.1 , 0.28, 0.58, 0.64, 0.33])
In [63]: v/np.linalg.norm(v, axis=-1,keepdims=True)
Out[63]: array([ 0.22, 0.1 , 0.28, 0.58, 0.64, 0.33])
2D Case :
In [58]: v = np.random.rand(4,6)
In [59]: v / np.linalg.norm(v, axis=1)[:, None]
Out[59]:
array([[ 0.53, 0.04, 0.38, 0.21, 0.58, 0.43],
[ 0.49, 0.4 , 0.02, 0.56, 0.38, 0.38],
[ 0.05, 0.49, 0.45, 0.18, 0.54, 0.47],
[ 0.45, 0.61, 0.19, 0.1 , 0.14, 0.61]])
In [60]: v/np.linalg.norm(v, axis=-1,keepdims=True)
Out[60]:
array([[ 0.53, 0.04, 0.38, 0.21, 0.58, 0.43],
[ 0.49, 0.4 , 0.02, 0.56, 0.38, 0.38],
[ 0.05, 0.49, 0.45, 0.18, 0.54, 0.47],
[ 0.45, 0.61, 0.19, 0.1 , 0.14, 0.61]])

Categories

Resources