I cannot understand for the life of me why a row operation with NumPy just clearly leads to the wrong answer. The correct answer is in the SymPy matrix. Can anyone tell me why NumPy is unable to perform the correct calculation? I'm going crazy. Thank you!
# simplex tableau
import numpy as np
import sympy as sp
#NumPy
simplex = np.array([[2,4,3,1,0,0,0, 400],
[4,1,1,0,1,0,0, 200],
[7,4,4,0,0,1,0, 800],
[-3,-4,-2,0,0,0,1, 0]])
simplex[1,:] = simplex[1,:] - (1/4)*simplex[0,:]
print(simplex)
#SymPy
simplex = sp.Matrix([[2,4,3,1,0,0,0, 400],
[4,1,1,0,1,0,0, 200],
[7,4,4,0,0,1,0, 800],
[-3,-4,-2,0,0,0,1, 0]])
simplex[1,:] = simplex[1,:] - (1/4)*simplex[0,:]
simplex
Numpy:
[[ 2 4 3 1 0 0 0 400]
[ 3 0 0 0 1 0 0 100]
[ 7 4 4 0 0 1 0 800]
[ -3 -4 -2 0 0 0 1 0]]
Sympy:
Matrix([
[ 2, 4, 3, 1, 0, 0, 0, 400],
[3.5, 0, 0.25, -0.25, 1, 0, 0, 100.0],
[ 7, 4, 4, 0, 0, 1, 0, 800],
[ -3, -4, -2, 0, 0, 0, 1, 0]])
Your NumPy array has an integer dtype. It literally can't hold floating-point numbers. Give it a floating-point dtype:
simplex = np.array(..., dtype=float)
Related
I have to insert a small matrix into a big matrix (zeros matrix), I was trying through a loop, but every time I am getting the value error: could not broadcast the input array from the shape (6,6) into shape (4,4)
there are two issues:-
how to insert it into the zeros matrix. (specifying the location into the big zeros matrix).
how to put that matrix, from the 23rd row of the 40*40 zeroes matrix.
import numpy as np
ndofs = 39
k = np.array( [ [ 1, 0, 1, 0, 0, 0 ],
[ 0, 12, 6, 0, -12, 6 ],
[ 0, 6 , 4, 0, -6, 2 ],
[ 1, 0, 0, 1, 0, 0 ],
[ 0, -12, -6, 0, 12, 6 ],
[ 0, 6, 2, 0, -6, 4 ] ] )
K = np.zeros((ndofs+1,ndofs+1))
print(K.shape)
# for each element, changes to global coordinates
for i in range(ndofs):
K_temp = np.zeros((ndofs+1,ndofs+1))
K_temp[3*i:3*i+6, 3*i:3*i+6] = k
K += K_temp
print(K)
you just overwrite the indexes in the bigger array...
a = numpy.zeros((50,50))
b = numpy.ones((10,10))
a[2:12,2:12] = b # insert b at 2,2
Apologies because I asked a similar question yesterday but I feel my question lacked content, hopefully now it will be easier to understand.
I have a symmetric matrix with pairwise distances between individuals (see below), and I want to cluster groups of individuals in a way that all members of a cluster will have pairwise distances of zero. I have applied scipy.cluster.hierarchy using different linkage methods and clustering criteria for this but I don't get my expected results. In the example below I would argue that ind5 shouldn't be part of the cluster #1 because it's distance to ind9 is 1 and not 0.
from scipy.cluster.hierarchy import linkage, fcluster
from scipy.spatial.distance import squareform
import numpy as np
import pandas as pd
df = pd.read_csv(infile1, sep = '\t', index_col = 0)
print(df)
ind1 ind2 ind3 ind4 ind5 ind6 ind7 ind8 ind9
ind1 0 29 27 1 2 1 2 1 1
ind2 29 0 2 30 31 29 31 30 30
ind3 27 2 0 28 29 27 29 28 28
ind4 1 30 28 0 0 0 1 2 0
ind5 2 31 29 0 0 0 2 2 1
ind6 1 29 27 0 0 0 1 2 0
ind7 2 31 29 1 2 1 0 3 1
ind8 1 30 28 2 2 2 3 0 2
ind9 1 30 28 0 1 0 1 2 0
X = squareform(df.to_numpy())
print(X)
[29 27 1 2 1 2 1 1 2 30 31 29 31 30 30 28 29 27 29 28 28 0 0 1
2 0 0 2 2 1 1 2 0 3 1 2]
Z = linkage(X, 'single')
print(Z)
[[ 3. 4. 0. 2.]
[ 5. 9. 0. 3.]
[ 8. 10. 0. 4.]
[ 0. 11. 1. 5.]
[ 6. 12. 1. 6.]
[ 7. 13. 1. 7.]
[ 1. 2. 2. 2.]
[14. 15. 27. 9.]]
max_d = 0
clusters = fcluster(Z, max_d, criterion='distance')
sample_list = df.index.to_list()
clust_name_list = clusters.tolist()
result = pd.DataFrame({'Inds': sample_list, 'Clusters': clust_name_list})
print(result)
Inds Clusters
0 ind1 2
1 ind2 5
2 ind3 6
3 ind4 1
4 ind5 1
5 ind6 1
6 ind7 3
7 ind8 4
8 ind9 1
I was hoping that anybody more familiar with these methods could advice whether there is any linkage method that would exclude from the cluster any element (in this case ind5) with distance > 0 to at least one of the other elements in the cluster.
Thanks for your help!
Gonzalo
You can reinterpret your problem as the problem finding cliques in a graph. The graph is obtained from your distance matrix by interpreting a distance of 0 as creating an edge between two nodes. Once you have the graph, you can use networkx (or some other graph theory library) to find the cliques in the graph. The cliques in the graph will be the sets of nodes in which all the pairwise distances in the clique are 0.
Here is your distance matrix (but note that your distances do not satisfy the triangle inequality):
In [136]: D
Out[136]:
array([[ 0, 29, 27, 1, 2, 1, 2, 1, 1],
[29, 0, 2, 30, 31, 29, 31, 30, 30],
[27, 2, 0, 28, 29, 27, 29, 28, 28],
[ 1, 30, 28, 0, 0, 0, 1, 2, 0],
[ 2, 31, 29, 0, 0, 0, 2, 2, 1],
[ 1, 29, 27, 0, 0, 0, 1, 2, 0],
[ 2, 31, 29, 1, 2, 1, 0, 3, 1],
[ 1, 30, 28, 2, 2, 2, 3, 0, 2],
[ 1, 30, 28, 0, 1, 0, 1, 2, 0]])
Convert the distance matrix to the adjacency matrix A:
In [137]: A = D == 0
In [138]: A.astype(int) # Display as integers for a more compact output.
Out[138]:
array([[1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 1, 1, 0, 0, 1],
[0, 0, 0, 1, 1, 1, 0, 0, 0],
[0, 0, 0, 1, 1, 1, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 1, 0, 1, 0, 0, 1]])
Create a networkx graph G, and find the cliques with nx.find_cliques:
In [139]: import networkx as nx
In [140]: G = nx.Graph(A)
In [141]: cliques = nx.find_cliques(G)
In [142]: list(cliques)
Out[142]: [[0], [1], [2], [3, 5, 8], [3, 5, 4], [6], [7]]
(The values in the lists are the indices; e.g. the clique [2] corresponds to the set of labels ['ind3'].)
Note that there are two nontrivial cliques, [3, 5, 8] and [3, 5, 4], and 3 and 5 occur in both. This is a consequence of your distances having this anomalous data: distance(ind5, ind4) = 0, and distance(ind4, ind9) = 0, but distance(ind5, ind9) = 1 (i.e. the triangle inequality is not satisfied). So, by your definition of a "cluster", there are two possible nontrivial clusters: [ind4, ind5, ind9] or [ind4, ind5, ind6].
Finally, note the warning in the networkx documentation: "Finding the largest clique in a graph is NP-complete problem, so most of these algorithms have an exponential running time". If your distance matrix is large, this calculation could take a very long time!
Your solution is correct!
You are getting the following clusters:
cluster 1 with elements ind4, ind5, ind6 and ind9 (at distance 0 from each other).
cluster 2 with element ind1
cluster 3 with element ind7
cluster 4 with element ind8
cluster 5 with element ind2
cluster 6 with element ind3
Only the elements at distance 0 are clustered together in cluster 1, as you require. Clusters 2 to 6 are degenerate clusters, with a single isolated element.
Let's modify the distances so that more proper clusters are created:
X = np.array([ 0, 27, 1, 2, 1, 2, 1, 1,
2, 30, 31, 29, 31, 30, 30,
28, 29, 27, 29, 28, 28,
0, 0, 1, 2, 0,
0, 2, 2, 1,
1, 2, 0,
0, 1,
2])
Z = linkage(X, 'single')
max_d = 0
clusters = fcluster(Z, max_d, criterion='distance')
print("Clusters:", clusters)
for cluster_id in np.unique(clusters):
members = np.where(clusters == cluster_id)[0]
print(f"Cluster {cluster_id} has members {members}")
Getting:
Clusters: [2 2 4 3 3 3 1 1 3]
Cluster 1 has members [6 7]
Cluster 2 has members [0 1]
Cluster 3 has members [3 4 5 8]
Cluster 4 has members [2]
I am attempting to convert a sum of absolute deviations to a linear programming problem so that I can utilize CPLEX (or other solver). I am stuck on how the matrices are to be set up. The problem is as follows:
minimize abs(x1 - 5) + abs(x2 - 3)
s.t. x1 + x2 = 10
I have the following constraints set up to transform the problem into a linear form:
x1 - 5 <= t1
-(x1 - 5) <= t1 and
x2 - 3 <= t2
-(x2 - 3) <= t2
I've set up the objective function as
c = [0,0,1,1]
but I am lost on how to set up
Ax <= b
in matrix form. What I have so far is:
A = [[ 1, -1, 0, 0],
[-1, -1, 0, 0],
[ 0, 0, 1,-1],
[ 0, 0,-1,-1]]
b = [ 5, -5, 3,-3]
I have set up the other constraint in matrix for as:
B = [1, 1, 0, 0]
b2 = [10]
When I run the following:
linprog(c,A_ub=A,b_ub=b,A_eq=B,b_eq=b2,bounds=[(0,None),(0,None)])
I get the following error message back:
ValueError: Invalid input for linprog: A_eq must have exactly two dimensions, and the number of columns in A_eq must be equal to the size of c
I know there is a solution because when I use scipy.optimize.minimize it solves to [6,4]. I'm sure the issue is I am not formulating the input matrices correctly but I am not sure how to set them up so that it runs.
Edit - here is the code that does not run:
import numpy as np
from scipy.optimize import linprog, minimize
c = np.block([np.zeros(2),np.ones(2)])
print("c =>",c)
A = [[ 1, -1, 0, 0],
[-1, -1, 0, 0],
[ 0, 0, 1,-1],
[ 0, 0,-1,-1]]
b = [[ 5, -5, 3,-3]]
print(A)
print(np.multiply(A,b))
B = [ 1, 1, 0, 0]
b2 = [10]
print(np.multiply(B,b2))
linprog(c,A_ub=A,b_ub=b,A_eq=B,b_eq=b2,bounds=[(0,None),(0,None)],
options={'disp':True})
I think the message is quite good. B should be 2-dimensional matrix instead of a 1-dimensional vector. So:
B = [[1, 1, 0, 0]]
Secondly, the bounds array is too short.
Thirdly, your ordering of variables is inconsistent. The columns in A are x1,t1,x2,t2 while the columns in B (and c) seem to be x1,x2,t1,t2. They need to follow the same scheme.
So I'm trying to generate a list of possible adjacent movements within a 3d array (preferebly n-dimensional).
What I have works as it's supposed to, but I was wondering if there's a more numpythonic way to do so.
def adjacents(loc, bounds):
adj = []
bounds = np.array(bounds) - 1
if loc[0] > 0:
adj.append((-1, 0, 0))
if loc[1] > 0:
adj.append((0, -1, 0))
if loc[2] > 0:
adj.append((0, 0, -1))
if loc[0] < bounds[0]:
adj.append((1, 0, 0))
if loc[1] < bounds[1]:
adj.append((0, 1, 0))
if loc[2] < bounds[2]:
adj.append((0, 0, 1))
return np.array(adj)
Here are some example outputs:
adjacents((0, 0, 0), (10, 10, 10))
= [[1 0 0]
[0 1 0]
[0 0 1]]
adjacents((9, 9, 9), (10, 10, 10))
= [[-1 0 0]
[ 0 -1 0]
[ 0 0 -1]]
adjacents((5, 5, 5), (10, 10, 10))
= [[-1 0 0]
[ 0 -1 0]
[ 0 0 -1]
[ 1 0 0]
[ 0 1 0]
[ 0 0 1]]
Here's an alternative which is vectorized and uses a constant, prepopulated array:
# all possible moves
_moves = np.array([
[-1, 0, 0],
[ 0,-1, 0],
[ 0, 0,-1],
[ 1, 0, 0],
[ 0, 1, 0],
[ 0, 0, 1]])
def adjacents(loc, bounds):
loc = np.asarray(loc)
bounds = np.asarray(bounds)
mask = np.concatenate((loc > 0, loc < bounds - 1))
return _moves[mask]
This uses asarray() instead of array() because it avoids copying if the input happens to be an array already. Then mask is constructed as an array of six bools corresponding to the original six if conditions. Finally, the appropriate rows of the constant data _moves are returned.
But what about performance?
The vectorized approach above, while it will appeal to some, actually runs only half as fast as the original. If it's performance you're after, the best simple change you can make is to remove the line bounds = np.array(bounds) - 1 and subtract 1 inside each of the last three if conditions. That gives you a 2x speedup (because it avoids creating an unnecessary array).
Suppose I have an 2D numpy array a=[[1,-2,1,0], [1,0,0,-1]], but I want to convert it to an 3D numpy array by element-wise multiply a vector t=[[x0,x0,x0,x0],[x1,x1,x1,x1]] where xi is a 1D numpy array with 3072 size. So the result would be a*t=[[x0,-2x0,x0,0],[x1,0,0,-x1]] with the size (2,4,3072). So how should I do that in Python numpy?
Code:
import numpy as np
# Example data taken from bendl's answer !!!
a = np.array([[1,-2,1,0], [1,0,0,-1]])
xi = np.array([1, 2, 3])
b = np.outer(a, xi).reshape(a.shape[0], -1, len(xi))
print('a:')
print(a)
print('b:')
print(b)
Output:
a:
[[ 1 -2 1 0]
[ 1 0 0 -1]]
b:
[[[ 1 2 3]
[-2 -4 -6]
[ 1 2 3]
[ 0 0 0]]
[[ 1 2 3]
[ 0 0 0]
[ 0 0 0]
[-1 -2 -3]]]
As i said: it looks like an outer-product and splitting/reshaping this one dimension is easy.
You can use numpy broadcasting for this:
a = numpy.array([[1, -2, 1, 0], [1, 0, 0, -1]])
t = numpy.arange(3072 * 2).reshape(2, 3072)
# array([[ 0, 1, 2, ..., 3069, 3070, 3071], # = x0
# [3072, 3073, 3074, ..., 6141, 6142, 6143]]) # = x1
a.shape
# (2, 4)
t.shape
# (2, 3072)
c = (a.T[None, :, :] * t.T[:, None, :]).T
# array([[[ 0, 1, 2, ..., 3069, 3070, 3071], # = 1 * x0
# [ 0, -2, -4, ..., -6138, -6140, -6142], # = -2 * x0
# [ 0, 1, 2, ..., 3069, 3070, 3071], # = 1 * x0
# [ 0, 0, 0, ..., 0, 0, 0]], # = 0 * x0
#
# [[ 3072, 3073, 3074, ..., 6141, 6142, 6143], # = 1 * x1
# [ 0, 0, 0, ..., 0, 0, 0], # = 0 * x1
# [ 0, 0, 0, ..., 0, 0, 0], # = 0 * x1
# [-3072, -3073, -3074, ..., -6141, -6142, -6143]]]) # = -1 * x1
c.shape
# (2, 4, 3072)
Does this do what you need?
import numpy as np
a = np.array([[1,-2,1,0], [1,0,0,-1]])
xi = np.array([1, 2, 3])
a = np.dstack([a * i for i in xi])
The docs for this are here:
https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.dstack.html