List Comprehensions to create pairwise dissimilarity - python

I'm not familiar with list comprehensions but I would like to compute the bray-curtis dissimilarity using list comprehensions. The dissimilarity is given by
def bray(x):
bray_diss = np.zeros((x.shape[0], x.shape[0]))
for i in range(0, bray_diss.shape[0]):
bray_diss[i,i] = 0
for j in range(i+1, bray_diss.shape[0]):
l1_diff = abs(x[i,:] - x[j,:])
l1_sum = x[i,:] + x[j,:] + 1
bray_diss[i,j] = l1_diff.sum() / l1_sum.sum()
bray_diss[j,i] = bray_diss[i,j]
return bray_diss
I tryed something like :
def bray(x):
[[((abs(x[i,:] - x[j,:])).sum() / (x[i,:] + x[j,:] + 1).sum()) for j in range(0, x.shape[0])] for i in range(0, x.shape[0])]
without succes, and I can't figure out what is wrong! Moreover, in the first implementation, the second loop is not performed on all the matrix row values to save computation time, how is it possible to do it with list comprehension ?
Thanks !

You won't gain anything wxith a list comprehension... except a better comprehension of list comprehensions!
What you have to understand is that list comprehension is a functional concept. I will not go in functional programming detail,
but you have to keep in mind that functional programming forbids side effects. An example:
my_matrix = np.zeros(n, n)
for i in range(n):
for j in range(n):
my_matrix[i,j] = value_of_cell(i,j)
The last line is a side effect: you modifiy the state of my_matrix. In contrast, a side effect free version would do:
np.array([[value_of_cell(i,j) for j in range(n)] for i in range(n)])
You don't have the "create-then-assign" sequence: you create the matrix by declaring the values at each position. More precisely, to create a matrix:
you have to declare a value for every cell;
when you are given the pair (i,j), you can't use it to declare the value of another cell (e.g. (j,i))
(If you need to transform the matrix later, you have to recreate it. That's why this method may be expensive -- in time and space.)
Now, take look at your code. When you write a list comprehension, a good rule of thumb is to use auxiliary functions as they help to clean the code (we don't try to create a one-liner here):
def bray(x):
n = x.shape[0] # cleaner than to repeat x.shape[0] everywhere
def diss(i,j): # I hope it's correct
l1_diff = abs(x[i,:] - x[j,:])
l1_sum = x[i,:] + x[j,:] + 1
return l1_diff.sum() / l1_sum.sum()
bray_diss = np.zeros((n, n))
for i in range(n): # range(n) = range(0,n)
# bray_diss[i,i] = 0 <-- you don't need to set it to zero here
for j in range(i+1, n):
bray_diss[i,j] = diss(i,j)
bray_diss[j,i] = bray_diss[i,j]
return bray_diss
That's cleaner. What is the next step? In the code above, you choose to iterate over j that are greater than i and to set two values at once. But in a list comprehension, you don't choose the cells: the list comprehension gives you, for each cell, the coordinates and you have to declare the values.
First, let's try to set only one value per iteration, that is to use two loops:
def bray(x):
...
bray_diss = np.zeros((n, n))
for i in range(n):
for j in range(i+1, n):
bray_diss[i,j] = inner(i,j)
for i in range(n):
for j in range(i):
bray_diss[i,j] = bray_diss[j,i]
return bray_diss
That's better. Second, we need to assign a value to every cell of the matrix, not just prefill with zeroes and choose the cells we wan't to update:
def bray(x):
...
bray_diss = np.zeros((n, n))
for i in range(n):
for j in range(n):
if j>i: # j in range(i+1, n)
bray_diss[i,j] = inner(i,j) # top right corner
else # j in range(i+1)
bray_diss[i,j] = 0. # zeroes in the bottom left corner + diagonal
for i in range(n):
for j in range(n):
if j<i: # j in range(i)
bray_diss[i,j] = bray_diss[j,i] # fill the bottom left corner now
else # j in range(i, n)
bray_diss[i,j] = bray_diss[i,j] # top right corner + diagonal is already ok
return bray_diss
A short version would be, using the "fake ternary conditional operator" of Python:
def bray(x):
...
bray_diss = np.zeros((n, n))
for i in range(n):
for j in range(n):
bray_diss[i,j] = inner(i,j) if j>i else 0.
for i in range(n):
for j in range(n):
bray_diss[i,j] = bray_diss[j,i] if j<i else bray_diss[i,j]
return bray_diss
Now we can turn this into list comprehensions:
def bray(x):
...
bray_diss_top_right = np.array([[diss(i,j) if j>i else 0. for j in range(n)] for i in range(n)])
bray_diss = np.array([[bray_diss_top_right[j,i] if j<i else bray_diss_top_right[i,j] for j in range(n)] for i in range(n)])
return bray_diss
And, if I'm not wrong, it is even more simple like this (final version):
def bray(x):
n = x.shape[0]
def diss(i,j):
l1_diff = abs(x[i,:] - x[j,:])
l1_sum = x[i,:] + x[j,:] + 1
return l1_diff.sum() / l1_sum.sum()
bray_diss_top_right = np.array([[diss(i,j) if j>i else 0. for j in range(n)] for i in range(n)])
return bray_diss_top_right + bray_diss_top_right.transpose()
Note that this version is probably (I didn't measure) slower than yours, but the way the matrix is built is, in my opinion, easier to grasp.

Related

Swapping columns of a matrix without numpy

I need to swap the columns of a given matrix by applying some function to (M,i,j) to swap the ith and jth columns of M.
def swap_columns(M,i,j)
rows = M[i]
col = M[i][j]
for i in M:
N = M[i][j]
M[i][j] = M[j][i]
M[j][i] = N
return N
Unable to get this working at all.
In python variable swapping can be done by: x, y = y, x
Code:
This function will modify the original matrix. No need to return
def col_swapper(matrix, col_1, col_2):
for line in range(len(matrix)):
matrix[line][col_1], matrix[line][col_2] = matrix[line][col_2], matrix[line][col_1]
You could do it (in-place) using a for-loop to perform the swapping through unpacking of the matrix's rows:
def swapCols(M,c1,c2):
for row,row[c2],row[c1] in ((row,row[c1],row[c2]) for row in M):pass
you can use this function:
import copy
def my_swap(M, i, j):
temp = copy.deepcopy(M)
for k in range(len(M)):
temp[k][i] = M[k][j]
temp[k][j] = M[k][i]
return temp

How to find absolute difference of list and use those values in maximization optimization model guobipy?

Following is my code where I try absolute operations of two lists based on some conditions and then maximize the summation of those.
m=[5,3,2]
cm=[sum(m[0:x:1]) for x in range(1, len(m)+1)]
P=len(m)
p = range(len(m))
N=sum(m)
n = range(N)
sett=[0 for i in p]
for i in p:
if(i==0):
sett[i]=range(0,cm[i])
else:
sett[i]= range(cm[i-1],cm[i])
model for optimization
with grb.Env() as env, grb.Model(env=env) as o:
o = grb.Model()
o.Params.LogToConsole =0
o.Params.OutputFlag =0
x = {}
for i in n:
for j in n:
x[i,j] = o.addVar(vtype=grb.GRB.BINARY, name='x'+str(i)+'_'+str(j))
c = {}
for j in n:
c[j] = o.addVar(vtype=grb.GRB.INTEGER, name='c'+str(j))
difc = {}
for j in n:
difc[j] = o.addVar(vtype=grb.GRB.INTEGER, name='difc'+str(j))
adifc = {}
for j in n:
adifc[j] = o.addVar(vtype=grb.GRB.INTEGER, name='adifc'+str(j))
sc = {}
for i in p:
sc[i]=o.addVar(vtype=grb.GRB.CONTINUOUS, name='sc'+str(i))
z=0
for i in p:
for j in range(0,m[i]):
o.addConstr((grb.quicksum(x[z,k] for k in range(int(((j)*N/m[i])),int(((j+1)*N/m[i])))) == 1))
z=z+1
for j in n:
o.addConstr((grb.quicksum(x[i,j] for i in n)) == 1)
z=0
for j in n:
o.addConstr(c[j]== grb.quicksum((j+1)* x[z,j] for j in range(0,N)))
z=z+1
z=0
for i in p:
for j in range(0,m[i]):
o.addConstr( difc[z]== c[z]-m[i] )
z=z+1
for j in n:
o.addConstr(adifc[j]== abs_(difc[j]))
for i in p:
o.addConstr((sc[i] == (sum((adifc[z]) for z in sett[i]))))
objective =(grb.quicksum(sc[i] for i in p))
o.ModelSense = grb.GRB.MAXIMIZE
o.setObjective(objective)
o.update()
o.write('mymodel.lp')
o.write('mymodel.mps')
t1=process_time()
o.optimize()
t2=process_time()
o.computeIIS()
o.write("infeasibility_.ilp")
print(o.objVal)
All the above constraints except the difference of list and absolute of the difference constraint works fine.
The model becomes infeasible only when I add the difference and absolute constraints to the model.
Infeasiblity occurs at some point which shouldn't occur. What I did wrong?
difc[j] = o.addVar(vtype=grb.GRB.INTEGER, name='difc'+str(j))
By default, this is a non-negative variable (see https://www.gurobi.com/documentation/9.1/refman/py_model_addvar.html).
This is likely not correct, as you use it to model a difference and take the absolute value.
o.addConstr( difc[z]== c[z]-m[i] )
o.addConstr(adifc[j]== abs_(difc[j]))
You probably should make difc a free variable. And there is no reason to make it discrete.
You need to be very careful implementing models and understand fully everything. Remember that we solve systems of equations. Any mistake somewhere can create havoc. Much more so than in normal sequential programming. So extra care is warranted. Sloppy programming will be punished.

How can I optimize the following Python code, to prevent time exeption?

Everybody. I wrote the following code. Please help me, to optimize this, when I submit in some test cases compiler writing time-limit-exceeded 2.069s / 13.33Mb.
import math
N = int(input())
arr = [None]*N; new_list = []
stepen = 0; res = .0;
arr = input().split(" ")
arr = [float(h) for h in arr]
Q = int(input())
for j in range(Q):
x, y = input().split()
new_list.extend([int(x), int(y)])
for i, j in zip(new_list[0::2], new_list[1::2]):
stepen = (j - i)+ 1
res = math.prod(arr[i:j+1])
print(pow(res, 1./stepen))
The slowest thing in your algorithm is the math.prod(arr[i:j+1]). If all the x and y inputs denote the entire range, you will surely TLE, as the calls to prod must loop over the entire range.
In order to avoid this, you must do a prefix product on your array. The idea is this: Keep a second array pref, with the property that pref[i] = arr[i] * pref[i-1]. As a result, pref[i] will be the product of everything at the ith position and before in arr.
Then to find the product between positions i and j, you want pref[j] / pref[i-1]. See if you can figure out why this gives the correct answer.

How can I create a Triadiagonal Matrix on Python, without numpy and with user's input?

I got a homework for numerical analysis. I need to create a matrix, it needs to be tridiagonal, code will ask for how many coulmns there will be, and what will be de values. And I shouldn't use numpy etc.
I researched it a lot, but no answers. Still trying to write it. Looking for some help. Thanks!
I did it only for a square matrix (dimensions: (n+1,n+1)). In the following code, a is the diagonal, b the diagonal above and c the diagonal under.
def tridiagonal(n,a,b,c):
M = [ [0 for j in range(n)] for i in range(n) ]
for k in range(n):
M[k][k] = a
for k in range(n-1):
M[k][k+1] = b
M[k+1][k]=c
return(M)
EDIT: The 2 loops can be grouped into one:
def tridiagonal(n,a,b,c):
M = [ [0 for j in range(n)] for i in range(n) ]
for k in range(n-1):
M[k][k] = a
M[k][k+1] = b
M[k+1][k]=c
M[n-1][n-1]=a
return(M)
Reading your comment and seeing your example in the link, here another code, where L1, L2 and L3 are the 3 lists representing the diagonal, the diagonal above and the diagonal under:
def tridiagonal(n,L1,L2,L3):
M = [ [0 for j in range(n)] for i in range(n) ]
for k in range(n-1):
M[k][k] = L1[k]
M[k][k+1] = L2[k]
M[k+1][k]= L3[k]
M[n-1][n-1]= L1[n-1]
return(M)
You just have to define the 3 lists with your particular values.
Thanks #PharaohOfParis.
I've changed your code a little. Now it works perfect!
def tridiagonal(n):
values1 = []
values2 = [0]
print("Enter R values ")
for i in range(0, n):
values1.append(float(input()))
for i in range(0, n):
values2.append(values1[i])
print(values1)
print(values2)
M = [ [0 for j in range(n)] for i in range(n) ]
for k in range(n-1):
M[k][k] =values1[k]+values2[k]
M[k][k+1] = -values1[k+1]
M[k+1][k]= -values1[k+1]
M[n-1][n-1]= values1[n-1]+values2[n-1]
print(M)
tridiagonal(5)

How to convert this one-liner into a generator which returns k elements?

How would you rewrite the following sliding window code using generator?
My issue is that each iteration I want to consume k=100 elements or the number of elements left.
n=1005
m=5
step=2
a=[(i,j) for i in range(0,n-m+1,step) for j in range(0,n-m+1,step) if i<j]
I came up with this solution:
Maybe this was a dumb question. Anyways here is a possible, non-optimal solution:
def my_gen(n,m,k,step):
res = []
count = 0
for i in range(0,n-m+1,step):
for j in range(0,n-m+1,step):
if i<j:
res.append((i,j))
count += 1
if count%k == 0:
yield res
res = []
yield res
Validating it works:
n=104
m=5 # size of the window
step=2
k=8
a=[(i,j) for i in range(0,n-m+1,step) for j in range(0,n-m+1,step) if i<j]
gen=my_gen(n,m,k,step)
b=[]
for g in gen:
b.extend(g)
for x,y in zip(a,b):
assert(x==y)

Categories

Resources