I have a list of lists m which I need to modify
I need that the sum of each row to be greater than A and the sum of each column to be lesser than B
I have something like this
x = 5 #or other number, not relevant
rows = len(m)
cols = len(m[0])
for r in range(rows):
while sum(m[r]) < A:
c = randint(0, cols-1)
m[r][c] += x
for c in range(cols):
cant = sum([m[r][c] for r in range(rows)])
while cant > B:
r = randint(0, rows-1)
if m[r][c] >= x: #I don't want negatives
m[r][c] -= x
My problem is: I need to satisfy both conditions and, this way, after the second for I won't be sure if the first condition is still met.
Any suggestions on how to satisfy both conditions and, of course, with the best execution? I could definitely consider the use of numpy
Edit (an example)
#input
m = [[0,0,0],
[0,0,0]]
A = 20
B = 25
# one desired output (since it chooses random positions)
m = [[10,0,15],
[15,0,5]]
I may need to add
This is for the generation of the random initial population of a genetic algorithm, the restrictions are to make them a possible solution, and I would need to run this like 80 times to get different possible solutions
Something like this should to the trick:
import numpy
from scipy.optimize import linprog
A = 10
B = 20
m = 2
n = m * m
# the coefficients of a linear function to minimize.
# setting this to all ones minimizes the sum of all variable
# values in the matrix, which solves the problem, but see below.
c = numpy.ones(n)
# the constraint matrix.
# This is matrix-multiplied with the current solution candidate
# to form the left hand side of a set of normalized
# linear inequality constraint equations, i.e.
#
# x_0 * A_ub[0][0] + x_1 * A_ub[0][1] <= b_0
# x_1 * A_ub[1][0] + x_1 * A_ub[1][1] <= b_1
# ...
A_ub = numpy.zeros((2 * m, n))
# row sums. Since the <= inequality is a fixed component,
# we just multiply everthing by (-1), i.e. we demand that
# the negative sums are smaller than the negative limit -A.
#
# Assign row ranges all at once, because numpy can do this.
for r in xrange(0, m):
A_ub[r][r * m:(r + 1) * m] = -1
# We want that the sum of the x in each (flattened)
# column is smaller than B
#
# The manual stepping for the column sums in row-major encoding
# is a little bit annoying here.
for r in xrange(0, m):
for j in xrange(0, m):
A_ub[r + m][r + m * j] = 1
# the actual upper limits for the normalized inequalities.
b_ub = [-A] * m + [B] * m
# hand the linear program to scipy
solution = linprog(c, A_ub=A_ub, b_ub=b_ub)
# bring the solution into the desired matrix form
print numpy.reshape(solution.x, (m, m))
Caveats
I use <=, not < as indicated in your question, because that's what numpy supports.
This minimizes the total sum of all values in the target vector.
For your use case, you probably want to minimize the distance
to the original sample, which the linear program cannot handle, since neither the squared error nor the absolute difference can be expressed using a linear combination (which is what c stands for). For that, you will probably need to go to full minimize().
Still, this should get you rough idea.
A NumPy solution:
import numpy as np
val = B / len(m) # column sums <= B
assert val * len(m[0]) >= A # row sums >= A
# create array shaped like m, filled with val
arr = np.empty_like(m)
arr[:] = val
I chose to ignore the original content of m - it's all zero in your example anyway.
from random import *
m = [[0,0,0],
[0,0,0]]
A = 20
B = 25
x = 1 #or other number, not relevant
rows = len(m)
cols = len(m[0])
def runner(list1, a1, b1, x1):
list1_backup = list(list1)
rows = len(list1)
cols = len(list1[0])
for r in range(rows):
while sum(list1[r]) <= a1:
c = randint(0, cols-1)
list1[r][c] += x1
for c in range(cols):
cant = sum([list1[r][c] for r in range(rows)])
while cant >= b1:
r = randint(0, rows-1)
if list1[r][c] >= x1: #I don't want negatives
list1[r][c] -= x1
good_a_int = 0
for r in range(rows):
test1 = sum(list1[r]) > a1
good_a_int += 0 if test1 else 1
if good_a_int == 0:
return list1
else:
return runner(list1=list1_backup, a1=a1, b1=b1, x1=x1)
m2 = runner(m, A, B, x)
for row in m:
print ','.join(map(lambda x: "{:>3}".format(x), row))
Related
I have to basically create a symmetric matrix of NxN. Which has 1s and 0s randomly populated into it. However the only constraint is I need only one '1' in any row and any column.
I wrote a code to generate the matrix but it has more than one '1' in any row or column. I need to follow the constraint mentioned above, how can i modify my code?
import numpy as np
N = int(input("Enter the number of row and col:"))
my_matrix = np.random.randint(2,size=(N,N))
print(my_matrix)
TL;DR
Each result is generated with equal probability and run with O(n) time complexity:
import random
_prob_cache = [1, 1]
def prob(n):
try:
return _prob_cache[n]
except IndexError:
pass
for i in range(len(_prob_cache) - 1, n):
_prob_cache.append(1 / (i * _prob_cache[-1] + 1))
return _prob_cache[-1]
def symmetric_permutation(n):
res = np.zeros((n, n), int)
remain = list(range(n))
while remain:
m = len(remain)
diag_prob = prob(m)
row = remain.pop()
rnd = random.random()
if rnd < diag_prob:
col = row
else:
nondiag_prob = (1 - diag_prob) / (m - 1)
idx = int((rnd - diag_prob) / nondiag_prob)
remain[idx], remain[-1] = remain[-1], remain[idx]
col = remain.pop()
res[row, col] = res[col, row] = 1
return res
Long Answer
Begin with some derivation:
Let f(n) be the number of all setting schemes of n * n matrix. Obviously, we have:
f(1) = 1
Then take a convention:
f(0) = 1
For n > 1, I can extract a position from any row and set it to 1. There are two cases:
If 1 is on the diagonal, we can remove the row and column of this 1 and continue to set on the remaining (n - 1) * (n - 1) matrix, so the number of remaining setting schemes is f(n - 1).
If 1 is not on the diagonal, the symmetrical part also needs to be set to 1. Then we can remove the row and column where the two 1's are located. We need to continue to set the remaining (n - 2) * (n - 2) matrix. Therefore, the number of remaining setting schemes is f(n - 2).
Therefore, we can deduce:
f(n) = f(n - 1) + (n - 1) * f(n - 2)
According to the above strategy, if we want to make every setting scheme appear with equal probability, we should give different weights to diagonal index and other indices when selecting index. The weight of the diagonal index should be:
p(n) = f(n - 1) / f(n)
Therefore:
f(n) = f(n - 1) + (n - 1) * f(n - 2)
f(n) (n - 1) * f(n - 2)
=> -------- = 1 + ------------------
f(n - 1) f(n - 1)
1
=> ---- = 1 + (n - 1) * p(n - 1)
p(n)
1
=> p(n) = ------------------
(n - 1) * p(n - 1)
The probability function code is as follows:
_prob_cache = [1, 1]
def prob(n):
"""
Iterative version to prevent stack overflow caused by recursion.
Old version:
#lru_cache
def prob(n):
if n == 1:
return 1
else:
return 1 / ((n - 1) * prob(n - 1) + 1)
"""
try:
return _prob_cache[n]
except IndexError:
pass
for i in range(len(_cache) - 1, n):
_prob_cache.append(1 / (i * _prob_cache[-1] + 1))
return _prob_cache[-1]
The weight of the non diagonal index is:
f(n - 2) f(n - 2) f(n - 1)
-------- = -------- * -------- = p(n - 1) * p(n)
f(n) f(n - 1) f(n)
or
f(n - 2) 1 - p(n)
-------- = --------
f(n) n - 1
Here I choose to use the latter to call the function less once.
Specific implementation:
We use a list to store the indices that can still be used. In each loop, we take the last element of the list as the row index (unlike previously said to select the first element, which can speed up the removal of elements from the list), calculate the weight of the two cases and obtain the column index randomly, sets the value of the corresponding position, and removes the used index from the list until the list is empty:
import random
import numpy as np
def symmetric_permutation(n):
res = np.zeros((n, n), int)
remain = list(range(n))
while remain:
m = len(remain)
diag_prob = prob(m)
row = remain.pop()
rnd = random.random()
if rnd < diag_prob:
col = row
else:
nondiag_prob = (1 - diag_prob) / (m - 1)
col = remain.pop(int((rnd - diag_prob) / nondiag_prob))
res[row, col] = res[col, row] = 1
return res
Optimize to O(n) time complexity:
If we do not consider the creation of the zero matrix, the time complexity of the above policy is O(n^2), because every time we have a high probability of removing an index from the list.
However, violent removal is unnecessary. We have no requirements on the order of the remaining indices because the selection of row index does not affect the randomness of column index. Therefore, a cheaper solution is to overwrite the selected column index with the last element, and then remove the last element. This makes the O(n) operation of removing intermediate elements become O(1) operation, so the time complexity becomes O(n):
def symmetric_permutation(n):
res = np.zeros((n, n), int)
remain = list(range(n))
while remain:
m = len(remain)
diag_prob = prob(m)
row = remain.pop()
rnd = random.random()
if rnd < diag_prob:
col = row
else:
nondiag_prob = (1 - diag_prob) / (m - 1)
idx = int((rnd - diag_prob) / nondiag_prob)
remain[idx], remain[-1] = remain[-1], remain[idx]
col = remain.pop()
res[row, col] = res[col, row] = 1
return res
Probability test:
Here we prepare another function to calculate f(n) for the following test:
def f(n):
before_prev, prev = 1, 1
for i in range(1, n):
before_prev, prev = prev, prev + before_prev * i
return prev
Next is a probability test to verify whether the results are uniform enough. Here I take n=8 and build matrix 500_000 times, use the column index of 1 in each row as the identification of each result, and draw a line graph and histogram of the number of occurrences of each result:
from collections import Counter
import matplotlib.pyplot as plt
random.seed(0)
n = 8
times = 500_000
n_bin = 30
cntr = Counter()
cntr.update(tuple(symmetric_permutation(n).nonzero()[1]) for _ in range(times))
assert len(cntr) == f(n)
plt.subplot(2, 1, 1).plot(cntr.values())
plt.subplot(2, 1, 2).hist(cntr.values(), n_bin)
plt.show()
It can be seen from sub figure 1 that the number of occurrences of each result is roughly within the range of 650 ± 70, and it can be observed from sub figure 2 that the distribution of the number of occurrences of each result is close to the Gaussian distribution:
For #AndrzejO 's answer, the same code test is used here, and his solution is faster (after optimization, the speed of the two is almost the same now), but the probability of each result does not seem equal (note that various results also appear here):
Create a matrix with zeros. Then, you need to take randomly N row numbers, without repetition, and randomly N column numbers, without repetition. You can use random.sample for this. Then put 1 on the row/column positions.
import numpy as np
from random import sample
N = int(input("Enter the number of row and col:"))
my_matrix = np.zeros((N,N), dtype='int8')
rows = sample(range(N), N)
cols = sample(range(N), N)
points = zip(rows, cols)
for x, y in points:
my_matrix[x, y] = 1
print(my_matrix)
If you want a symmetrical matrix: In case N is even, I would take N random numbers out of N, half of them as x, half as y; and on both positions (x,y) and (y,x) put a 1. If N is uneven, an additional 1 needs to be put on a random position on the diagonal.
import numpy as np
from random import sample, randint
N = int(input("Enter the number of row and col:"))
even = N%2 == 0
my_matrix = np.zeros((N,N), dtype='int8')
N_range = list(range(N))
if not even:
diagonal = randint(0, N-1)
N_range.remove(diagonal)
my_matrix[diagonal, diagonal] = 1
N = N - 1
rowcol = sample(N_range, N)
rows = rowcol[:N//2]
cols = rowcol[N//2:]
for x, y in zip(rows,cols):
my_matrix[x, y] = 1
my_matrix[y, x] = 1
Here is a better version. Take the first free row, get a random free column, put a 1 on (row,col) and (col,row). Remove the used col/row. Repeat until all numbers 0-(N-1) are used.
import numpy as np
import random
N = int(input("Enter the number of row and col:"))
my_matrix=np.zeros((N,N))
not_used_number = list(range(N))
while len(not_used_number) != 0:
current_row = not_used_number[0]
random_col = random.choice(not_used_number)
my_matrix[current_row, random_col] = 1
my_matrix[random_col, current_row] = 1
not_used_number.remove(current_row)
if current_row != random_col:
not_used_number.remove(random_col)
I am writing a code for successive over-relaxation.
When I run the code, I get the following error:
x[i] = (1-w)xold[i] + w(d[i] + sum(C[i,:]*x)) # estimate new values
OverflowError: Python int too large to convert to C long
# import libraries
import numpy as np
# define function
# M is the coeff matrix; b is RHS matrix, x is the initial guesses
# tol is acceptable tolerance and Nmax = max. iterations
def sor(M,b,x,w,tol,Nmax):
N = len(M) # length of the coefficient matrix
C = np.zeros((N,N)) # initialize iteration coeff matrix
d = np.zeros(N) # initiation iteration RHS matrix
# Create iteration matrix
for i in np.arange(0,N,1):
pvt = M[i,i] # identify the pivot element
C[i,:] = -M[i,:]/pvt # divide coefficient by pivot
C[i,i] = 0 # element the pivot element
d[i] = b[i]/pvt # divide RHS by Pivot element
# Perform iterations
res = 100 # create a high res so there is at least 1 iteration
iter = 0 #initialize iteration
xold = 1.0*x # initialize xold
#res = np.linalg.norm(np.matmul(M,x) - b)
# iterate when residual > tol or iter <= max iterations
while(res > tol and iter <= Nmax):
for i in np.arange(0,N,1): # loop through all unknowns
x[i] = (1-w)*xold[i] + w*(d[i] + sum(C[i,:]*x)) # estimate new values
res = np.sum(np.abs(np.matmul(M,x) - b)) # compute res
iter = iter + 1 # update residual
xold = x
return(x)
# Solve Example
Nmax = 100 # Max. Number of iteration
tol = 1e-03 # Absolute tolerance
M = [[1,1,1,0,0,0],
[1,1,1,1,0,0],
[1,1,1,1,1,0],
[0,1,1,1,1,1],
[0,0,1,1,1,1],
[0,0,0,1,1,1]]
M = np.array(M) # Coefficient Matrix
b = np.array([1,1,0.5,1,0.5,1])
y = [0,0,0,0,0,0]
y = np.array(y) # Initial Guesses
w = 1
X = sor(M,b,y,w,tol,Nmax) # Apply the function
print(X)
I am suppose to get the same answer as the built in function:
[ 1. 0.5 -0.5 0. -0.5 1.5]
What am I missing that is causing this issue?
Thanks in advance!
I am trying to make my own CFD solver and one of the most computationally expensive parts is solving for the pressure term. One way to solve Poisson differential equations faster is by using a multigrid method. The basic recursive algorithm for this is:
function phi = V_Cycle(phi,f,h)
% Recursive V-Cycle Multigrid for solving the Poisson equation (\nabla^2 phi = f) on a uniform grid of spacing h
% Pre-Smoothing
phi = smoothing(phi,f,h);
% Compute Residual Errors
r = residual(phi,f,h);
% Restriction
rhs = restriction(r);
eps = zeros(size(rhs));
% stop recursion at smallest grid size, otherwise continue recursion
if smallest_grid_size_is_achieved
eps = smoothing(eps,rhs,2*h);
else
eps = V_Cycle(eps,rhs,2*h);
end
% Prolongation and Correction
phi = phi + prolongation(eps);
% Post-Smoothing
phi = smoothing(phi,f,h);
end
I've attempted to implement this algorithm myself (also at the end of this question) however it is very slow and doesn't give good results so evidently it is doing something wrong. I've been trying to find why for too long and I think it's just worthwhile seeing if anyone can help me.
If I use a grid size of 2^5 by 2^5 points, then it can solve it and give reasonable results. However, as soon as I go above this it takes exponentially longer to solve and basically get stuck at some level of inaccuracy, no matter how many V-Loops are performed. at 2^7 by 2^7 points, the code takes way too long to be useful.
I think my main issue is that my implementation of a jacobian iteration is using linear algebra to calculate the update at each step. This should, in general, be fast however, the update matrix A is an n*m sized matrix, and calculating the dot product of a 2^7 * 2^7 sized matrix is expensive. As most of the cells are just zeros, should I calculate the result using a different method?
if anyone has any experience in multigrid methods, I would appreciate any advice!
Thanks
my code:
# -*- coding: utf-8 -*-
"""
Created on Tue Dec 29 16:24:16 2020
#author: mclea
"""
import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import convolve2d
from mpl_toolkits.mplot3d import Axes3D
from scipy.interpolate import griddata
from matplotlib import cm
def restrict(A):
"""
Creates a new grid of points which is half the size of the original
grid in each dimension.
"""
n = A.shape[0]
m = A.shape[1]
new_n = int((n-2)/2+2)
new_m = int((m-2)/2+2)
new_array = np.zeros((new_n, new_m))
for i in range(1, new_n-1):
for j in range(1, new_m-1):
ii = int((i-1)*2)+1
jj = int((j-1)*2)+1
# print(i, j, ii, jj)
new_array[i,j] = np.average(A[ii:ii+2, jj:jj+2])
new_array = set_BC(new_array)
return new_array
def interpolate_array(A):
"""
Creates a grid of points which is double the size of the original
grid in each dimension. Uses linear interpolation between grid points.
"""
n = A.shape[0]
m = A.shape[1]
new_n = int((n-2)*2 + 2)
new_m = int((m-2)*2 + 2)
new_array = np.zeros((new_n, new_m))
i = (np.indices(A.shape)[0]/(A.shape[0]-1)).flatten()
j = (np.indices(A.shape)[1]/(A.shape[1]-1)).flatten()
A = A.flatten()
new_i = np.linspace(0, 1, new_n)
new_j = np.linspace(0, 1, new_m)
new_ii, new_jj = np.meshgrid(new_i, new_j)
new_array = griddata((i, j), A, (new_jj, new_ii), method="linear")
return new_array
def adjacency_matrix(rows, cols):
"""
Creates the adjacency matrix for an n by m shaped grid
"""
n = rows*cols
M = np.zeros((n,n))
for r in range(rows):
for c in range(cols):
i = r*cols + c
# Two inner diagonals
if c > 0: M[i-1,i] = M[i,i-1] = 1
# Two outer diagonals
if r > 0: M[i-cols,i] = M[i,i-cols] = 1
return M
def create_differences_matrix(rows, cols):
"""
Creates the central differences matrix A for an n by m shaped grid
"""
n = rows*cols
M = np.zeros((n,n))
for r in range(rows):
for c in range(cols):
i = r*cols + c
# Two inner diagonals
if c > 0: M[i-1,i] = M[i,i-1] = -1
# Two outer diagonals
if r > 0: M[i-cols,i] = M[i,i-cols] = -1
np.fill_diagonal(M, 4)
return M
def set_BC(A):
"""
Sets the boundary conditions of the field
"""
A[:, 0] = A[:, 1]
A[:, -1] = A[:, -2]
A[0, :] = A[1, :]
A[-1, :] = A[-2, :]
return A
def create_A(n,m):
"""
Creates all the components required for the jacobian update function
for an n by m shaped grid
"""
LaddU = adjacency_matrix(n,m)
A = create_differences_matrix(n,m)
invD = np.zeros((n*m, n*m))
np.fill_diagonal(invD, 1/4)
return A, LaddU, invD
def calc_RJ(rows, cols):
"""
Calculates the jacobian update matrix Rj for an n by m shaped grid
"""
n = int(rows*cols)
M = np.zeros((n,n))
for r in range(rows):
for c in range(cols):
i = r*cols + c
# Two inner diagonals
if c > 0: M[i-1,i] = M[i,i-1] = 0.25
# Two outer diagonals
if r > 0: M[i-cols,i] = M[i,i-cols] = 0.25
return M
def jacobi_update(v, f, nsteps=1, max_err=1e-3):
"""
Uses a jacobian update matrix to solve nabla(v) = f
"""
f_inner = f[1:-1, 1:-1].flatten()
n = v.shape[0]
m = v.shape[1]
A, LaddU, invD = create_A(n-2, m-2)
Rj = calc_RJ(n-2,m-2)
update=True
step = 0
while update:
v_old = v.copy()
step += 1
vt = v_old[1:-1, 1:-1].flatten()
vt = np.dot(Rj, vt) + np.dot(invD, f_inner)
v[1:-1, 1:-1] = vt.reshape((n-2),(m-2))
err = v - v_old
if step == nsteps or np.abs(err).max()<max_err:
update=False
return v, (step, np.abs(err).max())
def MGV(f, v):
"""
Solves for nabla(v) = f using a multigrid method
"""
# global A, r
n = v.shape[0]
m = v.shape[1]
# If on the smallest grid size, compute the exact solution
if n <= 6 or m <=6:
v, info = jacobi_update(v, f, nsteps=1000)
return v
else:
# smoothing
v, info = jacobi_update(v, f, nsteps=10, max_err=1e-1)
A = create_A(n, m)[0]
# calculate residual
r = np.dot(A, v.flatten()) - f.flatten()
r = r.reshape(n,m)
# downsample resitdual error
r = restrict(r)
zero_array = np.zeros(r.shape)
# interploate the correction computed on a corser grid
d = interpolate_array(MGV(r, zero_array))
# Add prolongated corser grid solution onto the finer grid
v = v - d
v, info = jacobi_update(v, f, nsteps=10, max_err=1e-6)
return v
sigma = 0
# Setting up the grid
k = 6
n = 2**k+2
m = 2**(k)+2
hx = 1/n
hy = 1/m
L = 1
H = 1
x = np.linspace(0, L, n)
y = np.linspace(0, H, m)
XX, YY = np.meshgrid(x, y)
# Setting up the initial conditions
f = np.ones((n,m))
v = np.zeros((n,m))
# How many V cyles to perform
err = 1
n_cycles = 10
loop = True
cycle = 0
# Perform V cycles until converged or reached the maximum
# number of cycles
while loop:
cycle += 1
v_new = MGV(f, v)
if np.abs(v - v_new).max() < err:
loop = False
if cycle == n_cycles:
loop = False
v = v_new
print("Number of cycles " + str(cycle))
plt.contourf(v)
I realize that I'm not answering your question directly, but I do note that you have quite a few loops that will contribute some overhead cost. When optimizing code, I have found the following thread useful - particularly the line profiler thread. This way you can focus in on "high time cost" lines and then start to ask more specific questions regarding opportunities to optimize.
How do I get time of a Python program's execution?
a,b=np.ogrid[0:n:1,0:n:1]
A=np.exp(1j*(np.pi/3)*np.abs(a-b))
a,b=np.diag_indices_from(A)
A[a,b]=1-1j/np.sqrt(3)
is my basis. it produces a grid which acts as an n*n matrix.
My issue is I need to replace a column in the grid, say for example where b=17.
I need for this column to be:
A=np.exp(1j*(np.pi/3)*np.abs(a-17+geo_mean(x)))
except for where a=b where it needs to stay as:
A[a,b]=1-1j/np.sqrt(3)
geo_mean(x) is just a geometric average of 50 values determined from a pseudo random number generator, defined in my code as:
x=[random.uniform(0,0.5) for p in range(0,50)]
def geo_mean(iterable):
a = np.array(iterable)
return a.prod()**(1.0/len(a))
So how do i go about replacing a column to include the geo_mean in the exponent formula and do it without changing the diagonal value?
Let's start by saying that diag_indices_from() is kind of useless here since we already know that diagonal elements are those that have equal indices i and j and run up to value n. Therefore, let's simplify the code a little bit at the beginning:
a, b = np.ogrid[0:n:1, 0:n:1]
A = np.exp(1j * (np.pi / 3) * np.abs(a - b))
diag = np.arange(n)
A[diag, diag] = 1 - 1j / np.sqrt(3)
Now, let's say you would like to set the column k values, except for the diagonal element, to
np.exp(1j * (np.pi/3) * np.abs(a - 17 + geo_mean(x)))
(I guess a in the above formula is row index).
This can be done using integer indices, especially that they are almost computed: we already have diag and we just need to remove from it the index of the diagonal element that needs to be kept unchanged:
r = np.delete(diag, k)
Then
x = np.random.uniform(0, 0.5, (r.size, 50))
A[r, k] = np.exp(1j * (np.pi/3) * np.abs(r - k + geo_mean(x)))
However, for the above to work, you need to rewrite your geo_mean() function in a such a way that it will work with 2D input arrays (I will also add some checks and conversions to make it backward compatible):
def geo_mean(x):
x = np.asarray(x)
dim = len(x.shape)
x = np.atleast_2d(x)
v = np.prod(x, axis=1) ** (1.0 / x.shape[1])
return v[0] if dim == 1 else v
I'm computing thousands of gradients and would like to vectorize the computations in Python. The context is SVM and the loss function is Hinge Loss. Y is Mx1, X is MxN and w is Nx1.
L(w) = lam/2 * ||w||^2 + 1/m Sum i=1:m ( max(0, 1-y[i]X[i]w) )
The gradient of this is
grad = lam*w + 1/m Sum i=1:m {-y[i]X[i].T if y[i]*X[i]*w < 1, else 0}
Instead of looping through each element of the sum and evaluating the max function, is it possible to vectorize this? I want to use something like np.where like the following
grad = np.where(y*X.dot(w) < 1, -X.T.dot(y), 0)
This does not work because where the condition is true, -X.T*y is the wrong dimension.
edit: list comprehension version, would like to know if there's a cleaner or more optimal way
def grad(X,y,w,lam):
# cache y[i]*X[i].dot(w), each row of Xw is multiplied by a single element of y
yXw = y*X.dot(w)
# cache y[i]*X[i], note each row of X is multiplied by a single element of y
yX = X*y[:,np.newaxis]
# return the average of this max function
return lam*w + np.mean( [-yX[i] if yXw[i] < 1 else 0 for i in range(len(y))] )
you have two vectors A and B, and you want to return array C, such that C[i] = A[i] if B[i] < 1 and 0 else, consequently all you need to do is
C := A * sign(max(0, 1-B)) # suprisingly similar to the original hinge loss, right?:)
since
if B < 1 then 1-B > 0, thus max(0, 1-B) > 0 and sign(max(0, 1-B)) == 1
if B >= 1 then 1-B <= 0, thus max(0, 1-B) = 0 and sign(max(0, 1-B)) == 0
so in your code it will be something like
A = (y*X.dot(w)).ravel()
B = (X*y[:,np.newaxis]).ravel()
C = A * np.sign(np.maximum(0, 1-B))