Shape mismatch error with scipy.optimize.minimize for logistic regression - python

I am going through's Andrew Ng's ML course, and I am trying to implement the programs in python. For the second exercise, on logistic regression, I am trying to use scipy.optimize.minimize for optimizing the cost function. My code is as follows.
import os
import numpy as np
from scipy.special import expit
from scipy import optimize
datafile1 = os.path.join('data','ex2data1.txt')
data1 = np.loadtxt(datafile1, delimiter=',')
exam_scores, results = data1[:, :2], data1[:, 2]
m, n = exam_scores.shape
exam_scores = np.concatenate([np.ones([m, 1]), exam_scores], axis=1)
def cost_function(x, y, theta):
m = len(y)
hypothesis = expit(, theta))
term1 =, np.log(hypothesis)) / m
term2 = - y).T, np.log(1 - hypothesis)) / m
cost = term1 + term2
return cost
def gradient(x, y, theta):
m = len(y)
hypothesis = expit(, theta))
return - y, x) / m
def minimize_cost(x, y, theta):
output = optimize.minimize(cost_function, theta, args=(x, y),
jac=gradient, options={'maxiter':400})
return, output.x
theta = np.zeros(n + 1)
theta, cost = minimize_cost(exam_scores, results, theta)
This gives me
<ipython-input-42-e2ba65cce1d8> in gradient(x, y, theta)
9 def gradient(x, y, theta):
10 m = len(y)
---> 11 hypothesis = expit(, theta))
12 return - y, x) / m
ValueError: shapes (3,) and (100,) not aligned: 3 (dim 0) != 100 (dim 0).
However the shape of theta and the output of the gradient function is the same, i.e. theta.shape == gradient(exam_scores, results, theta).shape gives me True.
I do not understand why is the gradient function raising a ValueError when called from minimize since by itself it is giving the expected output.
Any pointers would be appreciated.
P.S. Here is a part of the data.
exam_scores[:5, :]
array([[34.62365962, 78.02469282],
[30.28671077, 43.89499752],
[35.84740877, 72.90219803],
[60.18259939, 86.3085521 ],
[79.03273605, 75.34437644]])
results.reshape(m, 1)[:5, :]
Edit: Added part of the data.


Python problems on Machine-Learning

import numpy as np
import pandas as pd
import numpy as np
from matplotlib import pyplot as pt
def computeCost(X,y,theta):
predictions= X*theta-y
return 1/(2*m)*np.sum(sqrerror)
def gradientDescent(X, y, theta, alpha, num_iters):
m = len(y)
jhistory = np.zeros((num_iters,1))
for i in range(num_iters):
h = X * theta
s = h - y
theta = theta - (alpha / m) * (s.T*X).T
jhistory_iter = computeCost(X, y, theta)
return theta,jhistory_iter
data = open(r'C:\Users\Coding\Desktop\machine-learning-ex1\ex1\ex1data1.txt')
y =np.array(data1[:,1])
X = np.array([data1[:,0]]).reshape(m,1)
X = np.asmatrix(np.insert(X,0,1,axis=1))
iterations = 1500
alpha = 0.01;
print('Testing the cost function ...')
J = computeCost(X, y, theta)
print('With theta = [0 , 0]\nCost computed = ', J)
print('Expected cost value (approx) 32.07')
J = computeCost(X, y, theta)
print('With theta = [-1 , 2]\nCost computed =', J)
print('Expected cost value (approx) 54.24')
theta,JJ = gradientDescent(X, y, theta, alpha, iterations)
print('Theta found by gradient descent:')
print('Expected theta values (approx)')
print(' -3.6303\n 1.1664\n')
predict1 = [1, 3.5] *theta
Testing the cost function ...
With theta = [0 , 0]
Cost computed = 32.072733877455676
Expected cost value (approx) 32.07
With theta = [-1 , 2]
Cost computed = 69.84811062494227
Expected cost value (approx) 54.24
Theta found by gradient descent:
[[-3.70304726 -3.64357517]
[ 1.17367146 1.16769684]]
Expected theta values (approx)
[[4048.02858742 4433.63790186]]
There are two problems, the first Cost computed was right, but the second one was wrong. And there are 4 element in my gradient descent(suppose to be two)
When you mention "With theta = [-1 , 2]"
and you enter
I think this is incorrect. Assuming that you have single feature and you added a column of 1, and you are trying to do simple linear regression
The correct way should be
Also where have
predictions= X*theta-y
It would be better if you did,theta)-y
When you multiply, it's not doing the same thing.

Implement gradient descent in python

I am trying to implement gradient descent in python. Though my code is returning result by I think results I am getting are completely wrong.
Here is the code I have written:
import numpy as np
import pandas
dataset = pandas.read_csv('D:\ML Data\house-prices-advanced-regression-techniques\\train.csv')
X = np.empty((0, 1),int)
Y = np.empty((0, 1), int)
for i in range(dataset.shape[0]):
X = np.append(X,[i, 'LotArea'])
Y = np.append(Y,[i, 'SalePrice'])
X = np.c_[np.ones(len(X)), X]
Y = Y.reshape(len(Y), 1)
def gradient_descent(X, Y, theta, iterations=100, learningRate=0.000001):
m = len(X)
for i in range(iterations):
prediction =, theta)
theta = theta - (1/m) * learningRate * ( - Y))
return theta
theta = np.random.randn(2,1)
theta = gradient_descent(X, Y, theta)
The result I get after running this program is:
theta [[-5.23237458e+228]
Which are very high values. Can someone point out the mistake I have made in implementation.
Also, 2nd problem is I have to set value of learning rate very low (in this case i have set to 0.000001) to work other wise program throws an error.
Please help me in diagnosis the problem.
try to reduce the learning rate with iteration otherwise it wont be able to reach the optimal lowest.try this
import numpy as np
import pandas
dataset = pandas.read_csv('start.csv')
X = np.empty((0, 1),int)
Y = np.empty((0, 1), int)
for i in range(dataset.shape[0]):
X = np.append(X,[i, 'R&D Spend'])
Y = np.append(Y,[i, 'Profit'])
X = np.c_[np.ones(len(X)), X]
Y = Y.reshape(len(Y), 1)
def gradient_descent(X, Y, theta, iterations=50, learningRate=0.01):
m = len(X)
for i in range(iterations):
prediction =, theta)
theta = theta - (1/m) * learningRate * ( - Y))
return theta
theta = np.random.randn(2,1)
theta = gradient_descent(X, Y, theta)

My vectorization implementation of gradient descent does not get me the right answer

I'm currently working on Andrew Ng's gradient descent exercise using python but keeps getting me the wrong optimal theta. I followed this vectorization cheatsheet for gradient descent ---
Here is my code:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
def cost_func(X, Y, theta):
m = len(X)
H =
J = 1/(2*m) * (H - Y) - Y)
return J
def gradient_descent(X, Y, alpha=0.01, iterations=1500):
#initializing theta as a zero vector
theta = np.zeros(X.shape[1])
#initializing the a list of cost function value
J_list = [cost_func(X, Y, theta)]
m = len(X)
while iterations > 0:
H =
delta = (1/m)* - Y)
theta = theta - alpha * delta
iterations -= 1
J_list.append(cost_func(X, Y, theta))
return theta, J_list
def check_convergence(J_list):
plt.plot(range(len(J_list)), J_list)
plt.ylabel('Cost J')
file_name_1 = ''
df1 = pd.read_csv(file_name_1, header=None)
X = df1.values[:, 0]
Y = df1.values[:, 1]
m = len(X)
X = np.column_stack((np.ones(m), X))
theta_optimal, J_list = gradient_descent(X, Y, 0.01, 1500)
My theta output is [-3.63029144 1.16636235], which is incorrect.
Here is my cost function graph. As you see, it converges way too quickly.
The correct graph should look like.
Thank you.

How to vectorize Logistic Regression?

I'm trying to implement regularized logistic regression using python for the coursera ML class but I'm having a lot of trouble vectorizing it. Using this repository:
I've tried many different ways but never get the correct gradient or cost heres my current implementation:
h = utils.sigmoid(, theta) )
J = (-1/m) * ( np.log(h) ) + (1 - y.T).dot( np.log( 1 - h ) ) ) + ( lambda_/(2*m) ) * np.sum( np.square(theta[1:]) )
grad = ((1/m) * (h - y) X )).T + grad_theta_reg
Here are the results:
Cost : 0.693147
cost: 2.534819
[-0.100000, -0.030000, -0.080000, -0.130000]
Expected gradients:
[0.146561, -0.548558, 0.724722, 1.398003]
Any help from someone who knows whats going on would be much appreciated.
Bellow a working snippet of a vectorized version of Logistic Regression. You can see more here
theta_t = np.array([[-2], [-1], [1], [2]])
data = np.arange(1, 16).reshape(3, 5).T
X_t = np.c_[np.ones((5,1)), data/10]
y_t = (np.array([[1], [0], [1], [0], [1]]) >= 0.5) * 1
lambda_t = 3
J, grad = lrCostFunction(theta_t, X_t, y_t, lambda_t), lrGradient(theta_t, X_t, y_t, lambda_t, flattenResult=False)
print('\nCost: f\n', J)
print('Expected cost: 2.534819\n')
print(' f \n', grad)
print('Expected gradients:\n')
print(' 0.146561\n -0.548558\n 0.724722\n 1.398003\n')
from sigmoid import sigmoid
import numpy as np
def lrCostFunction(theta, X, y, reg_lambda):
"""LRCOSTFUNCTION Compute cost and gradient for logistic regression with
J = LRCOSTFUNCTION(theta, X, y, lambda) computes the cost of using
theta as the parameter for regularized logistic regression and the
gradient of the cost w.r.t. to the parameters.
m, n = X.shape #number of training examples
theta = theta.reshape((n,1))
prediction = sigmoid(
cost_y_1 = (1 - y) * np.log(1 - prediction)
cost_y_0 = -1 * y * np.log(prediction)
J = (1.0/m) * np.sum(cost_y_0 - cost_y_1) + (reg_lambda/(2.0 * m)) * np.sum(np.power(theta[1:], 2))
return J
from sigmoid import sigmoid
import numpy as np
def lrGradient(theta, X,y, reg_lambda, flattenResult=True):
m,n = X.shape
theta = theta.reshape((n,1))
prediction = sigmoid(, theta))
errors = np.subtract(prediction, y)
grad = (1.0/m) *, errors)
grad_with_regul = grad[1:] + (reg_lambda/m) * theta[1:]
firstRow = grad[0, :].reshape((1,1))
grad = np.r_[firstRow, grad_with_regul]
if flattenResult:
return grad.flatten()
return grad
Hope that helped!

Equivalent of `polyfit` for a 2D polynomial in Python

I'd like to find a least-squares solution for the a coefficients in
z = (a0 + a1*x + a2*y + a3*x**2 + a4*x**2*y + a5*x**2*y**2 + a6*y**2 +
a7*x*y**2 + a8*x*y)
given arrays x, y, and z of length 20. Basically I'm looking for the equivalent of numpy.polyfit but for a 2D polynomial.
This question is similar, but the solution is provided via MATLAB.
Here is an example showing how you can use numpy.linalg.lstsq for this task:
import numpy as np
x = np.linspace(0, 1, 20)
y = np.linspace(0, 1, 20)
X, Y = np.meshgrid(x, y, copy=False)
Z = X**2 + Y**2 + np.random.rand(*X.shape)*0.01
X = X.flatten()
Y = Y.flatten()
A = np.array([X*0+1, X, Y, X**2, X**2*Y, X**2*Y**2, Y**2, X*Y**2, X*Y]).T
B = Z.flatten()
coeff, r, rank, s = np.linalg.lstsq(A, B)
the adjusting coefficients coeff are:
array([ 0.00423365, 0.00224748, 0.00193344, 0.9982576 , -0.00594063,
0.00834339, 0.99803901, -0.00536561, 0.00286598])
Note that coeff[3] and coeff[6] respectively correspond to X**2 and Y**2, and they are close to 1. because the example data was created with Z = X**2 + Y**2 + small_random_component.
Based on the answers from #Saullo and #Francisco I have made a function which I have found helpful:
def polyfit2d(x, y, z, kx=3, ky=3, order=None):
Two dimensional polynomial fitting by least squares.
Fits the functional form f(x,y) = z.
Resultant fit can be plotted with:
np.polynomial.polynomial.polygrid2d(x, y, soln.reshape((kx+1, ky+1)))
x, y: array-like, 1d
x and y coordinates.
z: np.ndarray, 2d
Surface to fit.
kx, ky: int, default is 3
Polynomial order in x and y, respectively.
order: int or None, default is None
If None, all coefficients up to maxiumum kx, ky, ie. up to and including x^kx*y^ky, are considered.
If int, coefficients up to a maximum of kx+ky <= order are considered.
Return paramters from np.linalg.lstsq.
soln: np.ndarray
Array of polynomial coefficients.
residuals: np.ndarray
rank: int
s: np.ndarray
# grid coords
x, y = np.meshgrid(x, y)
# coefficient array, up to x^kx, y^ky
coeffs = np.ones((kx+1, ky+1))
# solve array
a = np.zeros((coeffs.size, x.size))
# for each coefficient produce array x^i, y^j
for index, (j, i) in enumerate(np.ndindex(coeffs.shape)):
# do not include powers greater than order
if order is not None and i + j > order:
arr = np.zeros_like(x)
arr = coeffs[i, j] * x**i * y**j
a[index] = arr.ravel()
# do leastsq fitting and return leastsq result
return np.linalg.lstsq(a.T, np.ravel(z), rcond=None)
And the resultant fit can be visualised with:
fitted_surf = np.polynomial.polynomial.polyval2d(x, y, soln.reshape((kx+1,ky+1)))
Excellent answer by Saullo Castro. Just to add the code to reconstruct the function using the least-squares solution for the a coefficients,
def poly2Dreco(X, Y, c):
return (c[0] + X*c[1] + Y*c[2] + X**2*c[3] + X**2*Y*c[4] + X**2*Y**2*c[5] +
Y**2*c[6] + X*Y**2*c[7] + X*Y*c[8])
You can also use scikit-learn for this.
import numpy as np
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
x = np.linspace(0, 1, 20)
y = np.linspace(0, 1, 20)
X, Y = np.meshgrid(x, y, copy=False)
X = X.flatten()
Y = Y.flatten()
# Generate noisy data
Z = X**2 + Y**2 + np.random.randn(*X.shape)*0.01
# Process 2D inputs
poly = PolynomialFeatures(degree=2)
input_pts = np.stack([X, Y]).T
assert(input_pts.shape == (400, 2))
in_features = poly.fit_transform(input_pts)
# Linear regression
model = LinearRegression(), Z)
# Display coefficients
print(dict(zip(poly.get_feature_names_out(), model.coef_.round(4))))
# Check fit
print(f"R-squared: {model.score(poly.transform(input_pts), Z):.3f}")
# Make predictions
Z_predicted = model.predict(poly.transform(input_pts))
{'1': 0.0, 'x0': 0.003, 'x1': -0.0074, 'x0^2': 0.9974, 'x0 x1': 0.0047, 'x1^2': 1.0014}
R-squared: 1.000
Note that if kx != ky the code will fail because the j and i indices are inverted in the loop.
You get (j,i) from enumerate(np.ndindex(coeffs.shape)), but then you address elements in coeffs as coeffs[i,j]. Since the shape of the coefficient matrix is given by the maximum polynomial order that you are asking to use, the matrix will be rectangular if kx != ky and you will exceed one of its dimensions.

