Softmax and Hinge functions in Python - python

I'm trying to implement the Hinge loss function in Python and faced with some misleadings.
In some sources that I used to read (for example, "Regression Analysis in Python"under Luca Massoron) states that Hinge sometimes calls as Softmax function.
But for me it is kind of strange because, Hinge:
and Softmax is just exponential function like:
I made that function in Python (for Softmax) this way:
def softmax(x):
e_x = np.exp(x - np.max(x))
return e_x/e_x.sum(axis=0)
Have two questions:
Can I use that softmax function like an equivalent to hinge function?
If not, how can hinge be implemented in Python?

Can I use that softmax function like an equivalent to hinge function?
no - they are not equivalent.
a hinge function is a loss function and do not provide well-calibrated probabilities, whereas softmax is a mapping function (one that maps a set of scores into a distribution, one that sums to one).
If not, how can hinge be implemented in Python?
this following snippet captures the essence of hinge loss functions:
import numpy as np
import matplotlib.pyplot as plt
xmin, xmax = -1, 2
xx = np.linspace(xmin, xmax, 100)
plt.plot(xx, np.where(xx < 1, 1 - xx, 0), label="Hinge loss")

you can also implement softmax functions in pure python :)
import numpy as np
import math as math
def sofyMax(data):
# pure python
# math:: $rezult(powe,sumColumn) = \dfrac{powe(data)}{sumColumn(powe(data))}$
def powe(data):
outp = [[] for _ in range(len(data))]
for column in range(len(data[0])):
r = 0
for row in data:
return outp
def sumColumn(data):
outps = []
for column in range(len(data[0])):
total = 0
for row in data:
outps += [total]
return outps
def rezult(data,sumcolumn):
outp = [[] for _ in range(len(data))]
l = 0
for row in data:
for c,s in zip(row,sumcolumn) :
outp[l] += [c/s]
return outp
et1 = powe(data)
et2 = sumColumn(et1)
return rezult(et1,et2)
data = np.random.randn(10,5)
(np.exp(data)/np.sum(np.exp(data),axis=0)) == (np.array(sofyMax(data)))


Multiclass logistic regression from scratch

I’m trying to apply multiclass logistic regression from scratch. The dataset is the MNIST.
I built some functions such as hypothesis, sigmoid, cost function, cost function derivate, and gradient descendent. My code is below.
I’m struggling with:
As all images are labeled with the respective digit that they represent. There are a total of 10 classes.
Inside the function gradient descendent, I need to loop through each class, but I do not know how to apply it using the One vs All method.
In other words, what I need to do are:
How to filter each class inside the gradient descendent.
After that, how to build a function to predict the test set.
Here is my code.
import numpy as np
import pandas as pd
# Only training data set
# the test data will be load later.
url='' + url.split('/')[-2]
df = pd.read_csv(url,header = None)
X = df.values[:, 0:-1]
y = df.values[:, -1]
m = np.size(X, 0)
y = np.array(y).reshape(m, 1)
X = np.c_[ np.ones(m), X ] # Bias
def hypothesis(X, thetas):
return sigmoid( #- 0.0000001
def sigmoid(z):
return 1/(1+np.exp(-z))
def losscost(X, y, m, thetas):
h = hypothesis(X, thetas)
return -(1/m) * ( + (1-y).dot(np.log(1-h)) )
def derivativelosscost(X, y, m, thetas):
h = hypothesis(X, thetas)
return (h-y).dot(X)/m
def descendinggradient(X, y, m, epoch, alpha, thetas):
n = np.size(X, 1)
J_historico = []
for i in range(epoch):
for j in range(0,10): # 10 classes
# How to filter each class inside here (inside this def descendinggradient)?
# 2 lines below are wrong.
#thetas = thetas - alpha * derivativelosscost(X, y, m, thetas)
#J_historico = J_historico + [losscost(X, y, m, thetas)]
return [thetas, J_historico]
alpha = 0.01
epoch = 50
(thetas, J_historico) = descendinggradient(X, y, m, epoch, alpha)
# After that, how to build a function to predict the test set.
Let me explain this problem step-by-step:
First since you code doesn't provides the actual data or a link to it I've created a random dataset followed by the same commands you used to create X and Y:
batch_size = 20
num_classes = 10
rng = np.random.default_rng(seed=42)
df = pd.DataFrame(
4* rng.random((batch_size, num_classes + 1)) - 2, # Create Random Array Between -2, 2
columns=['X0','X1','X2','X3','X4','X5','X6','X7','X8', 'X9','Y']
X = df.values[:, 0:-1]
y = df.values[:, -1]
m = np.size(X, 0)
y = np.array(y).reshape(m, 1)
X = np.c_[ np.ones(m), X ] # Bias
Next lets take a look at your hypothesis function. If we would just run hypothesis and take a look at the first sample, we will get a vector with the size (10,1). I also needed to provide the initial thetas for this case:
thetas = rng.random((X.shape[1],num_classes))
h = hypothesis(X, thetas)
>>>[0.89701729 0.90050806 0.98358408 0.81786334 0.96636732 0.97819512
0.89118488 0.87238045 0.70612173 0.30256924]
Basically the function calculates a "propabilties"[1] for each class.
At this point we got to the first issue in your code. The result of the sigmoid function returns "propabilities" which are not "connected" to each other. So to set those "propabilties" in relation we need a another function: SOFTMAX. You will find plenty implementations about this functions. In short: It will calculate the "propabilites" based on the "sigmoid", so that the sum overall class-"propabilites" results to 1.
So for your second question "How to implement a predict after training", we only need to find the argmax value to determine the class:
h = hypothesis(X, thetas)
p = softmax(h) # needs to be implemented
prediction = np.argmax(p, axis=1)
>>>[2 5 5 8 3 5 2 1 3 5 2 3 8 3 3 9 5 1 1 8]
Now that we know how to predict a class, we also need to know where to setup the training. We want to do this directly after the softmax function. But instead of using the argmax to determine the winning class, we use the costfunction and its derivative. Your problem in your code: You used the crossentropy loss for a binary problem. The binary problem also don't need to use the softmax function, because the sigmoid function already provides the connection of the two binary classes. So since we are not interested in the result at all of the cross-entropy-loss for multiple classes and only into its derivative, we also want to calculate this directly.
The conversion from binary crossentropy to multiclass is kind of unintuitive in the first view. I recommend to read a bit about it before implementing. After this you basicly use your line:
thetas = thetas - alpha * derivativelosscost(X, y, m, thetas)
for updating the thetas.
[1]These are not actuall propabilities, but this is a complete different topic.

Why does my sigmoid function return values not in the interval ]0,1[?

I am implementing logistic regression in Python with numpy. I have generated the following data set:
# class 0:
# covariance matrix and mean
cov0 = np.array([[5,-4],[-4,4]])
mean0 = np.array([2.,3])
# number of data points
m0 = 1000
# class 1
# covariance matrix
cov1 = np.array([[5,-3],[-3,3]])
mean1 = np.array([1.,1])
# number of data points
m1 = 1000
# generate m gaussian distributed data points with
# mean and cov.
r0 = np.random.multivariate_normal(mean0, cov0, m0)
r1 = np.random.multivariate_normal(mean1, cov1, m1)
X = np.concatenate((r0,r1))
Now I have implemented the sigmoid function with the aid of the following methods:
def logistic_function(x):
""" Applies the logistic function to x, element-wise. """
return 1.0 / (1 + np.exp(-x))
def logistic_hypothesis(theta):
return lambda x : logistic_function(, theta.T))
def generateNewX(x):
x = np.insert(x, 0, 1, axis=1)
return x
After applying logistic regression, I found out that the best thetas are:
best_thetas = [-0.9673200946417307, -1.955812236119612, -5.060885703369424]
However, when I apply the logistic function with these thetas, then the output is numbers that are not inside the interval [0,1]
data = logistic_hypothesis(np.asarray(best_thetas))(X)
This gives the following result:
[2.67871968e-11 3.19858822e-09 3.77845881e-09 ... 5.61325410e-03
2.19767618e-01 6.23288747e-01]
Can someone help me understand what has gone wrong with my implementation? I cannot understand why I am getting such big values. Isnt the sigmoid function supposed to only give results in the [0,1] interval?
It does, it's just in scientific notation.
'e' Exponent notation. Prints the number in scientific notation using
the letter ‘e’ to indicate the exponent.
>>> a = [2.67871968e-11, 3.19858822e-09, 3.77845881e-09, 5.61325410e-03]
>>> [0 <= i <= 1 for i in a]
[True, True, True, True]

Efficient way to implement simple filter with varying coeffients in Python/Numpy

I am looking for an efficient way to implement a simple filter with one coefficient that is time-varying and specified by a vector with the same length as the input signal.
The following is a simple implementation of the desired behavior:
def myfilter(signal, weights):
output = np.empty_like(weights)
val = signal[0]
for i in range(len(signal)):
val += weights[i]*(signal[i] - val)
output[i] = val
return output
weights = np.random.uniform(0, 0.1, (100,))
signal = np.linspace(1, 3, 100)
output = myfilter(signal, weights)
Is there a way to do this more efficiently with numpy or scipy?
You can trade in the overhead of the loop for a couple of additional ops:
import numpy as np
def myfilter(signal, weights):
output = np.empty_like(weights)
val = signal[0]
for i in range(len(signal)):
val += weights[i]*(signal[i] - val)
output[i] = val
return output
def vectorised(signal, weights):
wp = np.r_[1, np.multiply.accumulate(1 - weights[1:])]
sw = weights * signal
sw[0] = signal[0]
sws = np.add.accumulate(sw / wp)
return wp * sws
weights = np.random.uniform(0, 0.1, (100,))
signal = np.linspace(1, 3, 100)
print(np.allclose(myfilter(signal, weights), vectorised(signal, weights)))
On my machine the vectorised version is several times faster. It uses a "closed form" solution of your recurrence equation.
Edit: For very long signal / weight (100,000 samples, say) this method doesn't work because of overflow. In that regime you can still save a bit (more than 50% on my machine) using the following trick, which has the added bonus that you needn't solve the recurrence formula, only invert it.
from scipy import linalg
def solver(signal, weights):
rw = 1 / weights[1:]
v = np.r_[1, rw, 1-rw, 0]
v.shape = 2, -1
return linalg.solve_banded((1, 0), v, signal)
This trick uses the fact that your recurrence is formally similar to a Gauss elimination on a matrix with only one nonvanishing subdiagonal. It piggybacks on a library function that specialises in doing precisely that.
Actually, quite proud of this one.

Precision, recall, F1 score equal with sklearn

I'm trying to compare different distance calculating methods and different voting systems in k-nearest neighbours algorithm. Currently my problem is that no matter what I do precision_recall_fscore_support method from scikit-learn yields exactly the same results for precision, recall and fscore. Why is that? I've tried it on different datasets (iris, glass and wine). What am I doing wrong? The code so far:
#!/usr/bin/env python3
from collections import Counter
from data_loader import DataLoader
from sklearn.metrics import precision_recall_fscore_support as pr
import random
import math
import ipdb
def euclidean_distance(x, y):
return math.sqrt(sum([math.pow((a - b), 2) for a, b in zip(x, y)]))
def manhattan_distance(x, y):
return sum(abs([(a - b) for a, b in zip(x, y)]))
def get_neighbours(training_set, test_instance, k):
names = [instance[4] for instance in training_set]
training_set = [instance[0:4] for instance in training_set]
distances = [euclidean_distance(test_instance, training_set_instance) for training_set_instance in training_set]
distances = list(zip(distances, names))
print(list(filter(lambda x: x[0] == 0.0, distances)))
sorted(distances, key=lambda x: x[0])
return distances[:k]
def plurality_voting(nearest_neighbours):
classes = [nearest_neighbour[1] for nearest_neighbour in nearest_neighbours]
count = Counter(classes)
return count.most_common()[0][0]
def weighted_distance_voting(nearest_neighbours):
distances = [(1/nearest_neighbour[0], nearest_neighbour[1]) for nearest_neighbour in nearest_neighbours]
index = distances.index(min(distances))
return nearest_neighbours[index][1]
def weighted_distance_squared_voting(nearest_neighbours):
distances = list(map(lambda x: 1 / x[0]*x[0], nearest_neighbours))
index = distances.index(min(distances))
return nearest_neighbours[index][1]
def main():
data = DataLoader.load_arff("datasets/iris.arff")
dataset = data["data"]
# random.seed(42)
train = dataset[:100]
test = dataset[100:150]
classes = [instance[4] for instance in test]
predictions = []
for test_instance in test:
prediction = weighted_distance_voting(get_neighbours(train, test_instance[0:4], 15))
print(pr(classes, predictions, average="micro"))
if __name__ == "__main__":
The problem is that you're using the 'micro' average.
As stated here:
As is written in the documentation: "Note that for “micro”-averaging
in a multiclass setting will produce equal precision, recall and
[image: F], while “weighted” averaging may produce an F-score that is
not between precision and recall."
But if you drop a majority label, using the labels parameter, then
micro-averaging differs from accuracy, and precision differs from

Python lmfit: Fitting a 2D Model

I'm trying to fit a 2D-Gaussian to some greyscale image data, which is given by one 2D array.
The lmfit library implements a easy-to-use Model class, that should be capable of doing this.
Unfortunately the documentation ( does only provide examples for 1D fitting. For my case I simply construct the lmfit Model with 2 independent variables.
The following code seems valid for me, but causes scipy to throw a "minpack.error: Result from function call is not a proper array of floats."
Tom sum it up: How to input 2D (x1,x2)->(y) data to a Model of lmfit.?
Here is my approach:
Everything is packed in a GaussianFit2D class, but here are the important parts:
That's the Gaussian function. The documentation says about user defined functions
Of course, the model function will have to return an array that will be the same size as the data being modeled. Generally this is handled by also specifying one or more independent variables.
I don't really get what this should mean, since for given values x1,x2 the only reasonable result is a scalar value.
def _function(self, x1, x2, amp, wid, cen1, cen2):
val = (amp/(np.sqrt(2*np.pi)*wid)) * np.exp(-((x1-cen1)**2+(x2-cen2)**2)/(2*wid**2))
return val
Here the model is generated:
def _buildModel(self, **kwargs):
model = lmfit.Model(self._function, independent_vars=["x1", "x2"],
param_names=["amp", "wid", "cen1", "cen2"])
return model
That's the function that takes the data, builds the model and params and calls lmfit fit():
def fit(self, data, freeX, **kwargs):
freeX = np.asarray(freeX, float)
model = self._buildModel(**kwargs)
params = self._generateModelParams(model, **kwargs), x1=freeX[0], x2=freeX[1], params=params)
Anf finally here this fit function gets called:
data = np.asarray(img, float)
gaussFit = GaussianFit2D()
x1 = np.arange(len(img[0, :]))
x2 = np.arange(len(img[:, 0]))
fit =, [x1, x2])
Ok, wrote with the devs and got the answer from them (thanks to Matt here).
The basic idea is to flatten all the input to 1D data, hiding from lmfit the >1 dimensional input.
Here's how you do it.
Modify your function:
def function(self, x1, x2):
return (x1+x2).flatten()
Flatten your 2D input array you want to fit to:
data = data.flatten()
Modify the two 1D x-variables such that you have any combination of them:
x1n = []
x2n = []
for i in x1:
for j in x2:
x1n = np.asarray(x1n)
x2n = np.asarray(x2n)
And throw anything into the fitter:, x1=x1n, x2=x2n, params=params)
Here is an example for your reference, hope it may help you.
import numpy
from lmfit import Model
def gaussian(x, cenu, cenv, wid):
u = x[:, 0]
v = x[:, 1]
return (1/(2*numpy.pi*wid**2)) * numpy.exp(-(u-cenu)**2 / (2*wid**2)-(v-cenv)**2 / (2*wid**2))
data = numpy.empty((25,3))
x = numpy.arange(-2,3,1)
y = numpy.arange(-2,3,1)
xx, yy = numpy.meshgrid(x, y)
data[:,0] = xx.flatten()
data[:,1] = yy.flatten()
data[:, 2]= gaussian(data[:,0:2],0,0,0.5)
print 'xx\n', xx
print 'yy\n',yy
print 'data to be fit\n', data[:, 2]
cu = 0.9
cv = 0.5
wid = 1
gmod = Model(gaussian)
gmod.set_param_hint('cenu', value=cu, min=cu-2, max=cu+2)
gmod.set_param_hint('cenv', value=cv, min=cv -2, max=cv+2)
gmod.set_param_hint('wid', value=wid, min=0.1, max=5)
params = gmod.make_params()
result =[:, 2], x=data[:, 0:2], params=params)
print result.fit_report(min_correl=0.25)
print result.best_values
print result.best_fit

