Translating Matlab # into Python code (RuntimeWarning: invalid value encountered in true_divide ) - python

I'm translating a Matlab code to python and I got stuck here.
I think that with # it's creating a local variable right?
I tried to traduce the code with nested function but it seems to have a problem in hanfling one of the variable.
#Code's variable:
#LL0 = first tentative value
#Sad and f = returns of psd estimation via welch's method
#Sad = is the psd of discrete time-series
#f = is an array of the sampled frequencies
#u_mean = is a scalar
% Matlab Code
f = #(LL, Sad, n, U_media)sum((Sad - 4 * n * LL / U_media .* (1 + 70.8 * (n * LL / U_media).^2).^(-5/6)).^2);
fun = #(LL)f(LL, Sad, n, U_media);
LL = fminsearch(fun, LL0);
def f1(LL, Sad, n, u_mean):
a = sum((Sad - 4 * n * LL / u_mean * (1 + 70.8 * (n * LL / u_mean)**2)**(-5/6))**2)
return(a)
f2 = lambda LL, Sad, u_mean, f: f1(LL, Sad, n, u_mean)
fun = lambda LL: f2(LL, Sad, n, u_mean)
LL = scipy.optimize.fmin(func=fun, x0=LL0, maxfun=100000, xtol=1e-6, maxiter=10000, disp=True)
The code is running but the iterations give me this output
RuntimeWarning: invalid value encountered in true_divide
sum((S_adim - 4 * f * LL / u_mean * (1 + 70.8 * (f * LL / u_mean**2)**(-5/6)))**2)
RuntimeWarning: invalid value encountered in reduce
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
Warning: Maximum number of iterations has been exceeded.
The value I get in return is the same I used as tentative value
Can anyone help me? Thanks in advance

finally found the solution; i'm not so sure why now it's working think i was assigning the same variable twice. Anyway this is the solution I've come to:
def f(LL, Sad, n, u_mean):
f = sum((Sad - (4 * n * LL / u_mean) *
((1 + 70.8 * ((n * LL / u_mean) ** (2)))**(-5 / 6)))**2)
return(f)
fun = lambda LL: f(LL, Sad, n, u_mean)
res = scipy.optimize.minimize(fun=fun, x0=LL0)
Lux = res.x
Btw, spyder idle keeps saying that I shouldn't use lambda function but instead define a function.. can anyone tell me why? I was able to translate the "fun" into a function; I've tried with:
def fun(LL):
f(LL, Sad, n, u_mean)
return(f)
res = scipy.optimize.minimize(fun=fun, x0=LL0)
but it doesn't work. Would be nice if anybody can tell me why.
Thanks for your help

Related

Multiprocessing python function for numerical calculations

Hoping to get some help here with parallelising my python code, I've been struggling with it for a while and come up with several errors in whichever way I try, currently running the code will take about 2-3 hours to complete, The code is given below;
import numpy as np
from scipy.constants import Boltzmann, elementary_charge as kb, e
import multiprocessing
from functools import partial
Tc = 9.2
x = []
g= []
def Delta(T):
'''
Delta(T) takes a temperature as an input and calculates a
temperature dependent variable based on Tc which is defined as a
global parameter
'''
d0 = (pi/1.78)*kb*Tc
D0 = d0*(np.sqrt(1-(T**2/Tc**2)))
return D0
def element_in_sum(T, n, phi):
D = Delta(T)
matsubara_frequency = (np.pi * kb * T) * (2*n + 1)
factor_d = np.sqrt((D**2 * cos(phi/2)**2) + matsubara_frequency**2)
element = ((2 * D * np.cos(phi/2))/ factor_d) * np.arctan((D * np.sin(phi/2))/factor_d)
return element
def sum_elements(T, M, phi):
'''
sum_elements(T,M,phi) is the most computationally heavy part
of the calculations, the larger the M value the more accurate the
results are.
T: temperature
M: number of steps for matrix calculation the larger the more accurate the calculation
phi: The phase of the system can be between 0- pi
'''
X = list(np.arange(0,M,1))
Y = [element_in_sum(T, n, phi) for n in X]
return sum(Y)
def KO_1(M, T, phi):
Iko1Rn = (2 * np.pi * kb * T /e) * sum_elements(T, M, phi)
return Iko1Rn
def main():
for j in range(1, 92):
T = 0.1*j
for i in range(1, 314):
phi = 0.01*i
pool = multiprocessing.Pool()
result = pool.apply_async(KO_1,args=(26000, T, phi,))
g.append(result)
pool.close()
pool.join()
A = max(g);
x.append(A)
del g[:]
My approach was to try and send the KO1 function into a multiprocessing pool but I either get a Pickling error or a too many files open, Any help is greatly appreciated, and if multiprocessing is the wrong approach I would love any guide.
I haven't tested your code, but you can do several things to improve it.
First of all, don't create arrays unnecessarily. sum_elements creates three array-like objects when it can use just one generator. First, np.arange creates a numpy array, then the list function creates a list object and and then the list comprehension creates another list. The function does 4 times the work it should.
The correct way to implement it (in python3) would be:
def sum_elements(T, M, phi):
return sum(element_in_sum(T, n, phi) for n in range(0, M, 1))
If you use python2, replace range with xrange.
This tip will probably help you in any python script you'll write.
Also, try to utilize multiprocessing better. It seems what you need to do is to create a multiprocessing.Pool object once, and use the pool.map function.
The main function should look like this:
def job(args):
i, j = args
T = 0.1*j
phi = 0.01*i
return K0_1(26000, T, phi)
def main():
pool = multiprocessing.Pool(processes=4) # You can change this number
x = [max(pool.imap(job, ((i, j) for i in range(1, 314)) for j in range(1, 92)]
Notice that I used a tuple in order to pass multiple arguments to job.
This is not an answer to the question, but if I may, I would propose how to speed up the code using simple numpy array operations. Have a look at the following code:
import numpy as np
from scipy.constants import Boltzmann, elementary_charge as kb, e
import time
Tc = 9.2
RAM = 4*1024**2 # 4GB
def Delta(T):
'''
Delta(T) takes a temperature as an input and calculates a
temperature dependent variable based on Tc which is defined as a
global parameter
'''
d0 = (np.pi/1.78)*kb*Tc
D0 = d0*(np.sqrt(1-(T**2/Tc**2)))
return D0
def element_in_sum(T, n, phi):
D = Delta(T)
matsubara_frequency = (np.pi * kb * T) * (2*n + 1)
factor_d = np.sqrt((D**2 * np.cos(phi/2)**2) + matsubara_frequency**2)
element = ((2 * D * np.cos(phi/2))/ factor_d) * np.arctan((D * np.sin(phi/2))/factor_d)
return element
def KO_1(M, T, phi):
X = np.arange(M)[:,np.newaxis,np.newaxis]
sizeX = int((float(RAM) / sum(T.shape))/sum(phi.shape)/8) #8byte
i0 = 0
Iko1Rn = 0. * T * phi
while (i0+sizeX) <= M:
print "X = %i"%i0
indices = slice(i0, i0+sizeX)
Iko1Rn += (2 * np.pi * kb * T /e) * element_in_sum(T, X[indices], phi).sum(0)
i0 += sizeX
return Iko1Rn
def main():
T = np.arange(0.1,9.2,0.1)[:,np.newaxis]
phi = np.linspace(0,np.pi, 361)
M = 26000
result = KO_1(M, T, phi)
return result, result.max()
T0 = time.time()
r, rmax = main()
print time.time() - T0
It runs a bit more than 20sec on my PC. One has to be careful not to use too much memory, that is why there is still a loop with a bit complicated construction to use only pieces of X. If enough memory is present, then it is not necessary.
One should also note that this is just the first step of speeding up. Much improvement could be reached still using e.g. just in time compilation or cython.

ln (Natural Log) in Python

In this assignment I have completed all the problems except this one. I have to create a python script to solve an equation (screenshot).
Unfortunately, in my research all over the internet I cannot figure out how in the world to either convert ln to log or anything usable, or anything. The code I have written so far is below. I will also post the answer that our teacher says we should get.
import math
p = 100
r = 0.06 / 12
FV = 4000
n = str(ln * ((1 + (FV * r) / p) / (ln * (1 + r))))
print ("Number of periods = " + str(n))
The answer I should get is 36.55539635919235
Any advice or help you have would be greatly appreciated!
Also, we are not using numpy. I already attempted that one.
Thanks!
math.log is the natural logarithm:
From the documentation:
math.log(x[, base]) With one argument, return the natural logarithm of
x (to base e).
Your equation is therefore:
n = math.log((1 + (FV * r) / p) / math.log(1 + r)))
Note that in your code you convert n to a str twice which is unnecessary
Here is the correct implementation using numpy (np.log() is the natural logarithm)
import numpy as np
p = 100
r = 0.06 / 12
FV = 4000
n = np.log(1 + FV * r/ p) / np.log(1 + r)
print ("Number of periods = " + str(n))
Output:
Number of periods = 36.55539635919235

python divide by zero encountered in log - logistic regression

I'm trying to implement a multiclass logistic regression classifier that distinguishes between k different classes.
This is my code.
import numpy as np
from scipy.special import expit
def cost(X,y,theta,regTerm):
(m,n) = X.shape
J = (np.dot(-(y.T),np.log(expit(np.dot(X,theta))))-np.dot((np.ones((m,1))-y).T,np.log(np.ones((m,1)) - (expit(np.dot(X,theta))).reshape((m,1))))) / m + (regTerm / (2 * m)) * np.linalg.norm(theta[1:])
return J
def gradient(X,y,theta,regTerm):
(m,n) = X.shape
grad = np.dot(((expit(np.dot(X,theta))).reshape(m,1) - y).T,X)/m + (np.concatenate(([0],theta[1:].T),axis=0)).reshape(1,n)
return np.asarray(grad)
def train(X,y,regTerm,learnRate,epsilon,k):
(m,n) = X.shape
theta = np.zeros((k,n))
for i in range(0,k):
previousCost = 0;
currentCost = cost(X,y,theta[i,:],regTerm)
while(np.abs(currentCost-previousCost) > epsilon):
print(theta[i,:])
theta[i,:] = theta[i,:] - learnRate*gradient(X,y,theta[i,:],regTerm)
print(theta[i,:])
previousCost = currentCost
currentCost = cost(X,y,theta[i,:],regTerm)
return theta
trX = np.load('trX.npy')
trY = np.load('trY.npy')
theta = train(trX,trY,2,0.1,0.1,4)
I can verify that cost and gradient are returning values that are in the right dimension (cost returns a scalar, and gradient returns a 1 by n row vector), but i get the error
RuntimeWarning: divide by zero encountered in log
J = (np.dot(-(y.T),np.log(expit(np.dot(X,theta))))-np.dot((np.ones((m,1))-y).T,np.log(np.ones((m,1)) - (expit(np.dot(X,theta))).reshape((m,1))))) / m + (regTerm / (2 * m)) * np.linalg.norm(theta[1:])
why is this happening and how can i avoid this?
The proper solution here is to add some small epsilon to the argument of log function. What worked for me was
epsilon = 1e-5
def cost(X, y, theta):
m = X.shape[0]
yp = expit(X # theta)
cost = - np.average(y * np.log(yp + epsilon) + (1 - y) * np.log(1 - yp + epsilon))
return cost
You can clean up the formula by appropriately using broadcasting, the operator * for dot products of vectors, and the operator # for matrix multiplication — and breaking it up as suggested in the comments.
Here is your cost function:
def cost(X, y, theta, regTerm):
m = X.shape[0] # or y.shape, or even p.shape after the next line, number of training set
p = expit(X # theta)
log_loss = -np.average(y*np.log(p) + (1-y)*np.log(1-p))
J = log_loss + regTerm * np.linalg.norm(theta[1:]) / (2*m)
return J
You can clean up your gradient function along the same lines.
By the way, are you sure you want np.linalg.norm(theta[1:]). If you're trying to do L2-regularization, the term should be np.linalg.norm(theta[1:]) ** 2.
Cause:
This is happening because in some cases, whenever y[i] is equal to 1, the value of the Sigmoid function (theta) also becomes equal to 1.
Cost function:
J = (np.dot(-(y.T),np.log(expit(np.dot(X,theta))))-np.dot((np.ones((m,1))-y).T,np.log(np.ones((m,1)) - (expit(np.dot(X,theta))).reshape((m,1))))) / m + (regTerm / (2 * m)) * np.linalg.norm(theta[1:])
Now, consider the following part in the above code snippet:
np.log(np.ones((m,1)) - (expit(np.dot(X,theta))).reshape((m,1)))
Here, you are performing (1 - theta) when the value of theta is 1. So, that will effectively become log (1 - 1) = log (0) which is undefined.
I'm guessing your data has negative values in it. You can't log a negative.
import numpy as np
np.log(2)
> 0.69314718055994529
np.log(-2)
> nan
There are a lot of different ways to transform your data that should help, if this is the case.
def cost(X, y, theta):
yp = expit(X # theta)
cost = - np.average(y * np.log(yp) + (1 - y) * np.log(1 - yp))
return cost
The warning originates from np.log(yp) when yp==0 and in np.log(1 - yp) when yp==1. One option is to filter out these values, and not to pass them into np.log. The other option is to add a small constant to prevent the value from being exactly 0 (as suggested in one of the comments above)
Add epsilon value[which is a miniature value] to the log value so that it won't be a problem at all.
But i am not sure if it will give accurate results or not .

What is the most pythonic way to conditionally compute?

I'm implementing Bayesian Changepoint Detection in Python/NumPy (if you are interested have a look at the paper). I need to compute likelihoods for data in ranges [a, b], where a and b can have all values from 1 to n. However I can prune the computation at some points, so that I don't have to compute every likelihood. On the other hand some likelihoods are used more than once, so that I can save time by saving the values in a matrix P[a, b]. Right now I check whether the value is already computed, whenever I use it, but I find that a bit of a hassle. It looks like this:
# ...
P = np.ones((n, n)) * np.inf # a likelihood can't get inf, so I use it
# as pseudo value
for a in range(n):
for b in range(a, n):
# The following two lines get annoying and error prone if you
# use P more than once
if P[a, b] == np.inf:
P[a, b] = likelihood(data, a, b)
Q[a] += P[a, b] * g[a] * Q[a - 1] # some computation using P[a, b]
# ...
I wonder, whether there is a more intuitive and pythonic way to achieve this, without having the if ... statement before every use of a P[a, b]. Something like an automagical function call if some condition is not met. I could of course make the likelihood function aware of the fact that it could save values, but then it needs some kind of state (e.g. becomes an object). I want to avoid that.
The likelihood function
Since it was asked for in a comment, I add the likelihood function. It actually computes the conjugate prior and then the likelihood. And all in log representation... So it is quite complicated.
from scipy.special import gammaln
def gaussian_obs_log_likelihood(data, t, s):
n = s - t
mean = data[t:s].sum() / n
muT = (n * mean) / (1 + n)
nuT = 1 + n
alphaT = 1 + n / 2
betaT = 1 + 0.5 * ((data[t:s] - mean) ** 2).sum() + ((n)/(1 + n)) * (mean**2 / 2)
scale = (betaT*(nuT + 1))/(alphaT * nuT)
# splitting the PDF of the student distribution up is /much/ faster. (~ factor 20)
prob = 1
for yi in data[t:s]:
prob += np.log(1 + (yi - muT)**2/(nuT * scale))
lgA = gammaln((nuT + 1) / 2) - np.log(np.sqrt(np.pi * nuT * scale)) - gammaln(nuT/2)
return n * lgA - (nuT + 1)/2 * prob
Although I work with Python 2.7, both answers for 2.7 and 3.x are appreciated.
I would use a sibling of defaultdict for this (you can't use defaultdict directly since it won't tell you the key that is missing):
class Cache(object):
def __init__(self):
self.cache = {}
def get(self, a, b):
key = (a,b)
result = self.cache.get(key, None)
if result is None:
result = likelihood(data, a, b)
self.cache[key] = result
return result
Another approach would be using a cache decorator on likelihood as described here.

Euler method (explicit and implicit)

I'd like to implement Euler's method (the explicit and the implicit one)
(https://en.wikipedia.org/wiki/Euler_method) for the following model:
x(t)' = q(x_M -x(t))x(t)
x(0) = x_0
where q, x_M and x_0 are real numbers.
I know already the (theoretical) implementation of the method. But I couldn't figure out where I can insert / change the model.
Could anybody help?
EDIT: You were right. I didn't understand correctly the method. Now, after a few hours, I think that I really got it! With the explicit method, I'm pretty sure (nevertheless: could anybody please have a look at my code? )
With the implicit implementation, I'm not very sure if it's correct. Could please anyone have a look at the implementation of the implicit method and give me a feedback what's correct / not good?
def explizit_euler():
''' x(t)' = q(xM -x(t))x(t)
x(0) = x0'''
q = 2.
xM = 2
x0 = 0.5
T = 5
dt = 0.01
N = T / dt
x = x0
t = 0.
for i in range (0 , int(N)):
t = t + dt
x = x + dt * (q * (xM - x) * x)
print '%6.3f %6.3f' % (t, x)
def implizit_euler():
''' x(t)' = q(xM -x(t))x(t)
x(0) = x0'''
q = 2.
xM = 2
x0 = 0.5
T = 5
dt = 0.01
N = T / dt
x = x0
t = 0.
for i in range (0 , int(N)):
t = t + dt
x = (1.0 / (1.0 - q *(xM + x) * x))
print '%6.3f %6.3f' % (t, x)
Pre-emptive note: Although the general idea should be correct, I did all the algebra in place in the editor box so there might be mistakes there. Please, check it yourself before using for anything really important.
I'm not sure how you come to the "implicit" formula
x = (1.0 / (1.0 - q *(xM + x) * x))
but this is wrong and you can check it by comparing your "explicit" and "implicit" results: they should slightly diverge but with this formula they will diverge drastically.
To understand the implicit Euler method, you should first get the idea behind the explicit one. And the idea is really simple and is explained at the Derivation section in the wiki: since derivative y'(x) is a limit of (y(x+h) - y(x))/h, you can approximate y(x+h) as y(x) + h*y'(x) for small h, assuming our original differential equation is
y'(x) = F(x, y(x))
Note that the reason this is only an approximation rather than exact value is that even over small range [x, x+h] the derivative y'(x) changes slightly. It means that if you want to get a better approximation of y(x+h), you need a better approximation of "average" derivative y'(x) over the range [x, x+h]. Let's call that approximation just y'. One idea of such improvement is to find both y' and y(x+h) at the same time by saying that we want to find such y' and y(x+h) that y' would be actually y'(x+h) (i.e. the derivative at the end). This results in the following system of equations:
y'(x+h) = F(x+h, y(x+h))
y(x+h) = y(x) + h*y'(x+h)
which is equivalent to a single "implicit" equation:
y(x+h) - y(x) = h * F(x+h, y(x+h))
It is called "implicit" because here the target y(x+h) is also a part of F. And note that quite similar equation is mentioned in the Modifications and extensions section of the wiki article.
So now going to your case that equation becomes
x(t+dt) - x(t) = dt*q*(xM -x(t+dt))*x(t+dt)
or equivalently
dt*q*x(t+dt)^2 + (1 - dt*q*xM)*x(t+dt) - x(t) = 0
This is a quadratic equation with two solutions:
x(t+dt) = [(dt*q*xM - 1) ± sqrt((dt*q*xM - 1)^2 + 4*dt*q*x(t))]/(2*dt*q)
Obviously we want the solution that is "close" to the x(t) which is the + solution. So the code should be something like:
b = (q * xM * dt - 1)
x(t+h) = (b + (b ** 2 + 4 * q * x(t) * dt) ** 0.5) / 2 / q / dt
(editor note:) Applying the binomial complement, this formula has the numerically more stable form for small dt, where then b < 0,
x(t+h) = (2 * x(t)) / ((b ** 2 + 4 * q * x(t) * dt) ** 0.5 - b)

Categories

Resources