The MSS-maximum likelihood method for Fluctuation Test - python

I am trying to implement an algorithm in the following paper (method 5) http://dx.doi.org/10.1016%2FS0076-6879(05)09012-9 in python2.7 to improve my programming skill. Implementations can be found at these locations: Apparently I cannot post so many links. If my reputation goes up, I will post the links here.
Essentially, the algorithm is used for biological research, and finds the mutation rate of cells under some condition. Here is my attempt, which has errors (NOTE: I updated this code to remove an error however I am still not getting the right answer):
import numpy as np
import sympy as sp
from scipy.optimize import minimize
def leeCoulson(nparray):
median=np.median(nparray)
x=sp.Symbol('x')
M_est=sp.solve(sp.Eq(-x*sp.log(x) - 1.24*x + median,0),x)
return M_est
def ctArray(nparray,max):
list=[0] * int(max+1)
for i in range(int(max)+1):
list[i]=nparray.count(i)
return list
values='filename1.csv'
data=np.genfromtxt(values,delimiter=',')
mVal=int(max(data))
ctArray_=ctArray(np.ndarray.tolist(data),mVal)
ef mssCalc(estM,max=mVal,count=ctArray_):
def rec(pi,r):
pr=(estM/r)+sum([(pi[i]/(r-i+1)) for i in range(0,r-1)])
return pr
prod=1
pi=[0]*max
pi[0]=np.exp(-1*estM)
for r in range(1,max):
pi[r]=rec(pi,r)
prod=prod*(pi[r]**count[r])
return -1*prod
finalM=minimize (mssCalc,leeCoulson(data),method='nelder-mead',options={'xtol':1e-3,'disp':True})
print finalM
This code gives the following errors:
mss-mle_calc.py:37: RuntimeWarning: overflow encountered in multiply
prod=prod*(pi[r]**count[r])
/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/optimize/optimize.py:462: RuntimeWarning: invalid value encountered in subtract
numpy.max(numpy.abs(fsim[0] - fsim[1:])) <= ftol):
Warning: Maximum number of function evaluations has been exceeded.
Please help me make this code better if you have some time.

Thanks for looking, there were a couple stupid mistakes in my code. Here is the working code (as far as I can tell):
import numpy as np
import sympy as sp
from scipy.optimize import minimize
def leeCoulson(nparray):
median=np.median(nparray)
x=sp.Symbol('x')
M_est=sp.solve(sp.Eq(-x*sp.log(x) - 1.24*x + median,0),x)
return float(M_est[0])
def ctArray(nparray,max):
list=[0] * int(max+1)
for i in range(int(max)+1):
list[i]=nparray.count(i)
return list
values='filename1.csv'
data=np.genfromtxt(values,delimiter=',')
mVal=int(max(data))
ctArray_=ctArray(np.ndarray.tolist(data),mVal)
def mssCalc(estM,max=mVal,count=ctArray_):
def rec(pi,r):
pr=(estM/r)*sum([(pi[i]/(r-i+1)) for i in range(0,r)])
return pr
prod=1
pi=[0]*(max+1)
pi[0]=np.exp(-1.0*estM)
for r in range(1,max+1):
pi[r]=rec(pi,r)
prod=prod*(pi[r]**count[r])
return -1*prod
finalM=minimize (mssCalc,leeCoulson(data),method='nelder-mead',options={'xtol':1e-3,'disp':True})
print finalM

Related

Sympy substitutes

I am just a new user for SymPy. I am self learning this library for my undergrauate research. But in the middle of the process I am stuck with one code.
I have defined a function with a subscript.
U_n= x^n + 1/x^n
When I consider (U_1)^3 I get (substitute n=1)
(U_1)^3 = (x+1/x)^3
Then after simplifying this I get
(U_1)^3 = (x^3 + 1/x^3) + 3(x+ 1/x)
But one can see this answer as
(U_1)^3 = U_3 + 3U_1
How to get the output in terms of U_n 's ?
from sympy import *
from sympy import sympify
import sympy
x=symbols('x')
def u(n):
return x**n+1/x**n
def unu(eq1):
c = (eq1.subs(x, exp(x))).simplify()/2
return c.subs(cosh, Function('u')).subs(x,1)
from sympy import *
from sympy import sympify
import sympy
x=symbols('x')
def v(n):
return x**n-1/x**n
def vnu(eq2):
c = (eq2.subs(x, exp(x))).simplify()/2
return c.subs(sinh, Function('v')).subs(x,1)
This is my current code.I have built it to 2 separate U_n and V_n.But I cannot combine them.
Can someone please give an idea how to build this code using SymPy. It would be a very big help for my research.
Thank you very much.

CVXPY throws SolverError

When using CVXPY, I frequently get "SolverError". Their doc just says this is caused by numerical issues, but no further information is given about how to avoid them.
The following code snippet is an example, the problem is trivial, but the 'CVXOPT' solver just throws "SolverError". It is true that if we change the solver to another one, like 'ECOS', the problem will be solved as expected. But the point is, 'CVXOPT' should in principle solve this trivial problem and it really baffles me why it doesn't work.
import numpy as np
import cvxpy as cv
np.random.seed(0)
temp = np.random.rand(5)
T = 2
x = cv.Variable(T)
u = cv.Variable(2, T)
pbs = []
for t in range(T):
cost = cv.sum_squares(x[t]-temp[t])
constr = [x[t] == u[0,t]+u[1,t],]
pbs.append(cv.Problem(cv.Minimize(cost), constr))
prob = sum(pbs)
prob.solve(solver='CVXOPT')
Use prob.solve(solver='CVXOPT', kktsolver=cv.ROBUST_KKTSOLVER) to make the optimisation process more robust.

ValueError: need more than 3 values to unpack when using optimize.minimize

I'm pretty new to python and I got stuck on this:
I'd like to use scipy.optimize.minimize to maximize a function and I'm having some problem with the extra arguments of the function I defined.
I looked for a solution in tons of answered questions but I can't find anything that solves my problem.
I saw in Structure of inputs to scipy minimize function how to pass extra arguments that one wants to be constant in the minimization of the function and my code seems fine to me from this point of view.
This is my code:
import numpy as np
from scipy.stats import pearsonr
import scipy.optimize as optimize
def min_pears_function(a,exp):
(b,c,d,e)=a
return (1-(pearsonr(b + exp[0] * c + exp[1] * d + exp[2],e)[0]))
a = (log_x,log_y,log_t,log_z) # where log_x, log_y, log_t and log_z are numpy arrays with same length
guess_PF=[0.6,2.0,0.2]
res = optimize.minimize(min_pears_function, guess_PF, args=(a,), options={'xtol': 1e-8, 'disp': True})
When running the code I get the following error:
ValueError: need more than 3 values to unpack
But I can't see what needed argument I'm missing. The function seems to work fine, so I guess the problem is in optimize.minimize call?
Your error occurs here:
def min_pears_function(a,exp):
# XXX: This is your error line
(b,c,d,e)=a
return (1-(pearsonr(b + exp[0] * c + exp[1] * d + exp[2],e)[0]))
This is because:
the initial value you pass to optimize.minimize is guessPF which has just three values ([0.6,2.0,0.2]).
this initial value is passed to min_pears_function as the variable a.
Did you mean for it to be passed as exp? Is it exp you wish to solve for? In that case, redefine the signature as:
def min_pears_function(exp, a):
...

__init__ got an unexpected keyword argument [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I'm trying to run this script for chemistry research.
In case I didn't copy the script correctly here's the link to download the files:
http://pubs.acs.org/doi/suppl/10.1021/acs.analchem.5b02258
This instruction may be helpful:
"fsolve_withPT.py takes two command line arguments: input file name of “MamPol2_titration_data.txt” and
output file name of “Kaps_result.txt”"
When I run the script on ipython I get this error message:
__init__() got an unexpected keyword argument 'step_max'
#University of California San Francisco
#Supplemental for
#
#A model for specific and nonspecific binding of ligand to multi-protein
#complexes by native mass spectrometry
#
#Shenheng Guan, et al
#2015
#
import sys
import math
import numpy
import warnings
from scipy.optimize import fsolve,fmin
import matplotlib.pyplot as plt
warnings.filterwarnings('ignore')
input_fn='MamPol2_titration_data.txt'
output_fn='Kaps_result1.txt'
##input_fn=sys.argv[1]
##output_fn=sys.argv[1]
fid=open(input_fn,'r')
line=fid.readline()
line=line.strip('\n')
no_mam=[float(x) for x in line.split('\t')[1:]]
line=fid.readline()
data=[]
conc=[]
for line in fid:
line=line.strip('\n')
tmp0=line.split('\t')
conc.append(float(tmp0[0]))
tmp1=[float(x) for x in tmp0[1:]]
data.append(tmp1)
fid.close()
class fsolve_withPT:
def __init__(self,conc,data):
self.conc = conc
self.data = data
def ff(self, x, Kas, LT, PT):#x[0]:[P];x[1]:[PL]...;x[n]:[PLn];x[-1]:[L] n+2 (10)
fc=[]
for j in range(0,len(x)-2):#setup equilibrium equations
fc.append(Kas[j]*x[j]*x[-1]-x[j+1])
#mass conservation for P
tmpP=0.0#[P]
for j in range(0,len(x)-1):#x[0] to x[8] or [P] to [PL8]
tmpP=tmpP+x[j]
fc.append(tmpP-PT)#PT equals to all P species combined
#mass conservation for L
tmpL=x[-1]#[L]
for j in range(1,len(x)-1):
tmpL=tmpL+j*x[j]
fc.append(tmpL-LT)
return fc
def error(self,w):
Kas=w[:-1]
PT=w[-1]
mySum=0.0
for m in range(0,len(self.conc)):#over conc (LT)
#print Kas,self.conc[m],PT
F=fsolve(self.ff, [1.0]*10, args=(Kas, self.conc[m], PT))
myPT=sum(F[:-1])
for k in range(0, len(no_mam)):#over # of Mam
mySum=mySum+(F[k]/myPT-self.data[m][k])**2
return mySum
w0=[8,7,6,5,4,3,2,1,0]
w0=numpy.array(w0)
w0=w0*3.01e4
w0[-1]=5.e-6
myFclass=fsolve_withPT(conc,data)
w, fopt, iter, funcalls, warnflag = fmin(myFclass.error, w0, maxiter=2000,
maxfun=2000, full_output=True,disp=True)
# http://nullege.com/codes/show/src#n#u#Numdifftools-0.6.0#numdifftools#speed_comparison#run_benchmarks.py/73/numdifftools.Hessian
import numdifftools as nd
#my_step_nom=[1.0e3]*8+[1.0e-6]*1
my_step_nom=w#*1.0e-3
hessian = nd.Hessian(myFclass.error,step_max=1.0e-2,step_nom=my_step_nom)#, step_max=1.0, step_nom=numpy.abs(w))
H = hessian(w)
covH=numpy.linalg.inv(H)
conc0=conc#numpy.linspace(0.0,6.0E-05,num=101).tolist()
y0=[]
for tmp in conc0:
F=fsolve(myFclass.ff, [1.0]*10, args=(w[:-1], tmp,w[-1]))
y0.append(F)
#y0=myFunc(conc0,w)
fid=open(output_fn,'w')
fid.write('Calculated complex conc. (M)\t'+str(w[-1])+'\n')
fid.write('# of Mam in Complex\t')
for j in no_mam:
fid.write(str(j)+'\t')
fid.write('\n')
fid.write('Associate constants (Kas)\t\t')
for j in no_mam[:-1]:
fid.write(str(w[j])+'\t')
fid.write('\n')
fid.write('Mam Conc. (M)\tSimulated abundances\n')
for k in range(0,len(y0)):
fid.write(str(conc0[k])+'\t')
yc=y0[k]
tmp=sum(yc[:-1])
for j in range(0,len(yc)-2):
fid.write(str(yc[j]/tmp)+'\t')
fid.write(str(yc[-2])+'\n')
fid.close()
from scipy import stats
SS=fopt
DF=len(data)*len(data[0])-len(w)
t_factor=stats.t.ppf(0.95, DF)
SE=[]
dw=[]
for j in range(0,len(w)):
SE.append(numpy.sqrt(SS/DF*numpy.abs(covH[j,j])))
for j in range(0,len(w)):
dw.append(SE[j]*t_factor)
you are unable to run this code becuase the paper's code is incorrect. I even downloaded the code from the link you posted to make sure I had the correct code. I was also able to reproduce your error. I'll try to explain what is going on and what you might be able to do in light of this.
The error init() got an unexpected keyword argument 'step_max' essentially means that the code is telling python to create an object with some initial parameters, but python does not recognize the 'step_max' field.
The culprit line in the code is
hessian = nd.Hessian(myFclass.error,step_max=1.0e-2,step_nom=my_step_nom)
You can see that is is trying to tell python to create a nd.Hessian object given three initial parameters: myFclass.error, step_max=1.0e-2, and step_nom=my_step_nom. The problem here is that the nd.Hessian initializer does not take parameters called step_max and step_nom.
So then, what does the nd.Hessian initializer take? nd.Hessian is the Hessian object from the numdifftools package, so I took a look at the source code. Sure enough, this is the source code for initializing a nd.Hessian object:
class Hessian(_Derivative):
def __init__(self, f, step=None, method='central', full_output=False):
Take a look at the __init__. You can see that it takes f, step, method, and full_output. If it had taken in step_max and step_nom, those fields would have been included in the __init__.
One option is to try to use the np.Hessian object correctly and use the step parameter and figure out what step you want to use.
For example, if you replace the
hessian = nd.Hessian(myFclass.error,step_max=1.0e-2,step_nom=my_step_nom)
with
hessian = nd.Hessian(myFclass.error,step=1.0e-2)
You will be able to run the code. It might not give the same results as the paper though, you'll never really know what exact code they ran to get their results.
If you want to continue using this code and want to use the numdifftools package, I suggest taking a look at the source code that has nice explanations and comments and examples.

Use Scipy to find a target value Python

I am new with scipy and python. I have searched quite extensively to find a tool similar to Excel Solver in Python and scipy seems to be very powerful. My question is kinda simple. I was trying to find the discount rate for a series of cash flows so that the sum of the present value of CFs equates to a specific value.
I got this error message if I run the codes. 1500 is my target value so I try to minimize the difference between my target value and f(DR).
RuntimeWarning: overflow encountered in multiply
DRfactor[i] = DRfactor[i-1]*(1+DRs[i])
Any and all help is much appreciated
import numpy as np
import scipy as sp
import scipy.optimize
def f(DR):
CFs = [100]*50
DRs = [np.nan]*50
DRfactor = [np.nan]*50
for i in range(0,50):
if 0<=i<=4:
DRs[i] = DR
else:
DRs[i] = (DRs[i-1]-0.1)*0.9+0.1
if i == 0:
DRfactor[i] = 1+DRs[i]
else:
DRfactor[i] = DRfactor[i-1]*(1+DRs[i])
CFPV = np.divide(CFs, DRfactor)
CFsum = np.sum(CFPV)
return (CFsum - 1500)**2
print (f(0.05))
sol = sp.optimize.minimize(f, 0.05)
sol.x
I figured it out. scipy.optimize.newton can zero out f(DR) and give 0.041611073570941355 which is the same answer given by excel solver.

Categories

Resources