Could you guys please tell me how I can make the following code more pythonic?
The code is correct. Full disclosure - it's problem 1b in Handout #4 of this machine learning course. I'm supposed to use newton's algorithm on the two data sets for fitting a logistic hypothesis. But they use matlab & I'm using scipy
Eg one question i have is the matrixes kept rounding to integers until I initialized one value to 0.0. Is there a better way?
Thanks
import os.path
import math
from numpy import matrix
from scipy.linalg import inv #, det, eig
x = matrix( '0.0;0;1' )
y = 11
grad = matrix( '0.0;0;0' )
hess = matrix('0.0,0,0;0,0,0;0,0,0')
theta = matrix( '0.0;0;0' )
# run until convergence=6or7
for i in range(1, 6):
#reset
grad = matrix( '0.0;0;0' )
hess = matrix('0.0,0,0;0,0,0;0,0,0')
xfile = open("q1x.dat", "r")
yfile = open("q1y.dat", "r")
#over whole set=99 items
for i in range(1, 100):
xline = xfile.readline()
s= xline.split(" ")
x[0] = float(s[1])
x[1] = float(s[2])
y = float(yfile.readline())
hypoth = 1/ (1+ math.exp(-(theta.transpose() * x)))
for j in range(0,3):
grad[j] = grad[j] + (y-hypoth)* x[j]
for k in range(0,3):
hess[j,k] = hess[j,k] - (hypoth *(1-hypoth)*x[j]*x[k])
theta = theta - inv(hess)*grad #update theta after construction
xfile.close()
yfile.close()
print "done"
print theta
One obvious change is to get rid of the "for i in range(1, 100):" and just iterate over the file lines. To iterate over both files (xfile and yfile), zip them. ie replace that block with something like:
import itertools
for xline, yline in itertools.izip(xfile, yfile):
s= xline.split(" ")
x[0] = float(s[1])
x[1] = float(s[2])
y = float(yline)
...
(This is assuming the file is 100 lines, (ie. you want the whole file). If you're deliberately restricting to the first 100 lines, you could use something like:
for i, xline, yline in itertools.izip(range(100), xfile, yfile):
However, its also inefficient to iterate over the same file 6 times - better to load it into memory in advance, and loop over it there, ie. outside your loop, have:
xfile = open("q1x.dat", "r")
yfile = open("q1y.dat", "r")
data = zip([line.split(" ")[1:3] for line in xfile], map(float, yfile))
And inside just:
for (x1,x2), y in data:
x[0] = x1
x[1] = x2
...
x = matrix([[0.],[0],[1]])
theta = matrix(zeros([3,1]))
for i in range(5):
grad = matrix(zeros([3,1]))
hess = matrix(zeros([3,3]))
[xfile, yfile] = [open('q1'+a+'.dat', 'r') for a in 'xy']
for xline, yline in zip(xfile, yfile):
x.transpose()[0,:2] = [map(float, xline.split(" ")[1:3])]
y = float(yline)
hypoth = 1 / (1 + math.exp(theta.transpose() * x))
grad += (y - hypoth) * x
hess -= hypoth * (1 - hypoth) * x * x.transpose()
theta += inv(hess) * grad
print "done"
print theta
the matrixes kept rounding to integers until I initialized one value
to 0.0. Is there a better way?
At the top of your code:
from __future__ import division
In Python 2.6 and earlier, integer division always returns an integer unless there is at least one floating point number within. In Python 3.0 (and in future division in 2.6), division works more how we humans might expect it to.
If you want integer division to return an integer, and you've imported from future, use a double //. That is
from __future__ import division
print 1//2 # prints 0
print 5//2 # prints 2
print 1/2 # prints 0.5
print 5/2 # prints 2.5
You could make use of the with statement.
the code that reads the files into lists could be drastically simpler
for line in open("q1x.dat", "r"):
x = map(float,line.split(" ")[1:])
y = map(float, open("q1y.dat", "r").readlines())
Related
I'm new to sympy and I'm trying to use it to get the values of higher order Greeks of options (basically higher order derivatives). My goal is to do a Taylor series expansion. The function in question is the first derivative.
f(x) = N(d1)
N(d1) is the P(X <= d1) of a standard normal distribution. d1 in turn is another function of x (x in this case is the price of the stock to anybody who's interested).
d1 = (np.log(x/100) + (0.01 + 0.5*0.11**2)*0.5)/(0.11*np.sqrt(0.5))
As you can see, d1 is a function of only x. This is what I have tried so far.
import sympy as sp
from math import pi
from sympy.stats import Normal,P
x = sp.symbols('x')
u = (sp.log(x/100) + (0.01 + 0.5*0.11**2)*0.5)/(0.11*np.sqrt(0.5))
N = Normal('N',0,1)
f = sp.simplify(P(N <= u))
print(f.evalf(subs={x:100})) # This should be 0.5155
f1 = sp.simplify(sp.diff(f,x))
f1.evalf(subs={x:100}) # This should also return a float value
The last line of code however returns an expression, not a float value as I expected like in the case with f. I feel like I'm making a very simple mistake but I can't find out why. I'd appreciate any help.
Thanks.
If you define x with positive=True (which is implied by the log in the definition of u assuming u is real which is implied by the definition of f) it looks like you get almost the expected result (also using f1.subs({x:100}) in the version without the positive x assumption shows the trouble is with unevaluated polar_lift(0) terms):
import sympy as sp
from sympy.stats import Normal, P
x = sp.symbols('x', positive=True)
u = (sp.log(x/100) + (0.01 + 0.5*0.11**2)*0.5)/(0.11*sp.sqrt(0.5)) # changed np to sp
N = Normal('N',0,1)
f = sp.simplify(P(N <= u))
print(f.evalf(subs={x:100})) # 0.541087287864516
f1 = sp.simplify(sp.diff(f,x))
print(f1.evalf(subs={x:100})) # 0.0510177033783834
I'm working on a Python script that takes a mathematical function as an input and spits out useful information to draw a curve for that function (tangents, intersection points, asymptote, etc), and the firststep is finding the definition domain of that function (when that function is valid eg: 1/x-2 df=]-∞,2[U]2,+∞[) and I need to do it using sympy.
Down bellow is the WIP code
from sympy import *
from fractions import *
x = symbols('x')
f = Function('f')
f = input('type function: ')
fp = diff(f)
sol = solve(f, x)
sol_p = solve(fp, x)
print(f"f(x)={f},f'(x)={fp}")
#print(f"x1={sol[0]},x2={sol[1]},x'={sol_p}")
print(f'{len(sol)}')
psol = {}
limits_at_edges = {}
df = solveset(f, x, domain=S.Reals)
for i in range(1,( len(sol) + 1)): # Prints out every item in sol[] after aplying .evalf()
psol["x" + str(i) ] = sol[i - 1].evalf()
for i in range(1, len(sol) + 1):
limits_at_edges[f'limit x -> x{i} f(x)'] = limit(f, x, sol[i - 1])
print(f'Solution:{sol}')
print(f'Processes solution:{psol}')
print(f'Derivative solution:{sol_p}')
print(limits_at_edges)
print(f'Domain:{df}')
pprint(f, use_unicode=True)
This question is similar to How to know whether a function is continuous with sympy?
This can be done using continuous_domain as explained here.
I'm using python and apparently the slowest part of my program is doing simple additions on float variables.
It takes about 35seconds to do around 400,000,000 additions/multiplications.
I'm trying to figure out what is the fastest way I can do this math.
This is how the structure of my code looks like.
Example (dummy) code:
def func(x, y, z):
loop_count = 30
a = [0,1,2,3,4,5,6,7,8,9,10,11,12,...35 elements]
b = [0,11,22,33,44,55,66,77,88,99,1010,1111,1212,...35 elements]
p = [0,0,0,0,0,0,0,0,0,0,0,0,0,...35 elements]
for i in range(loop_count - 1):
c = p[i-1]
d = a[i] + c * a[i+1]
e = min(2, a[i]) + c * b[i]
f = e * x
g = y + d * c
.... and so on
p[i] = d + e + f + s + g5 + f4 + h7 * t5 + y8
return sum(p)
func() is called about 200k times. The loop_count is about 30. And I have ~20 multiplications and ~45 additions and ~10 uses of min/max
I was wondering if there is a method for me to declare all these as ctypes.c_float and do addition in C using stdlib or something similar ?
Note that the p[i] calculated at the end of the loop is used as c in the next loop iteration. For iteration 0, it just uses p[-1] which is 0 in this case.
My constraints:
I need to use python. While I understand plain math would be faster in C/Java/etc. I cannot use it due to a bunch of other things I do in python which cannot be done in C in this same program.
I tried writing this with cython, but it caused a bunch of issues with the environment I need to run this in. So, again - not an option.
I think you should consider using numpy. You did not mention any constraint.
Example case of a simple dot operation (x.y)
import datetime
import numpy as np
x = range(0,10000000,1)
y = range(0,20000000,2)
for i in range(0, len(x)):
x[i] = x[i] * 0.00001
y[i] = y[i] * 0.00001
now = datetime.datetime.now()
z = 0
for i in range(0, len(x)):
z = z+x[i]*y[i]
print "handmade dot=", datetime.datetime.now()-now
print z
x = np.arange(0.0, 10000000.0*0.00001, 0.00001)
y = np.arange(0.0, 10000000.0*0.00002, 0.00002)
now = datetime.datetime.now()
z = np.dot(x,y)
print 'numpy dot =',datetime.datetime.now()-now
print z
outputs
handmade dot= 0:00:02.559000
66666656666.7
numpy dot = 0:00:00.019000
66666656666.7
numpy is more than 100x times faster.
The reason is that numpy encapsulates a C library that does the dot operation with compiled code. In the full python you have a list of potentially generic objects, casting, ...
I have to write a function, s(x) = x * sin(3/x) in python that is capable of taking single values or vectors/arrays, but I'm having a little trouble handling the cases when x is zero (or has an element that's zero). This is what I have so far:
def s(x):
result = zeros(size(x))
for a in range(0,size(x)):
if (x[a] == 0):
result[a] = 0
else:
result[a] = float(x[a] * sin(3.0/x[a]))
return result
Which...doesn't work for x = 0. And it's kinda messy. Even worse, I'm unable to use sympy's integrate function on it, or use it in my own simpson/trapezoidal rule code. Any ideas?
When I use integrate() on this function, I get the following error message: "Symbol" object does not support indexing.
This takes about 30 seconds per integrate call:
import sympy as sp
x = sp.Symbol('x')
int2 = sp.integrate(x*sp.sin(3./x),(x,0.000001,2)).evalf(8)
print int2
int1 = sp.integrate(x*sp.sin(3./x),(x,0,2)).evalf(8)
print int1
The results are:
1.0996940
-4.5*Si(zoo) + 8.1682775
Clearly you want to start the integration from a small positive number to avoid the problem at x = 0.
You can also assign x*sin(3./x) to a variable, e.g.:
s = x*sin(3./x)
int1 = sp.integrate(s, (x, 0.00001, 2))
My original answer using scipy to compute the integral:
import scipy.integrate
import math
def s(x):
if abs(x) < 0.00001:
return 0
else:
return x*math.sin(3.0/x)
s_exact = scipy.integrate.quad(s, 0, 2)
print s_exact
See the scipy docs for more integration options.
If you want to use SymPy's integrate, you need a symbolic function. A wrong value at a point doesn't really matter for integration (at least mathematically), so you shouldn't worry about it.
It seems there is a bug in SymPy that gives an answer in terms of zoo at 0, because it isn't using limit correctly. You'll need to compute the limits manually. For example, the integral from 0 to 1:
In [14]: res = integrate(x*sin(3/x), x)
In [15]: ans = limit(res, x, 1) - limit(res, x, 0)
In [16]: ans
Out[16]:
9⋅π 3⋅cos(3) sin(3) 9⋅Si(3)
- ─── + ──────── + ────── + ───────
4 2 2 2
In [17]: ans.evalf()
Out[17]: -0.164075835450162
import csv
import numpy
from sympy import *
import numpy as np
from numpy import *
import json
reader=csv.reader(open("/Users/61/Desktop/pythonlearning/generator1.csv","rU"),delimiter=',')
a=list(reader)
result=numpy.array(a)
print a
b = []
for n in range(3):
b.append(a[n+1][0:3])
print b
e = np.array(b)
f = e.astype(np.float)
print f
x = Symbol("x")
y = Symbol("y")
coeffs = f
F1 = numpy.poly1d(f[0])
F12 = np.polyder(F1)
print F12
F2 = numpy.poly1d(f[1])
F22 = np.polyder(F2)
print F22
F3 = numpy.poly1d(f[2])
F32 = np.polyder(F3)
print F32
this is my coding and f is a array of numbers like this:[[ 9.68000000e-04 6.95000000e+00 7.49550000e+02]
[ 7.38000000e-04 7.05100000e+00 1.28500000e+03] [ 1.04000000e-03 6.53100000e+00 1.53100000e+03]].
Basically, I want to assign the value of f to form polynomials, and then differentiate the polynomials. The results it like this 0.001936 x + 6.95 0.001476 x + 7.051 0.00208 x + 6.531
My question is how could write a loop for Fn if instead of 3 polynomials, I have n polynomials instead. How could I write a loop to obtain the differentiation for the n polynomials and can easy use the polynomials with different name of it. eg, F1 represent the first polynomial and F2 represent the second and so on.
i tried sth like this, but it doesnt work
i = 1
if i < 3:
F(i)=numpy.poly1d(f[i-1])
else:
i = i+1
You need to use a loop to deal with a variable number of polynomials and a data structure to store them. Try using a dictionary, iterating using a for loop.
numberPolynomials = 3
F = {}
for n in range(1, numberPolynomials+1):
F[n] = np.poly1d(f[n-1])
F[(n, 2)] = np.polyder(F[n])
print F[(n, 2)]
Now you can refer to the polynomial not as F1, F2, etc. but as F[1], F[2], etc. For what you had called F12, F22, F32 would then be F[(1,2)], F[(2,2)], F[(3,2)]. Though, if you aren't going to be using the originals you should overwrite them and probably just use a list.
This is assuming, you change the 3x imports of numpy to:
import numpy as np