I'm taking my first course in programming, the course is meant for physics applications and the like. We have an exam coming up, my professor published a practice exam with the following question.
The Maxwell distribution in speed v for an ideal gas consisting of particles of mass m at Kelvin temperature T is given by:
Stackoverflow doesn't use MathJax for formula's, and I can't quite figure out how to write a formula on this site. So, here is a link to WolframAlpha:
where k is Boltzmann's constant, k = 1.3806503 x 10-23 J/K.
Write a Python script called maxwell.py which prints v and f(v) to standard output in two column format. For the particle mass, choose the mass of a proton, m = 1.67262158 10-27 kg. For the gas temperature, choose the temperature at the surface of the sun, T = 5778 K.
Your output should consist of 300 data points, ranging from v = 100 m/s to v = 30,000 m/s in steps of size dv = 100 m/s.
So, here is my attempt at the code.
import math as m
import sys
def f(v):
n = 1.67262158e-27 #kg
k = 1.3806503e-23 #J/K
T = 5778 #Kelvin
return (4*m.pi)*((n/(2*m.pi*k*T))**(3/2))*(v**2)*m.exp((-n*v**2)/(2*k*T))
v = 100 #m/s
for i in range(300):
a = float(f(v))
print (v, a)
v = v + 100
But, my professors solution is:
import numpy as np
def f(v):
m = 1.67262158e-27 # kg
T = 5778. # K
k = 1.3806503e-23 # J/K
return 4.*np.pi * (m/(2.*np.pi*k*T))**1.5 * v**2 * np.exp(-m*v**2/(2.*k*T))
v = np.linspace(100.,30000.,300)
fv = f(v)
vfv = zip(v,fv)
for x in vfv:
print "%5.0f %.3e"%x
# print np.sum(fv*100.)
So, these are very different codes. From what I can tell, they produce the same result. I guess my question is, simply, why is my code incorrect?
Thank you!
EDIT:
So, I asked my professor about it and this was his response.
I think your code is fine. It would run much faster using numpy, but the problem didn't specify that you needed numpy. I might have docked one point for not looping through a list of v's (your variable i doesn't do anything). Also, you should have used v += 100. Almost, these two things together would have been one point out of 10.
1st: Is there any better syntax for doing the range in my code, since my variable i doesn't do anything?
2nd: What is the purpose of v += 100?
Things to be careful about when dealing with numbers is implicit type conversion from floats to ints.
One instance I could figure in your code is that you use (3/2) which evaluates to 1, while the other code uses 1.5 directly.
Related
I am working on a code to solve for the optimum combination of diameter size of number of pipelines. The objective function is to find the least sum of pressure drops in six pipelines.
As I have 15 choices of discrete diameter sizes which are [2,4,6,8,12,16,20,24,30,36,40,42,50,60,80] that can be used for any of the six pipelines that I have in the system, the list of possible solutions becomes 15^6 which is equal to 11,390,625
To solve the problem, I am using Mixed-Integer Linear Programming using Pulp package. I am able to find the solution for the combination of same diameters (e.g. [2,2,2,2,2,2] or [4,4,4,4,4,4]) but what I need is to go through all combinations (e.g. [2,4,2,2,4,2] or [4,2,4,2,4,2] to find the minimum. I attempted to do this but the process is taking a very long time to go through all combinations. Is there a faster way to do this ?
Note that I cannot calculate the pressure drop for each pipeline as the choice of diameter will affect the total pressure drop in the system. Therefore, at anytime, I need to calculate the pressure drop of each combination in the system.
I also need to constraint the problem such that the rate/cross section of pipeline area > 2.
Your help is much appreciated.
The first attempt for my code is the following:
from pulp import *
import random
import itertools
import numpy
rate = 5000
numberOfPipelines = 15
def pressure(diameter):
diameterList = numpy.tile(diameter,numberOfPipelines)
pressure = 0.0
for pipeline in range(numberOfPipelines):
pressure += rate/diameterList[pipeline]
return pressure
diameterList = [2,4,6,8,12,16,20,24,30,36,40,42,50,60,80]
pipelineIds = range(0,numberOfPipelines)
pipelinePressures = {}
for diameter in diameterList:
pressures = []
for pipeline in range(numberOfPipelines):
pressures.append(pressure(diameter))
pressureList = dict(zip(pipelineIds,pressures))
pipelinePressures[diameter] = pressureList
print 'pipepressure', pipelinePressures
prob = LpProblem("Warehouse Allocation",LpMinimize)
use_diameter = LpVariable.dicts("UseDiameter", diameterList, cat=LpBinary)
use_pipeline = LpVariable.dicts("UsePipeline", [(i,j) for i in pipelineIds for j in diameterList], cat = LpBinary)
## Objective Function:
prob += lpSum(pipelinePressures[j][i] * use_pipeline[(i,j)] for i in pipelineIds for j in diameterList)
## At least each pipeline must be connected to a diameter:
for i in pipelineIds:
prob += lpSum(use_pipeline[(i,j)] for j in diameterList) ==1
## The diameter is activiated if at least one pipelines is assigned to it:
for j in diameterList:
for i in pipelineIds:
prob += use_diameter[j] >= lpSum(use_pipeline[(i,j)])
## run the solution
prob.solve()
print("Status:", LpStatus[prob.status])
for i in diameterList:
if use_diameter[i].varValue> pressureTest:
print("Diameter Size",i)
for v in prob.variables():
print(v.name,"=",v.varValue)
This what I did for the combination part which took really long time.
xList = np.array(list(itertools.product(diameterList,repeat = numberOfPipelines)))
print len(xList)
for combination in xList:
pressures = []
for pipeline in range(numberOfPipelines):
pressures.append(pressure(combination))
pressureList = dict(zip(pipelineIds,pressures))
pipelinePressures[combination] = pressureList
print 'pipelinePressures',pipelinePressures
I would iterate through all combinations, I think you would run into memory problems otherwise trying to model ALL combinations in a MIP.
If you iterate through the problems perhaps using the multiprocessing library to use all cores, it shouldn't take long just remember only to hold information on the best combination so far, and not to try and generate all combinations at once and then evaluate them.
If the problem gets bigger you should consider Dynamic Programming Algorithms or use pulp with column generation.
As I'm really struggleing to get from R-code, to Python code, I would like to ask some help. The code I want to use has been provided to my from withing the mathematics forum of stackexchange.
https://math.stackexchange.com/questions/2205573/curve-fitting-on-dataset
I do understand what is going on. But I'm really having a hard time trying to solve the R-code, as I have never seen anything of it. I have written the function to return the sum of squares. But I'm stuck at how I could use a function similar to the optim function. And also I don't really like the guesswork at the initial values. I would like it better to run and re-run a type of optim function untill I get the wanted result, because my needs for a nearly perfect curve fit are really high.
def model (par,x):
n = len(x)
res = []
for i in range(1,n):
A0 = par[3] + (par[4]-par[1])*par[6] + (par[5]-par[2])*par[6]**2
if(x[i] == par[6]):
res[i] = A0 + par[1]*x[i] + par[2]*x[i]**2
else:
res[i] = par[3] + par[4]*x[i] + par[5]*x[i]**2
return res
This is my model function...
def sum_squares (par, x, y):
ss = sum((y-model(par,x))^2)
return ss
And this is the sum of squares
But I have no idea on how to convert this:
#I found these initial values with a few minutes of guess and check.
par0 <- c(7,-1,-395,70,-2.3,10)
sol <- optim(par= par0, fn=sqerror, x=x, y=y)$par
To Python code...
I wrote an open source Python package (BSD license) that has a genetic algorithm (Differential Evolution) front end to the scipy Levenberg-Marquardt solver, it functions similarly to what you describe in your question. The github URL is:
https://github.com/zunzun/pyeq3
It comes with a "user-defined function" example that's fairly easy to use:
https://github.com/zunzun/pyeq3/blob/master/Examples/Simple/FitUserDefinedFunction_2D.py
along with command-line, GUI, cluster, parallel, and web-based examples. You can install the package with "pip3 install pyeq3" to see if it might suit your needs.
Seems like I have been able to fix the problem.
def model (par,x):
n = len(x)
res = np.array([])
for i in range(0,n):
A0 = par[2] + (par[3]-par[0])*par[5] + (par[4]-par[1])*par[5]**2
if(x[i] <= par[5]):
res = np.append(res, A0 + par[0]*x[i] + par[1]*x[i]**2)
else:
res = np.append(res,par[2] + par[3]*x[i] + par[4]*x[i]**2)
return res
def sum_squares (par, x, y):
ss = sum((y-model(par,x))**2)
print('Sum of squares = {0}'.format(ss))
return ss
And then I used the functions as follow:
parameter = sy.array([0.0,-8.0,0.0018,0.0018,0,200])
res = least_squares(sum_squares, parameter, bounds=(-360,360), args=(x1,y1),verbose = 1)
The only problem is that it doesn't produce the results I'm looking for... And that is mainly because my x values are [0,360] and the Y values only vary by about 0.2, so it's a hard nut to crack for this function, and it produces this (poor) result:
Result
I think that the range of x values [0, 360] and y values (which you say is ~0.2) is probably not the problem. Getting good initial values for the parameters is probably much more important.
In Python with numpy / scipy, you would definitely want to not loop over values of x but do something more like
def model(par,x):
res = par[2] + par[3]*x + par[4]*x**2
A0 = par[2] + (par[3]-par[0])*par[5] + (par[4]-par[1])*par[5]**2
res[np.where(x <= par[5])] = A0 + par[0]*x + par[1]*x**2
return res
It's not clear to me that that form is really what you want: why should A0 (a value independent of x added to a portion of the model) be so complicated and interdependent on the other parameters?
More importantly, your sum_of_squares() function is actually not what least_squares() wants: you should return the residual array, you should not do the sum of squares yourself. So, that should be
def sum_of_squares(par, x, y):
return (y - model(par, x))
But most importantly, there is a conceptual problem that is probably going to plague this model: Your par[5] is meant to represent a breakpoint where the model changes form. This is going to be very hard for these optimization routines to find. These routines generally make a very small change to each parameter value to estimate to derivative of the residual array with respect to that variable in order to figure out how to change that variable. With a parameter that is essentially used as an integer, the small change in the initial value will have no effect at all, and the algorithm will not be able to determine the value for this parameter. With some of the scipy.optimize algorithms (notably, leastsq) you can specify a scale for the relative change to make. With leastsq that is called epsfcn. You may need to set this as high as 0.3 or 1.0 for fitting the breakpoint to work. Unfortunately, this cannot be set per variable, only per fit. You might need to experiment with this and other options to least_squares or leastsq.
"""Some simulations to predict the future portfolio value based on past distribution. x is
a numpy array that contains past returns.The interpolated_returns are the returns
generated from the cdf of the past returns to simulate future returns. The portfolio
starts with a value of 100. portfolio_value is filled up progressively as
the program goes through every loop. The value is multiplied by the returns in that
period and a dollar is removed."""
portfolio_final = []
for i in range(10000):
portfolio_value = [100]
rand_values = np.random.rand(600)
interpolated_returns = np.interp(rand_values,cdf_values,x)
interpolated_returns = np.add(interpolated_returns,1)
for j in range(1,len(interpolated_returns)+1):
portfolio_value.append(interpolated_returns[j-1]*portfolio_value[j-1])
portfolio_value[j] = portfolio_value[j]-1
portfolio_final.append(portfolio_value[-1])
print (np.mean(portfolio_final))
I couldn't find a way to write this code using numpy. I was having a look at iterations using nditer but I was unable to move ahead with that.
I guess the easiest way to figure out how you can vectorize your stuff would be to look at the equations that govern your evolution and see how your portfolio actually iterates, finding patterns that could be vectorized instead of trying to vectorize the code you already have. You would have noticed that the cumprod actually appears quite often in your iterations.
Nevertheless you can find the semi-vectorized code below. I included your code as well such that you can compare the results. I also included a simple loop version of your code which is much easier to read and translatable into mathematical equations. So if you share this code with somebody else I would definitely use the simple loop option. If you want some fancy-pants vectorizing you can use the vector version. In case you need to keep track of your single steps you can also add an array to the simple loop option and append the pv at every step.
Hope that helps.
Edit: I have not tested anything for speed. That's something you can easily do yourself with timeit.
import numpy as np
from scipy.special import erf
# Prepare simple return model - Normal distributed with mu &sigma = 0.01
x = np.linspace(-10,10,100)
cdf_values = 0.5*(1+erf((x-0.01)/(0.01*np.sqrt(2))))
# Prepare setup such that every code snippet uses the same number of steps
# and the same random numbers
nSteps = 600
nIterations = 1
rnd = np.random.rand(nSteps)
# Your code - Gives the (supposedly) correct results
portfolio_final = []
for i in range(nIterations):
portfolio_value = [100]
rand_values = rnd
interpolated_returns = np.interp(rand_values,cdf_values,x)
interpolated_returns = np.add(interpolated_returns,1)
for j in range(1,len(interpolated_returns)+1):
portfolio_value.append(interpolated_returns[j-1]*portfolio_value[j-1])
portfolio_value[j] = portfolio_value[j]-1
portfolio_final.append(portfolio_value[-1])
print (np.mean(portfolio_final))
# Using vectors
portfolio_final = []
for i in range(nIterations):
portfolio_values = np.ones(nSteps)*100.0
rcp = np.cumprod(np.interp(rnd,cdf_values,x) + 1)
portfolio_values = rcp * (portfolio_values - np.cumsum(1.0/rcp))
portfolio_final.append(portfolio_values[-1])
print (np.mean(portfolio_final))
# Simple loop
portfolio_final = []
for i in range(nIterations):
pv = 100
rets = np.interp(rnd,cdf_values,x) + 1
for i in range(nSteps):
pv = pv * rets[i] - 1
portfolio_final.append(pv)
print (np.mean(portfolio_final))
Forget about np.nditer. It does not improve the speed of iterations. Only use if you intend to go one and use the C version (via cython).
I'm puzzled about that inner loop. What is it supposed to be doing special? Why the loop?
In tests with simulated values these 2 blocks of code produce the same thing:
interpolated_returns = np.add(interpolated_returns,1)
for j in range(1,len(interpolated_returns)+1):
portfolio_value.append(interpolated_returns[j-1]*portfolio[j-1])
portfolio_value[j] = portfolio_value[j]-1
interpolated_returns = (interpolated_returns+1)*portfolio - 1
portfolio_value = portfolio_value + interpolated_returns.tolist()
I assuming that interpolated_returns and portfolio are 1d arrays of the same length.
With the comments from the answer, I rewrote the code below (math.1p(x)->math.log(x)), which now should work and give a good approximation of the volatility.
I am trying to create a short code to calculate the implied volatility of a European Call option. I wrote the code below:
from scipy.stats import norm
import math
norm.cdf(1.96)
#c_p - Call(+1) or Put(-1) option
#P - Price of option
#S - Strike price
#E - Exercise price
#T - Time to expiration
#r - Risk-free rate
#C = SN(d_1) - Ee^{-rT}N(D_2)
def implied_volatility(Price,Stock,Exercise,Time,Rf):
P = float(Price)
S = float(Stock)
E = float(Exercise)
T = float(Time)
r = float(Rf)
sigma = 0.01
print (P, S, E, T, r)
while sigma < 1:
d_1 = float(float((math.log(S/E)+(r+(sigma**2)/2)*T))/float((sigma*(math.sqrt(T)))))
d_2 = float(float((math.log(S/E)+(r-(sigma**2)/2)*T))/float((sigma*(math.sqrt(T)))))
P_implied = float(S*norm.cdf(d_1) - E*math.exp(-r*T)*norm.cdf(d_2))
if P-(P_implied) < 0.001:
return sigma
sigma +=0.001
return "could not find the right volatility"
print implied_volatility(15,100,100,1,0.05)
This yields: 0.595 volatility which should be somewhere 0.3203. That is a huge difference...
I know this is not a fast method by any means, I just want to demonstrate how the principle works, but I am not able to calculate a good approximation.
For some reason when I call the function it gives me really bad approximation of the actual implied volatility which I calculated using a Matlab Program and the following webpage: Implied Volatility. Could anyone please help me to figure out where I made the mistake?
There are two problems I see, none of which are directly python related:
You are using log1p(x), which is the natural logarithm of 1+x, while you actually want log(x), which is the natural logarithm of x (cf. Wikipedia).
An option price of 100 is way to high considering the other parameters. Try to calculate the implied volatility for a price of 10 - which should be about 0.18 both by your program and the calculator you linked.
In Python2, the result of 5 / 2 is 2. It uses floor division. To fix that, make every number a float. In your implied_volatility function, change P = Price to P = float(Price), S = Stock to S = float(Stock), etc.
I have successfully generated .wav files in python of sine waves at different frequencies.
If I wish to generate harmonies, for example, a C major tirade, am I supposed to add the each sine wave of the individual notes together?
When adding 2 notes together, say C and G, the program creates the correct harmony. When I attempt to add a third note though, there is an overflow error. How might this be successfully accomplished.
The Code:
I am placing the data for the sine waves into an array of signed short integers.
wave = array.array('h')
And then adding multiple waves togeather to generate the harmonies.
for i in range(len(data)):
wave1[i] += wave2[i]
This works!
But when I add a third array, (wave3), it overflows.
This is because the signed short integer has reached its maximum. I am working with a 16 bit rate. Is the problem simply that the bit rate is too low? When creating complex audio with lots of harmonies, does the bit rate simply need to be much higher? Have I approached the problem in the absolute wrong direction?
Full Source
I don't think it's the byterate. You just have to normalize the values so that they fit properly. I've rewritten your code a bit so it uses lists at first, then arranges all the values from 0 to 32767, taking into account the volume, and puts it into the array.
def normalize(nmin, nmax, nums): #this could probably be done a bit shorter
orange = max(nums)-min(nums)
nrange = nmax-nmin
nums = [float(num)/orange*nrange for num in nums]
omin = min(nums)
nums = [num-omin+nmin for num in nums]
return nums
if __name__ == '__main__':
data, data2, data3 = [], [], []
data.extend(create_data(getTime(1), getFreq('C', 4)))
data2.extend(create_data(getTime(1), getFreq('Bb', 4)))
data3.extend(create_data(getTime(1), getFreq('G', 4)))
for i in range(len(data)):
data[i] += data2[i]
data[i] += data3[i]
data = array.array('h', [int(val) for val in normalize(0, 32767*VOLUME/100, data)])
write_wave(data, int(len(data)/SAMPLE_RATE))
winsound.PlaySound('output.wav', winsound.SND_FILENAME)