I'm gradually moving from Matlab to Python and would like to get some advice on optimising an iterative loop.
This is how I am currently running the loop, and for info I've included the code that defines the variables.
nh = 2000
h = np.array(range(nh))
nt = 10000
wmin = 1
wmax = 10
hw = np.array(wmin + (wmax-wmin)*invlogit(randn(1,nh)));
sl = np.array(zeros((nh,1))+radians(40))
fa = np.array(zeros((nh,1))+radians(35))
c = np.array(zeros((nh,1))+4.4)
y = np.array(zeros((nh,1))+17.6)
yw = np.array(zeros((nh,1))+9.81)
ir = 0.028
m = np.array(zeros((nh,nt)));
m[:,49] = 0.1
z = np.array(zeros((nh,nt)))
z[:,0] = 0+(3.0773-0)*rand(nh,1).T
reset = np.array(zeros((nh,nt)))
fs = np.array(zeros((nh,nt)))
for t in xrange(0, nt-1):
fs[:,t] = (c.T+(y.T-m[:,t]*yw.T)*z[:,t]*(np.cos(sl.T)**2)*np.tan(fa.T))/(y.T*z[:,t]*np.sin(sl.T)*np.cos(sl.T))
reset[fs[:,t]<=1,t+1] = 1;
z[fs[:,t]<=1,t+1] = 0;
z[fs[:,t]>1,t+1] = z[fs[:,t]>1,t]+(ir/hw[0,fs[:,t]>1]).T
This is how I would optimise the code in Matlab, however it runs fairly slowly in python. I suspect there is a more pythonic way of doing this and would really appreciate a nudge in the right direction.
Many thanks!
Not specifically about the loop, you're doing a ton of extra work in calls that look like:
np.array(zeros((nh,nt)))
Just use:
np.zeros((nh,nt))
in its place. Additionally, you could replace:
h = np.array(range(nh))
with:
h = np.arange(nh)
Other comments:
You're calling np.sin(sl.T)*np.cos(sl.T) in every loop although, sl does not appear to be changing at all. Just calculate it once and assign it to a variable that you use in the loop. You do this in a bunch of your trig calls.
The expression
(c.T+(y.T-m[:,t]*yw.T)*z[:,t]*(np.cos(sl.T)**2)*np.tan(fa.T))/(y.T*z[:,t]*np.sin(sl.T)*np.cos(sl.T))
uses c, y, m, yw, sl, fa that do not change inside the loop. You could compute several subexpressions before the loop.
Also, most of those arrays contain one repeated value. You could compute with scalars instead:
sl = radians(40)
fa = radians(35)
c = 4.4
y = 17.6
yw = 9.81
Then, with precomputed subexpressions:
A = cos(sl)**2 * tan(fa) * (y - m*yw)
B = y*sin(sl)*cos(sl)
for t in xrange(0, nt-1):
fs[:,t] = (c + A[:,t]*z[:,t]) / (B*z[:,t])
less = fs[:,t]<=1
more = np.logical_not(less)
reset[less,t+1] = 1
z[less,t+1] = 0
z[more,t+1] = z[more,t]+(ir/hw[0,more]).T
Related
I cannot seem to get an output when I pass numbers to the function. I need to get the computed value and subtract it from the exact. Is there something I am not getting right?
def f1(x):
f1 = np.exp(x)
return f1;
def trapezoid(f,a,b,n):
'''Computes the integral of functions using the trapezoid rule
f = function of x
a = upper limit of the function
b = lower limit of the function
N = number of divisions'''
h = (b-a)/N
xi = np.linspace(a,b,N+1)
fi = f(xi)
s = 0.0
for i in range(1,N):
s = s + fi[i]
s = np.array((h/2)*(fi[0] + fi[N]) + h*s)
print(s)
return s
exactValue = np.full((20),math.exp(1)-1)
a = 0.0;b = 1.0 # integration interval [a,b]
computed = np.empty(20)
E=np.zeros(20)
exact=np.zeros(20)
N=20
def convergence_tests(f, a, b, N):
n = np.zeros(N, 1);
E = np.zeros(N, 1);
Exact = math.exp(1)-1
for i in range(N):
n[i] = 2^i
computed[i] = trapezoid(f, a, b, n[i])
E = abs(Exact - computed)
print(E, computed)
return E, computed
You have defined several functions, but your main program never calls any of them. In fact, your "parent" function convergence_test cannot be called, because it's defined at the bottom of the program.
I suggest that you use incremental programming: write a few lines; test those before you proceed to the next mini-task in your code. In the posting, you've written about 30 lines of active code, without realizing that virtually none of it actually executes. There may well be several other errors in this; you'll likely have a difficult time fixing all of them to get the expected output.
Start small and grow incrementally.
I'm using python and apparently the slowest part of my program is doing simple additions on float variables.
It takes about 35seconds to do around 400,000,000 additions/multiplications.
I'm trying to figure out what is the fastest way I can do this math.
This is how the structure of my code looks like.
Example (dummy) code:
def func(x, y, z):
loop_count = 30
a = [0,1,2,3,4,5,6,7,8,9,10,11,12,...35 elements]
b = [0,11,22,33,44,55,66,77,88,99,1010,1111,1212,...35 elements]
p = [0,0,0,0,0,0,0,0,0,0,0,0,0,...35 elements]
for i in range(loop_count - 1):
c = p[i-1]
d = a[i] + c * a[i+1]
e = min(2, a[i]) + c * b[i]
f = e * x
g = y + d * c
.... and so on
p[i] = d + e + f + s + g5 + f4 + h7 * t5 + y8
return sum(p)
func() is called about 200k times. The loop_count is about 30. And I have ~20 multiplications and ~45 additions and ~10 uses of min/max
I was wondering if there is a method for me to declare all these as ctypes.c_float and do addition in C using stdlib or something similar ?
Note that the p[i] calculated at the end of the loop is used as c in the next loop iteration. For iteration 0, it just uses p[-1] which is 0 in this case.
My constraints:
I need to use python. While I understand plain math would be faster in C/Java/etc. I cannot use it due to a bunch of other things I do in python which cannot be done in C in this same program.
I tried writing this with cython, but it caused a bunch of issues with the environment I need to run this in. So, again - not an option.
I think you should consider using numpy. You did not mention any constraint.
Example case of a simple dot operation (x.y)
import datetime
import numpy as np
x = range(0,10000000,1)
y = range(0,20000000,2)
for i in range(0, len(x)):
x[i] = x[i] * 0.00001
y[i] = y[i] * 0.00001
now = datetime.datetime.now()
z = 0
for i in range(0, len(x)):
z = z+x[i]*y[i]
print "handmade dot=", datetime.datetime.now()-now
print z
x = np.arange(0.0, 10000000.0*0.00001, 0.00001)
y = np.arange(0.0, 10000000.0*0.00002, 0.00002)
now = datetime.datetime.now()
z = np.dot(x,y)
print 'numpy dot =',datetime.datetime.now()-now
print z
outputs
handmade dot= 0:00:02.559000
66666656666.7
numpy dot = 0:00:00.019000
66666656666.7
numpy is more than 100x times faster.
The reason is that numpy encapsulates a C library that does the dot operation with compiled code. In the full python you have a list of potentially generic objects, casting, ...
I have a pandas dataframe with 6 million rows. The columns are:
['x', 'y']
I need to apply a simple calculation between x an y, and append it to the dataframe.
This is what I've tried:
'''
Calculates the height of a pressure level in feet
'''
def pressure_to_elevation(P, T = None):
sea_level_pressure = 1013.25
if T is not None:
# https://www.omnicalculator.com/physics/air-pressure-at-altitude
P0 = sea_level_pressure
g = 9.80665
M = 0.0289644
R0 = 8.31447
m = (np.log(P/P0)*T) / -(g*M/R0)
f = 3.28084 * m
return f
b = 0.190284
c = 145366.45
return (1-math.pow((P/sea_level_pressure), b)) * c
test_df['result'] = test_fd.apply(lambda row: pressure_to_elevation(row['x'], row['y']),axis=1)
Unfortunately, this takes a ridiculous amount of time... in fact, I've yet to see it complete.
Is there a faster way to do this?
Try this:
def pressure_to_elevation(P, T):
sea_level_pressure = 1013.25
P0 = sea_level_pressure
g = 9.80665
M = 0.0289644
R0 = 8.31447
b = 0.190284
c = 145366.45
return np.where(T.notnull(),
3.28084 * ((np.log(P/P0)*T) / -(g*M/R0)),
(1-np.pow((P/sea_level_pressure), b)) * c)
Usage:
test_df['result'] = pressure_to_elevation(test_df['x'], test_df['y'])
I believe if you break this out into separate steps and avoid iterating through the entire dataframe, the speed will increase dramatically. Give the following a shot.
test_df['result_1'] = (test_df['x']/sea_level_pressure)
test_df['result_1'] = test_df['result']**0.190284
test_df['result_1'] = (1 - test_df['result'])*145366.45
test_df['result_2'] = 3.28084*((np.log(test_df['x']/sea_level_pressure)*test_df['y'])/(-1*(9.80665*0.0289644/8.31447)))
test_df['final_result'] = np.where(pd.isnull(test_df['y']), test_df['result_1'], test_df['result_2'])
I have following python code:
H1 = [[0.04,0.03,0.01,0.002],[0.02,0.04,0.001,0.5]]
H2 = [[0.06,0.02,0.02,0.004],[0.8,0.09,0.6,0.1]]
D1 = [0.01,0.02,0.1,0.01]
D2 = [0.1,0.3,0.01,0.4]
Tp = np.sum(D1)
Tn = np.sum(D2)
T = []
append2 = T.append
E = []
append3 = E.append
for h1,h2 in itertools.izip(H1,H2)
Err = []
append1 = Err.append
for v in h1:
L1 = [1 if i>=v else 0 for i in h1]
L2 = [1 if i>=v else 0 for i in h2]
Sp = np.dot(D1,L1)
Sn = np.dot(D2,L2)
err = min(Sp+Tn-Sn, Sn+Tp-Sp)
append1(err)
b = np.argmin(Err)
append2(h1[b])
append3(Err[b])
This is just an example code. I need to run the inner for loop near about 20,000 times (here it runs just twice). But the inner for loop takes much time making it inpractical to use.
In line profiler, it shows that line Sp = np.dot(D1,L1) , Sn = np.dot(D2,L2) and b = np.argmin(Err) are the most time consuming.
How can I reduce the time taken by above code.
Any help will be much appreciated.
Thanks!
You can get a pretty big speed boost if you use numpy functions with numpy arrays instead of lists. Most numpy functions will convert lists to arrays internally and that adds a lot of overhead to the run time. Here is a simple example:
In [16]: a = range(10)
In [17]: b = range(10)
In [18]: aa = np.array(a)
In [19]: bb = np.array(b)
In [20]: %timeit np.dot(a, b)
10000 loops, best of 3: 54 us per loop
In [21]: %timeit np.dot(aa, bb)
100000 loops, best of 3: 3.4 us per loop
numpy.dot run 16x faster when called with arrays in this case. Also when you use numpy arrays you'll be able to simplify some of your code which should also help it run faster. For example if h1 is an array, L1 = [1 if i>=v else 0 for i in h1] can be written as h1 > v which returns an array and should also run faster. Bellow I've gone ahead and replaced your lists with arrays so you can see what it would look like.
import numpy as np
H1 = np.array([[0.04,0.03,0.01,0.002],[0.02,0.04,0.001,0.5]])
H2 = np.array([[0.06,0.02,0.02,0.004],[0.8,0.09,0.6,0.1]])
D1 = np.array([0.01,0.02,0.1,0.01])
D2 = np.array([0.1,0.3,0.01,0.4])
Tp = np.sum(D1)
Tn = np.sum(D2)
T = np.zeros(H1.shape[0])
E = np.zeros(H1.shape[0])
for i in range(len(H1)):
h1 = H1[i]
h2 = H2[i]
Err = np.zeros(len(h1))
for j in range(len(h1)):
v = h1[j]
L1 = h1 > v
L2 = h2 > v
Sp = np.dot(D1, L1)
Sn = np.dot(D2, L2)
err = min(Sp+Tn-Sn, Sn+Tp-Sp)
Err[j] = err
b = np.argmin(Err)
T[i] = h1[b]
E[i] = Err[b]
Once you're more comfortable with numpy arrays you might want to look into expressing at least your inner loop using broadcasting. For some applications, using broadcasting can be much more efficient than python loops. Good luck, hope that helps.
You need to keep the data in ndarray types. When you do a numpy operation on a list, it has to construct a new array each time. I modified your code to run a variable number of times and found it too ~1s for 10000 iterations. Changing the datatypes to ndarrays reduced that by about a factor of two, and I think there is still some improvement to make (the first version of this had a bug that made it execute too fast)
import itertools
import numpy as np
N = 10000
H1 = [np.array([0.04,0.03,0.01,0.002])] * N
H2 = [np.array([0.06,0.02,0.02,0.004])] * N
D1 = np.array([0.01,0.02,0.1,0.01] )
D2 = np.array([0.1,0.3,0.01,0.4] )
Tp = np.sum(D1)
Tn = np.sum(D2)
T = []
append2 = T.append
E = []
append3 = E.append
for h1,h2 in itertools.izip(H1,H2):
Err = []
append1 = Err.append
for v in h1:
#L1 = [1 if i>=v else 0 for i in h1]
#L2 = [1 if i>=v else 0 for i in h2]
L1 = h1 > v
L2 = h2 > v
Sp = np.dot(D1,L1)
Sn = np.dot(D2,L2)
err = min(Sp+Tn-Sn, Sn+Tp-Sp)
append1(err)
b = np.argmin(Err)
append2(h1[b])
append3(Err[b])
There's some low-hanging fruit in your list comprehensions:
L1 = [1 if i>=v else 0 for i in h1]
L2 = [1 if i>=v else 0 for i in h2]
The above could be written as:
L1 = [i>=v for i in h1]
L2 = [i>=v for i in h2]
Because Booleans are a subclass of integers, True and False are already 1 and 0, just wearing fancy clothes.
err = min(Sp+Tn-Sn, Sn+Tp-Sp)
append1(err)
You could combine the above two lines to avoid the variable assignment and access.
If you put the code in a function, all local variable usage will be slightly faster. Also, any global functions or methods you use (e.g. min, np.dot) can be converted to locals in the function signature using default arguments. np.dot is an especially slow call to make (outside of how long the operation itself takes) because it involves an attribute lookup. This would be similar to the optimization you already make with the list append methods.
Now I imagine none of this will really affect performance much, since your question really seems to be "how can I make NumPy faster?" (which others are on top of for you) but they might have some impact and be worth doing.
If I have correctly understood what does the instruction np.dot() on two lists of dimensions 1, it seems to me that the following code should do the same as yours.
Could you test its speed , please ?
Its principle is to play on indices instead of the elements of lists, and to use the peculiarity of a list defined as a default value of a function
H1 = [[0.04,0.03,0.01,0.002],[0.02,0.04,0.001,0.5]]
H2 = [[0.06,0.02,0.02,0.004],[0.8,0.09,0.6,0.1]]
D1 = [0.01,0.02,0.1,0.01]
D2 = [0.1,0.3,0.01,0.4]
Tp = np.sum(D1)
Tn = np.sum(D2)
T,E = [],[]
append2 = T.append
append3 = E.append
ONE,TWO = [],[]
def zoui(v, ONE=ONE,TWO=TWO,
D1=D1,D2=D2,Tp=Tp,Tn=Tn,tu0123 = (0,1,2,3)):
diff = sum(D1[i] if ONE[i]>=v else 0 for i in tu0123)\
-sum(D2[i] if TWO[i]>=v else 0 for i in tu0123)
#or maybe
#diff = sum(D1[i] * ONE[i]>=v for i in tu0123)\
# -sum(D2[i] * TWO[i]>=v for i in tu0123)
return min(Tn+diff,Tp-diff)
for n in xrange(len(H1)):
ONE[:] = H1[n]
TWO[:] = H2[n]
Err = map(zoui,ONE)
b = np.argmin(Err)
append2(ONE[b])
append3(Err[b])
This is my first time making an algorithm for any of my projects.
I have an Xbox controller I will be using in a Python script using Pygame to read the output from the controller. Pygame outputs 0 when centered, -1 when full left, and 1 when full right.
For my application I need to translate this to values between 1000 and 2000 where 1000 is -1, 1500 is 0, and 2000 is 1.
Not asking necessarily for an answer, just some help with how to go about making an algorithm for myself.
If these are the only values possible, then you can create a dict to map Pygame outputs to your values.
positionMap = {-1:1000,0:1500,1:2000}
newVal = positionMap[oldVal]
But, if intermediate values are also possible then use this equation:
newVal = oldVal*500 + 1500
Your function can be of the form f(x) = ax^2 + bx + c. Put your transformations in a system and solve it:
a(-1)^2 + b(-1) + c = 1000
a*0^2 + b*0 + c = 1500
a*1^2 + b*1 + c = 2000
a - b + c = 1000
c = 1500
a + b + c = 2000
a - b = -500
a + b = 500
=> 2a = 0 => a = 0
=> b = 500
So you can use the function f(x) = 500x + 1500.
f(-1) = 1000
f(0) = 1500
f(1) = 2000
f(0.3) = 1650
There are many ways to do what you're asking. The simplest way is a linear mapping using the two point form of a line. You probably learned this in algebra but you might have forgotten it so here's a refresher: http://www.mathsisfun.com/algebra/line-equation-2points.html.
In your case, the x values are what you're given (-1 .. 1) and the y values are what you want (1000..2000).
Of course if you'd like to change the feel of the controller, you might choose not to use a linear function. You might want something that slows down as you approach the limits of the controller for example.
def mapInput(value):
minInput = -1
maxInput = 1
minOutput = 1000
maxOutput = 2000
return (value - minInput)*(maxOutput-minOutput)/(maxInput-minInput) + minOutput
If those values are never going to change, then you could save a few processor cycles and just do:
def mapInput(value):
return value * 500 + 1500
If you are new to Python too, then might like to use an if/elif/else statement as that might be more readable for you:
if in_value == -1:
out_value = 1000
elif in_value == 0:
out_value = 1500
else:
out_value = 2000
The code could be wrapped in a function or used in-line.