Einsum in python for a complex loop

Einsum in python for a complex loop - python

I have complex loops in python that I'm trying to "vectorize" to improve computation time. I found the function np.einsum allowing it, I managed to use it, but I'm stuck with another loop.
In the following code, I put the loop I managed to "einsumize" (s1), and the other one where I didn't.
import numpy as np
Q = 6
P = 24
N = 40
bQ = np.arange(Q)
bP = np.arange(P)
uN = np.arange(N)
t1 = np.arange(P*Q*N).reshape([Q,P,N])
t2 = np.arange(Q*N*Q*N).reshape([Q,N,Q,N])
s1_ = np.einsum('p,q,n,qpn',bP, bQ, uN, t1)
s1 = 0
for p in range(P):
for q in range(Q):
for n in range(N):
s1 += bP[p] * bQ[q] * uN[n] * t1[q,p,n]
print(s1)
print(s1_)
print()
s2_ = np.einsum('p,q,n,m,pnqm', bQ, bQ, uN, uN, t2)
s2 = 0
for p in range(Q):
for q in range(Q):
for n in range(N):
for m in range(N):
s2 += bQ[q] * bQ[q] * uN[n] * uN[m] * t2[p,n,q,m]
print(s2)
print(s2_)
The result of the previous code is
13475451600
13475451600
6125547636000
5707354770000
The math formula to compute s1 is : s1 = \sum_p\sum_q\sum_n bP[p] * bQ[q] * uN[n] * t1[q,p,n]. And the one to compute s2 is s2 = \sum_q\sum_q'\sum_n\sum_n' bQ[q] bQ[q'] uN[n] uN[n'] t2[q,n,q',n'].
For the triple loop, if I well understood how einsum works,I tell the different indices of the tensors that will be multiplied, and telling no output indices tell that all will be summed. But it seems not to be working for the quadruple loop.
EDIT : I saw an answer (which seems to have been deleted), telling that it was just a mistake of index on the quadruple loop... I should have seen it :/

I see a typo in your code, you are not using the variable p outside of the order 4 tensor.
Try changing
s2 += bQ[q] * bQ[q] * uN[n] * uN[m] * t2[p,n,q,m]
for
s2 += bQ[p] * bQ[q] * uN[n] * uN[m] * t2[p,n,q,m]

Related

How to speed up this numpy.arange loop?

In a python program, the following function is called about 20,000 times from another function that is called about 1000 times from yet another function that executes 30 times. Thus the total number of times this particular function is called is about 600,000,000. In python it takes more than two hours (perhaps much longer; I aborted the program without waiting for it to finish), while essentially the same task coded in Java takes less than 5 minutes. If I change the 20,000 above to 400 (keeping everything else in the rest of the program untouched), the total time drops to about 4 minutes (this means this particular function is the culprit). What can I do to speed up the Python version, or is it just not possible? No lists are manipulated inside this function (there are lists elsewhere in the whole program, but in those places I tried to use numpy arrays as far as possible). I understand that replacing python lists with numpy arrays speeds things up, but there are cases in my program (not in this particular function) where I must build a list iteratively, using append; and those must-have lists are lists of objects (not floats or ints), so numpy would be of little help even if I converted those lists of objects to numpy arrays.
def compute_something(arr):
'''
arr is received as a numpy array of ints and floats (I think python upcasts them to all floats,
doesn’t it?).
Inside this function, elements of arr are accessed using indexing (arr[0], arr[1], etc.), because
each element of the array has its own unique use. It’s not that I need the array as a whole (as in
arr**2 or sum(arr)).
The arr elements are used in several simple arithmetic operations involving nothing costlier than
+, -, *, /, and numpy.log(). There is no other loop inside this function; there are a few if’s though.
Inside this function, use is made of constants imported from other modules (I doubt the
importing, as in AnotherModule.x is expensive).
'''
for x in numpy.arange(float1, float2, float3):
do stuff
return a, b, c # Return a tuple of three floats
Edit:
Thanks for all the comments. Here’s the inside of the function (I made the variable names short for convenience). The ndarray array arr has only 3 elements in it. Can you please suggest any improvement?
def compute_something(arr):
a = Mod.b * arr[1] * arr[2] + Mod.c
max = 0.0
for c in np.arange(a, arr[1] * arr[2] * (Mod.d – Mod.e), Mod.f):
i = c / arr[2]
m1 = Mod.A * np.log( (i / (arr[1] *Mod.d)) + (Mod.d/Mod.e))
m2 = -Mod.B * np.log(1.0 - (i/ (arr[1] *Mod.d)) - (Mod.d /
Mod.e))
V = arr[0] * (Mod.E - Mod.r * i / arr[1] - Mod.r * Mod.d -
m1 – m2)
p = c * V /1000.0
if p > max:
max = p
vmp = V
pen = Mod.COEFF1 * (Mod.COEFF2 - max) if max < Mod.CONST else 0.0
wo = Mod.COEFF3 * arr[1] * arr[0] + Mod.COEFF4 * abs(Mod.R5 - vmp) +
Mod.COEFF6 * arr[2]
w = wo + pen
return vmp, max, w

Python supports profiling of code. (module cProfile). Also there is option to use line_profiler to find most expensive part of code tool here.
So you do not need to guessing which part of code is most expensive.
In this code which you presten the problem is in usage for loop which generates many conversion between types of objects. If you use numpy you can vectorize your calculation.
I try to rewrite your code to vectorize your operation. You do not provide information what is Mod object, but I have hope it will work.
def compute_something(arr):
a = Mod.b * arr[1] * arr[2] + Mod.c
# start calculation on vectors instead of for lop
c_arr = np.arange(a, arr[1] * arr[2] * (Mod.d – Mod.e), Mod.f)
i_arr = c_arr/arr[2]
m1_arr = Mod.A * np.log( (i_arr / (arr[1] *Mod.d)) + (Mod.d/Mod.e))
m2_arr = -Mod.B * np.log(1.0 - (i_arr/ (arr[1] *Mod.d)) - (Mod.d /
Mod.e))
V_arr = arr[0] * (Mod.E - Mod.r * i_arr / arr[1] - Mod.r * Mod.d -
m1_arr – m2_arr)
p = c_arr * V_arr / 1000.0
max_val = p.max() # change name to avoid conflict with builtin function
max_ind = np.nonzero(p == max_val)[0][0]
vmp = V_arr[max_ind]
pen = Mod.COEFF1 * (Mod.COEFF2 - max_val) if max_val < Mod.CONST else 0.0
wo = Mod.COEFF3 * arr[1] * arr[0] + Mod.COEFF4 * abs(Mod.R5 - vmp) +
Mod.COEFF6 * arr[2]
w = wo + pen
return vmp, max_val, w

I would suggest to use range as it is approximately 2 times faster:
def python():
for i in range(100000):
pass
def numpy():
for i in np.arange(100000):
pass
from timeit import timeit
print(timeit(python, number=1000))
print(timeit(numpy, number=1000))
Output:
5.59282787179696
10.027646953771665

Fastest way to add/multiply two floating point scalar numbers in python

I'm using python and apparently the slowest part of my program is doing simple additions on float variables.
It takes about 35seconds to do around 400,000,000 additions/multiplications.
I'm trying to figure out what is the fastest way I can do this math.
This is how the structure of my code looks like.
Example (dummy) code:
def func(x, y, z):
loop_count = 30
a = [0,1,2,3,4,5,6,7,8,9,10,11,12,...35 elements]
b = [0,11,22,33,44,55,66,77,88,99,1010,1111,1212,...35 elements]
p = [0,0,0,0,0,0,0,0,0,0,0,0,0,...35 elements]
for i in range(loop_count - 1):
c = p[i-1]
d = a[i] + c * a[i+1]
e = min(2, a[i]) + c * b[i]
f = e * x
g = y + d * c
.... and so on
p[i] = d + e + f + s + g5 + f4 + h7 * t5 + y8
return sum(p)
func() is called about 200k times. The loop_count is about 30. And I have ~20 multiplications and ~45 additions and ~10 uses of min/max
I was wondering if there is a method for me to declare all these as ctypes.c_float and do addition in C using stdlib or something similar ?
Note that the p[i] calculated at the end of the loop is used as c in the next loop iteration. For iteration 0, it just uses p[-1] which is 0 in this case.
My constraints:
I need to use python. While I understand plain math would be faster in C/Java/etc. I cannot use it due to a bunch of other things I do in python which cannot be done in C in this same program.
I tried writing this with cython, but it caused a bunch of issues with the environment I need to run this in. So, again - not an option.

I think you should consider using numpy. You did not mention any constraint.
Example case of a simple dot operation (x.y)
import datetime
import numpy as np
x = range(0,10000000,1)
y = range(0,20000000,2)
for i in range(0, len(x)):
x[i] = x[i] * 0.00001
y[i] = y[i] * 0.00001
now = datetime.datetime.now()
z = 0
for i in range(0, len(x)):
z = z+x[i]*y[i]
print "handmade dot=", datetime.datetime.now()-now
print z
x = np.arange(0.0, 10000000.0*0.00001, 0.00001)
y = np.arange(0.0, 10000000.0*0.00002, 0.00002)
now = datetime.datetime.now()
z = np.dot(x,y)
print 'numpy dot =',datetime.datetime.now()-now
print z
outputs
handmade dot= 0:00:02.559000
66666656666.7
numpy dot = 0:00:00.019000
66666656666.7
numpy is more than 100x times faster.
The reason is that numpy encapsulates a C library that does the dot operation with compiled code. In the full python you have a list of potentially generic objects, casting, ...

Solving system of nonlinear equations with Python, Euler's formula

I am working on solving and analyzing a system of differential equations in Python. First did I solve it with help of scipy.integrate dopri5 and scopes Odeint. Which worked out fine. Then I tried to solve the equations with use of the Euler's method. The equations and code is as followed,
dj = -mu*(J**3 - (C - C0)*J - F)
dc = J + C*F + a*J**2
df = J*F - C
T = 100
dt = 0.001
t = np.linspace(0, T, int(T/dt)+1)
j = np.zeros(len(t))
c = np.zeros(len(t))
f = np.zeros(len(t))
# Initial condition
j[0] = 0.1
c[0] = -0.5
f[0] = 0.1
a = 0.3025
C0 = 0.5
mu = 50
for i in range(len(t)):
j[i+1] = j[i] + (-mu * (j[i]**3 - (c[i] - C0)*j[i] - f[i]))*dt
c[i+1] = c[i] + (j[i] + c[i] * f[i] + (a * j[i])**2)*dt
f[i+1] = f[i] + (j[i] * f[i] - c[i])*dt
Is there any reason why the Euler's method should not work when both the other two are?

In the first iteration, i is 0, and your first line of the loop essentially is:
j[0] = j[-1] + (-mu * (j[-1]**3 - (c[-1] - C0)*j[-1] - f[-1]))*dt
j[-1] is the last element of j, just like c[-1] is the last element of c, etc. Initially they are all zeros, so j[0] becomes a 0, too, which overwrites the initial conditions. To fix this problem, change range(len(t)) to range(1,len(t)). (The model diverges after the first 9200 steps, anyway.)

As DYZ says, your calculation is incorrect on the first loop iteration because j[-1] is the last element of j, which you've initialised to zero.
However, your code wastes a lot of RAM. I assume you just want arrays containing T results, plus the initial values, rather than the results calculated on every step. The code below achieves that via a double for loop. We aren't really getting any benefit from Numpy in this code, so I don't bother importing it.
Note that Euler integration is not very accurate, and you generally need to use a much smaller step size than what's required by more sophisticated integration algorithms. As DYZ mentions, with your current step size the calculation diverges before the loop finishes.
Here's a modified version of your code using a smaller step size.
T = 100
dt = 0.00001
steps = int(T / dt)
substeps = int(steps / T)
# Recalculate `dt` to compensate for possible truncation
# in the `steps` and `substeps` calculations
dt = 1.0 / substeps
print('steps, substeps, dt:', steps, substeps, dt)
a = 0.3025
C0 = 0.5
mu = 50
#dj = -mu*(J**3 - (C - C0)*J - F)
#dc = J + C*F + a*J**2
#df = J*F - C
# Initial condition
j = 0.1
c = -0.5
f = 0.1
jlst, clst, flst = [j], [c], [f]
for i in range(T):
for _ in range(substeps):
j1 = j + (-mu * (j**3 - (c - C0)*j - f))*dt
c1 = c + (j + c * f + (a * j)**2)*dt
f1 = f + (j * f - c)*dt
j, c, f = j1, c1, f1
jlst.append(j)
clst.append(c)
flst.append(f)
def round_seq(seq, places=6):
return [round(u, places) for u in seq]
print('j:', round_seq(jlst), end='\n\n')
print('c:', round_seq(clst), end='\n\n')
print('f:', round_seq(flst), end='\n\n')
output
steps, substeps, dt: 10000000 100000 1e-05
j: [0.1, 0.585459, 1.26718, 3.557956, -1.311867, -0.647698, -0.133683, 0.395812, 0.964856, 3.009683, -2.025674, -1.047722, -0.48872, 0.044296, 0.581284, 1.245423, 14.725407, -1.715456, -0.907364, -0.372118, 0.167733, 0.705257, 1.511711, -3.588555, -1.476817, -0.778593, -0.253874, 0.289294, 0.837128, 1.985792, -2.652462, -1.28088, -0.657113, -0.132971, 0.409071, 0.983504, 3.229393, -2.1809, -1.113977, -0.539586, -0.009829, 0.528546, 1.156086, 8.23469, -1.838582, -0.967078, -0.423261, 0.113883, 0.650319, 1.381138, 12.045565, -1.575015, -0.833861, -0.305952, 0.23632, 0.778052, 1.734888, -2.925769, -1.362437, -0.709641, -0.186249, 0.356775, 0.917051, 2.507782, -2.367126, -1.184147, -0.590753, -0.063942, 0.476121, 1.07614, 5.085211, -1.976542, -1.029395, -0.474206, 0.059772, 0.596505, 1.273214, 17.083466, -1.682855, -0.890842, -0.357555, 0.182944, 0.721096, 1.554496, -3.331861, -1.450497, -0.763182, -0.239007, 0.30425, 0.85435, 2.076595, -2.584081, -1.258788, -0.642362, -0.117774, 0.423883, 1.003181, 3.521072, -2.132709, -1.094792, -0.525123]
c: [-0.5, -0.302644, 0.847742, 12.886781, 0.177404, -0.423405, -0.569541, -0.521669, -0.130084, 7.97828, -0.109606, -0.363033, -0.538874, -0.61005, -0.506872, 0.05076, 216.678959, -0.198445, -0.408569, -0.566869, -0.603713, -0.451729, 0.58959, 2.252504, -0.246645, -0.451, -0.588697, -0.587898, -0.375758, 2.152898, -0.087229, -0.295185, -0.49006, -0.603411, -0.562389, -0.263696, 8.901196, -0.132332, -0.342969, -0.525087, -0.609991, -0.526417, -0.077251, 67.082608, -0.177771, -0.389092, -0.555341, -0.607658, -0.47794, 0.293664, 147.817033, -0.225425, -0.432796, -0.579951, -0.595996, -0.412269, 1.235928, -0.037058, -0.273963, -0.473412, -0.597912, -0.574782, -0.318837, 4.581828, -0.113301, -0.3222, -0.51029, -0.608168, -0.543547, -0.172371, 24.718184, -0.157526, -0.369151, -0.542732, -0.609811, -0.500922, 0.09504, 291.915024, -0.204371, -0.414, -0.56993, -0.602265, -0.443622, 0.700005, 0.740665, -0.25268, -0.456048, -0.590933, -0.585265, -0.36427, 2.528225, -0.093699, -0.301181, -0.494644, -0.60469, -0.558516, -0.245806, 10.941068, -0.137816, -0.348805, -0.52912]
f: [0.1, 0.68085, 1.615135, 1.01107, -2.660947, -0.859348, -0.134789, 0.476782, 1.520241, 4.892319, -9.514924, -2.041217, -0.61413, 0.060247, 0.792463, 2.510586, 11.393914, -6.222736, -1.559576, -0.438133, 0.200729, 1.033274, 3.348756, -39.664752, -4.304545, -1.201378, -0.282146, 0.349631, 1.331995, 4.609547, -20.169056, -3.104072, -0.923759, -0.138225, 0.513633, 1.716341, 6.739864, -11.717002, -2.307614, -0.699883, 7.4e-05, 0.700823, 2.22957, 11.017447, -7.434886, -1.751919, -0.512171, 0.138566, 0.922012, 2.9434, -30.549886, -5.028825, -1.346261, -0.348547, 0.282981, 1.19254, 3.987366, -26.554232, -3.566328, -1.0374, -0.200198, 0.439487, 1.535198, 5.645421, -14.674838, -2.619369, -0.792589, -0.060175, 0.615387, 1.985246, 8.779969, -8.991742, -1.972575, -0.590788, 0.077534, 0.820118, 2.599728, 8.879606, -5.928246, -1.509453, -0.417854, 0.218635, 1.066761, 3.477148, -36.053938, -4.124934, -1.163178, -0.263755, 0.369033, 1.37438, 4.811848, -18.741635, -2.987496, -0.893457, -0.120864, 0.535433, 1.771958, 7.117055, -11.027021, -2.227847, -0.674889]
That takes about 75 seconds on my old 2GHz machine.
Using dt = 0.000005 (which takes almost 2 minutes on this machine) the final values of j, c, and f are -0.524774, -0.529217, -0.674293, respectively, so it looks like we're beginning to get convergence.
Thanks to LutzL for pointing out that dt may need adjusting because of the rounding in the steps and substeps calculations.

Solving a mathematical equation recursively in Python

I want to solve an equation which I am supposed to solve it recursively, I uploaded the picture of formula (Sorry! I did not know how to write mathematical formulas here!)
I wrote the code in Python as below:
import math
alambda = 1.0
rho = 0.8
c = 1.0
b = rho * c / alambda
P0 = (1 - (alambda*b))
P1 = (1-(alambda*b))*(math.exp(alambda*b) - 1)
def a(n):
a_n = math.exp(-alambda*b) * ((alambda*b)**n) / math.factorial(n)
return a_n
def P(n):
P(n) = (P0+P1)*a(n) + sigma(n)
def sigma(n):
j = 2
result = 0
while j <= n+1:
result = result + P(j)*a(n+1-j)
j += 1
return result
It is obvious that I could not finish P function. So please help me with this.
when n=1 I should extract P2, when n=2 I should extract P3.
By the way, P0 and P1 are as written in line 6 and 7.
When I call P(5) I want to see P(0), P(1), P(2), P(3), P(4), P(5), P(6) at the output.

You need to reorganize the formula so that you don't have to calculate P(3) to calculate P(2). This is pretty easy to do, by bringing the last term of the summation, P(n+1)a(0), to the left side of the equation and dividing through by a(0). Then you have a formula for P(n+1) in terms of P(m) where m <= n, which is solvable by recursion.
As Bruce mentions, it's best to cache your intermediate results for P(n) by keeping them in a dict so that a) you don't have to recalculate P(2) etc everytime you need it, and b) after you get the value of P(n), you can just print the dict to see all the values of P(m) where m <= n.
import math
a_lambda = 1.0
rho = 0.8
c = 1.0
b = rho * c / a_lambda
p0 = (1 - (a_lambda*b))
p1 = (1-(a_lambda*b))*(math.exp(a_lambda*b) - 1)
p_dict = {0: p0, 1: p1}
def a(n):
return math.exp(-a_lambda*b) * ((a_lambda*b)**n) / math.factorial(n)
def get_nth_p(n, p_dict):
# return pre-calculated value if p(n) is already known
if n in p_dict:
return p_dict[n]
# Calculate p(n) using modified formula
p_n = ((get_nth_p(n-1, p_dict)
- (get_nth_p(0, p_dict) + get_nth_p(1, p_dict)) * a(n - 1)
- sum(get_nth_p(j, p_dict) * a(n + 1 - j) for j in xrange(2, n)))
/ a(0))
# Save computed value into the dict
p_dict[n] = p_n
return p_n
get_nth_p(6, p_dict)
print p_dict
Edit 2
Some cosmetic updates to the code - shortening the name and making p_dict a mutable default argument (something I try to use only sparingly) really makes the code much more readable:
import math
# Customary to distinguish variables that are unchanging by making them ALLCAP
A_LAMBDA = 1.0
RHO = 0.8
C = 1.0
B = RHO * C / A_LAMBDA
P0 = (1 - (A_LAMBDA*B))
P1 = (1-(A_LAMBDA*B))*(math.exp(A_LAMBDA*B) - 1)
p_value_cache = {0: P0, 1: P1}
def a(n):
return math.exp(-A_LAMBDA*B) * ((A_LAMBDA*B)**n) / math.factorial(n)
def p(n, p_dict=p_value_cache):
# return pre-calculated value if p(n) is already known
if n in p_dict:
return p_dict[n]
# Calculate p(n) using modified formula
p_n = ((p(n-1)
- (p(0) + p(1)) * a(n - 1)
- sum(p(j) * a(n + 1 - j) for j in xrange(2, n)))
/ a(0))
# Save computed value into the dict
p_dict[n] = p_n
return p_n
p(6)
print p_value_cache

You could fix if that way:
import math
alambda = 1.0
rho = 0.8
c = 1.0
b = rho * c / alambda
def a(n):
# you might want to cache a as well
a_n = math.exp(-alambda*b) * ((alambda*b)**n) / math.factorial(n)
return a_n
PCache={0:(1 - (alambda*b)),1:(1-(alambda*b))*(math.exp(alambda*b) - 1)}
def P(n):
if n in PCache:
return PCache[n]
ret= (P(0)+P(1))*a(n) + sigma(n)
PCache[n]=ret
return ret
def sigma(n):
# caching this seems smart as well
j = 2
result = 0
while j <= n+1:
result = result + P(j)*a(n+1-j)
j += 1
return result
void displayP(n):
P(n) # fill cache :-)
for x in range(n):
print ("%u -> %d\n" % (x,PCache[x]))
Instead of managing the cache manually, you might want to use a memoize decorator (see http://www.python-course.eu/python3_memoization.php )
Notes:
not tested, but you should get the idea behind it
your recurrence won't work P(n) depends on P(n+1) on your equation... This will never end
It looks like I misunderstood P0 and P1 as being Both constants (big numbers) and results (small numbers, indices)... Notation is not the best choice I guess...

Fastest possible method for the arcsin function on small, arbitrary floating-point values

I need to calculate the arcsine function of small values that are under the form of mpmath's "mpf" floating-point bignums.
What I call a "small" value is for example e/4/(10**7) = 0.000000067957045711476130884...
Here is a result of a test on my machine with mpmath's built-in asin function:
import gmpy2
from mpmath import *
from time import time
mp.dps = 10**6
val=e/4/(10**7)
print "ready"
start=time()
temp=asin(val)
print "mpmath asin: "+str(time()-start)+" seconds"
>>> 155.108999968 seconds
This is a particular case: I work with somewhat small numbers, so I'm asking myself if there is a way to calculate it in python that actually beats mpmath for this particular case (= for small values).
Taylor series are actually a good choice here because they converge very fast for small arguments. But I still need to accelerate the calculations further somehow.
Actually there are some problems:
1) Binary splitting is ineffective here because it shines only when you can write the argument as a small fraction. A full-precision float is given here.
2) arcsin is a non-alternating series, thus Van Wijngaarden or sumalt transformations are ineffective too (unless there is a way I'm not aware of to generalize them to non-alternating series).
https://en.wikipedia.org/wiki/Van_Wijngaarden_transformation
The only acceleration left I can think of is Chebyshev polynomials. Can Chebyshev polynomials be applied on the arcsin function? How to?

Can you use the mpfr type that is included in gmpy2?
>>> import gmpy2
>>> gmpy2.get_context().precision = 3100000
>>> val = gmpy2.exp(1)/4/10**7
>>> from time import time
>>> start=time();r=gmpy2.asin(val);print time()-start
3.36188197136
In addition to supporting the GMP library, gmpy2 also supports the MPFR and MPC multiple-precision libraries.
Disclaimer: I maintain gmpy2.

Actually binary splitting does work very well, if combined with iterated argument reduction to balance the number of terms against the size of the numerators and denominators (this is known as the bit-burst algorithm).
Here is a binary splitting implementation for mpmath based on repeated application of the formula atan(t) = atan(p/2^q) + atan((t*2^q-p) / (2^q+p*t)). This formula was suggested recently by Richard Brent (in fact mpmath's atan already uses a single invocation of this formula at low precision, in order to look up atan(p/2^q) from a cache). If I remember correctly, MPFR also uses the bit-burst algorithm to evaluate atan, but it uses a slightly different formula, which possibly is more efficient (instead of evaluating several different arctangent values, it does analytic continuation using the arctangent differential equation).
from mpmath.libmp import MPZ, bitcount
from mpmath import mp
def bsplit(p, q, a, b):
if b - a == 1:
if a == 0:
P = p
Q = q
else:
P = p * p
Q = q * 2
B = MPZ(1 + 2 * a)
if a % 2 == 1:
B = -B
T = P
return P, Q, B, T
else:
m = a + (b - a) // 2
P1, Q1, B1, T1 = bsplit(p, q, a, m)
P2, Q2, B2, T2 = bsplit(p, q, m, b)
T = ((T1 * B2) << Q2) + T2 * B1 * P1
P = P1 * P2
B = B1 * B2
Q = Q1 + Q2
return P, Q, B, T
def atan_bsplit(p, q, prec):
"""computes atan(p/2^q) as a fixed-point number"""
if p == 0:
return MPZ(0)
# FIXME
nterms = (-prec / (bitcount(p) - q) - 1) * 0.5
nterms = int(nterms) + 1
if nterms < 1:
return MPZ(0)
P, Q, B, T = bsplit(p, q, 0, nterms)
if prec >= Q:
return (T << (prec - Q)) // B
else:
return T // (B << (Q - prec))
def atan_fixed(x, prec):
t = MPZ(x)
s = MPZ(0)
q = 1
while t:
q = min(q, prec)
p = t >> (prec - q)
if p:
s += atan_bsplit(p, q, prec)
u = (t << q) - (p << prec)
v = (MPZ(1) << (q + prec)) + p * t
t = (u << prec) // v
q *= 2
return s
def atan1(x):
prec = mp.prec
man = x.to_fixed(prec)
return mp.mpf((atan_fixed(man, prec), -prec))
def asin1(x):
x = mpf(x)
return atan1(x/sqrt(1-x**2))
With this code, I get:
>>> from mpmath import *
>>> mp.dps = 1000000
>>> val=e/4/(10**7)
>>> from time import time
>>> start = time(); y1 = asin(x); print time() - start
58.8485069275
>>> start = time(); y2 = asin1(x); print time() - start
8.26498985291
>>> nprint(y2 - y1)
-2.31674e-1000000
Warning: atan1 assumes 0 <= x < 1/2, and the determination of the number of terms might not be optimal or correct (fixing these issues is left as an exercise to the reader).

A fast way is to use a pre-calculated look-up table.
But if you look at e.g. a Taylor series for asin;
def asin(x):
rv = (x + 1/3.0*x**3 + 7/30.0*x**5 + 64/315.0*x**7 + 4477/22680.0*x**9 +
28447/138600.0*x**11 + 23029/102960.0*x**13 +
17905882/70945875.0*x**15 + 1158176431/3958416000.0*x**17 +
9149187845813/26398676304000.0*x**19)
return rv
You'll see that for small values of x, asin(x) ≈ x.
In [19]: asin(1e-7)
Out[19]: 1.0000000000000033e-07
In [20]: asin(1e-9)
Out[20]: 1e-09
In [21]: asin(1e-11)
Out[21]: 1e-11
In [22]: asin(1e-12)
Out[22]: 1e-12
E.g. for the value us used:
In [23]: asin(0.000000067957045711476130884)
Out[23]: 6.795704571147624e-08
In [24]: asin(0.000000067957045711476130884)/0.000000067957045711476130884
Out[24]: 1.0000000000000016
Of course it depends on whether this difference is relevant to you.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Einsum in python for a complex loop - python

I see a typo in your code, you are not using the variable p outside of the order 4 tensor. Try changing s2 += bQ[q] * bQ[q] * uN[n] * uN[m] * t2[p,n,q,m] for s2 += bQ[p] * bQ[q] * uN[n] * uN[m] * t2[p,n,q,m]

Related

How to speed up this numpy.arange loop?

Fastest way to add/multiply two floating point scalar numbers in python

Solving system of nonlinear equations with Python, Euler's formula

Solving a mathematical equation recursively in Python

Fastest possible method for the arcsin function on small, arbitrary floating-point values

Categories

Resources