I have complex loops in python that I'm trying to "vectorize" to improve computation time. I found the function np.einsum allowing it, I managed to use it, but I'm stuck with another loop.
In the following code, I put the loop I managed to "einsumize" (s1), and the other one where I didn't.
import numpy as np
Q = 6
P = 24
N = 40
bQ = np.arange(Q)
bP = np.arange(P)
uN = np.arange(N)
t1 = np.arange(P*Q*N).reshape([Q,P,N])
t2 = np.arange(Q*N*Q*N).reshape([Q,N,Q,N])
s1_ = np.einsum('p,q,n,qpn',bP, bQ, uN, t1)
s1 = 0
for p in range(P):
for q in range(Q):
for n in range(N):
s1 += bP[p] * bQ[q] * uN[n] * t1[q,p,n]
print(s1)
print(s1_)
print()
s2_ = np.einsum('p,q,n,m,pnqm', bQ, bQ, uN, uN, t2)
s2 = 0
for p in range(Q):
for q in range(Q):
for n in range(N):
for m in range(N):
s2 += bQ[q] * bQ[q] * uN[n] * uN[m] * t2[p,n,q,m]
print(s2)
print(s2_)
The result of the previous code is
13475451600
13475451600
6125547636000
5707354770000
The math formula to compute s1 is : s1 = \sum_p\sum_q\sum_n bP[p] * bQ[q] * uN[n] * t1[q,p,n]. And the one to compute s2 is s2 = \sum_q\sum_q'\sum_n\sum_n' bQ[q] bQ[q'] uN[n] uN[n'] t2[q,n,q',n'].
For the triple loop, if I well understood how einsum works,I tell the different indices of the tensors that will be multiplied, and telling no output indices tell that all will be summed. But it seems not to be working for the quadruple loop.
EDIT : I saw an answer (which seems to have been deleted), telling that it was just a mistake of index on the quadruple loop... I should have seen it :/
I see a typo in your code, you are not using the variable p outside of the order 4 tensor.
Try changing
s2 += bQ[q] * bQ[q] * uN[n] * uN[m] * t2[p,n,q,m]
for
s2 += bQ[p] * bQ[q] * uN[n] * uN[m] * t2[p,n,q,m]
I have done the following code:
test.py
nc = 1; nb = 20; ni = 6; nc = 2; ia = 20; ib = 20; ic = 0
U1 = numpy.array((1,2,0,0,0,3))
U2 = numpy.array((2,2,1,0,0,1))
U3 = numpy.array((2,1,1,0,0,2))
U4 = numpy.array((2,1,0,1,0,2))
U5 = numpy.array((2,1,1,1,0,3))
for n in range(ni):
a = nc*(nb*nc*ia+nc*ib+ic)+U1[n]
a2 = ia + U1[n]
b2 = ib + U3[n]
c2 = ic + U4[n]
b = nc*(nb*nc*a2+nc*b2+c2)+U2[n]
A = str(numpy.array((a,b,U5[n])))
print(A)
with open("test.txt", 'w') as out:
for o in A:
out.write(o)
test.txt gives me the following:
[1683 1933 3]
But if I print test.py by using print(A), I get this:
[1681 1774 2]
[1682 1848 1]
[1680 1685 1]
[1680 1682 1]
[1680 1680 0]
[1683 1933 3]
How can I write the the whole print in test.txt? I assume to do something like this:
ol = []
ol.append(o))
The basic problem is you are opening the same file again and again and overwriting this file for every iteration in the for loop.
Use:
with open("test.txt", 'w') as out:
for n in range(ni):
a = nc*(nb*nc*ia+nc*ib+ic)+U1[n]
a2 = ia + U1[n]
b2 = ib + U3[n]
c2 = ic + U4[n]
b = nc*(nb*nc*a2+nc*b2+c2)+U2[n]
A = str(numpy.array((a,b,U5[n])))
out.write(f"{A}\n")
Now the text.txt file will contain:
[1681 1774 2]
[1682 1848 1]
[1680 1685 1]
[1680 1682 1]
[1680 1680 0]
[1683 1933 3]
The best solution - Fast and efficient
import numpy
nc = 1; nb = 20; ni = 6; nc = 2; ia = 20; ib = 20; ic = 0
U1 = numpy.array((1,2,0,0,0,3))
U2 = numpy.array((2,2,1,0,0,1))
U3 = numpy.array((2,1,1,0,0,2))
U4 = numpy.array((2,1,0,1,0,2))
U5 = numpy.array((2,1,1,1,0,3))
# No for loop here, we are using NumPy broadcasting features
a = nc*(nb*nc*ia+nc*ib+ic)+U1
a2 = ia + U1
b2 = ib + U3
c2 = ic + U4
b = nc * (nb * nc * a2 + nc * b2 + c2) + U2
# Transpose the matrix to get the result wanted in your case
A = numpy.array((a, b, U5)).T
with open(file="res.txt", mode="w") as b:
b.write(numpy.array2string(A))
Remarks about your code written in the question
In most cases, using NumPy broadcasting (removing loop over arrays) makes the code faster. It can also be easier to read.
Writing into a file while inside a loop is a bad practice. Even if you keep your file open with the use of the context manager with open, performances are poor.
Better build your array, convert it to string.
Then write the whole thing into a file.
Other solution using numpy builtin functions
Disclaimer: To use only if the row number of your array is small (<500)
Dumping a numpy array into a text file is a builtin function in NumPy.
Look at this: API doc | numpy.savetxt
However, if you look at the source code of this function, you will see that you iterate over the array's rows which impacts performances a lot when dimensions number increases (thanks to #hpaulj for the remark).
In your case, you could replace the two last lines of the snippet above with:
numpy.savetxt("a.txt", A) # just see the doc to add some formatting options
In each iteration of the outer loop, you are asking the file system for a fresh, empty copy of "test.txt". So of course the final version only contains the content of the last loop.
Open with the attributes "a" for "(write-and-)append" or as in the other answer, and more efficiently, open once outside of the loop.
I have two DFs which I would like to use to calculate the following:
w(ti,ti)*a(ti)^2 + w(tj,tj)*b(sj,tj)^2 + 2*w(si,tj)*a(ti)*b(tj)
The above uses two terms (a,b).
w is the weight df where i and j are index and column spaces pertaining to the Tn index of a and b.
Set Up - Edit dynamic W
import pandas as pd
import numpy as np
I = ['i'+ str(i) for i in range(4)]
Q = ['q' + str(i) for i in range(5)]
T = ['t' + str(i) for i in range(3)]
n = 100
df1 = pd.DataFrame({'I': [I[np.random.randint(len(I))] for i in range(n)],
'Q': [Q[np.random.randint(len(Q))] for i in range(n)],
'Tn': [T[np.random.randint(len(T))] for i in range(n)],
'V': np.random.rand(n)}).groupby(['I','Q','Tn']).sum()
df1.head(5)
I Q Tn V
i0 q0 t0 1.626799
t2 1.725374
q1 t0 2.155340
t1 0.479741
t2 1.039178
w = np.random.randn(len(T),len(T))
w = (w*w.T)/2
np.fill_diagonal(w,1)
W = pd.DataFrame(w, columns = T, index = T)
W
t0 t1 t2
t0 1.000000 0.029174 -0.045754
t1 0.029174 1.000000 0.233330
t2 -0.045754 0.233330 1.000000
Effectively I would like to use the index Tn in df1 to use the above equation for every I and Q.
The end result for df1.loc['i0','q0'] in the example above should be:
W(t0,t0) * V(t0)^2
+ W(t2,t2) * V(t2)^2
+ 2 * W(t0,t2) * V(t0) * V(t2)
=
1.0 * 1.626799**2
+ 1.0 * 1.725374**2
+ (-0.045754) * 1.626799 * 1.725374
The end result for df1.loc['i0','q1'] in the example above should be:
W(t0,t0) * V(t0)^2
+ W(t1,t1) * V(t1)^2
+ W(t2,t2) * V(t2)^2
+ 2 * W(t0,t1) * V(t0) * V(t1)
+ 2 * W(t0,t2) * V(t0) * V(t2)
+ 2 * W(t2,t1) * V(t1) * V(t2)
=
1.0 * 2.155340**2
+ 1.0 * 0.479741**2
+ 1.0 * 1.039178**2
+ 0.029174 * 2.155340 * 0.479741 * 1
+ (-0.045754) * 2.155340 * 1.039178 * 1
+ 0.233330 * 0.479741 * 1.039178 * 1
This pattern will repeat depending on the number of tn terms in each Q hence it should be robust enough to handle as many Tn terms as needed (in the example I use 3, but it could be as much as 100 or more).
Each result should then be saved in a new DF with Index = [I, Q]
The solution should also not be slower than excel when n increases in value.
Thanks in advance
One way could be first reindex your dataframe df1 with all the possible combinations of the lists I, Q and Tn with pd.MultiIndex.from_product, filling the missing value in the column 'V' with 0. The column has then len(I)*len(Q)*len(T) elements. Then you can reshape the values to get each row related to one combination on I and Q such as:
ar = (df1.reindex(pd.MultiIndex.from_product([I,Q,T], names=['I','Q','Tn']),fill_value=0)
.values.reshape(-1,len(T)))
To see the relation between my input df1 and ar, here are some related rows
print (df1.head(6))
V
I Q Tn
i0 q0 t1 1.123666
q1 t0 0.538610
t1 2.943206
q2 t0 0.570990
t1 0.617524
t2 1.413926
print (ar[:3])
[[0. 1.1236656 0. ]
[0.53861027 2.94320574 0. ]
[0.57099049 0.61752408 1.4139263 ]]
Now, to perform the multiplication with the element of W, one way is to create the outer product of ar with itself but row-wise to get, for each row a len(T)*len(T) matrix. For example, for the second row:
[0.53861027 2.94320574 0. ]
becomes
[[0.29010102, 1.58524083, 0. ], #0.29010102 = 0.53861027**2, 1.58524083 = 0.53861027*2.94320574 ...
[1.58524083, 8.66246003, 0. ],
[0. , 0. , 0. ]]
Several methods are possible such as ar[:,:,None]*ar[:,None,:] or np.einsum with the right subscript: np.einsum('ij,ik->ijk',ar,ar). Both give same result.
The next step can be done with a tensordot and specify the right axes. So with ar and W as an input, you do:
print (np.tensordot(np.einsum('ij,ik->ijk',ar,ar),W.values,axes=([1,2],[0,1])))
array([ 1.26262437, 15.29352438, 15.94605435, ...
To check for the second value here, 1*0.29010102 + 1*8.66246003 + 2.*2*1.58524083 == 15.29352438 (where 1 is W(t0,t0) and W(t1,t1), 2 is W(t0,t1))
Finally, to create the dataframe as expected, use again pd.MultiIndex.from_product:
new_df = pd.DataFrame({'col1': np.tensordot(np.einsum('ij,ik->ijk',ar,ar),
W.values,axes=([1,2],[0,1]))},
index=pd.MultiIndex.from_product([I,Q], names=['I','Q']))
print (new_df.head(3))
col1
I Q
i0 q0 1.262624
q1 15.293524
q2 15.946054
...
Note: if you are SURE that each element of T is at least once in the last level of df1, the ar can be obtain using unstack such as ar=df1.unstack(fill_value=0).values. But I would suggest to use the reindex method above to prevent any error
I am working on solving and analyzing a system of differential equations in Python. First did I solve it with help of scipy.integrate dopri5 and scopes Odeint. Which worked out fine. Then I tried to solve the equations with use of the Euler's method. The equations and code is as followed,
dj = -mu*(J**3 - (C - C0)*J - F)
dc = J + C*F + a*J**2
df = J*F - C
T = 100
dt = 0.001
t = np.linspace(0, T, int(T/dt)+1)
j = np.zeros(len(t))
c = np.zeros(len(t))
f = np.zeros(len(t))
# Initial condition
j[0] = 0.1
c[0] = -0.5
f[0] = 0.1
a = 0.3025
C0 = 0.5
mu = 50
for i in range(len(t)):
j[i+1] = j[i] + (-mu * (j[i]**3 - (c[i] - C0)*j[i] - f[i]))*dt
c[i+1] = c[i] + (j[i] + c[i] * f[i] + (a * j[i])**2)*dt
f[i+1] = f[i] + (j[i] * f[i] - c[i])*dt
Is there any reason why the Euler's method should not work when both the other two are?
In the first iteration, i is 0, and your first line of the loop essentially is:
j[0] = j[-1] + (-mu * (j[-1]**3 - (c[-1] - C0)*j[-1] - f[-1]))*dt
j[-1] is the last element of j, just like c[-1] is the last element of c, etc. Initially they are all zeros, so j[0] becomes a 0, too, which overwrites the initial conditions. To fix this problem, change range(len(t)) to range(1,len(t)). (The model diverges after the first 9200 steps, anyway.)
As DYZ says, your calculation is incorrect on the first loop iteration because j[-1] is the last element of j, which you've initialised to zero.
However, your code wastes a lot of RAM. I assume you just want arrays containing T results, plus the initial values, rather than the results calculated on every step. The code below achieves that via a double for loop. We aren't really getting any benefit from Numpy in this code, so I don't bother importing it.
Note that Euler integration is not very accurate, and you generally need to use a much smaller step size than what's required by more sophisticated integration algorithms. As DYZ mentions, with your current step size the calculation diverges before the loop finishes.
Here's a modified version of your code using a smaller step size.
T = 100
dt = 0.00001
steps = int(T / dt)
substeps = int(steps / T)
# Recalculate `dt` to compensate for possible truncation
# in the `steps` and `substeps` calculations
dt = 1.0 / substeps
print('steps, substeps, dt:', steps, substeps, dt)
a = 0.3025
C0 = 0.5
mu = 50
#dj = -mu*(J**3 - (C - C0)*J - F)
#dc = J + C*F + a*J**2
#df = J*F - C
# Initial condition
j = 0.1
c = -0.5
f = 0.1
jlst, clst, flst = [j], [c], [f]
for i in range(T):
for _ in range(substeps):
j1 = j + (-mu * (j**3 - (c - C0)*j - f))*dt
c1 = c + (j + c * f + (a * j)**2)*dt
f1 = f + (j * f - c)*dt
j, c, f = j1, c1, f1
jlst.append(j)
clst.append(c)
flst.append(f)
def round_seq(seq, places=6):
return [round(u, places) for u in seq]
print('j:', round_seq(jlst), end='\n\n')
print('c:', round_seq(clst), end='\n\n')
print('f:', round_seq(flst), end='\n\n')
output
steps, substeps, dt: 10000000 100000 1e-05
j: [0.1, 0.585459, 1.26718, 3.557956, -1.311867, -0.647698, -0.133683, 0.395812, 0.964856, 3.009683, -2.025674, -1.047722, -0.48872, 0.044296, 0.581284, 1.245423, 14.725407, -1.715456, -0.907364, -0.372118, 0.167733, 0.705257, 1.511711, -3.588555, -1.476817, -0.778593, -0.253874, 0.289294, 0.837128, 1.985792, -2.652462, -1.28088, -0.657113, -0.132971, 0.409071, 0.983504, 3.229393, -2.1809, -1.113977, -0.539586, -0.009829, 0.528546, 1.156086, 8.23469, -1.838582, -0.967078, -0.423261, 0.113883, 0.650319, 1.381138, 12.045565, -1.575015, -0.833861, -0.305952, 0.23632, 0.778052, 1.734888, -2.925769, -1.362437, -0.709641, -0.186249, 0.356775, 0.917051, 2.507782, -2.367126, -1.184147, -0.590753, -0.063942, 0.476121, 1.07614, 5.085211, -1.976542, -1.029395, -0.474206, 0.059772, 0.596505, 1.273214, 17.083466, -1.682855, -0.890842, -0.357555, 0.182944, 0.721096, 1.554496, -3.331861, -1.450497, -0.763182, -0.239007, 0.30425, 0.85435, 2.076595, -2.584081, -1.258788, -0.642362, -0.117774, 0.423883, 1.003181, 3.521072, -2.132709, -1.094792, -0.525123]
c: [-0.5, -0.302644, 0.847742, 12.886781, 0.177404, -0.423405, -0.569541, -0.521669, -0.130084, 7.97828, -0.109606, -0.363033, -0.538874, -0.61005, -0.506872, 0.05076, 216.678959, -0.198445, -0.408569, -0.566869, -0.603713, -0.451729, 0.58959, 2.252504, -0.246645, -0.451, -0.588697, -0.587898, -0.375758, 2.152898, -0.087229, -0.295185, -0.49006, -0.603411, -0.562389, -0.263696, 8.901196, -0.132332, -0.342969, -0.525087, -0.609991, -0.526417, -0.077251, 67.082608, -0.177771, -0.389092, -0.555341, -0.607658, -0.47794, 0.293664, 147.817033, -0.225425, -0.432796, -0.579951, -0.595996, -0.412269, 1.235928, -0.037058, -0.273963, -0.473412, -0.597912, -0.574782, -0.318837, 4.581828, -0.113301, -0.3222, -0.51029, -0.608168, -0.543547, -0.172371, 24.718184, -0.157526, -0.369151, -0.542732, -0.609811, -0.500922, 0.09504, 291.915024, -0.204371, -0.414, -0.56993, -0.602265, -0.443622, 0.700005, 0.740665, -0.25268, -0.456048, -0.590933, -0.585265, -0.36427, 2.528225, -0.093699, -0.301181, -0.494644, -0.60469, -0.558516, -0.245806, 10.941068, -0.137816, -0.348805, -0.52912]
f: [0.1, 0.68085, 1.615135, 1.01107, -2.660947, -0.859348, -0.134789, 0.476782, 1.520241, 4.892319, -9.514924, -2.041217, -0.61413, 0.060247, 0.792463, 2.510586, 11.393914, -6.222736, -1.559576, -0.438133, 0.200729, 1.033274, 3.348756, -39.664752, -4.304545, -1.201378, -0.282146, 0.349631, 1.331995, 4.609547, -20.169056, -3.104072, -0.923759, -0.138225, 0.513633, 1.716341, 6.739864, -11.717002, -2.307614, -0.699883, 7.4e-05, 0.700823, 2.22957, 11.017447, -7.434886, -1.751919, -0.512171, 0.138566, 0.922012, 2.9434, -30.549886, -5.028825, -1.346261, -0.348547, 0.282981, 1.19254, 3.987366, -26.554232, -3.566328, -1.0374, -0.200198, 0.439487, 1.535198, 5.645421, -14.674838, -2.619369, -0.792589, -0.060175, 0.615387, 1.985246, 8.779969, -8.991742, -1.972575, -0.590788, 0.077534, 0.820118, 2.599728, 8.879606, -5.928246, -1.509453, -0.417854, 0.218635, 1.066761, 3.477148, -36.053938, -4.124934, -1.163178, -0.263755, 0.369033, 1.37438, 4.811848, -18.741635, -2.987496, -0.893457, -0.120864, 0.535433, 1.771958, 7.117055, -11.027021, -2.227847, -0.674889]
That takes about 75 seconds on my old 2GHz machine.
Using dt = 0.000005 (which takes almost 2 minutes on this machine) the final values of j, c, and f are -0.524774, -0.529217, -0.674293, respectively, so it looks like we're beginning to get convergence.
Thanks to LutzL for pointing out that dt may need adjusting because of the rounding in the steps and substeps calculations.