Quadratic n term equation using multiindex - python

I have two DFs which I would like to use to calculate the following:
w(ti,ti)*a(ti)^2 + w(tj,tj)*b(sj,tj)^2 + 2*w(si,tj)*a(ti)*b(tj)
The above uses two terms (a,b).
w is the weight df where i and j are index and column spaces pertaining to the Tn index of a and b.
Set Up - Edit dynamic W
import pandas as pd
import numpy as np
I = ['i'+ str(i) for i in range(4)]
Q = ['q' + str(i) for i in range(5)]
T = ['t' + str(i) for i in range(3)]
n = 100
df1 = pd.DataFrame({'I': [I[np.random.randint(len(I))] for i in range(n)],
'Q': [Q[np.random.randint(len(Q))] for i in range(n)],
'Tn': [T[np.random.randint(len(T))] for i in range(n)],
'V': np.random.rand(n)}).groupby(['I','Q','Tn']).sum()
df1.head(5)
I Q Tn V
i0 q0 t0 1.626799
t2 1.725374
q1 t0 2.155340
t1 0.479741
t2 1.039178
w = np.random.randn(len(T),len(T))
w = (w*w.T)/2
np.fill_diagonal(w,1)
W = pd.DataFrame(w, columns = T, index = T)
W
t0 t1 t2
t0 1.000000 0.029174 -0.045754
t1 0.029174 1.000000 0.233330
t2 -0.045754 0.233330 1.000000
Effectively I would like to use the index Tn in df1 to use the above equation for every I and Q.
The end result for df1.loc['i0','q0'] in the example above should be:
W(t0,t0) * V(t0)^2
+ W(t2,t2) * V(t2)^2
+ 2 * W(t0,t2) * V(t0) * V(t2)
=
1.0 * 1.626799**2
+ 1.0 * 1.725374**2
+ (-0.045754) * 1.626799 * 1.725374
The end result for df1.loc['i0','q1'] in the example above should be:
W(t0,t0) * V(t0)^2
+ W(t1,t1) * V(t1)^2
+ W(t2,t2) * V(t2)^2
+ 2 * W(t0,t1) * V(t0) * V(t1)
+ 2 * W(t0,t2) * V(t0) * V(t2)
+ 2 * W(t2,t1) * V(t1) * V(t2)
=
1.0 * 2.155340**2
+ 1.0 * 0.479741**2
+ 1.0 * 1.039178**2
+ 0.029174 * 2.155340 * 0.479741 * 1
+ (-0.045754) * 2.155340 * 1.039178 * 1
+ 0.233330 * 0.479741 * 1.039178 * 1
This pattern will repeat depending on the number of tn terms in each Q hence it should be robust enough to handle as many Tn terms as needed (in the example I use 3, but it could be as much as 100 or more).
Each result should then be saved in a new DF with Index = [I, Q]
The solution should also not be slower than excel when n increases in value.
Thanks in advance

One way could be first reindex your dataframe df1 with all the possible combinations of the lists I, Q and Tn with pd.MultiIndex.from_product, filling the missing value in the column 'V' with 0. The column has then len(I)*len(Q)*len(T) elements. Then you can reshape the values to get each row related to one combination on I and Q such as:
ar = (df1.reindex(pd.MultiIndex.from_product([I,Q,T], names=['I','Q','Tn']),fill_value=0)
.values.reshape(-1,len(T)))
To see the relation between my input df1 and ar, here are some related rows
print (df1.head(6))
V
I Q Tn
i0 q0 t1 1.123666
q1 t0 0.538610
t1 2.943206
q2 t0 0.570990
t1 0.617524
t2 1.413926
print (ar[:3])
[[0. 1.1236656 0. ]
[0.53861027 2.94320574 0. ]
[0.57099049 0.61752408 1.4139263 ]]
Now, to perform the multiplication with the element of W, one way is to create the outer product of ar with itself but row-wise to get, for each row a len(T)*len(T) matrix. For example, for the second row:
[0.53861027 2.94320574 0. ]
becomes
[[0.29010102, 1.58524083, 0. ], #0.29010102 = 0.53861027**2, 1.58524083 = 0.53861027*2.94320574 ...
[1.58524083, 8.66246003, 0. ],
[0. , 0. , 0. ]]
Several methods are possible such as ar[:,:,None]*ar[:,None,:] or np.einsum with the right subscript: np.einsum('ij,ik->ijk',ar,ar). Both give same result.
The next step can be done with a tensordot and specify the right axes. So with ar and W as an input, you do:
print (np.tensordot(np.einsum('ij,ik->ijk',ar,ar),W.values,axes=([1,2],[0,1])))
array([ 1.26262437, 15.29352438, 15.94605435, ...
To check for the second value here, 1*0.29010102 + 1*8.66246003 + 2.*2*1.58524083 == 15.29352438 (where 1 is W(t0,t0) and W(t1,t1), 2 is W(t0,t1))
Finally, to create the dataframe as expected, use again pd.MultiIndex.from_product:
new_df = pd.DataFrame({'col1': np.tensordot(np.einsum('ij,ik->ijk',ar,ar),
W.values,axes=([1,2],[0,1]))},
index=pd.MultiIndex.from_product([I,Q], names=['I','Q']))
print (new_df.head(3))
col1
I Q
i0 q0 1.262624
q1 15.293524
q2 15.946054
...
Note: if you are SURE that each element of T is at least once in the last level of df1, the ar can be obtain using unstack such as ar=df1.unstack(fill_value=0).values. But I would suggest to use the reindex method above to prevent any error

Related

Geometric series: calculate quotient and number of elements from sum and first & last element

Creating evenly spaced numbers on a log scale (a geometric progression) can easily be done for a given base and number of elements if the starting and final values of the sequence are known, e.g., with numpy.logspace and numpy.geomspace. Now assume I want to define the geometric progression the other way around, i.e., based on the properties of the resulting geometric series. If I know the sum of the series as well as the first and last element of the progression, can I compute the quotient and number of elements?
For instance, assume the first and last elements of the progression are and and the sum of the series should be equal to . I know from trial and error that it works out for n=9 and r≈1.404, but how could these values be computed?
You have enough information to solve it:
Sum of series = a + a*r + a*(r^2) ... + a*(r^(n-1))
= a*((r^n)-1)/(r-1)
= a*((last element * r) - 1)/(r-1)
Given the sum of series, a, and the last element, you can use the above equation to find the value of r.
Plugging in values for the given example:
50 = 1 * ((15*r)-1) / (r-1)
50r - 50 = 15r - 1
35r = 49
r = 1.4
Then, using sum of series = a*((r^n)-1)/(r-1):
50 = 1*((1.4^n)-1)(1.4-1)
21 = 1.4^n
n = log(21)/log(1.4) = 9.04
You can approximate n and recalculate r if n isn't an integer.
We have to reconstruct geometric progesssion, i.e. obtain a, q, m (here ^ means raise into power):
a, a * q, a * q^2, ..., a * q^(m - 1)
if we know first, last, total:
first = a # first item
last = a * q^(m - 1) # last item
total = a * (q^m - 1) / (q - 1) # sum
Solving these equation we can find
a = first
q = (total - first) / (total - last)
m = log(last / a) / log(q)
if you want to get number of items n, note that n == m + 1
Code:
import math
...
def Solve(first, last, total):
a = first
q = (total - first) / (total - last)
n = math.log(last / a) / math.log(q) + 1
return (a, q, n);
Fiddle
If you put your data (1, 15, 50) you'll get the solution
a = 1
q = 1.4
n = 9.04836151801382 # not integer
since n is not an integer you, probably want to adjust; let last == 15 be exact, when total can vary. In this case q = (last / first) ^ (1 / (n - 1)) and total = first * (q ^ n - 1) / (q - 1)
a = 1
q = 1.402850552006674
n = 9
total = 49.752 # now n is integer, but total <> 50
You have to solve the following two equations for r and n:
a:= An / Ao = r^(n - 1)
and
s:= Sn / Ao = (r^n - 1) / (r - 1)
You can eliminate n by
s = (r a - 1) / (r - 1)
and solve for r. Then n follows by log(a) / log(r) + 1.
In your case, from s = 50 and a = 15, we obtain r = 7/5 = 1.4 and n = 9.048...
It makes sense to round n to 9, but then r^8 = 15 (r ~ 1.40285) and r = 1.4 are not quite compatible.

Constructuing a 2D surface from a function that contains multiplication of dataframe elements

I have 2 tables(DataFrames), each has 2 columns. Let's say M1["a1","b1"] and M2["a2","b2"].
(M1 and M2 are actually refer to the same csv. I just describe them as two tables because of the function G below.)
I also have a function G = a1*b1*a2*b2 + a1*b1.
Just to make my question more clear, I would write function G as G(n,m) = a1(n)*b1(n)*a2(m)*b2(m) + a1(n)*b1(n) and would like to mention that (a1[n],b1[n]) always come in fixed pair, i.e., there is no (a1[3],b1[5]).
Later, I want to plot this function G with n corresponds with x-axis and m corresponds with y-axis.
The value of G itself will correspond with z-axis.
The final purpose is to find which (a,b) pair that gives the minimum value of G, if any.
How should I write function G in python?
Writing the following simply gives me error.
for n in a1,b1:
for m in a2,b2:
G = a1*b1*a2*b2 + a1*b1 #works, but the result consists only 1 column
G[:,:] = a1[n]*b1[n]*a2[m]*b2[m] + a1[n]*b1[n] #error
print(G)
I used simpler variable names above to simplify the post.
Here is my real code.
NMOS_gm_gmid = pandas.read_csv('NMOS_gm_gmid.csv', sep=',' , encoding='UTF-8')
NMOS_gm_gmid = NMOS_gm_gmid.apply(pandas.to_numeric, errors='coerce')
NMOS_ro_gmid = pandas.read_csv('NMOS_ro_gmid.csv', sep=',' , encoding='UTF-8')
NMOS_ro_gmid = NMOS_ro_gmid.apply(pandas.to_numeric, errors='coerce')
gm1 = NMOS_gm_gmid.iloc[:10,2]
ro1 = 1 / NMOS_ro_gmid.iloc[:10,2] * 1e6
gm2 = NMOS_gm_gmid.iloc[:10,2]
ro2 = 1 / NMOS_ro_gmid.iloc[:10,2] * 1e6
Gm = gm1*ro1*gm2*ro2 + gm1*ro1
You can use numpy broadcast here:
# M1, M2 from the same dataframe:
a1,b1,a2,b2 = df[['a1','b1','a2','b2']].to_numpy().T
G = (a1 * b1 + 1) * (a2*b2)[:,None]

Solving system of nonlinear equations with Python, Euler's formula

I am working on solving and analyzing a system of differential equations in Python. First did I solve it with help of scipy.integrate dopri5 and scopes Odeint. Which worked out fine. Then I tried to solve the equations with use of the Euler's method. The equations and code is as followed,
dj = -mu*(J**3 - (C - C0)*J - F)
dc = J + C*F + a*J**2
df = J*F - C
T = 100
dt = 0.001
t = np.linspace(0, T, int(T/dt)+1)
j = np.zeros(len(t))
c = np.zeros(len(t))
f = np.zeros(len(t))
# Initial condition
j[0] = 0.1
c[0] = -0.5
f[0] = 0.1
a = 0.3025
C0 = 0.5
mu = 50
for i in range(len(t)):
j[i+1] = j[i] + (-mu * (j[i]**3 - (c[i] - C0)*j[i] - f[i]))*dt
c[i+1] = c[i] + (j[i] + c[i] * f[i] + (a * j[i])**2)*dt
f[i+1] = f[i] + (j[i] * f[i] - c[i])*dt
Is there any reason why the Euler's method should not work when both the other two are?
In the first iteration, i is 0, and your first line of the loop essentially is:
j[0] = j[-1] + (-mu * (j[-1]**3 - (c[-1] - C0)*j[-1] - f[-1]))*dt
j[-1] is the last element of j, just like c[-1] is the last element of c, etc. Initially they are all zeros, so j[0] becomes a 0, too, which overwrites the initial conditions. To fix this problem, change range(len(t)) to range(1,len(t)). (The model diverges after the first 9200 steps, anyway.)
As DYZ says, your calculation is incorrect on the first loop iteration because j[-1] is the last element of j, which you've initialised to zero.
However, your code wastes a lot of RAM. I assume you just want arrays containing T results, plus the initial values, rather than the results calculated on every step. The code below achieves that via a double for loop. We aren't really getting any benefit from Numpy in this code, so I don't bother importing it.
Note that Euler integration is not very accurate, and you generally need to use a much smaller step size than what's required by more sophisticated integration algorithms. As DYZ mentions, with your current step size the calculation diverges before the loop finishes.
Here's a modified version of your code using a smaller step size.
T = 100
dt = 0.00001
steps = int(T / dt)
substeps = int(steps / T)
# Recalculate `dt` to compensate for possible truncation
# in the `steps` and `substeps` calculations
dt = 1.0 / substeps
print('steps, substeps, dt:', steps, substeps, dt)
a = 0.3025
C0 = 0.5
mu = 50
#dj = -mu*(J**3 - (C - C0)*J - F)
#dc = J + C*F + a*J**2
#df = J*F - C
# Initial condition
j = 0.1
c = -0.5
f = 0.1
jlst, clst, flst = [j], [c], [f]
for i in range(T):
for _ in range(substeps):
j1 = j + (-mu * (j**3 - (c - C0)*j - f))*dt
c1 = c + (j + c * f + (a * j)**2)*dt
f1 = f + (j * f - c)*dt
j, c, f = j1, c1, f1
jlst.append(j)
clst.append(c)
flst.append(f)
def round_seq(seq, places=6):
return [round(u, places) for u in seq]
print('j:', round_seq(jlst), end='\n\n')
print('c:', round_seq(clst), end='\n\n')
print('f:', round_seq(flst), end='\n\n')
output
steps, substeps, dt: 10000000 100000 1e-05
j: [0.1, 0.585459, 1.26718, 3.557956, -1.311867, -0.647698, -0.133683, 0.395812, 0.964856, 3.009683, -2.025674, -1.047722, -0.48872, 0.044296, 0.581284, 1.245423, 14.725407, -1.715456, -0.907364, -0.372118, 0.167733, 0.705257, 1.511711, -3.588555, -1.476817, -0.778593, -0.253874, 0.289294, 0.837128, 1.985792, -2.652462, -1.28088, -0.657113, -0.132971, 0.409071, 0.983504, 3.229393, -2.1809, -1.113977, -0.539586, -0.009829, 0.528546, 1.156086, 8.23469, -1.838582, -0.967078, -0.423261, 0.113883, 0.650319, 1.381138, 12.045565, -1.575015, -0.833861, -0.305952, 0.23632, 0.778052, 1.734888, -2.925769, -1.362437, -0.709641, -0.186249, 0.356775, 0.917051, 2.507782, -2.367126, -1.184147, -0.590753, -0.063942, 0.476121, 1.07614, 5.085211, -1.976542, -1.029395, -0.474206, 0.059772, 0.596505, 1.273214, 17.083466, -1.682855, -0.890842, -0.357555, 0.182944, 0.721096, 1.554496, -3.331861, -1.450497, -0.763182, -0.239007, 0.30425, 0.85435, 2.076595, -2.584081, -1.258788, -0.642362, -0.117774, 0.423883, 1.003181, 3.521072, -2.132709, -1.094792, -0.525123]
c: [-0.5, -0.302644, 0.847742, 12.886781, 0.177404, -0.423405, -0.569541, -0.521669, -0.130084, 7.97828, -0.109606, -0.363033, -0.538874, -0.61005, -0.506872, 0.05076, 216.678959, -0.198445, -0.408569, -0.566869, -0.603713, -0.451729, 0.58959, 2.252504, -0.246645, -0.451, -0.588697, -0.587898, -0.375758, 2.152898, -0.087229, -0.295185, -0.49006, -0.603411, -0.562389, -0.263696, 8.901196, -0.132332, -0.342969, -0.525087, -0.609991, -0.526417, -0.077251, 67.082608, -0.177771, -0.389092, -0.555341, -0.607658, -0.47794, 0.293664, 147.817033, -0.225425, -0.432796, -0.579951, -0.595996, -0.412269, 1.235928, -0.037058, -0.273963, -0.473412, -0.597912, -0.574782, -0.318837, 4.581828, -0.113301, -0.3222, -0.51029, -0.608168, -0.543547, -0.172371, 24.718184, -0.157526, -0.369151, -0.542732, -0.609811, -0.500922, 0.09504, 291.915024, -0.204371, -0.414, -0.56993, -0.602265, -0.443622, 0.700005, 0.740665, -0.25268, -0.456048, -0.590933, -0.585265, -0.36427, 2.528225, -0.093699, -0.301181, -0.494644, -0.60469, -0.558516, -0.245806, 10.941068, -0.137816, -0.348805, -0.52912]
f: [0.1, 0.68085, 1.615135, 1.01107, -2.660947, -0.859348, -0.134789, 0.476782, 1.520241, 4.892319, -9.514924, -2.041217, -0.61413, 0.060247, 0.792463, 2.510586, 11.393914, -6.222736, -1.559576, -0.438133, 0.200729, 1.033274, 3.348756, -39.664752, -4.304545, -1.201378, -0.282146, 0.349631, 1.331995, 4.609547, -20.169056, -3.104072, -0.923759, -0.138225, 0.513633, 1.716341, 6.739864, -11.717002, -2.307614, -0.699883, 7.4e-05, 0.700823, 2.22957, 11.017447, -7.434886, -1.751919, -0.512171, 0.138566, 0.922012, 2.9434, -30.549886, -5.028825, -1.346261, -0.348547, 0.282981, 1.19254, 3.987366, -26.554232, -3.566328, -1.0374, -0.200198, 0.439487, 1.535198, 5.645421, -14.674838, -2.619369, -0.792589, -0.060175, 0.615387, 1.985246, 8.779969, -8.991742, -1.972575, -0.590788, 0.077534, 0.820118, 2.599728, 8.879606, -5.928246, -1.509453, -0.417854, 0.218635, 1.066761, 3.477148, -36.053938, -4.124934, -1.163178, -0.263755, 0.369033, 1.37438, 4.811848, -18.741635, -2.987496, -0.893457, -0.120864, 0.535433, 1.771958, 7.117055, -11.027021, -2.227847, -0.674889]
That takes about 75 seconds on my old 2GHz machine.
Using dt = 0.000005 (which takes almost 2 minutes on this machine) the final values of j, c, and f are -0.524774, -0.529217, -0.674293, respectively, so it looks like we're beginning to get convergence.
Thanks to LutzL for pointing out that dt may need adjusting because of the rounding in the steps and substeps calculations.

getting dot products from numpy arrays and tuples simultaneously

I am trying to combine different parts of arrays and tuples to generate a series of products. Here is the tuple 'i':
i=(2,5)
Here is the first matrix 'w':
w=[array([[-1.95446441, 1.53904854, -0.3461807 ],
[-0.19153855, -1.63290931, -1.76897156]]),
array([[ 0.25648535],
[ 0.20186475],
[ 0.78002102]])]
here is the second matrix 'b':
[array([[-0.02676943],
[ 0.25294377],
[-0.43625132]]),
array([[ 0.07763943]])]
I am trying to make a series of products from various parts of these datastructures in a list of lists or matrix called 'a'.
The list of these products should be equivalent to:
a[0][0] = (w[0][0][0]*i[0]) + (w[0][1][0]*i[1]) + b[0][0]
a[0][1] = (w[0][0][1]*i[0]) + (w[0][1][1]*i[1]) + b[0][1]
a[0][2] = (w[0][0][2]*i[0]) + (w[0][1][2]*i[1]) + b[0][2]
a[1][0] = (w[1][0] * a[0][0]) + (w[1][1] * a[0][1]) + (w[1][2] * a[0][2]) + b[1][0]
I am trying to use this as part of a neural network and have written a version that works perfectly well using iteration. However I am new to numpy and would like to build a matrix based version of this. The problem I am having is more to do with understanding the numpy syntax to perform the operation above. I tried adapting this from an online tutorial but am not sure where to go from here.
for b, w in zip(b, w):
layer = sigmoid(np.dot(w, layer)+b.T)
a.append(layer)
This throws and error:
ValueError: shapes (2,3) and (1,3) not aligned: 3 (dim 1) != 1 (dim 0)
Any pointers would be very helpful?
For a start let's split your 2 variables, w and b. They aren't really arrays, they are lists of arrays with different shapes
w0 = array([[-1.95446441, 1.53904854, -0.3461807 ],
[-0.19153855, -1.63290931, -1.76897156]])
w1 = array([[ 0.25648535],
[ 0.20186475],
[ 0.78002102]])
b0 = array([[-0.02676943],
[ 0.25294377],
[-0.43625132]])
b1 = array([[ 0.07763943]])
Maybe later you can iterate over them as 2 element lists, but for now that just complicates things.
Now your a calculation simplifies to:
a0[0] = w0[0,0]*i[0] + w0[1,0]*i[1] + b0[0]
a0[1] = w0[0,1]*i[0] + w0[1,1]*i[1] + b0[1]
a0[2] = w0[0,2]*i[0] + w0[1,2]*i[1] + b0[2]
a1[0] = w1[0]* a0[0] + w1[1]*a0[1] + w1[2]*a0[2] + b1[0]
which further simplifies to:
a0 = w0[0,:]*i[0] + w0[1,:]*i[1] + b0
a1 = np.sum(w1*a0) + b1
or
I0 = np.array([i]).T
a0 = np.sum(w0*i0, axis=0) + b0
Those sums could be turned into dots; I think this works:
a0 = np.dot(w0.T,i) + b0
But I doubt if it's much of an improvement.
You can't calculate a0 and a1 together, since the one uses the other. But you could cast it as an iteration like (not tested):
I0 = ...
w = [w0,w1]
b = [b0,b1]
a = [None,None]
for i in range(...):
a[i] = np.sum(w[i]*I0, axis=0) + b[i]
I0 = a[i]

How to multiply arrays in pandas?

I have two arrays
x = [a,b,c]
y = [5,6,7]
I want to calculate the product such that the result of x * y is
x[0]* 5 + x[1] * 6 + x[2] * 7
Actually this is part of constraints equation that I have to form for optimization using scipy and pandas.
Also,
I have many numpy arrays that I created after reading a csv file. And I want to create my objective function on run time.
Here is the hard coded form of the objective function
def objFunc(x,sign=1.0) :
"""return sign*(sum(coeff[0:] *(wf[0:] + wv[0:] * decisionVars[0:])**power.values[0:]))"""
return sign* (( coeff.values[0]*(wf.values[0]+ wv.values[0] *x[0])**power.values[0] ) +
(coeff.values[1]*(wf.values[1]+ wv.values[1] *x[0])**power.values[1])+
(coeff.values[2]*(wf.values[2]+ wv.values[2] *x[0])**power.values[2]) +
(coeff.values[3]*(wf.values[3]+ wv.values[3] *x[0])**power.values[3]) +
(coeff.values[4]*(wf.values[4]+ wv.values[4] *x[0])**power.values[4] )+
(coeff.values[5]*(wf.values[5]+ wv.values[5] *x[0])**power.values[5]) +
(coeff.values[6]*(wf.values[6]+ wv.values[6] *x[1])**power.values[6]) +
(coeff.values[7]*(wf.values[7]+ wv.values[7] *x[1])**power.values[7]) +
(coeff.values[8]*(wf.values[8]+ wv.values[8] *x[1])**power.values[8]) +
(coeff.values[9]*(wf.values[9]+ wv.values[9] *x[2])**power.values[9]) +
(coeff.values[10]*(wf.values[10]+ wv.values[10] *x[2])**power.values[10]) +
(coeff.values[11]*(wf.values[11]+ wv.values[11] *x[2])**power.values[11]))
I used various ways to calculate it but to no avail.
df = pd.DataFrame.from_csv('C:\Users\prashant.mudgal\Downloads\T1 - Copy.csv')
df2 = pd.DataFrame.from_csv('C:\Users\prashant.mudgal\Downloads\T2.csv')
decisionVars= df2['DV']
coeff = df2['coef']
"""subset for power"""
power = df2['p']
wf = df2['weight_f']
wv = df2['weight_v']
def objFunc(x,sign=1.0) :
return sign*(sum(coeff[0:] *(wf[0:] + wv[0:] * decisionVars[0:])**power.values[0:]))
This works out of the box:
In [9]: df = pd.DataFrame([[1,5],[2,6],[3,7]], columns=list('ab'))
In [10]: df
Out[10]:
a b
0 1 5
1 2 6
2 3 7
In [11]: df.a * df.b
Out[11]:
0 5
1 12
2 21
dtype: int64

Categories

Resources