I have two arrays
x = [a,b,c]
y = [5,6,7]
I want to calculate the product such that the result of x * y is
x[0]* 5 + x[1] * 6 + x[2] * 7
Actually this is part of constraints equation that I have to form for optimization using scipy and pandas.
Also,
I have many numpy arrays that I created after reading a csv file. And I want to create my objective function on run time.
Here is the hard coded form of the objective function
def objFunc(x,sign=1.0) :
"""return sign*(sum(coeff[0:] *(wf[0:] + wv[0:] * decisionVars[0:])**power.values[0:]))"""
return sign* (( coeff.values[0]*(wf.values[0]+ wv.values[0] *x[0])**power.values[0] ) +
(coeff.values[1]*(wf.values[1]+ wv.values[1] *x[0])**power.values[1])+
(coeff.values[2]*(wf.values[2]+ wv.values[2] *x[0])**power.values[2]) +
(coeff.values[3]*(wf.values[3]+ wv.values[3] *x[0])**power.values[3]) +
(coeff.values[4]*(wf.values[4]+ wv.values[4] *x[0])**power.values[4] )+
(coeff.values[5]*(wf.values[5]+ wv.values[5] *x[0])**power.values[5]) +
(coeff.values[6]*(wf.values[6]+ wv.values[6] *x[1])**power.values[6]) +
(coeff.values[7]*(wf.values[7]+ wv.values[7] *x[1])**power.values[7]) +
(coeff.values[8]*(wf.values[8]+ wv.values[8] *x[1])**power.values[8]) +
(coeff.values[9]*(wf.values[9]+ wv.values[9] *x[2])**power.values[9]) +
(coeff.values[10]*(wf.values[10]+ wv.values[10] *x[2])**power.values[10]) +
(coeff.values[11]*(wf.values[11]+ wv.values[11] *x[2])**power.values[11]))
I used various ways to calculate it but to no avail.
df = pd.DataFrame.from_csv('C:\Users\prashant.mudgal\Downloads\T1 - Copy.csv')
df2 = pd.DataFrame.from_csv('C:\Users\prashant.mudgal\Downloads\T2.csv')
decisionVars= df2['DV']
coeff = df2['coef']
"""subset for power"""
power = df2['p']
wf = df2['weight_f']
wv = df2['weight_v']
def objFunc(x,sign=1.0) :
return sign*(sum(coeff[0:] *(wf[0:] + wv[0:] * decisionVars[0:])**power.values[0:]))
This works out of the box:
In [9]: df = pd.DataFrame([[1,5],[2,6],[3,7]], columns=list('ab'))
In [10]: df
Out[10]:
a b
0 1 5
1 2 6
2 3 7
In [11]: df.a * df.b
Out[11]:
0 5
1 12
2 21
dtype: int64
Related
I was trying to solve a random problem, I used a relation that I made, when I've come to implement it in python it give me different results than the one that I calculated, so I tried to change.
the thing is I don't get how does python see each one!!?
those two expressions here give different results sometimes:
((column+1)//2) * ((row+1)//2)
= (column+1)//2 * (row+1)//2
Here's an example:
rows, columns = 4, 4
for row in range(2, rows+1):
for column in range(1, columns+1):
print('*'*15)
result = ((column+1)//2) * ((row+1)//2)
f_result = (column+1)//2 * (row+1)//2
print('>> normal expression:', (column+1)//2, (row+1)//2)
print('>> second expression:', ((column+1)//2), ((row+1)//2))
print('>> row:', row)
print('>> column:', column)
print('>> Results:', result, f_result)
print()
The last two entries in the results:
***************
>> normal expression: 2 2
>> second expression: 2 2
>> row: 4
>> column: 3
>> Results: 4 5
***************
>> normal expression: 2 2
>> second expression: 2 2
>> row: 4
>> column: 4
>> Results: 4 5
You need to understand operator precedence first
Check out this link
Now for the expression
((col+1)//2) * ((row+1)//2) = (col+1)//2 * (row+1)//2
((col+1)//2) * ((row+1)//2) = ((4+1)//2) * ((4+1)//2)
= (5//2)*(5//2)
= 2 * 2
= 4
(col+1)//2 * (row+1)//2 = (4+1)//2 * (4+1)//2
= 5//2 * 5//2
= 2 * 5//2
= 10//2 (as * has higher precedence over //)
= 5
I have 2 tables(DataFrames), each has 2 columns. Let's say M1["a1","b1"] and M2["a2","b2"].
(M1 and M2 are actually refer to the same csv. I just describe them as two tables because of the function G below.)
I also have a function G = a1*b1*a2*b2 + a1*b1.
Just to make my question more clear, I would write function G as G(n,m) = a1(n)*b1(n)*a2(m)*b2(m) + a1(n)*b1(n) and would like to mention that (a1[n],b1[n]) always come in fixed pair, i.e., there is no (a1[3],b1[5]).
Later, I want to plot this function G with n corresponds with x-axis and m corresponds with y-axis.
The value of G itself will correspond with z-axis.
The final purpose is to find which (a,b) pair that gives the minimum value of G, if any.
How should I write function G in python?
Writing the following simply gives me error.
for n in a1,b1:
for m in a2,b2:
G = a1*b1*a2*b2 + a1*b1 #works, but the result consists only 1 column
G[:,:] = a1[n]*b1[n]*a2[m]*b2[m] + a1[n]*b1[n] #error
print(G)
I used simpler variable names above to simplify the post.
Here is my real code.
NMOS_gm_gmid = pandas.read_csv('NMOS_gm_gmid.csv', sep=',' , encoding='UTF-8')
NMOS_gm_gmid = NMOS_gm_gmid.apply(pandas.to_numeric, errors='coerce')
NMOS_ro_gmid = pandas.read_csv('NMOS_ro_gmid.csv', sep=',' , encoding='UTF-8')
NMOS_ro_gmid = NMOS_ro_gmid.apply(pandas.to_numeric, errors='coerce')
gm1 = NMOS_gm_gmid.iloc[:10,2]
ro1 = 1 / NMOS_ro_gmid.iloc[:10,2] * 1e6
gm2 = NMOS_gm_gmid.iloc[:10,2]
ro2 = 1 / NMOS_ro_gmid.iloc[:10,2] * 1e6
Gm = gm1*ro1*gm2*ro2 + gm1*ro1
You can use numpy broadcast here:
# M1, M2 from the same dataframe:
a1,b1,a2,b2 = df[['a1','b1','a2','b2']].to_numpy().T
G = (a1 * b1 + 1) * (a2*b2)[:,None]
I have data frame like below to which I need to following transformations
Extract and keep only 3 decimals from first number before comma , and also keep only 3 decimal points for second number .
Comma should replaced by :
If the number has only two decimal points and add one zero extra to make it 3 decimal points.
Input
df
[151.20732,-33.86785]
[81.67732,-09.86]
[1.2890,43.8]
[567.200,33.867]
[557.21,33.86]
Expected Output
151.207:-33.867
81.677:-09.860
1.289:43.800
567.200:33.867
557.210,33.860
How can this be done in pandas?
This is hard than I thought
def func(y,n):
if y < 0 :
return "%0.3f" % (-(y * 10 ** n // -1 / 10 ** n))
else :
return "%0.3f" % (y * 10 ** n // 1 / 10 ** n)
df.apply(lambda x : ':'.join([ func (y, 3) for y in x]) )
Out[86]:
0 151.207:33.867
1 81.677:9.860
2 1.289:43.800
3 567.200:33.867
4 557.210:33.860
dtype: object
Input
data = [[151.20732,-33.86785],
[81.67732,-09.86],
[1.2890,43.8],
[567.200,33.867],
[557.21,33.86]]
df = pd.Series(data)
DataFrame Input Option 1:
data = [[[151.20732,-33.86785]],
[[81.67732,-09.86]],
[[1.2890,43.8]],
[[567.200,33.867]],
[[557.21,33.86]]]
df = pd.DataFrame(data, columns=['geo'])
DataFrame Input Option 2:
literal_eval is used to read a CSV file that contains a list, otherwise list is read as a single string.
import ast
literal = lambda x: ast.literal_eval(x)
data = pd.read_csv('/Test_data.csv', converters={'geo.geometry.coordinates': literal})
df = pd.DataFrame(data, columns=['geo.geometry.coordinates'])
df.rename(columns = {'geo.geometry.coordinates':'geo'}, inplace = True)
Algorithm:
import math
def trunc(f,d):
# Truncate float (f) to d decimal places, unless NaN
return 'nan' if math.isnan(f) else f"{int(f*10**d)/10**d:0.{d}f}"
df['geo_neo'] = df.apply(lambda r: trunc(r['geo'][0], 3) + ':'
+ trunc(r['geo'][1], 3), axis = 1)
DataFrame Output:
geo geo_neo
0 [151.20732, -33.86785] 151.207:-33.867
1 [81.67732, -9.86] 81.677:-9.860
2 [1.289, 43.8] 1.289:43.800
3 [567.2, 33.867] 567.200:33.867
4 [557.21, 33.86] 557.210:33.860
So I'm having the following problem:
I have a dataframe like the one bellow where time_diff_float is the time difference between each row and the row above in minutes. So, for example, I had value = 4 20 minutes after value = 1.
value | time_diff_float
1 NaN
4 20
3 13
2 55
5 08
7 15
First I have to check if the time difference between two rows is < 60 (one hour) and create a column using the formula rem = value (from row above) * lambda ** time difference between 2 rows . My lambda is a constant with the value of 0.97.
And then, if the time difference between each row and 2 rows above is still inferior to 60, I have to re-do the same thing comparing each row with 2 rows above. And then I have to do the same thing comparing 3 rows above and etc.
To do that I wrote the following code:
df.loc[df['time_diff_float'] < 60, 'rem_1'] = df['value'].shift() * (lambda_ ** (df['time_diff_float'] - 1))
df.loc[df['time_diff_float'] + df['time_diff_float'].shift() < 60, 'rem_2'] = df['value'].shift(2) * (lambda_ ** (df['time_diff_float'] + df['time_diff_float'].shift() - 1))
df.loc[df['time_diff_float'] + df['time_diff_float'].shift() + df['time_diff_float'].shift(2) < 60, 'rem_3'] = df['value'].shift(3) * (lambda_ ** (df['time_diff_float'] + df['time_diff_float'].shift() + df['time_diff_float'].shift(2) - 1))
My question is: since I have to re-do this at least 10 times (even more) with the real values I have, is there a way to create the "rem columns" dynamically?
Thanks in advance!
You can save a mask of your data and then update it in every time of the loop:
n = 3
for i in range(1, n):
if (i==1):
mask = df['time_diff_float']
df.loc[mask, 'rem_' +str(i)] = df['value'].shift() * (lambda_ ** (mask - 1))
else:
mask += df['time_diff_float'].shift(i-1)
df.loc[mask < 60, 'rem_'+str(i)] = df['value'].shift(i) * (lambda_ ** (mask - 1))
I have two DFs which I would like to use to calculate the following:
w(ti,ti)*a(ti)^2 + w(tj,tj)*b(sj,tj)^2 + 2*w(si,tj)*a(ti)*b(tj)
The above uses two terms (a,b).
w is the weight df where i and j are index and column spaces pertaining to the Tn index of a and b.
Set Up - Edit dynamic W
import pandas as pd
import numpy as np
I = ['i'+ str(i) for i in range(4)]
Q = ['q' + str(i) for i in range(5)]
T = ['t' + str(i) for i in range(3)]
n = 100
df1 = pd.DataFrame({'I': [I[np.random.randint(len(I))] for i in range(n)],
'Q': [Q[np.random.randint(len(Q))] for i in range(n)],
'Tn': [T[np.random.randint(len(T))] for i in range(n)],
'V': np.random.rand(n)}).groupby(['I','Q','Tn']).sum()
df1.head(5)
I Q Tn V
i0 q0 t0 1.626799
t2 1.725374
q1 t0 2.155340
t1 0.479741
t2 1.039178
w = np.random.randn(len(T),len(T))
w = (w*w.T)/2
np.fill_diagonal(w,1)
W = pd.DataFrame(w, columns = T, index = T)
W
t0 t1 t2
t0 1.000000 0.029174 -0.045754
t1 0.029174 1.000000 0.233330
t2 -0.045754 0.233330 1.000000
Effectively I would like to use the index Tn in df1 to use the above equation for every I and Q.
The end result for df1.loc['i0','q0'] in the example above should be:
W(t0,t0) * V(t0)^2
+ W(t2,t2) * V(t2)^2
+ 2 * W(t0,t2) * V(t0) * V(t2)
=
1.0 * 1.626799**2
+ 1.0 * 1.725374**2
+ (-0.045754) * 1.626799 * 1.725374
The end result for df1.loc['i0','q1'] in the example above should be:
W(t0,t0) * V(t0)^2
+ W(t1,t1) * V(t1)^2
+ W(t2,t2) * V(t2)^2
+ 2 * W(t0,t1) * V(t0) * V(t1)
+ 2 * W(t0,t2) * V(t0) * V(t2)
+ 2 * W(t2,t1) * V(t1) * V(t2)
=
1.0 * 2.155340**2
+ 1.0 * 0.479741**2
+ 1.0 * 1.039178**2
+ 0.029174 * 2.155340 * 0.479741 * 1
+ (-0.045754) * 2.155340 * 1.039178 * 1
+ 0.233330 * 0.479741 * 1.039178 * 1
This pattern will repeat depending on the number of tn terms in each Q hence it should be robust enough to handle as many Tn terms as needed (in the example I use 3, but it could be as much as 100 or more).
Each result should then be saved in a new DF with Index = [I, Q]
The solution should also not be slower than excel when n increases in value.
Thanks in advance
One way could be first reindex your dataframe df1 with all the possible combinations of the lists I, Q and Tn with pd.MultiIndex.from_product, filling the missing value in the column 'V' with 0. The column has then len(I)*len(Q)*len(T) elements. Then you can reshape the values to get each row related to one combination on I and Q such as:
ar = (df1.reindex(pd.MultiIndex.from_product([I,Q,T], names=['I','Q','Tn']),fill_value=0)
.values.reshape(-1,len(T)))
To see the relation between my input df1 and ar, here are some related rows
print (df1.head(6))
V
I Q Tn
i0 q0 t1 1.123666
q1 t0 0.538610
t1 2.943206
q2 t0 0.570990
t1 0.617524
t2 1.413926
print (ar[:3])
[[0. 1.1236656 0. ]
[0.53861027 2.94320574 0. ]
[0.57099049 0.61752408 1.4139263 ]]
Now, to perform the multiplication with the element of W, one way is to create the outer product of ar with itself but row-wise to get, for each row a len(T)*len(T) matrix. For example, for the second row:
[0.53861027 2.94320574 0. ]
becomes
[[0.29010102, 1.58524083, 0. ], #0.29010102 = 0.53861027**2, 1.58524083 = 0.53861027*2.94320574 ...
[1.58524083, 8.66246003, 0. ],
[0. , 0. , 0. ]]
Several methods are possible such as ar[:,:,None]*ar[:,None,:] or np.einsum with the right subscript: np.einsum('ij,ik->ijk',ar,ar). Both give same result.
The next step can be done with a tensordot and specify the right axes. So with ar and W as an input, you do:
print (np.tensordot(np.einsum('ij,ik->ijk',ar,ar),W.values,axes=([1,2],[0,1])))
array([ 1.26262437, 15.29352438, 15.94605435, ...
To check for the second value here, 1*0.29010102 + 1*8.66246003 + 2.*2*1.58524083 == 15.29352438 (where 1 is W(t0,t0) and W(t1,t1), 2 is W(t0,t1))
Finally, to create the dataframe as expected, use again pd.MultiIndex.from_product:
new_df = pd.DataFrame({'col1': np.tensordot(np.einsum('ij,ik->ijk',ar,ar),
W.values,axes=([1,2],[0,1]))},
index=pd.MultiIndex.from_product([I,Q], names=['I','Q']))
print (new_df.head(3))
col1
I Q
i0 q0 1.262624
q1 15.293524
q2 15.946054
...
Note: if you are SURE that each element of T is at least once in the last level of df1, the ar can be obtain using unstack such as ar=df1.unstack(fill_value=0).values. But I would suggest to use the reindex method above to prevent any error