I have a dataframe with a specific value called rate_multiplier that I need to grab and compare it to the rate_multiplier I get from AWS s3 bucket.
In order to grab the rate_multiplier in the dataframe, I need to take the random variables that I created for a "person" and match them to the dataframe which gives a specific rate_mulitplier based on these certain characteristics
For example:
Random variables created:
Life_term = 25
Gender = F
Rate_class = Best-AB
Age = 27
Coverage = 2310
Dataframe:
Life_term Benefit_Length_months Gender Rate_class Age State Coverage band (low end) Coverage band (high end) Coverage Band Rate_multiplier
0 15 180 M Best-AA 18 Default 500 1199 500-1199 2.31
1 15 180 M Best-AA 19 Default 500 1199 500-1199 2.21
2 15 180 M Best-AA 20 Default 500 1199 500-1199 2.11
3 15 180 M Best-AA 21 Default 500 1199 500-1199 2.03
4 15 180 M Best-AA 22 Default 500 1199 500-1199 1.95
... ... ... ... ... ... ... ... ... ... ...
34987 10 120 F Nicotine Average-CD 61 Default 3600 10000 3600+ 19.10
34988 10 120 F Nicotine Average-CD 62 Default 3600 10000 3600+ 21.27
34989 10 120 F Nicotine Average-CD 63 Default 3600 10000 3600+ 23.44
34990 10 120 F Nicotine Average-CD 64 Default 3600 10000 3600+ 25.61
34991 10 120 F Nicotine Average-CD 65 Default 3600 10000 3600+ 27.78
So for this example, my randomly generated person would get a rate_multiplier of:
0.93
My code is as follows:
rate_mult_df.loc[(rate_mult_df['Life_term'] == 15) & (rate_mult_df['Gender'] == 'F') & (rate_mult_df['Rate_class'] == 'Best-AB') & (rate_mult_df['Age'] == 27) & (rate_mult_df['Coverage band (low end)'] <= 2310) & (rate_mult_df['Coverage band (high end)'] >= 2310)]
Is the right way to grab the rate_muliplier for the randomly genreated person or is there any easier way? Any and all help is appreciated. Please let me know if my question is clear enough. Working on that everyday.
For perfomance reasons I'd use .query()
rate_multiplier = df.query(
"Life_term == 15 &"
" Gender == 'F' &"
" Rate_class == 'Best-AB' &"
" Age == 27 &"
" `Coverage band (low end)` == 2310 &"
" `Coverage band (high end)` == 2310"
)["Rate_multiplier"].squeeze()
"Easier" depends on your workflow. For example if you want to query from a dictionary you could use:
def get_rate_multiplier(search_params: dict) -> str:
return " and ".join(
[f"({k} == '{v}')" if type(v) == str else f"({k} == {v})" for k, v in search_params.items()]
)
random_person = {
"Life_term": 15, "Gender": "F", "Rate_class": "Best-AB",
"Age": 27, "Coverage band (low end)": 2310, "Coverage band (high end)": 2310
}
rate_multiplier = float(df.query(get_rate_multiplier(random_person))["Rate_multiplier"].squeeze())
Related
Am having a dataframe,need to implement
every month I will be running this script so automatically it will pick based on extracted date
Input Dataframe
client_id expo_value value cal_value extracted_date
1 126 30 27.06 08/2022
2 135 60 36.18 08/2022
3 144 120 45 08/2022
4 162 30 54.09 08/2022
5 153 90 63.63 08/2022
6 181 120 72.9 08/2022
Input Dataframe
client_id expo_value value cal_value extracted_date Output_Value
1 126 30 27.06 08/2022 126+26.18 = 152.18
2 135 60 36.18 08/2022 261.29+70.02 = 331.31
3 144 120 45 08/2022 557.4+174.19 = 731.59
4 162 30 54.09 08/2022 156.7+ 52.34 = 209.04
5 153 90 63.63 08/2022 444.19+ 182.9 =627.09
6 181 120 72.9 08/2022 700.64+282.19=982.83
I want to implement 31 days/30 days/28 days inside the below function & i tried manually entering the number 31(days) for calculation but it should automatically should pick based on which month has how many days
def month_data(data):
if (data['value'] <=30).any():
return data['expo_value'] *30/ 31(days) + data['cal_value'] * 45/ 31(days)
elif (data['value'] <=60).any():
return data['expo_value'] *60/ 31(days) + data['cal_value'] * 90/31(days)
elif (data['value'] <=90).any():
return data['expo_value'] *100/31(days) + data['cal_value'] * 120/ 31(days)
else (data['value'] <=120).any():
return np.nan
Let me see if I understood you correctly. I tried to reproduce a small subset of your dataframe (you should do this next time you post something). The answer is as follows:
import pandas as pd
from datetime import datetime
import calendar
# I'll make a subset dataframe based on your example
data = [[30, '02/2022'], [60, '08/2022']]
df = pd.DataFrame(data, columns=['value', 'extracted_date'])
# First, turn the extracted_date column into a correct date format
date_correct_format = [datetime.strptime(i, '%m/%Y') for i in df['extracted_date']]
# Second, calculate the number of days per month
num_days = [calendar.monthrange(i.year, i.month)[1] for i in date_correct_format]
num_days
The data I used look like this
data
Subject 2000_X1 2000_X2 2001_X1 2001_X2 2002_X1 2002_X2
1 100 50 120 45 110 50
2 95 40 100 45 105 50
3 110 45 100 45 110 40
I want to calculate each variable growth for each year so the result will look like this
Subject 2001_X1_gro 2001_X2_gro 2002_X1_gro 2002_X2_gro
1 0.2 -0.1 -0.08333 0.11111
2 0.052632 0.125 0.05 0.11111
3 -0.09091 0 0.1 -0.11111
I already do it manually for each variable for each year with code like this
data[2001_X1_gro]= (data[2001_X1]-data[2000_X1])/data[2000_X1]
data[2002_X1_gro]= (data[2002_X1]-data[2001_X1])/data[2001_X1]
data[2001_X2_gro]= (data[2001_X2]-data[2000_X2])/data[2000_X2]
data[2002_X2_gro]= (data[2002_X2]-data[2001_X2])/data[2001_X2]
Is there a way to do it more efficient escpecially if I have more year and/or more variable?
import pandas as pd
df = pd.read_csv('data.txt', sep=',', header=0)
Input
Subject 2000_X1 2000_X2 2001_X1 2001_X2 2002_X1 2002_X2
0 1 100 50 120 45 110 50
1 2 95 40 100 45 105 50
2 3 110 45 100 45 110 40
Next, a loop is created and the columns are filled:
qqq = '_gro'
new_name = ''
year = ''
for i in range(1, len(df.columns) - 2):
year = str(int(df.columns[i][:4]) + 1) + df.columns[i][4:]
new_name = year + qqq
df[new_name] = (df[year] - df[df.columns[i]])/df[df.columns[i]]
print(df)
Output
Subject 2000_X1 2000_X2 2001_X1 2001_X2 2002_X1 2002_X2 2001_X1_gro \
0 1 100 50 120 45 110 50 0.200000
1 2 95 40 100 45 105 50 0.052632
2 3 110 45 100 45 110 40 -0.090909
2001_X2_gro 2002_X1_gro 2002_X2_gro
0 -0.100 -0.083333 0.111111
1 0.125 0.050000 0.111111
2 0.000 0.100000 -0.111111
In the loop, the year is extracted from the column name, converted to int, 1 is added to it. The value is again converted to a string, the prefix '_Xn' is added. A new_name variable is created, to which the string '_gro ' is also appended. A column is created and filled with calculated values.
If you want to count, for example, for three years, then you need to add not 1, but 3. This is with the condition that your data will be ordered. And note that the loop does not go through all the elements: for i in range(1, len(df.columns) - 2):. In this case, it skips the Subject column and stops short of the last two values. That is, you need to know where to stop it.
I have a dataframe with a sample of the employee survey results as shown below. The values in the delta columns are just the difference between the FY21 and FY20 columns.
Employee leadership_fy21 leadership_fy20 leadership_delta comms_fy21 comms_fy20 comms_delta
patrick.t#abc.com 88 50 38 90 80 10
johnson.g#abc.com 22 82 -60 80 90 -10
pamela.u#abc.com 41 94 -53 44 60 -16
yasmine.a#abc.com 90 66 24 30 10 20
I'd like to create multiple columns that
i. contain the % in the fy21 values
ii. merge it with the columns with the delta suffix such that the delta values are in a ().
example output would be:
Employee leadership_fy21 leadership_delta leadership_final comms_fy21 comms_delta comms_final
patrick.t#abc.com 88 38 88% (38) 90 10 90% (10)
johnson.g#abc.com 22 -60 22% (-60) 80 -10 80% (-10)
pamela.u#abc.com 41 -53 41% (-53) 44 -16 44% (-16)
yasmine.a#abc.com 90 24 90% (24) 30 20 30% (20)
I have tried the following code but it doesn't seem to work. It might have to do with numpy not being able to combine strings. Appreciate any form of help I can get, thank you.
#create a list of all the rating columns
ratingcollist = ['leadership','comms','wellbeing','teamwork']
#create a for loop to get all the columns that match the column list
for rat in ratingcollist:
cols = df.filter(like=rat).columns
fy21cols = df[cols].filter(like='_fy21').columns
deltacols = df[cols].filter(like='_delta').columns
if len(cols) > 0:
df[f'{rat.lower()}final'] = (df[fy21cols].values.astype(str) + '%' + '(' + df[deltacols].values.astype(str) + ')')
You can do this:
def yourfunction(ratingcol):
x=df.filter(regex=f'{ratingcol}(_delta|_fy21)')
fy=x.filter(regex='21').iloc[:,0].astype(str)
delta=x.filter(regex='_delta').iloc[:,0].astype(str)
return(fy+"%("+delta+")")
yourfunction('leadership')
0 88%(38)
1 22%(-60)
2 41%(-53)
3 90%(24)
Then, using a for loop you can create your columns
for i in ratingcollist:
df[f"{i}_final"]=yourfunction(i)
I am trying to do optimization with Pyomo (solver Ipopt).
I have 2 sets (j for the numbers of generator and t for times),6 indexed parameters (A,B,C,Pmin,Pmax :indexed on model.J, Demand indexed on model.T)
What I want to do is generate 24 different costs base on 24 different demands.
After I ran the code, this error appeared:
TypeError: P_LoadgenBalance() takes 1 positional argument but 2 were given.
I don’t know why this error occurs,Hope you can help me with it. Thanks for your help!
Vivi
from pyomo.environ import *
import matplotlib.pyplot as plt
import numpy as np
# create a model
model = AbstractModel()
# There are ten generators with different values of ABC.
# Minimizing the costs and find the optimal dispatch for 24 different demands changed with time
# obj:Cost= sum(ap^2 +bp+c)
# constraints: sum_i P(i,t) >= load(t) , (Pmin =< P =< Pmax).
# declare decision variables
model.M = Param(mutable=True)
model.T = RangeSet(model.M)
model.N = Param(mutable=True)
model.J = RangeSet(model.N)
model.A = Param(model.J)
model.B = Param(model.J)
model.C = Param(model.J)
model.D = Param(model.J)
model.E = Param(model.J)
model.F = Param(model.J)
model.P_min = Param(model.J, within=PositiveReals)
model.P_max = Param(model.J, within=PositiveReals)
model.demand = Param(model.T)
model.emission_value = Param(initialize=1000000, mutable=True)
# declare constraints_Pbounds (Pmin =< P =< Pmax)
def Pbounds(model, j,t):
return (model.P_min[j], model.P_max[j])
model.P = Var(model.J, model.T, bounds=Pbounds, domain=NonNegativeReals)
# declare constraints_P_LoadgenBalance ( sum P >= demand)
def P_LoadgenBalance(model,t):
return sum(model.P[j,t] for j in model.J ) >= model.demand[t]
model.P_LoadgenBalance = Constraint(model.T, rule=P_LoadgenBalance)
# declare objective_cost
def obj_cost(model):
return sum(model.A[j]* model.P[j,t] ** 2 + model.B[j] * model.P[j,t] + model.C[j] for j in model.J for t in model.T)
model.cost= Objective(rule=obj_cost, sense=minimize)
# declare objective_emission
def obj_emission(model):
return sum(model.E[j]* model.P[j,t] ** 2 + model.D[j] * model.P[j,t] + model.F[j] for j in model.J for t in model.T)
model.emission= Objective(rule=obj_emission, sense=minimize)
model.emission.deactivate()
opt = SolverFactory('Ipopt')
instance = model.create_instance("E:\pycharm_project\END-10units.dat")
results = opt.solve(instance)
print(value(instance.cost)
Data file
param M:=24;
param N:=10;
# Creating Parameters A, B, C,D,E,F, P_min, P_max:
param : A B C D E F P_min P_max:=
1 0.0148 12.1 82 2.15 3.59 -11.4 80 200
2 0.0289 12.6 49 3.63 2.02 -3.65 120 320
3 0.0135 13.2 100 3.3 4.7 -4.04 50 150
4 0.0127 13.9 105 3.73 1.61 -13.44 250 520
5 0.0261 13.5 72 2.27 2.29 -4.41 80 280
6 0.0212 15.4 29 2.37 2.77 -8.61 50 150
7 0.0382 14 32 2.03 4.86 -8.91 30 120
8 0.0393 13.5 40 2.4 3.32 -31.74 30 110
9 0.0396 15 25 2.5 4.03 -19.14 20 80
10 0.051 14.3 15 3.43 3.27 -21.02 20 60
param demand:=
1 600
2 650
3 680
4 651
5 630
6 650
7 810
8 820
9 883
10 893
11 888
12 901
13 892
14 875
15 843
16 877
17 880
18 904
19 865
20 855
21 766
22 733
23 688
24 654;
You're not defining t in the function P_LoadgenBalance, so t is the other expected argument. Code should be like:
def P_LoadgenBalance(model ,t):
For storage the individual cost, you must create a variable for it as follows with the new objective function:
# Cost Variable to store individual cost per j through t
model.X = Var(model.J, model.T, domain=NonNegativeReals)
# Cost Function
def cost(model, j, t):
return model.X[j, t] == model.A[j] * model.P[j, t] ** 2 + model.B[j] * model.P[j, t] + model.C[j]
model.cost = Constraint(model.T, Model.J, rule=cost)
# Objective
def obj_cost(model):
return sum(model.X[j, t] for j in model.J for t in model.T)
model.total_cost = Objective(rule=obj_cost, sense=minimize)
I have an ODE system of 7 equations for explaining a particular set of microorganisms dynamics of the form:
Where the are the different chemical and microorganisms species involved (even sub-indexes for chemical compounds), the are the yield coefficients and the are the pseudo-reactions:
I am using Pyomo for the estimation of all my unknown parameters, which are basically all the yield coefficients and kinetic constants (15 in total).
The following code works perfectly when is used with complete experimental time series for each of the dynamical variables:
from pyomo.environ import *
from pyomo.dae import *
m = AbstractModel()
m.t = ContinuousSet()
m.MEAS_t = Set(within=m.t) # Measurement times, must be subset of t
m.x1_meas = Param(m.MEAS_t)
m.x2_meas = Param(m.MEAS_t)
m.x3_meas = Param(m.MEAS_t)
m.x4_meas = Param(m.MEAS_t)
m.x5_meas = Param(m.MEAS_t)
m.x6_meas = Param(m.MEAS_t)
m.x7_meas = Param(m.MEAS_t)
m.x1 = Var(m.t,within=PositiveReals)
m.x2 = Var(m.t,within=PositiveReals)
m.x3 = Var(m.t,within=PositiveReals)
m.x4 = Var(m.t,within=PositiveReals)
m.x5 = Var(m.t,within=PositiveReals)
m.x6 = Var(m.t,within=PositiveReals)
m.x7 = Var(m.t,within=PositiveReals)
m.k1 = Var(within=PositiveReals)
m.k2 = Var(within=PositiveReals)
m.k3 = Var(within=PositiveReals)
m.k4 = Var(within=PositiveReals)
m.k5 = Var(within=PositiveReals)
m.k6 = Var(within=PositiveReals)
m.k7 = Var(within=PositiveReals)
m.k8 = Var(within=PositiveReals)
m.k9 = Var(within=PositiveReals)
m.y1 = Var(within=PositiveReals)
m.y2 = Var(within=PositiveReals)
m.y3 = Var(within=PositiveReals)
m.y4 = Var(within=PositiveReals)
m.y5 = Var(within=PositiveReals)
m.y6 = Var(within=PositiveReals)
m.x1dot = DerivativeVar(m.x1,wrt=m.t)
m.x2dot = DerivativeVar(m.x2,wrt=m.t)
m.x3dot = DerivativeVar(m.x3,wrt=m.t)
m.x4dot = DerivativeVar(m.x4,wrt=m.t)
m.x5dot = DerivativeVar(m.x5,wrt=m.t)
m.x6dot = DerivativeVar(m.x6,wrt=m.t)
m.x7dot = DerivativeVar(m.x7,wrt=m.t)
def _init_conditions(m):
yield m.x1[0] == 51.963
yield m.x2[0] == 6.289
yield m.x3[0] == 0
yield m.x4[0] == 6.799
yield m.x5[0] == 0
yield m.x6[0] == 4.08
yield m.x7[0] == 0
m.init_conditions=ConstraintList(rule=_init_conditions)
def _x1dot(m,i):
if i==0:
return Constraint.Skip
return m.x1dot[i] == - m.y1*m.k1*m.x1[i]*m.x2[i]/(m.k2+m.x1[i]) - m.y2*m.k3*m.x1[i]*m.x4[i]/(m.k4+m.x1[i])
m.x1dotcon = Constraint(m.t, rule=_x1dot)
def _x2dot(m,i):
if i==0:
return Constraint.Skip
return m.x2dot[i] == m.k1*m.x1[i]*m.x2[i]/(m.k2+m.x1[i]) - m.k7*m.x2[i]*m.x3[i]
m.x2dotcon = Constraint(m.t, rule=_x2dot)
def _x3dot(m,i):
if i==0:
return Constraint.Skip
return m.x3dot[i] == m.y3*m.k1*m.x1[i]*m.x2[i]/(m.k2+m.x1[i]) - m.y4*m.k5*m.x3[i]*m.x6[i]/(m.k6+m.x3[i])
m.x3dotcon = Constraint(m.t, rule=_x3dot)
def _x4dot(m,i):
if i==0:
return Constraint.Skip
return m.x4dot[i] == m.k3*m.x1[i]*m.x4[i]/(m.k4+m.x1[i]) - m.k8*m.x4[i]*m.x3[i]
m.x4dotcon = Constraint(m.t, rule=_x4dot)
def _x5dot(m,i):
if i==0:
return Constraint.Skip
return m.x5dot[i] == m.y5*m.k3*m.x1[i]*m.x4[i]/(m.k4+m.x1[i])
m.x5dotcon = Constraint(m.t, rule=_x5dot)
def _x6dot(m,i):
if i==0:
return Constraint.Skip
return m.x6dot[i] == m.k5*m.x3[i]*m.x6[i]/(m.k6+m.x3[i]) - m.k9*m.x6[i]*m.x7[i]
m.x6dotcon = Constraint(m.t, rule=_x6dot)
def _x7dot(m,i):
if i==0:
return Constraint.Skip
return m.x7dot[i] == m.y6*m.k5*m.x3[i]*m.x6[i]/(m.k6+m.x3[i])
m.x7dotcon = Constraint(m.t, rule=_x7dot)
def _obj(m):
return sum((m.x1[i]-m.x1_meas[i])**2+(m.x2[i]-m.x2_meas[i])**2+(m.x3[i]-m.x3_meas[i])**2+(m.x4[i]-m.x4_meas[i])**2+(m.x5[i]-m.x5_meas[i])**2+(m.x6[i]-m.x6_meas[i])**2+(m.x7[i]-m.x7_meas[i])**2 for i in m.MEAS_t)
m.obj = Objective(rule=_obj)
m.pprint()
instance = m.create_instance('exp.dat')
instance.t.pprint()
discretizer = TransformationFactory('dae.collocation')
discretizer.apply_to(instance,nfe=30)#,ncp=3)
solver=SolverFactory('ipopt')
results = solver.solve(instance,tee=True)
However, I am trying to run the same estimation routine in another experimental data that have missing values at the end of one or maximum two time series of some of the dynamical variables.
In other words, these complete experimental data looks like (in the .dat file):
set t := 0 6 12 18 24 30 36 42 48 54 60 66 72 84 96 120 144;
set MEAS_t := 0 6 12 18 24 30 36 42 48 54 60 66 72 84 96 120 144;
param x1_meas :=
0 51.963
6 43.884
12 24.25
18 26.098
24 11.871
30 4.607
36 1.714
42 4.821
48 5.409
54 3.701
60 3.696
66 1.544
72 4.428
84 1.086
96 2.337
120 2.837
144 3.486
;
param x2_meas :=
0 6.289
6 6.242
12 7.804
18 7.202
24 6.48
30 5.833
36 6.644
42 5.741
48 4.568
54 4.252
60 5.603
66 5.167
72 4.399
84 4.773
96 4.801
120 3.866
144 3.847
;
param x3_meas :=
0 0
6 2.97
12 9.081
18 9.62
24 6.067
30 11.211
36 16.213
42 10.215
48 20.106
54 22.492
60 5.637
66 5.636
72 13.85
84 4.782
96 9.3
120 4.267
144 7.448
;
param x4_meas :=
0 6.799
6 7.73
12 7.804
18 8.299
24 8.208
30 8.523
36 8.507
42 8.656
48 8.49
54 8.474
60 8.203
66 8.127
72 8.111
84 8.064
96 6.845
120 6.721
144 6.162
;
param x5_meas :=
0 0
6 0.267
12 0.801
18 1.256
24 1.745
30 5.944
36 3.246
42 7.787
48 7.991
54 6.943
60 8.593
66 8.296
72 6.85
84 8.021
96 7.667
120 7.209
144 8.117
;
param x6_meas :=
0 4.08
6 4.545
12 4.784
18 4.888
24 5.293
30 5.577
36 5.802
42 5.967
48 6.386
54 6.115
60 6.625
66 6.835
72 6.383
84 6.605
96 5.928
120 5.354
144 4.975
;
param x7_meas :=
0 0
6 0.152
12 1.616
18 0.979
24 4.033
30 5.121
36 2.759
42 3.541
48 4.278
54 4.141
60 6.139
66 3.219
72 5.319
84 4.328
96 3.621
120 4.208
144 5.93
;
While one of my incomplete data sets could have all time series complete, but one like this:
param x6_meas :=
0 4.08
6 4.545
12 4.784
18 4.888
24 5.293
30 5.577
36 5.802
42 5.967
48 6.386
54 6.115
60 6.625
66 6.835
72 6.383
84 6.605
96 5.928
120 5.354
144 .
;
I have knowledge that one can specify to Pyomo to take the derivative of certain variables with respect to a different time serie. However, after tried it, it hadn't worked, and I guess that is because that these are coupled ODE. So basically my question is if there is a way to overcome this issue in Pyomo.
Thanks in advance.
I think all you need to do is slightly modify your objective function like this:
def _obj(m):
sum1 = sum((m.x1[i]-m.x1_meas[i])**2 for i in m.MEAS_t if i in m.x1_meas.keys())
sum2 = sum((m.x2[i]-m.x2_meas[i])**2 for i in m.MEAS_t if i in m.x2_meas.keys())
sum3 = sum((m.x3[i]-m.x3_meas[i])**2 for i in m.MEAS_t if i in m.x3_meas.keys())
sum4 = sum((m.x4[i]-m.x4_meas[i])**2 for i in m.MEAS_t if i in m.x4_meas.keys())
sum5 = sum((m.x5[i]-m.x5_meas[i])**2 for i in m.MEAS_t if i in m.x5_meas.keys())
sum6 = sum((m.x6[i]-m.x6_meas[i])**2 for i in m.MEAS_t if i in m.x6_meas.keys())
sum7 = sum((m.x7[i]-m.x7_meas[i])**2 for i in m.MEAS_t if i in m.x7_meas.keys())
return sum1+sum2+sum3+sum4+sum5+sum6+sum7
m.obj = Objective(rule=_obj)
This double checks that i is a valid index for each set of measurements before adding that index to the sum. If you knew apriori which measurement sets were missing data then you could simplify this function by only doing this check on those sets and summing over the others like you were before.