Pyomo optimization Investments/Revenue - python

I'm new to Pyomo and I'm trying to optimise investments depending on budgets.
I have a total budget, and I want to find the best way to split the budget on the different medias.
eg: total_budget = 5000 --> tv = 3000, cinema = 500, radio = 1500.
I'm struggling "connecting" a Budget with a corresponding Revenue.
The medias have different return curves (It might be better to invest in a specific media until a certain budget is reached, then other medias).
The revenue for the different media is returned by a function like the following: tv_1k_revenue = calculate_revenue(budget=1000, media="tv")
Let say the only constraint I have is the total budget to simplify the problem (I can manage other constraints I think).
Here is my code so far:
model = pyo.ConcreteModel(doc="Optimization model")
# Declaration of possible budgets
model.S1 = Set(initialize=[*df.TV_Budget.values])
model.tv_budget = Var(model.S1, initialize=0.0)
model.S2 = Set(initialize=[*df.Cinema_Budget.values])
model.cinema_budget = Var(model.S2, initialize=0.0)
model.S3 = Set(initialize=[*df.Radio_Budget.values])
model.radio_budget = Var(model.S3, initialize=0.0)
# Objective function
def func_objective(model):
objective_expr = sum(model.tv_revenue +
model.cinema_revenue +
model.radio_revenue)
return objective_expr
model.objective = pyo.Objective(rule=func_objective, sense=pyo.maximize)
So my problem is, how do I declare model.tv_revenue, model.cinema_revenue, model.radio_revenue so I can optimise TV, Cinema and Radio budgets to maximize the total revenue generated by TV, Cinema, Radio?
Right now I created a DataFrame with a Budget and Revenue column for each media, but the best way should be using my calculate_revenue function and set bounds=(min_budget, max_budget) on each media budget.
Thank you for your help!

Thank you very much #AirSquid !
That's exactly it.
It does make a lot of sens to throw pandas in my case.
Also, Yes my revenue function is non-linear.
I might try to make a linear approximation and see if I can make that work.
I was going to try to declare my objective function as:
def func_objective(model):
objective_expr = sum([calculate_revenue(model.budget[media], media=media) for media in model.medias])
return objective_expr
model.objective = pyo.Objective(rule=func_objective, sense=pyo.maximize)
Would you know why I cannot declare it like this?

From what you are providing and your limited experience w/ pyomo, here's my recommendations...
You appear to have budgets and revenues, and those appear to be indexed by media type. It isn't clear what you are doing now with the indexing. So I would expect something like:
model.medias = pyo.Set(initialize=['radio', 'tv', ... ])
model.budget = pyo.Var(model.medias, domain=pyo.NonNegativeReals)
...
Throw pandas out the window. It is a great pkg, but not that helpful in setting up a model. Try something with just python dictionaries to hold your constants & parameters. (see some of my other examples if that is confusing).
The problem you will get to eventually, I'm betting, is that your revenue function is probably non-linear. Right? I would start with a simple linear approximation of it, see if you can get that model working, and then consider either making a piece-wise linear approximation or using a non-linear solver of some kind.
===================
Edit / Additional Info.
Regarding the obj function, you cannot just stuff in a reference to a non-linear function that returns a value. The objective needs to be a valid pyomo expression (linear or non-linear), comprised of model elements. I would start w/ something like this...
# media mix
import pyomo.environ as pyo
# data for linear approximations of form revenue = c1 * budget + c0
# media c0 c1
consts = { 'radio' : (4, 0.6),
'tv' : (12, 0.45)}
# a bunch of other parameters....?? limits, minimums, etc.
### MODEL
m = pyo.ConcreteModel('media mix')
### SETS
m.medias = pyo.Set(initialize=consts.keys())
### VARIABLES
m.budget = pyo.Var(m.medias, domain=pyo.NonNegativeReals)
### OBJ
m.obj = pyo.Objective(expr=sum(consts[media][1]*m.budget[media] + consts[media][0] for media in m.medias),
sense=pyo.maximize)
m.pprint()
Yields:
1 Set Declarations
medias : Size=1, Index=None, Ordered=Insertion
Key : Dimen : Domain : Size : Members
None : 1 : Any : 2 : {'radio', 'tv'}
1 Var Declarations
budget : Size=2, Index=medias
Key : Lower : Value : Upper : Fixed : Stale : Domain
radio : 0 : None : None : False : True : NonNegativeReals
tv : 0 : None : None : False : True : NonNegativeReals
1 Objective Declarations
obj : Size=1, Index=None, Active=True
Key : Active : Sense : Expression
None : True : maximize : 0.6*budget[radio] + 4 + 0.45*budget[tv] + 12
3 Declarations: medias budget obj

Related

How to implement conditional summing within Pyomo constraint

I am writing a model in Pyomo that optimizes for biomass production across ~100 different biomass types (model.Biomass = corn, wheat, wood scraps, etc.) on a county basis (model.SourceCounty). One of the constraints I am trying to write requires that the biomass output from my model equals production values that I've already obtained from another model. This other model does not have the granularity that my Pyomo model will have, however. It has biomass production values only on a regional (not county) basis (model.Zone) across more general biomass groupings (model.SimpBiomass = herbaceous biomass, woody biomass).
What I am trying to do in my constraint is sum up the biomass production decision variable (model.x) over the regions and more general biomass groupings from the other model before requiring that this sum equals the output from the other model so that my model produces a consistent result. However, what I'm learning is that the current way I've written the code (below) doesn't work because Pyomo calls constraints only once, when the value of the decision variables is yet to be solved for. Thus, my for loops with if statements just return a value of 0.
from pyomo.environ import *
# initialize model -- can rewrite as Concrete Model if that's better for what I'm trying to do
model = AbstractModel()
# initialize indices, including some manually
model.SourceCounty = Set() # county
model.Biomass = Set() # biomass type in my model
model.Year = Set(initialize=[2022, 2024, 2026, 2028, 2030, 2032, 2035, 2040]) # year
model.SimpBiomass = Set(initialize=['herbaceous biomass', 'waste biomass', 'woody biomass']) # type of feedstock resource - simplified (from separate model)
model.Zone = Set(initialize=['midwest','southeast']) # zones from separate model
# Create data import structure
data = DataPortal()
# Load indices that require data frame
data.load(filename='fips.csv', set=model.SourceCounty)
data.load(filename='Resources.csv', set=model.Biomass)
# initialize parameters
model.EERF = Param(model.SimpBiomass, model.Zone, model.Year) # biomass production from the other model that I'm trying to match in my model
model.QIJ = Param(model.SourceCounty) # mapping of county FIPS code from my model to zones from other model
model.AC = Param(model.Biomass) # mapping of specific resource type from my model into less specific from other model (values are those in SimpBiomass)
# load in parameters
data.load(filename="county_to_zone.csv", param=model.QIJ)
data.load(filename="BT16_to_zone_Resource.csv", param=model.AC)
# create decision variables (known as Var in Pyomo)
model.x = Var(model.Biomass, model.SourceCounty, model.Year, domain=PositiveReals) # feedstock production indexed by feedstock, source county, year
# I leave out the objective function for brevity
# Constraint in question
def feedstock_prod_rule(model, c, q, t):
expr2 = 0 # initialize summing variable
# for each biomass type (a) in my model, check if it belongs to a biomass category (c) from the other model
for a in model.Biomass:
if model.AC[a] == c:
# for each county (i) in my model, check if it belongs to a zone (q) from the other model
for i in model.SourceCounty:
if model.QIJ[i] == q:
# if it belongs to q and c from other model, add to expr2
expr2 += model.x[a, i, t]
# Sum of all biomass production from my model within zone q and biomass type c (expr2 at end of looping) should equal the output of the other model (EERF).
return expr2 == model.EERF[c, q, t]
# Add as constraint
model.feedstock_prod = Constraint(model.SimpBiomass, model.Zone, model.Year, rule=feedstock_prod_rule)
I need help figuring out a different way to write this constraint such that it doesn't rely on building up an expression that depends on the value of my decision variable model.x that has yet to be solved for. Is there a way to have one line of code in the return line that accomplishes the same thing?
I'm not totally sure you diagnosed the problem correctly, and without some snippets of your data files, it is too difficult to recreate. (It is also unnecessary, because even if it was re-creatable, there is a better way. :) )
In general, you are correct that you cannot embed conditional statements into constraints that depend on the value of a variable, which is unknown when the constraint is encoded into the model. However, you are using conditionals based on parameters which are fixed, so that should be OK. However, you are comparing parameter values to items in a set, which is.... a bad plan even if it worked... never tried it. The structural problem with that is that you are working with 1:1 pairings instead of labeling groups of stuff, which is leading to the conditional statements. You have a structure like:
beans : food
lettuce : food
paper : trash
Where you'd really be happier to work with groups like below and avoid the "if" statements:
food: { beans, lettuce }
trash: { paper }
So this can be done, and you can load it into an abstract model. I don't think you can do it from a .csv file, however, as I don't think there is a way to express indexed sets in .csv. You can easily do it in .yaml or .json. See the pyomo dox for more examples. You can even commingle your data sources, so you can retain your other csv's, as long as they are consistent, so you need to be aware of that. Here is a working example, that I think will clean up your model a bunch. Specifically note the indexing and groupings in the constraint at the end):
import pyomo.environ as pe
m = pe.AbstractModel()
# SETS
m.Group_names = pe.Set()
m.Items = pe.Set()
m.Groupings = pe.Set(m.Group_names, within=m.Items)
# PARAMS
m.Cost = pe.Param(m.Group_names)
# VARS
m.X = pe.Var(m.Items)
# Constraint "for each Group"
def C1(m, g_name):
return sum(m.X[i] for i in m.Groupings[g_name]) <= 10
m.C1 = pe.Constraint(m.Group_names, rule=C1)
# load data from sources
data = pe.DataPortal()
data.load(filename='cost.csv', param=m.Cost)
data.load(filename='data.yaml')
data.load(filename='items.csv', set=m.Items)
instance = m.create_instance(data)
instance.pprint()
The yaml file (others not shown, they are straightforward):
Group_names: ['Waste', 'Food', 'Critters']
Groupings:
'Waste': ['Trash', 'Coal']
'Food': ['Salad', 'Compost', 'Fish']
'Critters': ['Snails', 'Worms']
The resultant model:
3 Set Declarations
Group_names : Size=1, Index=None, Ordered=Insertion
Key : Dimen : Domain : Size : Members
None : 1 : Any : 3 : {'Waste', 'Food', 'Critters'}
Groupings : Size=3, Index=Group_names, Ordered=Insertion
Key : Dimen : Domain : Size : Members
Critters : 1 : Items : 2 : {'Snails', 'Worms'}
Food : 1 : Items : 3 : {'Salad', 'Compost', 'Fish'}
Waste : 1 : Items : 2 : {'Trash', 'Coal'}
Items : Size=1, Index=None, Ordered=Insertion
Key : Dimen : Domain : Size : Members
None : 1 : Any : 7 : {'Trash', 'Snails', 'Worms', 'Coal', 'Salad', 'Compost', 'Fish'}
1 Param Declarations
Cost : Size=2, Index=Group_names, Domain=Any, Default=None, Mutable=False
Key : Value
Food : 8.9
Waste : 4.2
1 Var Declarations
X : Size=7, Index=Items
Key : Lower : Value : Upper : Fixed : Stale : Domain
Coal : None : None : None : False : True : Reals
Compost : None : None : None : False : True : Reals
Fish : None : None : None : False : True : Reals
Salad : None : None : None : False : True : Reals
Snails : None : None : None : False : True : Reals
Trash : None : None : None : False : True : Reals
Worms : None : None : None : False : True : Reals
1 Constraint Declarations
C1 : Size=3, Index=Group_names, Active=True
Key : Lower : Body : Upper : Active
Critters : -Inf : X[Snails] + X[Worms] : 10.0 : True
Food : -Inf : X[Salad] + X[Compost] + X[Fish] : 10.0 : True
Waste : -Inf : X[Trash] + X[Coal] : 10.0 : True

PYOMO Constraints - setting constraints over indexed variables

I have been trying to get into python optimization, and I have found that pyomo is probably the way to go; I had some experience with GUROBI as a student, but of course that is no longer possible, so I have to look into the open source options.
I basically want to perform an non-linear mixed integer problem in which I will minimized a certain ratio. The problem itself is setting up a power purchase agreement (PPA) in a renewable energy scenario. Depending on the electricity generated, you will have to either buy or sell electricity acording to the PPA.
The only starting data is the generation; the PPA is the main decision variable, but I will need others. "buy", "sell", "b1" and "b2" are unknown without the PPA value. These are the equations:
Equations that rule the problem (by hand).
Using pyomo, I was trying to set up the problem as:
# Dataframe with my Generation information:
January = Data['Full_Data'][(Data['Full_Data']['Month'] == 1) & (Data['Full_Data']['Year'] == 2011)]
Gen = January['Producible (MWh)']
Time = len(Generacion)
M=100
# Model variables and definition:
m = ConcreteModel()
m.IDX = range(time)
m.PPA = Var(initialize = 2.0, bounds =(1,7))
m.compra = Var(m.IDX, bounds = (0, None))
m.venta = Var(m.IDX, bounds = (0, None))
m.b1 = Var(m.IDX, within = Binary)
m.b2 = Var(m.IDX, within = Binary)
And then, the constraint; only the first one, as I was already getting errors:
m.b1_rule = Constraint(
expr = (((Gen[i] - PPA)/M for i in m.IDX) <= m.b1[i])
)
which gives me the error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-5-5d5f5584ebca> in <module>
1 m.b1_rule = Constraint(
----> 2 expr = (((Generacion[i] - PPA)/M for i in m.IDX) <= m.b1[i])
3 )
pyomo\core\expr\numvalue.pyx in pyomo.core.expr.numvalue.NumericValue.__ge__()
pyomo\core\expr\logical_expr.pyx in pyomo.core.expr.logical_expr._generate_relational_expression()
AttributeError: 'generator' object has no attribute 'is_expression_type'
I honestly have no idea what this means. I feel like this should be a simple problem, but I am strugling with the syntax. I basically have to apply a constraint to each individual data from "Generation", there is no sum involved; all constraints are 1-to-1 contraints set so that the physical energy requirements make sense.
How do I set up the constraints like this?
Thank you very much
You have a couple things to fix. First, the error you are getting is because you have "extra parenthesis" around an expression that python is trying to convert to a generator. So, step 1 is to remove the outer parenthesis, but that will not solve your issue.
You said you want to generate this constraint "for each" value of your index. Any time you want to generate copies of a constraint "for each" you will need to either do that by making a constraint list and adding to it with some kind of loop, or use a function-rule combination. There are examples of each in the pyomo documentation and plenty on this site (I have posted a ton if you look at some of my posts.) I would suggest the function-rule combo and you should end up with something like:
def my_constr(m, i):
return m.Gen[i] - m.PPA <= m.b1[i] * M
m.C1 = Constraint(m.IDX, rule=my_constr)

How do I setup an objective function in CPLEX Python containing indicator functions?

The following is the objective function:
The idea is that a mean-variance optimization has already been done on a universe of securities. This gives us the weights for a target portfolio. Now suppose the investor already is holding a portfolio and does not want to change their entire portfolio to the target one.
Let w_0 = [w_0(1),w_0(2),...,w_0(N)] be the initial portfolio, where w_0(i) is the fraction of the portfolio invested in
stock i = 1,...,N. Let w_t = [w_t(1), w_t(2),...,w_t(N)] be the target portfolio, i.e., the portfolio
that it is desirable to own after rebalancing. This target portfolio may be constructed using quadratic optimization techniques such as variance minimization.
The objective is to decide the final portfolio w_f = [w_f (1), w_f (2),..., w_f(N)] that satisfies the
following characteristics:
(1) The final portfolio is close to our target portfolio
(2) The number of transactions from our initial portfolio is sufficiently small
(3) The return of the final portfolio is high
(4) The final portfolio does not hold many more securities that our initial portfolio
An objective function which is to be minimized is created by summing together the characteristic terms 1 through 4.
The first term is captured by summing the absolute difference in weights from the final and the target portfolio.
The second term is captured by the sum of an indicator function multiplied by a user specified penalty. The indicator function is y_{transactions}(i) where it is 1 if the weight of security i is different in the initial portfolio and the final portfolio, and 0 otherwise.
The third term is captured by the total final portfolio return multiplied by a negative user specified penalty since the objective is minimization.
The final term is the count of assets in the final portfolio (ie. sum of an indicator function counting the number of positive weights in the final portfolio), multiplied by a user specified penalty.
Assuming that we already have the target weights as target_w how do I setup this optimization problem in docplex python library? Or if anyone is familiar with mixed integer programming in NAG it would be helpful to know how to setup such a problem there as well.
`
final_w = [0.]*n
final_w = np.array(final_w)
obj1 = np.sum(np.absolute(final_w - target_w))
pen_trans = 1.2
def ind_trans(final,inital):
list_trans = []
for i in range(len(final)):
if abs(final[i]-inital[i]) == 0:
list_trans.append(0)
else:
list_trans.append(1)
return list_trans
obj2 = pen_trans*sum(ind_trans(final_w,initial_w))
pen_returns = 0.6
returns_np = np.array(df_secs['Return'])
obj3 = (-1)*np.dot(returns_np,final_w)
pen_count = 1.
def ind_count(final):
list_count = []
for i in range(len(final)):
if final[i] == 0:
list_count.append(0)
else:
list_count.append(1)
return list_count
obj4 = sum(ind_count(final_w))
objective = obj1 + obj2 + obj3 + obj4
The main issue in your code is that final_w is not a an array of variables but an array of data. So there will be nothing to optimize. To create an array of variables in docplex you have to do something like this:
from docplex.mp.model import Model
with Model() as m:
final = m.continuous_var_list(n, 0.0, 1.0)
That creates n variables that can take values between 0 and 1. With that in hand you can start things. For example:
obj1 = m.sum(m.abs(initial[i] - final[i]) for i in range(n))
For the next objective things become harder since you need indicator constraints. To simplify definition of these constraints first define a helper variable delta that gives the absolute difference between stocks:
delta = m.continuous_var_list(n, 0.0, 1.0)
m.add_constraints(delta[i] == m.abs(initial[i] - final[i]) for i in range(n))
Next you need an indicator variable that is 1 if a transaction is required to adjust stock i:
needtrans = m.binary_var_list(n)
for i in range(n):
# If needtrans[i] is 0 then delta[i] must be 0.
# Since needtrans[i] is penalized in the objective, the solver will
# try hard to set it to 0. It will only set it to 1 if delta[i] != 0.
# That is exactly what we want
m.add_indicator(needtrans[i], delta[i] == 0, 0)
With that you can define the second objective:
obj2 = pen_trans * m.sum(needtrans)
once all objectives have been defined, you can add their sum to the model:
m.minimize(obj1 + obj2 + obj3 + obj4)
and then solve the model and display its solution:
m.solve()
print(m.solution.get_values(final))
If any of the above is not (yet) clear to you then I suggest you take a look at the many examples that ship with docplex and also at the (reference) documentation.

Pyomo (v5.2) python(v3.7) script solving concrete model throws ValueError: No objective defined; cannot write legal LP file

I've built a pyomo concrete model and am trying to run it using a python script (Pyomo version 5.2 with python 3.7). When I try solving the model by running:
opt = SolverFactory('glpk')
results = opt.solve(model)
results.write()
then I get this error:
ValueError: ERROR: No objectives defined for input model; cannot
write legal LP file
Interstingly I know the objective rule works. When I run Objective_rule(model) I get a value back, and I can also manually change the model variables by assigning them different values, such as:
model.system_capacity = 1000
with the return value from the objective rule changing in response.
Any thoughts? I'm pretty new to pyomo and Algebraic Modeling Languages (AMLs) in general.
Here's my model, simplified:
# pyomo model for fitting historical solar generation profile to PVWatts simulation
# initialize input parameters
system_capacity_init = 1.0
lims = [0.2, 3.0]
system_capacity_bounds = (system_capacity_init * lims[0], system_capacity_init * lims[1])
# define and initialize integer parameters
# module type
module_type_dict = {0: 'Standard', 1: 'Premium', 2: 'ThinFilm'}
module_type_vals = list(module_type_dict.keys())
module_type_index_init = 0
# initialize pyomo concrete model
model = ConcreteModel()
# define continuous variables
model.system_capacity = Var(initialize=system_capacity_init,
bounds=system_capacity_bounds,
domain=PositiveReals)
# define integer variables
# module type
model.module_type_vals = Set(initialize=module_type_vals)
model.module_type = Var(initialize=module_type_vals[module_type_index_init],
within=model.module_type_vals)
# define objective function
def Objective_rule(model):
"""get hourly modeled solar roduction from PVWatts5 simulation tool using hourly historical solar insolation
defined in filename_solar comparing against hourly historical date in hourlysettlementdata"""
system_capacity = value(model.system_capacity)
module_type = value(model.module_type)
hourlypvwatts = sf.gethourlysolarproduction(filename_solar, folder,
system_capacity, module_type)
leastsquared = np.sum((hourlypvwatts[:8760] - hourlysettlementdata[:8760])**2)
return float(leastsquared)
# when passed function named as _rule then automatically assigns as rule, defaulte sense=minimize
model.Objective = Objective()
# examine model
model.pprint()
And here's the pprint / model declaration (again I cut out some variables so it is shorter...):
1 Set Declarations
module_type_vals : Dim=0, Dimen=1, Size=3, Domain=None, Ordered=False, Bounds=(0, 2)
[0, 1, 2]
2 Var Declarations
module_type : Size=1, Index=None
Key : Lower : Value : Upper : Fixed : Stale : Domain
None : 0 : 0 : 2 : False : False : module_type_vals
system_capacity : Size=1, Index=None
Key : Lower : Value : Upper : Fixed : Stale : Domain
None : 12.0 : 60.0 : 180.0 : False : False : PositiveReals
1 Objective Declarations
Objective : Size=0, Index=None, Active=True
Key : Active : Sense : Expression
12 Declarations: system_capacity dc_ac_ratio inv_eff losses tilt azimuth gcr module_type_vals module_type array_type_vals array_type Objective
Thanks for any help.
As mentioned in the comment, you didn't pass in the objective function rule when declaring your objective function. Your objective function declaration should be:
model.Objective = Objective(rule=Objective_rule)
The other issue I see is that your objective function rule doesn't look like it depends on any of the variables in the model and will be a fixed float value the way it is currently written. Remember that the objective function rule is not a call-back function that is evaluated multiple times by the solver. The objective function rule should return an algebraic expression involving Pyomo variables.
I was going to post this as a comment to #bethany-nicholson, but there is enough additional information to warrant an additional answer.
First, the answer about returning an expression instead of a computed value (i.e., rules generate expressions and are not callbacks) is spot-on and correct.
The remainder of this answer targets the use of "implicit rules". While older versions of Pyomo supported "implicit rules" (i.e., a component foo automatically looking for and using a foo_rule() if no rule was explicitly provided), that behavior was deprecated in Pyomo 4.0, and the developers threatened to remove it completely in Pyomo 5.0 (although as of 5.5 the functionality - with deprecation warning - is still there).
That said, it appears that implicit rules have never worked for Objective components (the implicit rule functionality was built into the Block component, and relied on rules being stored as a _rule attribute in the Component ... and Objective uses rule instead of _rule). This fragility is in part why support for implicit rules was deprecated.
If you are looking for a concise notation for specifying models, you might consider the component decorator notation:
#model.Objective()
def obj(m):
return # expression defining the objective
This creates an Objective component and attaches it to the model object using the name obj, assigning the obj() function as the component's rule.

Mapping and iterating nested dictionaries

I am not too familiar with python but have a working understanding of the basics. I believe that I need dictionaries, but what I am currently doing is not working and likely very ineffective time-wise.
I am trying to create a cross matrix that links reviews between users given: the list of reviewers, their individual reviews, metadata related to the reviews.
NOTE : This is written in Python 2.7.10 - I cannot use Python 3 because outdated systems this will be run on, yada yada.
For initialization I have the following:
print '\nCompiling Review Maps... ';
LbidMap = {};
TbidMap = {};
for user in reviewer_idx :
for review in data['Reviewer Reviews'][user] :
reviewInfo = data['Review Information'][review];
stars = float(reviewInfo['stars']);
bid = reviewInfo['business_id'];
# Initialize lists where necessary
# !!!! I know this is probably not effective, but am unsure of
# a better method. Open to suggestions !!!!!
if bid not in LbidMap:
LbidMap[bid] = {};
TbidMap[bid] = {};
if stars not in LbidMap[bid] :
LbidMap[bid][stars] = {};
if user not in TbidMap[bid] :
TbidMap[bid][user] = {};
# Track information on ratings to each business
LbidMap[bid][stars][user] = review;
TbidMap[bid][user][review] = stars;
(where 'bid' is short for "Business ID", pos_list is an input given by user at runtime)
I then go on and try to create a mapping of users who gave a "positive" review to a business T who also gave business L a rating of X (e.g., 5 people rated business L 4/5 stars, how many of those people also gave a "positive" review to business T?)
For mapping I have the following:
# Determine and map all users who rated business L as rL
# and gave business T a positive rating
print '\nCross matching ratings across businesses';
cross_TrL = [];
for Tbid in TbidMap :
for Lbid in LbidMap :
# Ensure T and L aren't the same business
if Tbid != Lbid :
for stars in LbidMap[Lbid] :
starSum = len(LbidMap[Lbid][stars]);
posTbid = 0;
for user in LbidMap[Lbid][stars] :
if user in TbidMap[Tbid] :
rid = LbidMap[Lbid][stars][user];
print 'Tbid:%s Lbid:%s user:%s rid:%s'%(Tbid, Lbid, user, rid);
reviewRate = TbidMap[Tbid][user][rid];
# If true, then we have pos review for T from L
if reviewRate in pos_list :
posTbid += 1;
numerator = posTbid + 1;
denominator = starSum + 1;
probability = float(numerator) / denominator;
I currently receive the following error (print out of current vars also provided):
Tbid:OlpyplEJ_c_hFxyand_Wxw Lbid:W0eocyGliMbg8NScqERaiA user:Neal_1EVupQKZKv3NsC2DA rid:TAIDnnpBMR16BwZsap9uwA
Traceback (most recent call last):
File "run_edge_testAdvProb.py", line 90, in <module>
reviewRate = TbidMap[Tbid][user][rid];
KeyError: u'TAIDnnpBMR16BwZsap9uwA'
So, I know the KeyError is on what should be the rid (review ID) at that particular moment within TbidMap, however it seems to me that the Key was somehow not included within the first code block of initialization.
What am I doing wrong? Additionally, suggestions on how to improve clock cycles on the second code block is welcomed.
EDIT: I realized that I was trying to locate rid of Tbid using the rid from Lbid, however rid is unique to each review so you would not have a Tbid.rid == Lbid.rid.
Updated the second code block, as such:
cross_TrL = [];
for Tbid in TbidMap :
for Lbid in LbidMap :
# Ensure T and L aren't the same business
if Tbid != Lbid :
# Get numer of reviews at EACH STAR rate for L
for stars in LbidMap[Lbid] :
starSum = len(LbidMap[Lbid][stars]);
posTbid = 0;
# For each review check if user rated the Tbid
for Lreview in LbidMap[Lbid][stars] :
user = LbidMap[Lbid][stars][Lreview];
if user in TbidMap[Tbid] :
# user rev'd Tbid, get their Trid
# and see if they gave Tbid a pos rev
for Trid in TbidMap[Tbid][user] :
# Currently this does not account for multiple reviews
# given by the same person. Just want to get this
# working and then I'll minimize this
Tstar = TbidMap[Tbid][user][Trid];
print 'Tbid:%s Lbid:%s user:%s Trid:%s'%(Tbid, Lbid, user, Trid);
if Tstar in pos_list :
posTbid += 1;
numerator = posTbid + 1;
denominator = starSum + 1;
probability = float(numerator) / denominator;
evaluation = {'Tbid':Tbid, 'Lbid':Lbid, 'star':stars, 'prob':probability}
cross_TrL.append(evaluation);
Still slow, but I no longer receive the error.

Categories

Resources