PYOMO Constraints - setting constraints over indexed variables - python

I have been trying to get into python optimization, and I have found that pyomo is probably the way to go; I had some experience with GUROBI as a student, but of course that is no longer possible, so I have to look into the open source options.
I basically want to perform an non-linear mixed integer problem in which I will minimized a certain ratio. The problem itself is setting up a power purchase agreement (PPA) in a renewable energy scenario. Depending on the electricity generated, you will have to either buy or sell electricity acording to the PPA.
The only starting data is the generation; the PPA is the main decision variable, but I will need others. "buy", "sell", "b1" and "b2" are unknown without the PPA value. These are the equations:
Equations that rule the problem (by hand).
Using pyomo, I was trying to set up the problem as:
# Dataframe with my Generation information:
January = Data['Full_Data'][(Data['Full_Data']['Month'] == 1) & (Data['Full_Data']['Year'] == 2011)]
Gen = January['Producible (MWh)']
Time = len(Generacion)
M=100
# Model variables and definition:
m = ConcreteModel()
m.IDX = range(time)
m.PPA = Var(initialize = 2.0, bounds =(1,7))
m.compra = Var(m.IDX, bounds = (0, None))
m.venta = Var(m.IDX, bounds = (0, None))
m.b1 = Var(m.IDX, within = Binary)
m.b2 = Var(m.IDX, within = Binary)
And then, the constraint; only the first one, as I was already getting errors:
m.b1_rule = Constraint(
expr = (((Gen[i] - PPA)/M for i in m.IDX) <= m.b1[i])
)
which gives me the error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-5-5d5f5584ebca> in <module>
1 m.b1_rule = Constraint(
----> 2 expr = (((Generacion[i] - PPA)/M for i in m.IDX) <= m.b1[i])
3 )
pyomo\core\expr\numvalue.pyx in pyomo.core.expr.numvalue.NumericValue.__ge__()
pyomo\core\expr\logical_expr.pyx in pyomo.core.expr.logical_expr._generate_relational_expression()
AttributeError: 'generator' object has no attribute 'is_expression_type'
I honestly have no idea what this means. I feel like this should be a simple problem, but I am strugling with the syntax. I basically have to apply a constraint to each individual data from "Generation", there is no sum involved; all constraints are 1-to-1 contraints set so that the physical energy requirements make sense.
How do I set up the constraints like this?
Thank you very much

You have a couple things to fix. First, the error you are getting is because you have "extra parenthesis" around an expression that python is trying to convert to a generator. So, step 1 is to remove the outer parenthesis, but that will not solve your issue.
You said you want to generate this constraint "for each" value of your index. Any time you want to generate copies of a constraint "for each" you will need to either do that by making a constraint list and adding to it with some kind of loop, or use a function-rule combination. There are examples of each in the pyomo documentation and plenty on this site (I have posted a ton if you look at some of my posts.) I would suggest the function-rule combo and you should end up with something like:
def my_constr(m, i):
return m.Gen[i] - m.PPA <= m.b1[i] * M
m.C1 = Constraint(m.IDX, rule=my_constr)

Related

How to iterate over external input list in pyomo objective function?

I am trying to run a simple LP pyomo Concrete model with Gurobisolver :
import pyomo.environ as pyo
from pyomo.opt import SolverFactory
model = pyo.ConcreteModel()
nb_years = 3
nb_mins = 2
step = 8760*1.5
delta = 10000
#Range of hour
model.h = pyo.RangeSet(0,8760*nb_years-1)
#Individual minimums
model.min = pyo.RangeSet(0, nb_mins-1)
model.mins = pyo.Var(model.min, within=model.h, initialize=[i for i in model.min])
def maximal_step_between_mins_constraint_rule(model, min):
next_min = min + 1 if min < nb_mins-1 else 0
if next_min == 0: # We need to take circularity into account
return 8760*nb_years - model.mins[min] + model.mins[next_min] <= step + delta
return model.mins[next_min] - model.mins[min] <= step + delta
def minimal_step_between_mins_constraint_rule(model, min):
next_min = min + 1 if min < nb_mins-1 else 0
if next_min == 0: # We need to take circularity into account
return 8760*nb_years - model.mins[min] + model.mins[next_min] >= step - delta
return model.mins[next_min] - model.mins[min] >= step - delta
model.input_list = pyo.Param(model.h, initialize=my_input_list, within=pyo.Reals, mutable=False)
def objective_rule(model):
return sum([model.input_list[model.mins[min]] for min in model.min])
model.maximal_step_between_mins_constraint= pyo.Constraint(model.min, rule=maximal_step_between_mins_constraint_rule)
model.minimal_step_between_mins_constraint= pyo.Constraint(model.min, rule=minimal_step_between_mins_constraint_rule)
model.objective = pyo.Objective(rule=objective_rule, sense=pyo.minimize)
opt = SolverFactory('gurobi')
results = opt.solve(model, options={'Presolve':2})
Basically I am trying to find two hours in my input list (which looks like this) spanning over 3 years of data, with constraints on the distance separating them, and where the sum of both value is minimized by the model.
I implemented my list as a parameter of fixed value, however even if mutable is set to False running my model produces this error :
ERROR: Rule failed when generating expression for Objective objective with
index None: RuntimeError: Error retrieving the value of an indexed item
input_list: index 0 is not a constant value. This is likely not what you
meant to do, as if you later change the fixed value of the object this
lookup will not change. If you understand the implications of using non-
constant values, you can get the current value of the object using the
value() function.
ERROR: Constructing component 'objective' from data=None failed: RuntimeError:
Error retrieving the value of an indexed item input_list: index 0 is not a
constant value. This is likely not what you meant to do, as if you later
change the fixed value of the object this lookup will not change. If you
understand the implications of using non-constant values, you can get the
current value of the object using the value() function.
Any idea why I get this error and how to fix it ?
Obviously, changing the objective function to sum([pyo.value(model.input_list[model.mins[min]]) for min in model.min]) is not a solution to my problem.
I also tried not to use pyomo parameters (with something like sum([input_list[model.mins[min]] for min in model.min]), but pyomo can't iterate over it and raises the following error :
ERROR: Constructing component 'objective' from data=None failed: TypeError:
list indices must be integers or slices, not _GeneralVarData
You have a couple serious syntax and structure problems in your model. Not all of the elements are included in the code you provide, but you (minimally) need to fix these:
In this snippet, you are initializing the value of each variable to a list, which is invalid. Start with no variable initializations:
model.mins = pyo.Var(model.min, within=model.h, initialize=[i for i in model.min])
In this summation, you appear to be using a variable as the index for some data. This is an invalid construct. The value of the variable is unkown when the model is built. You need to reformulate:
return sum([model.input_list[model.mins[min]] for min in model.min])
My suggestion: Start with a very small chunk of your data and pprint() your model and read it carefully for quality before you attempt to solve.
model.pprint()

KeyError in a dictionary when in the previous iteration of a while loop the key:value pair should have been created [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I am using a while loop to integrate various quantities from the surface of a star inwards using appropriate boundary conditions and stellar structure equations.
I am using dictionaries to represent physical variables such as pressure and density, where the plan is for the radii to be keys, and the value to be the pressure or density.
I have a key:value pair for the surface, and then I step inwards iteratively using a while loop updating the dictionaries as below:
import constants
import math
import matplotlib.pyplot as plt
mass=3*constants.solar_mass
radius=1.5*constants.solar_radius
#Variables to guess
core_temperature=1.4109*10**7
core_pressure= 2.6851*10**14
luminosity=pow(3,3.5)*constants.solar_luminosity
#Functions we are searching for
temperature={}
#guess
temperature[0]=core_temperature
#From the steffan boltzmann law
temperature[radius]=pow(luminosity/(4*math.pi*pow(radius,2)*constants.stefan_boltzmann_constant),0.25)
pressure={}
#guess
pressure[0]=core_pressure
#Pressure surface boundary condition
pressure[radius]=(2*constants.gravitation_constant*mass)/(3*constants.opacity*pow(radius,2))
mass_enclosed={}
#boundary conditions
mass_enclosed[0]=0
mass_enclosed[radius]=mass
density={}
#density surface boundary condition
density[radius]=(constants.mean_molecular_weight*pressure[radius])/(constants.gas_constant*temperature[radius])
delta_radius=int(radius/100)
#Polytropic constant
K=(pressure[radius]*constants.mean_molecular_weight)/(constants.gas_constant*pow(density[radius],constants.adiabatic_constant))
def integrate_from_surface():
i=0
while radius-i*delta_radius>(0.5*radius):
#temporary radius just for each loop through
r=radius-i*delta_radius
#updating pressure
pressure[r-delta_radius]=pressure[r]+(density[r]*constants.gravitation_constant*mass_enclosed[r]*delta_radius)/pow(r,2)
#updating density
density[r-delta_radius]=pow((pressure[r-delta_radius]*constants.mean_molecular_weight)/(constants.gas_constant*K),1.0/constants.adiabatic_constant)
#updating mass enclosed
mass_enclosed[r-delta_radius]=mass_enclosed[r]-4*math.pi*pow(r,2)*delta_radius*density[r]
i=i+1
integrate_from_surface()
While Loop: Radius and dictionaries are defined above
I am getting a KeyError, as shown below:
Traceback (most recent call last):
File "main.py", line 63, in <module>
integrate_from_surface()
File "main.py", line 51, in integrate_from_surface
pressure[r-delta_radius]=pressure[r]+(density[r]*constants.gravitation_constant*mass_enclosed[r]*delta_radius)/pow(r,2)
KeyError: 1043966868.09
KeyError message
If I print out the variable r in the while group, the process works perfectly until r is 1043966868.09. I do not understand, surely on the previous iteration I made this a key, so there should be no KeyError.
Constants file below:
solar_mass=1.9891*10**30
solar_radius=6.9598*10**8
solar_luminosity=3.8515*10**26
gas_constant=8.3145*10**3
gravitation_constant=6.6726*10**-11
radiation_constant=7.5646*10**-16
speed_of_light=2.9979*10**8
stefan_boltzmann_constant= radiation_constant*speed_of_light * 0.25
opacity=0.034
adiabatic_constant = 5.0/3
mean_molecular_weight = 8.0/13
Thanks in advance for any help.
As I stated in the comments, this behaviour is probably linked to the difficulty of hashing floats. In the end, using floats as dictionary keys is only as precise as your float precision. There are already extensive articles about the consequences and mechanisms at work here, for instance this one.
As an example (credits to the aforementioned article), you can see that hash(10-9.8) == hash(.2) returns False : using such keys in a dictionnary would create two entries.
A workaround (if you want to stick to float keys) would be to first evaluate all possible keys, then reuse those as many time as needed.
There is another good reason for this approch, as you will have to rewrite your "while loop" and replace it with a "for loop" : while loops are slower than for loops (more hints in here).
In your case, you could evaluate your steps this way :
radiuses = [radius-i*delta_radius for i in range(1,50)]
Afterwards, switching the loop iteration is very straightforward (for r in radiuses:). And more to the point, your code won't raise any exception.
Note that you could also store all your datas in a pandas.DataFrame, combining the way of accessing you steps by order (using .iloc) or by radius (using .loc).
That could be used this way (note that I'm storing all intermediate results in lists rather than dictionaries, this is in relation with the pandas.DataFrame constructor ) :
import math
import pandas as pd
SOLAR_MASS = 1.9891*10**30
SOLAR_RADIUS = 6.9598*10**8
SOLAR_LUMINOSITY = 3.8515*10**26
RADIATION_CONSTANT = 7.5646*10**-16
SPEED_OF_LIGHT = 2.9979*10**8
STEFAN_BOLTZMANN = RADIATION_CONSTANT * SPEED_OF_LIGHT * 0.25
OPACITY = 0.034
GRAVITATION_CONSTANT = 6.6726*10**-11
MEAN_MOLECULAR_WEIGHT = 8.0/13
GAS_CONSTANT = 8.3145*10**3
ADIABATIC_CONSTANT = 5.0/3
mass=3*SOLAR_MASS
radius=1.5*SOLAR_RADIUS
luminosity=(3**3.5)*SOLAR_LUMINOSITY
TEMPERATURE_R = (luminosity/(4*math.pi*(radius**2)*STEFAN_BOLTZMANN))**0.25
PRESSURE_R = (2*GRAVITATION_CONSTANT*mass)/(3*OPACITY*(radius**2))
MASS_R = mass
DENSITY_R = (MEAN_MOLECULAR_WEIGHT*PRESSURE_R)/(GAS_CONSTANT*TEMPERATURE_R)
K=(PRESSURE_R*MEAN_MOLECULAR_WEIGHT)/(GAS_CONSTANT*DENSITY_R**ADIABATIC_CONSTANT)
def integrate_from_surface_df():
delta_radius=int(radius/100)
pressure=[PRESSURE_R]
mass_enclosed=[MASS_R]
density=[DENSITY_R]
radiuses = [radius-i*delta_radius for i in range(1,50)]
for r in radiuses:
pressure.append(pressure[-1]+(density[-1]*GRAVITATION_CONSTANT*mass_enclosed[-1]*delta_radius)/r**2)
density.append(((pressure[-2]*MEAN_MOLECULAR_WEIGHT)/(GAS_CONSTANT*K))**(1.0/ADIABATIC_CONSTANT))
mass_enclosed.append(mass_enclosed[-1]-4*math.pi*r**2*delta_radius*density[-2])
df = pd.DataFrame({"pressure":pressure, "density":density, "mass_enclosed":mass_enclosed}, index=[radius]+radiuses)
return df
print(integrate_from_surface_df())
Which would return :
pressure density mass_enclosed
1.043970e+09 7.163524e+03 0.000043 5.967300e+30
1.033530e+09 1.743478e+05 0.000043 5.967300e+30
1.023091e+09 3.449615e+05 0.000292 5.967300e+30
1.012651e+09 1.527176e+06 0.000439 5.967300e+30
1.002211e+09 3.344824e+06 0.001072 5.967300e+30
9.917715e+08 7.876679e+06 0.001716 5.967300e+30
9.813318e+08 1.528565e+07 0.002870 5.967300e+30
9.708921e+08 2.793971e+07 0.004271 5.967299e+30
9.604524e+08 4.718766e+07 0.006134 5.967299e+30
9.500127e+08 7.543907e+07 0.008400 5.967298e+30
9.395730e+08 1.149941e+08 0.011132 5.967297e+30
9.291333e+08 1.685944e+08 0.014335 5.967296e+30
9.186936e+08 2.391984e+08 0.018035 5.967294e+30
9.082539e+08 3.300760e+08 0.022246 5.967292e+30
8.978142e+08 4.447982e+08 0.026988 5.967290e+30
8.873745e+08 5.872670e+08 0.032278 5.967287e+30
8.769348e+08 7.617399e+08 0.038133 5.967284e+30
8.664951e+08 9.728621e+08 0.044575 5.967280e+30
8.560554e+08 1.225701e+09 0.051622 5.967276e+30
8.456157e+08 1.525790e+09 0.059297 5.967271e+30
8.351760e+08 1.879167e+09 0.067624 5.967265e+30
8.247363e+08 2.292433e+09 0.076627 5.967259e+30
8.142966e+08 2.772804e+09 0.086334 5.967253e+30
8.038569e+08 3.328175e+09 0.096773 5.967245e+30
7.934172e+08 3.967188e+09 0.107976 5.967237e+30
7.829775e+08 4.699315e+09 0.119976 5.967229e+30
7.725378e+08 5.534939e+09 0.132808 5.967219e+30
7.620981e+08 6.485454e+09 0.146512 5.967209e+30
7.516584e+08 7.563373e+09 0.161127 5.967198e+30
7.412187e+08 8.782447e+09 0.176699 5.967187e+30
7.307790e+08 1.015780e+10 0.193274 5.967174e+30
7.203393e+08 1.170609e+10 0.210904 5.967161e+30
7.098996e+08 1.344566e+10 0.229642 5.967147e+30
6.994599e+08 1.539674e+10 0.249548 5.967133e+30
6.890202e+08 1.758168e+10 0.270684 5.967117e+30
6.785805e+08 2.002515e+10 0.293117 5.967101e+30
6.681408e+08 2.275445e+10 0.316921 5.967083e+30
6.577011e+08 2.579981e+10 0.342172 5.967066e+30
6.472614e+08 2.919473e+10 0.368956 5.967047e+30
6.368217e+08 3.297638e+10 0.397363 5.967027e+30
6.263820e+08 3.718607e+10 0.427491 5.967007e+30
6.159423e+08 4.186974e+10 0.459445 5.966985e+30
6.055026e+08 4.707856e+10 0.493339 5.966963e+30
5.950629e+08 5.286959e+10 0.529296 5.966940e+30
5.846232e+08 5.930656e+10 0.567451 5.966917e+30
5.741835e+08 6.646075e+10 0.607948 5.966892e+30
5.637438e+08 7.441197e+10 0.650945 5.966867e+30
5.533041e+08 8.324981e+10 0.696611 5.966841e+30
5.428644e+08 9.307487e+10 0.745135 5.966814e+30
5.324247e+08 1.040004e+11 0.796717 5.966786e+30
There is also an other workaround, which you already guessed by yourself : that would be to use integers as keys in your dictionary. I don't think this to be awkward in any way, but I'd rather have keys with a real, direct meaning (that is, your radius) rather than the iteration step... But this is really up to you.

Unknown result in z3 python for Int type

I was trying to solve certain set of constraints using z3 in python. My code:
import math
from z3 import *
### declaration
n_co2 = []
c_co2 = []
alpha = []
beta = []
m_dot_air = []
n_pir = []
pir_sensor = []
for i in range(2):
c_co2.append(Real('c_co2_'+str(i)))
n_pir.append(Real('n_pir_'+str(i)))
n_co2.append(Real('n_co2_'+str(0)))
alpha.append(Real('alpha_'+str(0)))
beta.append(Real('beta_'+str(0)))
m_dot_air.append(Real('m_dot_air_'+str(0)))
pir_sensor.append(Real('pir_sensor_'+str(0)))
s = Solver()
s.add(n_co2[0]>0)
s.add(c_co2[0]>0)
s.add(c_co2[1]>=0.95*c_co2[0])
s.add(c_co2[1]<=1.05*c_co2[0])
s.add(n_co2[0]>=0.95*n_pir[1])
s.add(n_co2[0]<=1.05*n_pir[1])
s.add(c_co2[1]>0)
s.add(alpha[0]<=-1)
s.add(beta[0]>0)
s.add(m_dot_air[0]>0)
s.add(alpha[0]==-1*(1+ m_dot_air[0] + (m_dot_air[0]**2)/2.0 + (m_dot_air[0]**3)/6.0 ))
s.add(beta[0]== (1-alpha[0])/m_dot_air[0])
s.add(n_co2[0]== (c_co2[1]-alpha[0]*c_co2[0])/(beta[0]*19.6)-(m_dot_air[0]*339)/19.6)
s.add(n_pir[1]>=0)
s.add(pir_sensor[0]>=-1)
s.add(pir_sensor[0]<=1)
s.add(Not(pir_sensor[0]==0))
s.add(n_pir[1]==(n_pir[0]+pir_sensor[0]))
#### testing
s.add(pir_sensor[0]==1)
s.add(n_pir[1]==1)
s.add(n_co2[0]==1)
print(s.check())
print(s.reason_unknown())
print(s.model())
The output of the code:
sat
[c_co2_0 = 355,
c_co2_1 = 1841/5,
m_dot_air_0 = 1,
n_co2_0 = 1,
n_pir_1 = 1,
pir_sensor_0 = 1,
n_pir_0 = 0,
beta_0 = 11/3,
alpha_0 = -8/3,
/0 = [(19723/15, 1078/15) -> 1793/98,
(11/3, 1) -> 11/3,
else -> 0]]
What is the significance "/0 = ..." part of the output model.
But when I change the type of n_pir from Real to Int, z3 cannot solve it. Although we saw that we have an Int solution for n_pir. Reason of unknown:
smt tactic failed to show goal to be sat/unsat (incomplete (theory arithmetic))
How this problem can be solved? Could anyone please provide reasoning about this problem?
For the "/0" part: It's an internally generated constraint from converting real to int solutions. You can totally ignore that. In fact, you shouldn't really even look at the value of that, it's an artifact of the z3py bindings and should probably be hidden from the user.
For your question regarding why you cannot make 'Real' to 'Int'. That's because you have a non-linear set of equations (where you multiply or divide two variables), and non-linear integer arithmetic is undecidable in general. (Whereas non-linear real arithmetic is decidable.) So, when you use 'Int', solver simply uses some heuristics, and in this case fails and says unknown. This is totally expected. Read this answer for more details: How does Z3 handle non-linear integer arithmetic?
Z3 does come with an NRA solver, you can give that a try. Declare your solver as:
s = SolverFor("NRA")
But again you're at the mercy of the heuristics and you may or may not get a solution. Also, watch out for z3py bindings coercing constants to when you mix and match arithmetic like that. A good way is to write:
print s.sexpr()
before you call s.check() and take a look at the output and convince yourself that the translation has been done correctly. For details on that, see this question: Python and Z3: integers and floating, how to manage them in the correct way?

Inappropriate argument value (of correct type). in JES (Python/Jython)

Hey so I am just working on some coding homework for my Python class using JES. Our assignment is to take a sound, add some white noise to the background and to add an echo as well. There is a bit more exacts but I believe I am fine with that. There are four different functions that we are making: a main, an echo equation based on a user defined length of time and amount of echos, a white noise generation function, and a function to merge the noises.
Here is what I have so far, haven't started the merging or the main yet.
#put the following line at the top of your file. This will let
#you access the random module functions
import random
#White noise Generation functiton, requires a sound to match sound length
def whiteNoiseGenerator(baseSound) :
noise = makeEmptySound(getLength(baseSound))
index = 0
for index in range(0, getLength(baseSound)) :
sample = random.randint(-500, 500)
setSampleValueAt(noise, index, sample)
return noise
def multipleEchoesGenerator(sound, delay, number) :
endSound = getLength(sound)
newEndSound = endSound +(delay * number)
len = 1 + int(newEndSound/getSamplingRate(sound))
newSound = makeEmptySound(len)
echoAmplitude = 1.0
for echoCount in range (1, number) :
echoAmplitude = echoAmplitude * 0.60
for posns1 in range (0, endSound):
posns2 = posns1 + (delay * echoCount)
values1 = getSampleValueAt(sound, posns1) * echoAmplitude
values2 = getSampleValueAt(newSound, posns2)
setSampleValueAt (newSound, posns2, values1 + values2)
return newSound
I receive this error whenever I try to load it in.
The error was:
Inappropriate argument value (of correct type).
An error occurred attempting to pass an argument to a function.
Please check line 38 of C:\Users\insanity180\Desktop\Work\Winter Sophomore\CS 140\homework3\homework_3.py
That line of code is:
setSampleValueAt (newSound, posns2, values1 + values2)
Anyone have an idea what might be happening here? Any assistance would be great since I am hoping to give myself plenty of time to finish coding this assignment. I have gotten a similar error before and it was usually a syntax error however I don't see any such errors here.
The sound is made before I run this program and I defined delay and number as values 1 and 3 respectively.
Check the arguments to setSampleValueAt; your sample value must be out of bounds (should be within -32768 - 32767). You need to do some kind of output clamping for your algorithm.
Another possibility (which indeed was the error, according to further input) is that your echo will be out of the range of the sample - that is, if your sample was 5 seconds long, and echo was 0.5 seconds long; or the posns1 + delay is beyond the length of the sample; the length of the new sound is not calculated correctly.

put stockprices into groups when they are within 0.5% of each other

Thanks for the answers, I have not used StackOverflow before so I was suprised by the number of answers and the speed of them - its fantastic.
I have not been through the answers properly yet, but thought I should add some information to the problem specification. See the image below.
I can't post an image in this because i don't have enough points but you can see an image
at http://journal.acquitane.com/2010-01-20/image003.jpg
This image may describe more closely what I'm trying to achieve. So you can see on the horizontal lines across the page are price points on the chart. Now where you get a clustering of lines within 0.5% of each, this is considered to be a good thing and why I want to identify those clusters automatically. You can see on the chart that there is a cluster at S2 & MR1, R2 & WPP1.
So everyday I produce these price points and then I can identify manually those that are within 0.5%. - but the purpose of this question is how to do it with a python routine.
I have reproduced the list again (see below) with labels. Just be aware that the list price points don't match the price points in the image because they are from two different days.
[YR3,175.24,8]
[SR3,147.85,6]
[YR2,144.13,8]
[SR2,130.44,6]
[YR1,127.79,8]
[QR3,127.42,5]
[SR1,120.94,6]
[QR2,120.22,5]
[MR3,118.10,3]
[WR3,116.73,2]
[DR3,116.23,1]
[WR2,115.93,2]
[QR1,115.83,5]
[MR2,115.56,3]
[DR2,115.53,1]
[WR1,114.79,2]
[DR1,114.59,1]
[WPP,113.99,2]
[DPP,113.89,1]
[MR1,113.50,3]
[DS1,112.95,1]
[WS1,112.85,2]
[DS2,112.25,1]
[WS2,112.05,2]
[DS3,111.31,1]
[MPP,110.97,3]
[WS3,110.91,2]
[50MA,110.87,4]
[MS1,108.91,3]
[QPP,108.64,5]
[MS2,106.37,3]
[MS3,104.31,3]
[QS1,104.25,5]
[SPP,103.53,6]
[200MA,99.42,7]
[QS2,97.05,5]
[YPP,96.68,8]
[SS1,94.03,6]
[QS3,92.66,5]
[YS1,80.34,8]
[SS2,76.62,6]
[SS3,67.12,6]
[YS2,49.23,8]
[YS3,32.89,8]
I did make a mistake with the original list in that Group C is wrong and should not be included. Thanks for pointing that out.
Also the 0.5% is not fixed this value will change from day to day, but I have just used 0.5% as an example for spec'ing the problem.
Thanks Again.
Mark
PS. I will get cracking on checking the answers now now.
Hi:
I need to do some manipulation of stock prices. I have just started using Python, (but I think I would have trouble implementing this in any language). I'm looking for some ideas on how to implement this nicely in python.
Thanks
Mark
Problem:
I have a list of lists (FloorLevels (see below)) where the sublist has two items (stockprice, weight). I want to put the stockprices into groups when they are within 0.5% of each other. A groups strength will be determined by its total weight. For example:
Group-A
115.93,2
115.83,5
115.56,3
115.53,1
-------------
TotalWeight:12
-------------
Group-B
113.50,3
112.95,1
112.85,2
-------------
TotalWeight:6
-------------
FloorLevels[
[175.24,8]
[147.85,6]
[144.13,8]
[130.44,6]
[127.79,8]
[127.42,5]
[120.94,6]
[120.22,5]
[118.10,3]
[116.73,2]
[116.23,1]
[115.93,2]
[115.83,5]
[115.56,3]
[115.53,1]
[114.79,2]
[114.59,1]
[113.99,2]
[113.89,1]
[113.50,3]
[112.95,1]
[112.85,2]
[112.25,1]
[112.05,2]
[111.31,1]
[110.97,3]
[110.91,2]
[110.87,4]
[108.91,3]
[108.64,5]
[106.37,3]
[104.31,3]
[104.25,5]
[103.53,6]
[99.42,7]
[97.05,5]
[96.68,8]
[94.03,6]
[92.66,5]
[80.34,8]
[76.62,6]
[67.12,6]
[49.23,8]
[32.89,8]
]
I suggest a repeated use of k-means clustering -- let's call it KMC for short. KMC is a simple and powerful clustering algorithm... but it needs to "be told" how many clusters, k, you're aiming for. You don't know that in advance (if I understand you correctly) -- you just want the smallest k such that no two items "clustered together" are more than X% apart from each other. So, start with k equal 1 -- everything bunched together, no clustering pass needed;-) -- and check the diameter of the cluster (a cluster's "diameter", from the use of the term in geometry, is the largest distance between any two members of a cluster).
If the diameter is > X%, set k += 1, perform KMC with k as the number of clusters, and repeat the check, iteratively.
In pseudo-code:
def markCluster(items, threshold):
k = 1
clusters = [items]
maxdist = diameter(items)
while maxdist > threshold:
k += 1
clusters = Kmc(items, k)
maxdist = max(diameter(c) for c in clusters)
return clusters
assuming of course we have suitable diameter and Kmc Python functions.
Does this sound like the kind of thing you want? If so, then we can move on to show you how to write diameter and Kmc (in pure Python if you have a relatively limited number of items to deal with, otherwise maybe by exploiting powerful third-party add-on frameworks such as numpy) -- but it's not worthwhile to go to such trouble if you actually want something pretty different, whence this check!-)
A stock s belong in a group G if for each stock t in G, s * 1.05 >= t and s / 1.05 <= t, right?
How do we add the stocks to each group? If we have the stocks 95, 100, 101, and 105, and we start a group with 100, then add 101, we will end up with {100, 101, 105}. If we did 95 after 100, we'd end up with {100, 95}.
Do we just need to consider all possible permutations? If so, your algorithm is going to be inefficient.
You need to specify your problem in more detail. Just what does "put the stockprices into groups when they are within 0.5% of each other" mean?
Possibilities:
(1) each member of the group is within 0.5% of every other member of the group
(2) sort the list and split it where the gap is more than 0.5%
Note that 116.23 is within 0.5% of 115.93 -- abs((116.23 / 115.93 - 1) * 100) < 0.5 -- but you have put one number in Group A and one in Group C.
Simple example: a, b, c = (0.996, 1, 1.004) ... Note that a and b fit, b and c fit, but a and c don't fit. How do you want them grouped, and why? Is the order in the input list relevant?
Possibility (1) produces ab,c or a,bc ... tie-breaking rule, please
Possibility (2) produces abc (no big gaps, so only one group)
You won't be able to classify them into hard "groups". If you have prices (1.0,1.05, 1.1) then the first and second should be in the same group, and the second and third should be in the same group, but not the first and third.
A quick, dirty way to do something that you might find useful:
def make_group_function(tolerance = 0.05):
from math import log10, floor
# I forget why this works.
tolerance_factor = -1.0/(-log10(1.0 + tolerance))
# well ... since you might ask
# we want: log(x)*tf - log(x*(1+t))*tf = -1,
# so every 5% change has a different group. The minus is just so groups
# are ascending .. it looks a bit nicer.
#
# tf = -1/(log(x)-log(x*(1+t)))
# tf = -1/(log(x/(x*(1+t))))
# tf = -1/(log(1/(1*(1+t)))) # solved .. but let's just be more clever
# tf = -1/(0-log(1*(1+t)))
# tf = -1/(-log((1+t))
def group_function(value):
# don't just use int - it rounds up below zero, and down above zero
return int(floor(log10(value)*tolerance_factor))
return group_function
Usage:
group_function = make_group_function()
import random
groups = {}
for i in range(50):
v = random.random()*500+1000
group = group_function(v)
if group in groups:
groups[group].append(v)
else:
groups[group] = [v]
for group in sorted(groups):
print 'Group',group
for v in sorted(groups[group]):
print v
print
For a given set of stock prices, there is probably more than one way to group stocks that are within 0.5% of each other. Without some additional rules for grouping the prices, there's no way to be sure an answer will do what you really want.
apart from the proper way to pick which values fit together, this is a problem where a little Object Orientation dropped in can make it a lot easier to deal with.
I made two classes here, with a minimum of desirable behaviors, but which can make the classification a lot easier -- you get a single point to play with it on the Group class.
I can see the code bellow is incorrect, in the sense the limtis for group inclusion varies as new members are added -- even it the separation crieteria remaisn teh same, you heva e torewrite the get_groups method to use a multi-pass approach. It should nto be hard -- but the code would be too long to be helpfull here, and i think this snipped is enoguh to get you going:
from copy import copy
class Group(object):
def __init__(self,data=None, name=""):
if data:
self.data = data
else:
self.data = []
self.name = name
def get_mean_stock(self):
return sum(item[0] for item in self.data) / len(self.data)
def fits(self, item):
if 0.995 < abs(item[0]) / self.get_mean_stock() < 1.005:
return True
return False
def get_weight(self):
return sum(item[1] for item in self.data)
def __repr__(self):
return "Group-%s\n%s\n---\nTotalWeight: %d\n\n" % (
self.name,
"\n".join("%.02f, %d" % tuple(item) for item in self.data ),
self.get_weight())
class StockGrouper(object):
def __init__(self, data=None):
if data:
self.floor_levels = data
else:
self.floor_levels = []
def get_groups(self):
groups = []
floor_levels = copy(self.floor_levels)
name_ord = ord("A") - 1
while floor_levels:
seed = floor_levels.pop(0)
name_ord += 1
group = Group([seed], chr(name_ord))
groups.append(group)
to_remove = []
for i, item in enumerate(floor_levels):
if group.fits(item):
group.data.append(item)
to_remove.append(i)
for i in reversed(to_remove):
floor_levels.pop(i)
return groups
testing:
floor_levels = [ [stock. weight] ,... <paste the data above> ]
s = StockGrouper(floor_levels)
s.get_groups()
For the grouping element, could you use itertools.groupby()? As the data is sorted, a lot of the work of grouping it is already done, and then you could test if the current value in the iteration was different to the last by <0.5%, and have itertools.groupby() break into a new group every time your function returned false.

Categories

Resources