Algorithm for efficient portfolio optimization - python

I'm trying to find the best allocation for a portfolio based on backtesting data. As a general rule, I've divided stocks into large caps and small/mid caps and growth/value and want no more than 80% of my portfolio in large caps or 70% of my portfolio in value. I need an algorithm that will be flexible enough to use for more than two stocks. So far, what I have is (including a random class called Ticker):
randomBoolean=True
listOfTickers=[]
listOfLargeCaps=[]
listOfSmallMidCaps=[]
largeCapAllocation=0
listOfValue=[]
listOfGrowthBlend=[]
valueAllocation=0
while randomBoolean:
tickerName=input("What is the name of the ticker?")
tickerCap=input("What is the cap of the ticker?")
tickerAllocation=int(input("Around how much do you want to allocate in this ticker?"))
tickerValue=input("Is this ticker a Value, Growth, or Blend stock?")
tickerName=Ticker(tickerCap,tickerValue,tickerAllocation,tickerName)
listOfTickers.append(tickerName)
closer=input("Type DONE if you are finished. Type ENTER to continue entering tickers")
if closer=="DONE":
randomBoolean=False
for ticker in listOfTickers:
if ticker.cap==("Large" or "large"):
listOfLargeCaps.append(ticker)
else:
listOfSmallMidCaps.append(ticker)
if ticker.value==("Value" or "value"):
listOfValue.append(ticker)
else:
listOfGrowthBlend.append(ticker)
for largeCap in listOfLargeCaps:
largeCapAllocation +=largeCap.allocation
if largeCapAllocation>80:
#run a function that will readjust ticker stuff and decrease allocation to large cap stocks
for value in listOfValue:
valueAllocation+=value.allocation
if valueAllocation>70:
#run a function that will readjust ticker stuff and decrease allocation to value stocks
The "function" I have so far just iterates through -5 to 6 in a sort of
for i in range (-5,6):
ticker1AllocationPercent + i
ticker2AllocationPercent - i
#update the bestBalance if the new allocation is better
How would I modify this algorithm to work for 3, 4, 5, etc. stocks, and how would I go about changing the allocations for the large/small-mid cap stocks and such?

As mentioned in the above answer, typically Quadratic solver is used in such problems. You can use Quadratic solver available in Pyportfolio. See this link for more details.

Related

Iterating for multiple for loops from API in Python

self.flights_list = ApiConnector('airlabs', 'flights', 'dep_icao,arr_icao,flight_number,flag,aircraft_icao').get_data_from_api()
self.airports_list = ApiConnector('airlabs', 'airports', 'icao_code,name,lat,lng')
def get_airport_cordinates(self, airport_name):
for i in self.airports_list.get_data_from_api():
if(i.get('icao_code') == airport_name):
return i['lat'], i['lng']
def list_all_flights(self):
for i in self.flights_list.get_data_from.api():
if(i.get('dep_icao') and i.get('arr_icao')):
print(f"Flight Number is {i['flight_number']} and the airline is {i['flag']} and the aircraft is {i['aircraft_icao']} going from {i['dep_icao']} to {i['arr_icao']}");
print(f'Flight distance is {Emissions().calculate_distance(ApiResponse().get_airport_cordinates(i["dep_icao"]), ApiResponse().get_airport_cordinates(i["arr_icao"]))} km');
print(f'Flight CO2 emissions is {Emissions().calculate_co2_emissions(Emissions().calculate_distance(ApiResponse().get_airport_cordinates(i["dep_icao"]), ApiResponse().get_airport_cordinates(i["arr_icao"])))} kg');
I am trying to iterate from Airlabs api. Basically two queries: one for flights, other for airports(which has latitude and longtitute - whic is extracted from iata_code from flights and match together however 2 responses have around 8mb but to iterate through all of them taking ages.
Is there any way how to speed it up ?
Speed up the for loops inside.
Generally
print(f"Flight Number is {i['flight_number']} and the airline is {i['flag']} and the aircraft is {i['aircraft_icao']} going from {i['dep_icao']} to {i['arr_icao']}");
-- this is working flawless (10000queries within a second)
However it slows down on this section.
print(f'Flight distance is {Emissions().calculate_distance(ApiResponse().get_airport_cordinates(i["dep_icao"]), ApiResponse().get_airport_cordinates(i["arr_icao"]))} km'); --> which compares results from flight_list (around 8mb) to airports_list (around 2 mb), distance itself is relatively fast. Any guidance how to speed it up ?
Problem solved by moving static queries from API to the file :)

statistics program gives out values different from test sample

I wrote a program for a statistics educatory problem, simply put I was supposed to predict prices for the next 250 days, then extract the lowest and highest price from 10k tries of 250-day predictions.
I followed the instructions written on the problem to use the gauss method from the random module and use the mean and std of the given sample.
the highest and lowest prices in the test are in the range of 45-55 but I predict 18-88. is there a problem with my code or is it just not a good method for prediction.
from random import gauss
with open('AAPL_train.csv','r') as sheet: #we categorize the data here
Date=[]
Open=[]
High=[]
Low=[]
Close=[]
Adj_Close=[]
Volume=[]
for lines in sheet.readlines()[1:-1]:
words=lines.strip().split(',')
Date.append(words[0])
Open.append(float(words[1]))
High.append(float(words[2]))
Low.append(float(words[3]))
Close.append(float(words[4]))
Adj_Close.append(float(words[5]))
Volume.append(int(words[6]))
subtract=[] #find the pattern of price changing by finding the day-to-day changes
for i in range(1,len(Volume)):
subtract.append(Adj_Close[i]-Adj_Close[i-1])
mean=sum(subtract)/len(subtract) #find the mean and std of the change pattern
accum=0
for amount in subtract:
accum+= (amount-mean)**2
var=accum/len(subtract)
stdev=var**0.5
worst=[]
best=[]
def Getwb(): #a function to predict
index=Adj_Close[-1]
index_lst=[]
for i in range(250):
index+=gauss(mean,stdev)
index_lst.append(index)
worst=(min(index_lst))
best=(max(index_lst))
return worst,best
for i in range (10000): #try predicting 10000 times and then extract highest and lowest result
x,y=Getwb()
worst.append(x)
best.append(y)
print(min(worst))
print(max(best))

Finding minimum value of a function wit 11,390,625 variable combinations

I am working on a code to solve for the optimum combination of diameter size of number of pipelines. The objective function is to find the least sum of pressure drops in six pipelines.
As I have 15 choices of discrete diameter sizes which are [2,4,6,8,12,16,20,24,30,36,40,42,50,60,80] that can be used for any of the six pipelines that I have in the system, the list of possible solutions becomes 15^6 which is equal to 11,390,625
To solve the problem, I am using Mixed-Integer Linear Programming using Pulp package. I am able to find the solution for the combination of same diameters (e.g. [2,2,2,2,2,2] or [4,4,4,4,4,4]) but what I need is to go through all combinations (e.g. [2,4,2,2,4,2] or [4,2,4,2,4,2] to find the minimum. I attempted to do this but the process is taking a very long time to go through all combinations. Is there a faster way to do this ?
Note that I cannot calculate the pressure drop for each pipeline as the choice of diameter will affect the total pressure drop in the system. Therefore, at anytime, I need to calculate the pressure drop of each combination in the system.
I also need to constraint the problem such that the rate/cross section of pipeline area > 2.
Your help is much appreciated.
The first attempt for my code is the following:
from pulp import *
import random
import itertools
import numpy
rate = 5000
numberOfPipelines = 15
def pressure(diameter):
diameterList = numpy.tile(diameter,numberOfPipelines)
pressure = 0.0
for pipeline in range(numberOfPipelines):
pressure += rate/diameterList[pipeline]
return pressure
diameterList = [2,4,6,8,12,16,20,24,30,36,40,42,50,60,80]
pipelineIds = range(0,numberOfPipelines)
pipelinePressures = {}
for diameter in diameterList:
pressures = []
for pipeline in range(numberOfPipelines):
pressures.append(pressure(diameter))
pressureList = dict(zip(pipelineIds,pressures))
pipelinePressures[diameter] = pressureList
print 'pipepressure', pipelinePressures
prob = LpProblem("Warehouse Allocation",LpMinimize)
use_diameter = LpVariable.dicts("UseDiameter", diameterList, cat=LpBinary)
use_pipeline = LpVariable.dicts("UsePipeline", [(i,j) for i in pipelineIds for j in diameterList], cat = LpBinary)
## Objective Function:
prob += lpSum(pipelinePressures[j][i] * use_pipeline[(i,j)] for i in pipelineIds for j in diameterList)
## At least each pipeline must be connected to a diameter:
for i in pipelineIds:
prob += lpSum(use_pipeline[(i,j)] for j in diameterList) ==1
## The diameter is activiated if at least one pipelines is assigned to it:
for j in diameterList:
for i in pipelineIds:
prob += use_diameter[j] >= lpSum(use_pipeline[(i,j)])
## run the solution
prob.solve()
print("Status:", LpStatus[prob.status])
for i in diameterList:
if use_diameter[i].varValue> pressureTest:
print("Diameter Size",i)
for v in prob.variables():
print(v.name,"=",v.varValue)
This what I did for the combination part which took really long time.
xList = np.array(list(itertools.product(diameterList,repeat = numberOfPipelines)))
print len(xList)
for combination in xList:
pressures = []
for pipeline in range(numberOfPipelines):
pressures.append(pressure(combination))
pressureList = dict(zip(pipelineIds,pressures))
pipelinePressures[combination] = pressureList
print 'pipelinePressures',pipelinePressures
I would iterate through all combinations, I think you would run into memory problems otherwise trying to model ALL combinations in a MIP.
If you iterate through the problems perhaps using the multiprocessing library to use all cores, it shouldn't take long just remember only to hold information on the best combination so far, and not to try and generate all combinations at once and then evaluate them.
If the problem gets bigger you should consider Dynamic Programming Algorithms or use pulp with column generation.

cplex python model definition and variables

I recently started using CPLEX integrated in python for my master project and I have a hard time with one of my variables. I am modelling the charge and discharge of a battery in function of the wind and solar power as well as electricity market prices. All my variables for charge, discharge and production are well defined but my battery state of charge ends up being null at all times after solving. When calling get values of this variable i get a list of zeros (with sol the solution of the optimization and Ebes the name of the State of charge):
sol.get_values(Ebes[t]for t in time)
It is even unfeasible that this variable would be null as i also have the constraints in the model :
for t in time:
mdl.add_constraint(Ebes[t]>=Ebmin)
mdl.add_constraint(Ebes[t]<=Ebmax)
When I display the model before solving with print(mdl.export_to_string()) it shows that Ebes is constrained to be higher than Ebmin(=20) for all the time steps. The only hint I get is that the name of the variables in there is slightly different from the others. In here the variables of Ebes are named _Ebes_date whereas the other variables are named for example Pdischarge_date and not _Pdischarge_date. I guess this "_" before the name shows that there is a problem but i can't manage to find what to change.
My variables are defined as:
Ebes=mdl.continuous_var_dict(time,name='Ebes')
for i in range(len(time)):
if i==0:
mdl.add_constraint(Ebes[time[0]]==Ebes[time[len(time)-1]]*(1-etaleak)+Pcha[time[0]]*etacha*dt-(Pdis[time[0]]/etadis*dt))
else:
t=time[i]
tm=time[i-1]
mdl.add_constraint(Ebes[t]==Ebes[tm]*(1-etaleak)+Pcha[t]*etacha*dt-(Pdis[t]/etadis*dt))
Thank you if you take time to answer me :)
The whole example:
import pandas as pd
from docplex.mp.model import Model
ind=['01_09_2016 00','01_09_2016 01','01_09_2016 02','01_09_2016 03','01_09_2016 04']#[1,2,3,4,5]
M=pd.Series(data=[10,30,30,15,30],index=ind)
P=pd.DataFrame(data={'Time':ind,'DK2_wind':[0.3,0.24,0.14,0.18,0.22],'DK2_solar':[0,0,0,0,0.01]}).set_index('Time',drop=True)
mdl=Model('dispatch')
time=P.index
Psolar=300 #MW
Pwind= 400 #MW
P.DK2_solar=P.DK2_solar*Psolar
P.DK2_wind=P.DK2_wind*Pwind
Pres=P.sum(axis=1) #MW
Pmax=800 #MW
#Battery parameters:
Pbmax= 50 #MW
Ebmax= 100 #MWh
Ebmin= 20 #MWh
Pbal =mdl.continuous_var_dict(time,name='Pbal')
Pcha =mdl.continuous_var_dict(time,name='Pcharge')
Pdis =mdl.continuous_var_dict(time,name='Pdischarge')
Ebes=mdl.continuous_var_dict(time,name='Ebes')
switch=mdl.binary_var_dict(time,name='switch')
for t in time:
mdl.add_constraint(Pbal[t]==Pres[t]+Pdis[t]-Pcha[t])
for t in time:
mdl.add_constraint(Pdis[t]<=Pbmax*(1-switch[t]))
for t in time:
mdl.add_constraint(Pdis[t]>=0)
for t in time:
mdl.add_constraint(Pcha[t]<=Pbmax*switch[t])
for t in time:
mdl.add_constraint(Pcha[t]>=0)
for t in time:
mdl.add_constraint(Ebes[t]>=Ebmin)
for t in time:
mdl.add_constraint(Ebes[t]<=Ebmax)
for i in range(len(time)):
if i==0:
mdl.add_constraint(Ebes[time[0]]==Ebes[time[len(time)-1]]+Pcha[time[0]]-Pdis[time[0]])
else:
t=time[i]
tm=time[i-1]
mdl.add_constraint(Ebes[t]==Ebes[tm]+Pcha[t]-Pdis[t])
mdl.maximize(mdl.sum((Pbal[t]*M[t]) for t in time))
sol=mdl.solve(url=URLmt,key=Mykey,log_output=True)
sol_Ebess=sol.get_values(Ebes[t]for t in time)
sol_Ebess
sol.solve_details.status
So here the sol_Ebess is null for all indices. if I change ind to be numbers instead it works and Ebess is equal to the real value.

recursion to iteration in python

We are trying to make a cluster analysis for a big amount of data. We are kind of new to python and found out that an iterative function is way more efficient than an recursive one. Now we are trying to change that but it is way harder than we thought.
This code underneath is the heart of our clustering function. This takes over 90 percent of the time. Can you help us to change that into a recursive one?
Some extra information: The taunach function gets neighbours of our point which will later form the clusters. The problem is that we have many many points.
def taunach(tau,delta, i,s,nach,anz):
dis=tabelle[s].dist
#delta=tau
x=data[i]
y=Skalarprodukt(data[tabelle[s].index]-x)
a=tau-abs(dis)
#LA.norm(data[tabelle[s].index]-x)
if y<a*abs(a):
nach.update({item.index for item in tabelle[tabelle[s].inner:tabelle[s].outer-1]})
anz = anzahl(delta, i, tabelle[s].inner, anz)
if dis>-1:
b=dis-tau
if y>=b*abs(b):#*(1-0.001):
nach,anz=taunach(tau,delta, i,tabelle[s].outer,nach,anz)
else:
if y<tau**2:
nach.add(tabelle[s].index)
if y < delta:
anz += 1
if tabelle[s].dist>-4:
b = dis - tau
if y>=b*abs(b):#*(1-0.001)):
nach,anz=taunach(tau,delta, i,tabelle[s].outer,nach,anz)
if tabelle[s].dist > -1:
if y<=(dis+tau)**2:
nach,anz=taunach(tau,delta, i,tabelle[s].inner,nach,anz)
return nach,anz

Categories

Resources