python random_sample minimum value - python

I am currently using random_sample to generate weightage allocation for 3 stocks where each row values add up to 1.
for portfolio in range (10):
weights = np.random.random_sample(3)
weights = weights/ np.sum(weights)
print (weights)
[0.39055438 0.44055996 0.16888567]
[0.22401792 0.26961926 0.50636282]
[0.67856154 0.21523207 0.10620639]
[0.33449127 0.36491387 0.30059486]
[0.55274192 0.23291811 0.21433997]
[0.20980909 0.38639029 0.40380063]
[0.24600751 0.199761 0.5542315 ]
[0.50743661 0.26633377 0.22622962]
[0.1154567 0.36803903 0.51650427]
[0.29092731 0.34675988 0.36231281]
I am able to do it but is there any way to ensure that the minimum weightage allocation is greater than 0.05? Meaning that the minimum weight allocation could only be something like [0.05 0.9 0.05]

You can ignore them:
n = 0
while n < 10:
weights = np.random.random_sample(3)
weights = weights/ np.sum(weights)
if any(i < 0.05 for i in weights):
continue
n += 1
print (weights)

Have a look at the docs
Results are from the “continuous uniform” distribution over the stated interval. To sample Unif(a,b), b>a multiply the output of random_sample by (b-a) and add a.
In this case, 0.95 * weight + 0.05

Related

how can i limit np get the value in specific range 0-1

I'm coding a markowitz efficient frontier with a group of 10 stocks. Until now, a get almost everything right, but, when i printed the weights gived for each stock, the sum was bigger than 1, and i cant resolve this.
#normalization of returns
log_ret = np.log(carteira/carteira.shift(1))
log_ret.head()
log_ret.cov()
np.random.seed(42)
num_ports = 6000
all_weights = np.zeros((num_ports, len(carteira.columns)))
ret_arr = np.zeros(num_ports)
vol_arr = np.zeros(num_ports)
sharpe_arr = np.zeros(num_ports)
for x in range(num_ports):
weights = np.array(np.random.random(10))
weighs = weights/np.sum(weights)
all_weights[x,:] = weights
#expected return
ret_arr [x] = np.sum( (log_ret.mean() *weights * 264))
#expected volatility
vol_arr[x] = np.sqrt(np.dot(weights.T, np.dot(log_ret.cov()*264, weights )))
#sharpe ration
sharpe_arr[x] = ret_arr[x]/vol_arr[x]
print('Max sharpe ration: {}'.format(sharpe_arr.max()))
print(' location array: {}'.format(sharpe_arr.argmax()))
print(all_weights[1506,:])
with this specific stocks, my weight is almost 4 but i need that return sum = 1.
Someone can help me ?

Indexing dynamic vector of class probabilities

For my code, I have a large (up to 40,000) vector of class probabilities. This set of class probabilities also needs to be reweighted regularly, so assume it will change on every call of the code. The vector sums to 1. I need to efficiently search through this for the index corresponding to that probability.
As an example - say the vector was [0.25, 0.25, 0.25, 0.25], uniform prob across 4 objects. My probability result is a 0.67. This corresponds to index 3, since 0.67 > sum(probvec[0:1]) but 0.67 <= sum(probvec[0:2]).
I'm open to changing the probability vector to make it the running sum, i.e. [0.25, 0.5, 0.75, 1], though then I'd also need a suggestion as to how to perform updates.
Any help would be appreciated.
Step 1: pre-compute all the partial sums up to the i-th index.
Step 2: scan your sums_probvec with binary search for obtaining the result in logtime.
import numpy as np
probvec = np.full(4, 0.25)
prob = 0.67
# pre-compute all the partial sums up to the i-th index
sum_probvec = [probvec[0]]
for i in range(1, len(probvec)) :
sum_probvec.append(sum_probvec[i-1] + probvec[i])
# use binary search for logtime results
i = 0
j = len(sum_probvec)
while i != j-1:
mid = (i + j) // 2
if prob > sum_probvec[mid]:
i = mid
else:
j = mid
index = i+2
print (index) # 3

How to solve a system of equations and constraints for portfolio optimization?

I have a DataFrame as follows :
Name Volatility Return
a 0.0243 0.212
b 0.0321 0.431
c 0.0323 0.443
d 0.0391 0.2123
e 0.0433 0.3123
I'd like to have a Volatility of 0.035 and the maximized Return for that volatility.
That is, I'd like, in a new Df the Name and the percentage of that asset that will be in my portfolio that gives the maximum Return for a Volatility equals to 0.035.
Therefore, I need to solve a system of equations with multiple conditions, to obtain the best solution (HighestReturn) for a fixed outcome (Volatility == 0.035).
The conditions are:
Each asset has a weight between 0 and 1.
The sum of the weights is 1.
The sum of the weights times the volatility of each asset is the "Desired Volatility".
The sum of the weights times the return of each asset is the "Total Return". This should be maximized.
Here is an approach using Z3Py, an open source SAT/SMT solver.
In a SAT/SMT solver you can write your code just as a list of conditions, and the program finds an optimal solution (or just a solution that satisfies all the conditions when Z3 is used as solver).
Originally SAT solvers only worked with pure boolean expressions, but modern SAT/SMT solvers also allow for fixed-bit and unlimited integers, fractions, reals and even functions as central variable.
To write the given equations into Z3, they are converted quite literally into Z3 expressions. The code below comments each of the steps.
import pandas as pd
from z3 import *
DesiredVolatility = 0.035
df = pd.DataFrame(columns=['Name', 'Volatility', 'Return'],
data=[['a', 0.0243, 0.212],
['b', 0.0321, 0.431],
['c', 0.0323, 0.443],
['d', 0.0391, 0.2123],
['e', 0.0433, 0.3123]])
# create a Z3 instance to optimize something
s = Optimize()
# the weight of each asset, as a Z3 variable
W = [Real(row.Name) for row in df.itertuples()]
# the total volatility
TotVol = Real('TotVol')
# the total return, to be maximized
TotReturn = Real('TotReturn')
# weights between 0 and 1, and sum to 1
s.add(And([And(w >= 0, w <= 1) for w in W]))
s.add(Sum([w for w in W]) == 1)
# the total return is calculated as the weighted sum of the asset returns
s.add(TotReturn == Sum([w * row.Return for w, row in zip(W, df.itertuples())]))
# the volatility is calculated as the weighted sum of the asset volatility
s.add(TotVol == Sum([w * row.Volatility for w, row in zip(W, df.itertuples())]))
# the volatility should be equal to the desired volatility
s.add(TotVol == DesiredVolatility)
# we're maximizing the total return
h1 = s.maximize(TotReturn)
# we ask Z3 to do its magick
res = s.check()
# we check the result, hoping for 'sat': all conditions satisfied, a maximum is found
if res == sat:
s.upper(h1)
m = s.model()
#for w in W:
# print(f'asset {w}): {m[w]} = {m[w].numerator_as_long() / m[w] .denominator_as_long() : .6f}')
# output the total return
print(f'Total Return: {m[TotReturn]} = {m[TotReturn].numerator_as_long() / m[TotReturn] .denominator_as_long() :.6f}')
# get the proportions out of the Z3 model
proportions = [m[w].numerator_as_long() / m[w] .denominator_as_long() for w in W]
# create a dataframe with the result
df_result = pd.DataFrame({'Name': df.Name, 'Proportion': proportions})
print(df_result)
else:
print("No satisfiable solution found")
Result:
Total Return: 452011/1100000 = 0.410919
Name Proportion
0 a 0.000000
1 b 0.000000
2 c 0.754545
3 d 0.000000
4 e 0.245455
You can easily add additional constraints, for example "no asset can have more than 30% of the total":
# change
s.add(And([And(w >= 0, w <= 1) for w in W]))`
# to
s.add(And([And(w >= 0, w <= 0.3) for w in W]))`
Which would result in:
Total Return: 558101/1480000 = 0.377095
Name Proportion
0 a 0.082432
1 b 0.300000
2 c 0.300000
3 d 0.017568
4 e 0.300000

Exponential Moving Average by time interval [duplicate]

I have a range of dates and a measurement on each of those dates. I'd like to calculate an exponential moving average for each of the dates. Does anybody know how to do this?
I'm new to python. It doesn't appear that averages are built into the standard python library, which strikes me as a little odd. Maybe I'm not looking in the right place.
So, given the following code, how could I calculate the moving weighted average of IQ points for calendar dates?
from datetime import date
days = [date(2008,1,1), date(2008,1,2), date(2008,1,7)]
IQ = [110, 105, 90]
(there's probably a better way to structure the data, any advice would be appreciated)
EDIT:
It seems that mov_average_expw() function from scikits.timeseries.lib.moving_funcs submodule from SciKits (add-on toolkits that complement SciPy) better suits the wording of your question.
To calculate an exponential smoothing of your data with a smoothing factor alpha (it is (1 - alpha) in Wikipedia's terms):
>>> alpha = 0.5
>>> assert 0 < alpha <= 1.0
>>> av = sum(alpha**n.days * iq
... for n, iq in map(lambda (day, iq), today=max(days): (today-day, iq),
... sorted(zip(days, IQ), key=lambda p: p[0], reverse=True)))
95.0
The above is not pretty, so let's refactor it a bit:
from collections import namedtuple
from operator import itemgetter
def smooth(iq_data, alpha=1, today=None):
"""Perform exponential smoothing with factor `alpha`.
Time period is a day.
Each time period the value of `iq` drops `alpha` times.
The most recent data is the most valuable one.
"""
assert 0 < alpha <= 1
if alpha == 1: # no smoothing
return sum(map(itemgetter(1), iq_data))
if today is None:
today = max(map(itemgetter(0), iq_data))
return sum(alpha**((today - date).days) * iq for date, iq in iq_data)
IQData = namedtuple("IQData", "date iq")
if __name__ == "__main__":
from datetime import date
days = [date(2008,1,1), date(2008,1,2), date(2008,1,7)]
IQ = [110, 105, 90]
iqdata = list(map(IQData, days, IQ))
print("\n".join(map(str, iqdata)))
print(smooth(iqdata, alpha=0.5))
Example:
$ python26 smooth.py
IQData(date=datetime.date(2008, 1, 1), iq=110)
IQData(date=datetime.date(2008, 1, 2), iq=105)
IQData(date=datetime.date(2008, 1, 7), iq=90)
95.0
I'm always calculating EMAs with Pandas:
Here is an example how to do it:
import pandas as pd
import numpy as np
def ema(values, period):
values = np.array(values)
return pd.ewma(values, span=period)[-1]
values = [9, 5, 10, 16, 5]
period = 5
print ema(values, period)
More infos about Pandas EWMA:
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.ewma.html
I did a bit of googling and I found the following sample code (http://osdir.com/ml/python.matplotlib.general/2005-04/msg00044.html):
def ema(s, n):
"""
returns an n period exponential moving average for
the time series s
s is a list ordered from oldest (index 0) to most
recent (index -1)
n is an integer
returns a numeric array of the exponential
moving average
"""
s = array(s)
ema = []
j = 1
#get n sma first and calculate the next n period ema
sma = sum(s[:n]) / n
multiplier = 2 / float(1 + n)
ema.append(sma)
#EMA(current) = ( (Price(current) - EMA(prev) ) x Multiplier) + EMA(prev)
ema.append(( (s[n] - sma) * multiplier) + sma)
#now calculate the rest of the values
for i in s[n+1:]:
tmp = ( (i - ema[j]) * multiplier) + ema[j]
j = j + 1
ema.append(tmp)
return ema
You can also use the SciPy filter method because the EMA is an IIR filter. This will have the benefit of being approximately 64 times faster as measured on my system using timeit on large data sets when compared to the enumerate() approach.
import numpy as np
from scipy.signal import lfilter
x = np.random.normal(size=1234)
alpha = .1 # smoothing coefficient
zi = [x[0]] # seed the filter state with first value
# filter can process blocks of continuous data if <zi> is maintained
y, zi = lfilter([1.-alpha], [1., -alpha], x, zi=zi)
I don't know Python, but for the averaging part, do you mean an exponentially decaying low-pass filter of the form
y_new = y_old + (input - y_old)*alpha
where alpha = dt/tau, dt = the timestep of the filter, tau = the time constant of the filter? (the variable-timestep form of this is as follows, just clip dt/tau to not be more than 1.0)
y_new = y_old + (input - y_old)*dt/tau
If you want to filter something like a date, make sure you convert to a floating-point quantity like # of seconds since Jan 1 1970.
My python is a little bit rusty (anyone can feel free to edit this code to make corrections, if I've messed up the syntax somehow), but here goes....
def movingAverageExponential(values, alpha, epsilon = 0):
if not 0 < alpha < 1:
raise ValueError("out of range, alpha='%s'" % alpha)
if not 0 <= epsilon < alpha:
raise ValueError("out of range, epsilon='%s'" % epsilon)
result = [None] * len(values)
for i in range(len(result)):
currentWeight = 1.0
numerator = 0
denominator = 0
for value in values[i::-1]:
numerator += value * currentWeight
denominator += currentWeight
currentWeight *= alpha
if currentWeight < epsilon:
break
result[i] = numerator / denominator
return result
This function moves backward, from the end of the list to the beginning, calculating the exponential moving average for each value by working backward until the weight coefficient for an element is less than the given epsilon.
At the end of the function, it reverses the values before returning the list (so that they're in the correct order for the caller).
(SIDE NOTE: if I was using a language other than python, I'd create a full-size empty array first and then fill it backwards-order, so that I wouldn't have to reverse it at the end. But I don't think you can declare a big empty array in python. And in python lists, appending is much less expensive than prepending, which is why I built the list in reverse order. Please correct me if I'm wrong.)
The 'alpha' argument is the decay factor on each iteration. For example, if you used an alpha of 0.5, then today's moving average value would be composed of the following weighted values:
today: 1.0
yesterday: 0.5
2 days ago: 0.25
3 days ago: 0.125
...etc...
Of course, if you've got a huge array of values, the values from ten or fifteen days ago won't contribute very much to today's weighted average. The 'epsilon' argument lets you set a cutoff point, below which you will cease to care about old values (since their contribution to today's value will be insignificant).
You'd invoke the function something like this:
result = movingAverageExponential(values, 0.75, 0.0001)
In matplotlib.org examples (http://matplotlib.org/examples/pylab_examples/finance_work2.html) is provided one good example of Exponential Moving Average (EMA) function using numpy:
def moving_average(x, n, type):
x = np.asarray(x)
if type=='simple':
weights = np.ones(n)
else:
weights = np.exp(np.linspace(-1., 0., n))
weights /= weights.sum()
a = np.convolve(x, weights, mode='full')[:len(x)]
a[:n] = a[n]
return a
I found the above code snippet by #earino pretty useful - but I needed something that could continuously smooth a stream of values - so I refactored it to this:
def exponential_moving_average(period=1000):
""" Exponential moving average. Smooths the values in v over ther period. Send in values - at first it'll return a simple average, but as soon as it's gahtered 'period' values, it'll start to use the Exponential Moving Averge to smooth the values.
period: int - how many values to smooth over (default=100). """
multiplier = 2 / float(1 + period)
cum_temp = yield None # We are being primed
# Start by just returning the simple average until we have enough data.
for i in xrange(1, period + 1):
cum_temp += yield cum_temp / float(i)
# Grab the timple avergae
ema = cum_temp / period
# and start calculating the exponentially smoothed average
while True:
ema = (((yield ema) - ema) * multiplier) + ema
and I use it like this:
def temp_monitor(pin):
""" Read from the temperature monitor - and smooth the value out. The sensor is noisy, so we use exponential smoothing. """
ema = exponential_moving_average()
next(ema) # Prime the generator
while True:
yield ema.send(val_to_temp(pin.read()))
(where pin.read() produces the next value I'd like to consume).
May be shortest:
#Specify decay in terms of span
#data_series should be a DataFrame
ema=data_series.ewm(span=5, adjust=False).mean()
import pandas_ta as ta
data["EMA3"] = ta.ema(data["close"], length=3)
pandas_ta is a Technical Analysis Library: https://github.com/twopirllc/pandas-ta. Above code calculates the Exponential Moving Average (EMA) for a series. You can specify the lag value using 'length'. Spesifically, above code calculates '3-day EMA'.
Here is a simple sample I worked up based on http://stockcharts.com/school/doku.php?id=chart_school:technical_indicators:moving_averages
Note that unlike in their spreadsheet, I don't calculate the SMA, and I don't wait to generate the EMA after 10 samples. This means my values differ slightly, but if you chart it, it follows exactly after 10 samples. During the first 10 samples, the EMA I calculate is appropriately smoothed.
def emaWeight(numSamples):
return 2 / float(numSamples + 1)
def ema(close, prevEma, numSamples):
return ((close-prevEma) * emaWeight(numSamples) ) + prevEma
samples = [
22.27, 22.19, 22.08, 22.17, 22.18, 22.13, 22.23, 22.43, 22.24, 22.29,
22.15, 22.39, 22.38, 22.61, 23.36, 24.05, 23.75, 23.83, 23.95, 23.63,
23.82, 23.87, 23.65, 23.19, 23.10, 23.33, 22.68, 23.10, 22.40, 22.17,
]
emaCap = 10
e=samples[0]
for s in range(len(samples)):
numSamples = emaCap if s > emaCap else s
e = ema(samples[s], e, numSamples)
print e
I'm a little late to the party here, but none of the solutions given were what I was looking for. Nice little challenge using recursion and the exact formula given in investopedia.
No numpy or pandas required.
prices = [{'i': 1, 'close': 24.5}, {'i': 2, 'close': 24.6}, {'i': 3, 'close': 24.8}, {'i': 4, 'close': 24.9},
{'i': 5, 'close': 25.6}, {'i': 6, 'close': 25.0}, {'i': 7, 'close': 24.7}]
def rec_calculate_ema(n):
k = 2 / (n + 1)
price = prices[n]['close']
if n == 1:
return price
res = (price * k) + (rec_calculate_ema(n - 1) * (1 - k))
return res
print(rec_calculate_ema(3))
A fast way (copy-pasted from here) is the following:
def ExpMovingAverage(values, window):
""" Numpy implementation of EMA
"""
weights = np.exp(np.linspace(-1., 0., window))
weights /= weights.sum()
a = np.convolve(values, weights, mode='full')[:len(values)]
a[:window] = a[window]
return a
I am using a list and a rate of decay as inputs. I hope this little function with just two lines may help you here, considering deep recursion is not stable in python.
def expma(aseries, ratio):
return sum([ratio*aseries[-x-1]*((1-ratio)**x) for x in range(len(aseries))])
more simply, using pandas
def EMA(tw):
for x in tw:
data["EMA{}".format(x)] = data['close'].ewm(span=x, adjust=False).mean()
EMA([10,50,100])
Papahaba's answer was almost what I was looking for (thanks!) but I needed to match initial conditions. Using an IIR filter with scipy.signal.lfilter is certainly the most efficient. Here's my redux:
Given a NumPy vector, x
import numpy as np
from scipy import signal
period = 12
b = np.array((1,), 'd')
a = np.array((period, 1-period), 'd')
zi = signal.lfilter_zi(b, a)
y, zi = signal.lfilter(b, a, x, zi=zi*x[0:1])
Get the N-point EMA (here, 12) returned in the vector y

Baye's rule - how to calculate likelihood

Given is some data, data, which corresponds to a binary sequence of coin flips, where heads are 1's and tails are 0's. Theta is a value between 0 and 1 representing the probability that a coin produces heads when flipped.
How does one go about calculating the likelihood? I faintly remember a formula where:
likelihood = (theta)^(h)*(1-theta)^(1-h)
where h is 1 if heads, and 0 if tails. I implemented the following code:
import numpy as np
(np.prod([theta*1 for i in data if i==1]) * np.prod([1-theta for i in data if i==0]))
This code works for some cases but not for some hidden cases (so I'm not sure what's wrong with it).
There are a couple of ways to interpret what you are trying to calculate:
Probability of exactly that sequence, including the order in which the head occurs (which is how your question is posed here)
Probability of the number of heads (lets call this X) occurring in your sequence, regardless of the order (which is what I think you were asking for).
option 1:
import numpy as np
theta = 0.2 # Probability of H is 0.2, hence NOT a fair coin
data = [0, 1, 0, 1, 1, 1, 0, 0, 1, 1] # T, H, T, H, H, ....
def likelihood(theta, h):
return (theta)**(h)*(1-theta)**(1-h)
likelihood(theta, 1) # 0.2
likelihood(theta, 0) # 0.8
singlethrow = [likelihood(theta, x) for x in data]
prob1 = np.prod(singlethrow) # 2.6214400000000015e-05
prob1 will converge to zero pretty quickly, because every additional coin toss will multiply the existing probability with a number smaller than 1 (either 0.2 if heads, 0.8 if tails)
option 2:
is a binomial distribution. This adds up the probability of all possible outcomes that results in a total of, say, 6 heads when tossing a coin 10 times. One particular sequence that results in 6 heads for 10 tosses we already evaluated in option 1 above. There are 210 such ways ( = 10! / (6!*(10−6)!) )
The scipy.stats.binom.pmf() functionality calculates this probability for you:
import scipy, scipy.stats
prob2 = scipy.stats.binom.pmf(6, 10, theta)
Or, more generally, if you rely on data in the form I defined above:
X = sum([toss == 1 for toss in data])
N = len(data)
prob3 = scipy.stats.binom.pmf(X, N, theta)
prob2 == prob3 # True
If you're interested in the Bayesian approach, you might want to have a look at the conjugate_prior package
from conjugate_prior import BetaBinomial
prior_model = BetaBinomial(1,1) # Uninformative prior
updated_model = prior_model.update(heads, tails)
credible_interval = updated_model.posterior(0.45, 0.55)
print ("There's {p:.2f}% chance that the coin is fair".format(p=credible_interval*100))

Categories

Resources