I'm new here and this is my first question. Therefore, please forgive me if not everything is clearly formulated. Feedback on that is welcome as well.
I try to execute a backtesting using the bt.algos.
As an example I take two ETF with a predefinded allocation (80 to 20)
data_bt = bt.get("spy,agg", start="2010-01-01")
weights=data_bt.copy()
weights["spy"]=0.2
weights["agg"]=0.8
tolerance=0.05
My algo looks like:
s = bt.Strategy("s1",[bt.algos.SelectAll(),
bt.algos.RunIfOutOfBounds(tolerance),
bt.algos.WeighTarget(weights),
bt.algos.Rebalance()])
It runs all smooth and I can create a backtest:
test = bt.Backtest(s,data_bt)
res=bt.run(test)
res.display()
The result (e.g. Total return) is always the same, regardless if the tolerance is 0.5,0.05,0.005 or anything else.
Where is my mistake?
Related
I'm debugging my code right now and since it's running with some datas and not with other ones, I wanted to set the 'max_iters' option to 1 to see if it works in only 1 iteration or if it needs more. I realised it doesn't seem to even use it. I tried putting a string "hello" instead of an int and it even worked. Do someone knows if it's a known problem?
self.prob.solve(solver="GLPK_MI", max_iters=1)
I'm using the CVXPY module with CVXOPT.
EDIT:
I want to do this because I don't get an error, it just continues to run forever. And with the project I'm working on it can take a lot of time to run so I wonder if it's really not working or if it's just a question of time.
Wouldn't be better if you set the max iterations as a variable? (just a suggestion)
In any case, in CVXOPT you need to set the max number of iteration as
'maxiters' : 1
or you can set it as a variable and then call solver as per below
opts = {'maxiters' : 1}
self.prob.solve(solver="GLPK_MI", options = opts)
I've been working on solving Problem 31 from Project Euler. I'm almost finished, but my code is giving me what I know is the wrong answer. My dad showed the right answer, and mine is giving me something completely wrong.
coins = (1,2,5,10,20,50,100,200)
amount = 200
def ways(target, avc):
target = amount
if avc <= 1:
return 1
res = 0
while target >= 0:
res = res + ways(target, avc-1)
target = target - coins[avc]
return res
print(ways(amount, 7))
This is the answer I get.
284130
The answer is supposed to 73682.
EDIT: Thank you to everyone who answered my question. Thanks to all of you, I have figured it out!
I see several ways how you can improve the way you're working on problems:
Copy the task description into your source code. This will make you more independent from Internet resources. That allows you to work offline. Even if the page is down or disappears completely, it will enable someone else understanding what problem you're trying to solve.
Use an IDE that does proper syntax highlighting and gives you hints what might be wrong with your code. The IDE will tell you what #aronquemarr mentioned in the comments:
There's also a thing called magic numbers. Many of the numbers in your code can be explained, but why do you start with an avc of 7?. Where does that number come from? 7 does not occur in the problem description.
Use better naming, so you understand better what your code does. What does avc stand for? If you think it's available_coins, that's not correct. There are 8 different coins, not 7.
If it still does not work, reduce the problem to a simpler one. Start with the most simple one you can think of, e.g. make only 1 type of coin available and set the amount to 2 cent. There should only be 1 solution.
To make sure this simple result will never break, introduce an assertion or unit test. They will help you when you change the code to work with larger datasets. They will also change the way you write code. In your case you'll find that accessing the variable coins from an outer scope is probably not a good idea, because it will break your assertion when you switch to the next level. The change that is needed will make your methods more self-contained and robust.
Then, increase the difficulty by having 2 different coins etc.
Examples for the first assertions that I used on this problem:
# Only 1 way of making 1 with only a 1 coin of value 1
assert(find_combinations([1], 1) == 1)
# Still only 1 way of making 1 with two coins of values 1 and 2
assert(find_combinations([1,2], 1) == 1)
# Two ways of making 2 with two coins of values 1 and 2
assert(find_combinations([1,2], 2) == 2)
As soon as the result does no longer match your expectation, use the debugger and step through your code. After every line, check the values in the debugger against the values that you think they should be.
One time, the debugger will be your best friend. And you can never imagine how you did stuff without it. You just need to learn how to use it.
Firstly, read Thomas Weller's answer. They give some excellent suggestions as to how to improve your coding and problem solving.
Secondly, your code works, and gives the correct answer after the following changes:
As suggested, remove the target = amount line. You're resigning the argument of the function to global amount on each call (even on the recursive calls). Having removed that the answer comes up to 3275 - still not the right answer.
The other thing you have to remember is that Python is 0-indexed. Therefore, your simplest case condition ought to read if avc <= 0: ... (not <=1).
Having made these two changes, your code gives the correct answer. Here is your code with these changes:
coins = (1,2,5,10,20,50,100,200)
amount = 200
def ways(target, avc):
if avc <= 0:
return 1
res = 0
while target >= 0:
res = res + ways(target, avc-1)
target = target - coins[avc]
return res
print(ways(amount, 7))
Lastly, there are plenty of Project Euler answers out there. Having solved the yourself, it might be useful to have a look at what others did. For reference, I have not actually solved this Project Euler before, so I had to do that first. Here is my solution. I've added a pile of comments on top of it to explain what it does.
EDIT
I've just noticed something quite worrying: Your code (after fixes) works only if the first element of coins is 1. All the other elements can be shuffled:
# This still works ok
coins = (1,2,200,10,20,50,100,5)
# But this does not
coins = (2,1,5,10,20,50,100,200)
To ensure that this is always the case, you can just do the following:
coins = ... # <- Some not necessarily sorted tuple
coins = tuple(sorted(coins))
In principle there are a few other issues. Your solution breaks for non-unique values coins, and which don't include 1. The former you could fix with the use of sets, and the latter by modifying your if avc <= 0: case (check for the divisibility of the target by the remaining coin). Here is you piece of code with these changes implemented. I've also renamed the variables and the function to be a bit easier to read, and used coins as the argument, rather that avc pointer (which, by the way, I could not stop reading as avec):
unique = lambda x: sorted(set(x)) # Sorted unique list
def find_combinations(target, coins):
''' Find the number of ways coins can make up the target amount '''
coins = unique(coins)
if len(coins)==1: # Only one coin, therefore just check divisibility
return 1 if not target%coins[0] else 0
combinations = 0
while target >= 0:
combinations += find_combinations(target, coins[:-1])
target -= coins[-1]
return combinations
coins = [2,1,5,10,20,50,100,200] # <- Now works
amount = 200
print(find_combinations(amount, coins))
I am trying to minimize a function of 3 input variables using scipy. The function reads like so-
def myfunc(x):
x[0] = a
x[1] = b
x[2] = c
n = f(a,b,c)
return n
bound1 = (80,100)
bound2 = (10,20)
bound3 = (312,740)
guess = [a0,b0,c0]
bds = (bound1,bound2,bound3)
result = minimize(myfunc, guess,method='L-BFGS-B',bounds=bds)
The function I am trying to currently run reaches a minimum at a=100,b=10,c=740, which is at the end of the bounds.
The minimize function keeps trying to iterate past the end of bound 3 (gets to c0 value of 740.0000000149012 on its last iteration.
Is there any way to stop this from happening? i.e. stop the iteration at the actual end of my bound?
This happens due to numerical-differentiation, which itself is not only needed to infer the step-direction and size, but also to reason about termination.
In general you can't do much without being very careful in regards to whatever solver (and there are many backend-solvers) being used. The basic idea is to replace the automatic numerical-differentiation with one provided by you: this one then respects those bounds and must be careful about the solvers-internals, e.g. "how to reason about termination at this end".
Fix A:
Your problem should vanish automatically when using: Pull-request #10673, which touches your configuration: L-BFGS-B.
It seems, this PR is not part of the current release SciPy 1.4.1 (as this was 2 months before the PR).
See also #6026, where a milestone of 1.5.0 is mentioned in regards to some changes including respecting bounds in num-diff.
For above PR, you will need to install scipy from the sources, which is:
quite doable on linux (and maybe os x)
not something you should try on windows!
trust me...
See the documentation if needed.
Fix B:
Apart from that, as you are doing unconstrained-optimization (with variable-bounds) where more solver-backends are available (compared to constrained-optimization), you might try another solver, trust-constr, which has explicit support for this, see #9098.
Be careful to recognize, that you need to signal this explicitly when setting up the bounds!
I have a hierarchical logit that has observations over time. Following Carter 2010, I have included a time, time^2, and time^3 term. The model mixes using Metropolis or NUTS before I add the time variables. HamiltonianMC fails. NUTS and Metropolis also work with time. But NUTS and Metropolis fail with time^2 and time^3, but they fail differently and in a puzzling way. However, unlike in other models that fail for more obvious model specification reasons, ADVI still gives an estimate, (and ELBO is not inf).
NUTS either stalls early (last time it made it to 60 iterations), or progresses too quickly and returns an empty traceplot with an error about varnames.
Metropolis errors out immediately with a dimension mismatch error. It looks like the one in this github error, but I'm using a Bernoulli outcome, not a negative binomial. The end of the error looks like: ValueError: Input dimension mis-match. (input[0].shape[0] = 1, input[4].shape[0] = 18)
I get an empty trace when I try HamiltonianMC. It returns the starting values with no mixing
ADVI gives me a mean and a standard deviation.
I increased the ADVI iterations by a lot. It gave pretty close to the same starting points and NUTS still failed.
I double checked that the fix for the github issue is in place in the version of pymc3 I'm running. It is.
My intuition is that this has something to do with how huge the time^2 and time^3 variables get, since I'm looking over a large time-frame. Time^3 starts at 0 and goes to 64,000.
Here is what I've tried for sampling so far. Note that I have small sample sizes while testing, since it takes so long to run (if it finishes) and I'm just trying to get it to sample at all. Once I find one that works, I'll up the number of iterations
with my_model:
mu,sds,elbo = pm.variational.advi(n=500000,learning_rate=1e-1)
print(mu['mu_b'])
step = pm.NUTS(scaling=my_model.dict_to_array(sds)**2,
is_cov=True)
my_trace = pm.sample(500,
step=step,
start=mu,
tune=100)
I've also done the above with tune=1000
I've also tried a Metropolis and Hamiltonian.
with my_model:
my_trace = pm.sample(5000,step=pm.Metropolis())
with my_model:
my_trace = pm.sample(5000,step=pm.HamiltonianMC())
Questions:
Is my intuition about the size and spread of the time variables reasonable?
Are there ways to sample square and cubed terms more effectively? I've searched, but can you perhaps point me to a resource on this so I can learn more about it?
What's up with Metropolis and the dimension mismatch error?
What's up with the empty trace plots for NUTS? Usually when it stalls, the trace up until the stall works.
Are there alternative ways to handle time that might be easier to sample?
I haven't posted a toy model, because it's hard to replicate without the data. I'll add a toy model once I replicate with simulated data. But the actual model is below:
with pm.Model() as my_model:
mu_b = pm.Flat('mu_b')
sig_b = pm.HalfCauchy('sig_b',beta=2.5)
b_raw = pm.Normal('b_raw',mu=0,sd=1,shape=n_groups)
b = pm.Deterministic('b',mu_b + sig_b*b_raw)
t1 = pm.Normal('t1',mu=0,sd=100**2,shape=1)
t2 = pm.Normal('t2',mu=0,sd=100**2,shape=1)
t3 = pm.Normal('t3',mu=0,sd=100**2,shape=1)
est =(b[data.group.values]* data.x.values) +\
(t1*data.t.values)+\
(t2*data.t2.values)+\
(t3*data.t3.values)
y = pm.Bernoulli('y', p=tt.nnet.sigmoid(est), observed = data.y)
BREAKTHROUGH 1: Metropolis error
Weird syntax issue. Theano seemed to be confused about a model with both constant and random effects. I created a constant in data equal to 0, data['c']=0 and used it as an index for the time, time^2 and time^3 effects, as follows:
est =(b[data.group.values]* data.x.values) +\
(t1[data.c.values]*data.t.values)+\
(t2[data.c.values]*data.t2.values)+\
(t3[data.c.values]*data.t3.values)
I don't think this is the whole issue, but it's a step in the right direction. I bet this is why my asymmetric specification didn't work, and if so, suspect it may sample better.
UPDATE: It sampled! Will now try some of the suggestions for making it easier on the sampler, including using the specification suggested here. But at least it's working!
Without having the dataset to play with it is hard to give a definite answer, but here is my best guess:
To me, it is a bit unexpected to hear about the third order polynomial in there. I haven't read the paper, so I can't really comment on it, but I think this might be the reason for your problems. Even very small values for t3 will have a huge influence on the predictor. To keep this reasonable, I'd try to change the parametrization a bit: First make sure that your predictor is centered (something like data['t'] = data['t'] - data['t'].mean() and after that define data.t2 and data.t3). Then try to set a more reasonable prior on t2 and t3. They should be pretty small, so maybe try something like
t1 = pm.Normal('t1',mu=0,sd=1,shape=1)
t2 = pm.Normal('t2',mu=0,sd=1,shape=1)
t2 = t2 / 100
t3 = pm.Normal('t3',mu=0,sd=1,shape=1)
t3 = t3 / 1000
If you want to look at other models, you could try to model your predictor as a GaussianRandomWalk or a Gaussian Process.
Updating pymc3 to the latest release candidate should also help, the sampler was improved quit a bit.
Update I just noticed you don't have an intercept term in your model. Unless there is a good reason for that you probably want to add
intercept = pm.Flat('intercept')
est = (intercept
+ b[..] * data.x
+ ...)
This sounds like it should be blindingly obvious but there's something quirky going on.
So I have two scripts. Largely the same but a few variances here and there. They both run big loops and they both use global variables. Yes, I know this is bad. Tidying it up is on my to do list but I'm on a time constraint at the moment.
One script works fine. As expected.
The other works fine... for the most part. Then it bugs out once it iterates through the loop again. It's very strange because the code has very little differences but I did pick up on something which I can't explain.
The variable I use to track my position in the loop is set like this in script 1 (which works flawlessly):
global CurrentMatchScan
while True:
print "=Starting Loop="
Loopcounter = 0
CurrentMatchScan = -1
LoadMatch()
The second script has:
global CurrentMatchScan
while True:
print "=Starting Loop="
Loopcounter = 0
CurrentMatchScan = -1
LoadMatch()
However in the second script PyCharm highlights the = -1 part as
Redclared'CurrentMatchScan' defined above without usage
So the obvious assumption is something further up in the code is using it. I do call a function later on which is placed up there...
def LoadMatch():
nextbox1 = LinkList[CurrentMatchScan + 1]
nextbox2 = LinkList[CurrentMatchScan + 2]
But this is only called once CurrentMatchScan is set... and the first script has the exact same code.
Is my IDE just not noticing it on the other script or something? There's a pretty big issue with the looping on the second one and this is the only difference I can see between the two.
I know this doesn't look like a lot to go on but there's not much else to it that I can see. It's barely actually referenced.
I'd really appreciate anyone who can point out how much of an idiot I'm being and missing something really simple.
/edit:
Based on the fact that it does use CurrentMatchScan and calls LoadMatch() for the first iterations properly I'm starting to doubt this is anything but PyCharm trying to warn me I'm potentially doing something silly.
I think if this was the issue it wouldn't work at all so the warning might be a bit of a red herring when it comes to the issue I'm actually facing.