I always run code on Jupyter, and everything is good, but recently it takes a lot of time to execute some code, as example :
corr = train.corr()
highest_corr = corr[corr.index[abs(corr["claim"])>0.006]]
or even just :
train = pd.read_csv('Desktop\\train.csv')
So what is the problem? and how to solve it?
PS: I saw in another place that I should not use large values for :
pd.set_option('display.max_column', 120)
So I removed it, but it remains the same problem.
Related
I'm new here and this is my first question. Therefore, please forgive me if not everything is clearly formulated. Feedback on that is welcome as well.
I try to execute a backtesting using the bt.algos.
As an example I take two ETF with a predefinded allocation (80 to 20)
data_bt = bt.get("spy,agg", start="2010-01-01")
weights=data_bt.copy()
weights["spy"]=0.2
weights["agg"]=0.8
tolerance=0.05
My algo looks like:
s = bt.Strategy("s1",[bt.algos.SelectAll(),
bt.algos.RunIfOutOfBounds(tolerance),
bt.algos.WeighTarget(weights),
bt.algos.Rebalance()])
It runs all smooth and I can create a backtest:
test = bt.Backtest(s,data_bt)
res=bt.run(test)
res.display()
The result (e.g. Total return) is always the same, regardless if the tolerance is 0.5,0.05,0.005 or anything else.
Where is my mistake?
I'm debugging my code right now and since it's running with some datas and not with other ones, I wanted to set the 'max_iters' option to 1 to see if it works in only 1 iteration or if it needs more. I realised it doesn't seem to even use it. I tried putting a string "hello" instead of an int and it even worked. Do someone knows if it's a known problem?
self.prob.solve(solver="GLPK_MI", max_iters=1)
I'm using the CVXPY module with CVXOPT.
EDIT:
I want to do this because I don't get an error, it just continues to run forever. And with the project I'm working on it can take a lot of time to run so I wonder if it's really not working or if it's just a question of time.
Wouldn't be better if you set the max iterations as a variable? (just a suggestion)
In any case, in CVXOPT you need to set the max number of iteration as
'maxiters' : 1
or you can set it as a variable and then call solver as per below
opts = {'maxiters' : 1}
self.prob.solve(solver="GLPK_MI", options = opts)
i have written lines of code that run normally in jupyter notebook (here is the code below):
umd_departments = requests.get("https://api.umd.io/v0/courses/departments")
umd_departments_list = umd_departments.json()
umd_departments_list2 = json.dumps(umd_departments_list, indent=1)
department_storage = [department['dept_id'] for department in umd_departments_list]
print(department_storage)
the code above usually gives me the output i want and prints it out immediately. the issue i run into is when i try and put the code above into a function of its own it doesn't work
def get_UMD_departments():
umd_departments = requests.get("https://api.umd.io/v0/courses/departments")
umd_departments_list = umd_departments.json()
umd_departments_list2 = json.dumps(umd_departments_list, indent=1)
department_storage = [department['dept_id'] for department in umd_departments_list]
print(department_storage)
the problem i face with this version of code is that it never prints out anything as oppose to the other code i showed. with this code, i also don't get an error when i run it since the * symbol doesn't show up and i don't get an error message. so i'm not sure what the problem is. it usually just shows how many times i ran the cell. i was wondering if anyone knew a way to make the function version of my code to work, greatly appreciated.
I have set up a problem with Pyomo that optimises the control strategy of a CHP unit, which is laid out (very roughly) in the following way:
class Problem
def OptiControl
<Pyomo Concrete model formulation>
return (model.obj.value())
The problem class is there because there are several control strategies methods that I'm investigating, and so I call these by using (for example) b = problem1.OptiControl().
The problem is that if I try to return the value of the objective my script gets stuck and when I Ctrl+C out of it I get (' Signal', 2, 'recieved, but no process queued'). Also if I write model.write() the script ends normally but nothing is displayed in IPython. Trying to print model.obj.value() also doesn't work.
I assume this has something to do with the fact that I'm calling Pyomo in a function because the model worked succesfully before, but I don't know how to get round this.
EDIT: writing the values to a file also doesn't work. If it helps, this is the excerpt of my code where I solve the model:
opt = SolverFactory("glpk") # Choose solver
solution = opt.solve(model) # Solve model
model.write()
with open('ffs.txt','w') as f:
model.obj.expr()
for t in model.P:
f.write(model.f["CHP",t].value)
I ended up working around this problem by re-writing my model as an AbstractModel, writing my data to a .dat file and reading that (clunky, but I need results now so it had to be done). For some reason it works now and does everything I expect it would do.
It also turned out my problem was infeasible which might have added to my problems, but at least with this method I could use results.write() and I was able to find that out which wasn't the case before.
I am trying to speed up a python function using numba, however I cannot seem to make it compile.
The input for my function is a 27x4 array of type np.int32.
My function is:
#nb.jit(nopython=True)
def edge_profile(input):
pos = input[:,:3]
val = input[:,3]
centre = np.mean(pos,axis=0).astype(np.int32)
diff = np.absolute(pos-centre).sum(axis=1)
cell_edge = np.zeros(3)
for i in range(3):
idx = np.where(diff==i+1)[0]
idy = np.where(val[idx]==1)[0]
cell_edge[i] = len(idy)
return cell_edge.astype(np.int32)
However this produces an extremely large error message which I have unable to use to diagnose the problem. I have tried specifying the input types as follows:
#nb.jit(nb.int32[:](nb.int32[:,:]))
def ...
however this produces an equally large error message.
I fell that I am probably using some function/feature that is not supported in numba, but I do not know enough about it to identify the problem. Any help would be greatly appreciated.
Numba should work ok so long as you stick to basic lists and arrays in the function you want to speed up. It appears that you are already using functions from numpy that are probably already well optimized. So its unlikely you will see a speed up even if you did get it to work. You haven't mentioned what your OS is. Under ubuntu 14.04 you can get it to work through some steps outlined here.