I've recently started trying out moving horizon estimation with GEKKO. My specified manipulated variables are used in a heat balance equation within my model, and I am having some issues with the matrix operations in the model.
Example code:
from gekko import GEKKO
import numpy as np
#creating a sample array of input values
nt = 51
u_meas = np.zeros(nt)
u_meas[3:10] = 1.0
u_meas[10:20] = 2.0
u_meas[20:40] = 0.5
u_meas[40:] = 3.0
p = GEKKO(remote=False)
p.time = np.linspace(0,10,nt)
n = 1 #process model order
#designating u as my input, and that I'm going to be using these measurements to estimate my parameters with MHE
p.u = p.MV(value=u_meas)
p.u.FSTATUS=1
#parameters I'm looking to modulate
p.K = p.FV(value=1, lb = 1, ub = 3) #gain
p.tau = p.FV(value=5, lb = 1, ub = 10) #time constant
p.x = [p.Intermediate(p.u)]
#constants within the model that do not change
X_O2 = 0.5
X_SiO2 = 0.25
X_N2 = 0.1
m_feed = 100
#creating an array with my feed separated into components. This creates a 1D array with the individual feed streams of my components.
mdot_F_i = (np.tile(m_feed,3)*np.array([X_O2, X_SiO2, X_N2])
#at this point, I want to add my MV values to the end of my component feed array for later heat and mass balance equations. Normally, in my previous model without MHE, I would put
mdot_c_i = np.concatenate(mdot_F_i, x, (other MV variables after))
However, now that u is a specified MV in GEKKO, and not a set value, I get an error at the mdot_c_i line that says that the array at index 0 has 1 dimension, and the array at index 1 has 2 dimensions.
I'm guessing that I have to specify mdot_c_i as an intermediate variable within Gekko. I've tried a couple different variations, alternately specifying mdot_c_i as an intermediate and trying to use only the values of the MV; however, I keep getting that error.
Has anyone experiences similar issues to this?
Thank you!
You can resolve this by using np.append() instead of np.concatenate(). Try something like:
mdot_c_i = np.append(mdot_F_i, p.u)
Here is a minimum and complete example if you'd like to try it.
import numpy as np
from gekko import GEKKO
m = GEKKO(remote=False)
x = m.Array(m.Var,3,lb=-10,ub=10)
y = m.Var(5,lb=-5,ub=5)
z = np.append(x,y)
m.Minimize(np.dot([1,1,-1,1],z))
m.solve(disp=False)
print([zi.value[0] for zi in z])
# solution: [-10.0, -10.0, 10.0, -5.0]
Gekko variables need to be stored as objects, not as numerical values. The error may be because the np.concatenate() function is trying to access the length of the Gekko manipulated variable data p.u.value to concatenate those values instead of concatenating p.u as an object.
Related
I want to implement MLE (Maximum likelihood estimation) with gekko package in python. Suppose that we have a DataFrame that contains two columns: ['Loss', 'Target'] and it length is equal to 500.
First we have to import packages that we need:
from gekko import GEKKO
import numpy as np
import pandas as pd
Then we simply create the DataFrame like this:
My_DataFrame = pd.DataFrame({"Loss":np.linspace(-555.795 , 477.841 , 500) , "Target":0.0})
My_DataFrame = My_DataFrame.sort_values(by=["Loss"] , ascending=False).reset_index(drop=True)
My_DataFrame
It going to be look like this:
Some components of [‘Target’] column should be calculated with a formula that I wrote it right down below in the picture(and the rest of them remains zero. I explained more in continue, please keep reading) so you can see it perfectly. Two main elements of formula are ‘Kasi’ and ‘Betaa’. I want to find best value for them that maximize sum of My_DataFrame[‘Target’]. So you got the idea and what is going to happen!
Now let me show you how I wrote the code for this purpose. First I define my objective function:
def obj_function(Array):
"""
[Purpose]:
+ it will calculate each component of My_DataFrame["Target"] column! then i can maximize sum(My_DataFrame["Target"]) and find best 'Kasi' and 'Betaa' for it!
[Parameters]:
+ This function gets Array that contains 'Kasi' and 'Betaa'.
Array[0] represents 'Kasi' and Array[1] represents 'Betaa'
[returns]:
+ returns a pandas.series.
actually it returns new components of My_DataFrame["Target"]
"""
# in following code if you don't know what is `qw`, just look at the next code cell right after this cell (I mean next section).
# in following code np.where(My_DataFrame["Loss"] == item)[0][0] is telling me the row's index of item.
for item in My_DataFrame[My_DataFrame["Loss"]>160]['Loss']:
My_DataFrame.iloc[np.where(My_DataFrame["Loss"] == item)[0][0] , 1] = qw.log10((1/Array[1])*( 1 + (Array[0]*(item-160)/Array[1])**( (-1/Array[0]) - 1 )))
return My_DataFrame["Target"]
if you got confused what's happening in for loop in obj_function function, check picture below, it contains a brief example! and if not, skip this part :
Then just we need to go through optimization. I use gekko package for this purpose. Note that I want to find best values of ‘Kasi’ and ‘Betaa’ so I have two main variables and I don’t have any kind of constraints!
So let’s get started:
# i have 2 variables : 'Kasi' and 'Betaa', so I put nd=2
nd = 2
qw = GEKKO()
# now i want to specify my variables ('Kasi' and 'Betaa') with initial values --> Kasi = 0.7 and Betaa = 20.0
x = qw.Array(qw.Var , nd , value = [0.7 , 20])
# So i guess now x[0] represents 'Kasi' and x[1] represents 'Betaa'
qw.Maximize(np.sum(obj_function(x)))
And then when I want to solve the optimization with qw.solve():
qw.solve()
But i got this error:
Exception: This steady-state IMODE only allows scalar values.
How can I fix this problem? (Complete script gathered in next section for the purpose of convenience)
from gekko import GEKKO
import numpy as np
import pandas as pd
My_DataFrame = pd.DataFrame({"Loss":np.linspace(-555.795 , 477.841 , 500) , "Target":0.0})
My_DataFrame = My_DataFrame.sort_values(by=["Loss"] , ascending=False).reset_index(drop=True)
def obj_function(Array):
"""
[Purpose]:
+ it will calculate each component of My_DataFrame["Target"] column! then i can maximize sum(My_DataFrame["Target"]) and find best 'Kasi' and 'Betaa' for it!
[Parameters]:
+ This function gets Array that contains 'Kasi' and 'Betaa'.
Array[0] represents 'Kasi' and Array[1] represents 'Betaa'
[returns]:
+ returns a pandas.series.
actually it returns new components of My_DataFrame["Target"]
"""
# in following code if you don't know what is `qw`, just look at the next code cell right after this cell (I mean next section).
# in following code np.where(My_DataFrame["Loss"] == item)[0][0] is telling me the row's index of item.
for item in My_DataFrame[My_DataFrame["Loss"]>160]['Loss']:
My_DataFrame.iloc[np.where(My_DataFrame["Loss"] == item)[0][0] , 1] = qw.log10((1/Array[1])*( 1 + (Array[0]*(item-160)/Array[1])**( (-1/Array[0]) - 1 )))
return My_DataFrame["Target"]
# i have 2 variables : 'Kasi' and 'Betaa', so I put nd=2
nd = 2
qw = GEKKO()
# now i want to specify my variables ('Kasi' and 'Betaa') with initial values --> Kasi = 0.7 and Betaa = 20.0
x = qw.Array(qw.Var , nd)
for i,xi in enumerate([0.7, 20]):
x[i].value = xi
# So i guess now x[0] represents 'Kasi' and x[1] represents 'Betaa'
qw.Maximize(qw.sum(obj_function(x)))
proposed potential script is here:
from gekko import GEKKO
import numpy as np
import pandas as pd
My_DataFrame = pd.read_excel("[<FILE_PATH_IN_YOUR_MACHINE>]\\Losses.xlsx")
# i'll put link of "Losses.xlsx" file in the end of my explaination
# so you can download it from my google drive.
loss = My_DataFrame["Loss"]
def obj_function(x):
k,b = x
target = []
for iloss in loss:
if iloss>160:
t = qw.log((1/b)*(1+(k*(iloss-160)/b)**((-1/k)-1)))
target.append(t)
return target
qw = GEKKO(remote=False)
nd = 2
x = qw.Array(qw.Var,nd)
# initial values --> Kasi = 0.7 and Betaa = 20.0
for i,xi in enumerate([0.7, 20]):
x[i].value = xi
# bounds
k,b = x
k.lower=0.1; k.upper=0.8
b.lower=10; b.upper=500
qw.Maximize(qw.sum(obj_function(x)))
qw.options.SOLVER = 1
qw.solve()
print('k = ',k.value[0])
print('b = ',b.value[0])
python output:
objective function = -1155.4861315885942
b = 500.0
k = 0.1
note that in python output b is representing "Betaa" and k is representing "Kasi".
output seems abit strange, so i decide to test it! for this purpose I used Microsoft Excel Solver!
(i put the link of excel file at the end of my explaination so you can check it out yourself if
you want.) as you can see in picture bellow, optimization by excel has been done and optimal solution
has been found successfully (see picture bellow for optimization result).
excel output:
objective function = -108.21
Betaa = 32.53161
Kasi = 0.436246
as you can see there is huge difference between python output and excel output and seems that excel is performing pretty well! so i guess problem still stands and proposed python script is not performing well...
Implementation_in_Excel.xls file of Optimization by Microsoft excel application is available here.(also you can see the optimization options in Data tab --> Analysis --> Slover.)
data that used for optimization in excel and python are same and it's available here (it's pretty simple and contains 501 rows and 1 column).
*if you can't download the files, let me know then I'll update them.
The initialization is applying the values of [0.7, 20] to each parameter. A scalar should be used to initialize value instead such as:
x = qw.Array(qw.Var , nd)
for i,xi in enumerate([0.7, 20]):
x[i].value = xi
Another issue is that gekko needs to use special functions to perform automatic differentiation for the solvers. For the objective function, switch to the gekko version of summation as:
qw.Maximize(qw.sum(obj_function(x)))
If loss is computed by changing the values of x then the objective function has logical expressions that need special treatment for solution with gradient-based solvers. Try using the if3() function for a conditional statement or else slack variables (preferred). The objective function is evaluated once to build a symbolic expressions that are then compiled to byte-code and solved with one of the solvers. The symbolic expressions are found in m.path in the gk0_model.apm file.
Response to Edit
Thanks for posting an edit with the complete code. Here is a potential solution:
from gekko import GEKKO
import numpy as np
import pandas as pd
loss = np.linspace(-555.795 , 477.841 , 500)
def obj_function(x):
k,b = x
target = []
for iloss in loss:
if iloss>160:
t = qw.log((1/b)*(1+(k*(iloss-160)/b)**((-1/k)-1)))
target.append(t)
return target
qw = GEKKO(remote=False)
nd = 2
x = qw.Array(qw.Var,nd)
# initial values --> Kasi = 0.7 and Betaa = 20.0
for i,xi in enumerate([0.7, 20]):
x[i].value = xi
# bounds
k,b = x
k.lower=0.6; k.upper=0.8
b.lower=10; b.upper=30
qw.Maximize(qw.sum(obj_function(x)))
qw.options.SOLVER = 1
qw.solve()
print('k = ',k.value[0])
print('b = ',b.value[0])
The solver reaches bounds at the solution. The bounds may need to be widened so that arbitrary limits are not the solution.
Update
Here is a final solution. That objective function in code had a problem so it should be fixed Here is the correct script:
from gekko import GEKKO
import numpy as np
import pandas as pd
My_DataFrame = pd.read_excel("<FILE_PATH_IN_YOUR_MACHINE>\\Losses.xlsx")
loss = My_DataFrame["Loss"]
def obj_function(x):
k,b = x
q = ((-1/k)-1)
target = []
for iloss in loss:
if iloss>160:
t = qw.log(1/b) + q* ( qw.log(b+k*(iloss-160)) - qw.log(b))
target.append(t)
return target
qw = GEKKO(remote=False)
nd = 2
x = qw.Array(qw.Var,nd)
# initial values --> Kasi = 0.7 and Betaa = 20.0
for i,xi in enumerate([0.7, 20]):
x[i].value = xi
qw.Maximize(qw.sum(obj_function(x)))
qw.solve()
print('Kasi = ',x[0].value)
print('Betaa = ',x[1].value)
Output:
The final value of the objective function is 108.20609317143486
---------------------------------------------------
Solver : IPOPT (v3.12)
Solution time : 0.031200000000000006 sec
Objective : 108.20609317143486
Successful solution
---------------------------------------------------
Kasi = [0.436245842]
Betaa = [32.531632983]
Results are close to the optimization result from Microsoft Excel.
qw.Maximize() only sets the objective of the optimization, you still need to call solve() on your model.
If I can see correctly, My_DataFrame has been defined in the global scope.
The problem is that the obj_funtion tries to access it (successful) and then, modify it's value (fails)
This is because you can't modify global variables from a local scope by default.
Fix:
At the beginning of the obj_function, add a line:
def obj_function(Array):
# comments
global My_DataFrame
for item .... # remains same
This should fix your problem.
Additional Note:
If you just wanted to access My_DataFrame, it would work without any errors and you don't need to add the global keyword
Also, just wanted to appreciate the effort you put into this. There's a proper explanation of what you want to do, relevant background information, an excellent diagram (Whiteboard is pretty great too), and even a minimal working example.
This should be how all SO questions are, it would make everyone's lives easier
I am solving multiple optimization problems with the Gekko optimization suite. I generate random numbers for three variables (book3.csv) and import them to python as initial values and parameters. I run optimization solver and collect the results. It works well. However, when I tried this about 20 times to check, I received an error message of Exception: #error: Solution not Found several times. It stops at the same location: ...gekko.py line 2174, in solve raise Exception(response)
I collected the value of control variables, u, at each round of optimization, i, at each time points, j, and saved it at a dataframe, utot, in order to track all the decisions made at each time for each round. I noticed that the last column (at the last time points) of utot has some positive values, which I think shouldn't be. I intentionally put Pe0=0, so that u=0 is optimum at all time points, but still the last column of utot has some positive value. I am not sure what am I missing here.
from gekko import GEKKO
import pandas as pd
import numpy as np
from numpy import random
file_path='./Book3.csv'
d=pd.read_csv("Book3.csv")
dftemp = pd.DataFrame(data=d)
na=len(dftemp)
# time points
n=51
year=50
# constants
Pa0 = 1
Pe0 = 0
C = 10
r = 0.05
k=50
shift=100
ll=0.5*random.rand(n)
index = range(0, na)
columns = range(0, n)
columns1 = range(0, 1)
#Dataframe to collect results
Atot=pd.DataFrame(index=index,columns=columns)
Etot=pd.DataFrame(index=index,columns=columns)
utot=pd.DataFrame(index=index,columns=columns)
PVtot=pd.DataFrame(index=index,columns=columns1)
for i in range(0,na):
# create GEKKO model
m = GEKKO(remote=False)
m.time = np.linspace(0,year,n)
t=m.time
A0=dftemp.loc[i][0]
h=dftemp.loc[i][1]
emax=dftemp.loc[i][2]
u = m.MV(value=0,lb=0, ub=emax)
u.STATUS = 1
u.DCOST = 0
A = m.SV(value=A0)
E = m.SV(value=0)
t = m.Param(value=m.time)
Pe = m.Var(value=Pe0)
d = m.Var(value=1)
l = m.Param(value=ll)
# Equation
m.Equation(A.dt()==-u)
m.Equation(E.dt()==u)
m.Equation(Pe.dt()==-Pe/k)
m.Equation(d==m.exp(-t*r))
m.Equation(A>=0)
# Objective (Utility)
J = m.Var(value=0)
# Final objective
Jf = m.FV()
Jf.STATUS = 1
m.Connection(Jf,J,pos2='end')
m.Equation(J.dt() == m.log(A+E*(1-l))*h*Pa0-r*C*u+E*Pe+shift)*d
# maximize U
m.Maximize(Jf)
# options
m.options.IMODE = 6 # optimal control
m.options.NODES = 3 # collocation nodes
m.options.SOLVER = 3 # solver (IPOPT)
# solve optimization problem
m.solve()
# print profit
print('Optimal Profit: ' + str(Jf.value[0]))
for j in range(0,n):
Atot.loc[i][j]=A.value[j]
Etot.loc[i][j]=E.value[j]
utot.loc[i][j]=u.value[j]
PVtot.loc[i][0]=Jf.value[0]
print(PVtot.sum())
Try setting m.options.TIME_SHIFT=0 to avoid advancing in time if the initial conditions should remain the same. The default is m.options.TIME_SHIFT=1 where all of the prior values are shifted left and the second horizon point becomes the first time point. This is the default for applications that are running in real-time to a physical process. Each cycle, the simulation or optimal control problem advances one time step ahead.
The last point in the horizon sometimes has no effect on the objective function so the solver isn't forced to move it to a particular value. This can be checked by fixing the last value with m.fix_final(utot,100) and see if this changes the objective function value. If the objective function value doesn't change then the solver will not find a unique solution for this value. If you'd like to make it unique then include an additional objective such as m.Minimize(1e-5 * utot**2).
When submitting questions, please include all of the files needed to reproduce the behavior. Here are some tips on creating a Minimal, Reproducible example.
I'm trying to plot a simple moving averages function but the resulting array is a few numbers short of the full sample size. How do I plot such a line alongside a more standard line that extends for the full sample size? The code below results in this error message:
ValueError: x and y must have same first dimension, but have shapes (96,) and (100,)
This is using standard matplotlib.pyplot. I've tried just deleting X values using remove and del as well as switching all arrays to numpy arrays (since that's the output format of my moving averages function) then tried adding an if condition to the append in the while loop but neither has worked.
import random
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
def movingaverage(values, window):
weights = np.repeat(1.0, window) / window
smas = np.convolve(values, weights, 'valid')
return smas
sampleSize = 100
min = -10
max = 10
window = 5
vX = np.array([])
vY = np.array([])
x = 0
val = 0
while x < sampleSize:
val += (random.randint(min, max))
vY = np.append(vY, val)
vX = np.append(vX, x)
x += 1
plt.plot(vX, vY)
plt.plot(vX, movingaverage(vY, window))
plt.show()
Expected results would be two lines on the same graph - one a simple moving average of the other.
Just change this line to the following:
smas = np.convolve(values, weights,'same')
The 'valid' option, only convolves if the window completely covers the values array. What you want is 'same', which does what you are looking for.
Edit: This, however, also comes with its own issues as it acts like there are extra bits of data with value 0 when your window does not fully sit on top of the data. This can be ignored if chosen, as is done in this solution, but another approach is to pad the array with specific values of your choosing instead (see Mike Sperry's answer).
Here is how you would pad a numpy array out to the desired length with 'nan's (replace 'nan' with other values, or replace 'constant' with another mode depending on desired results)
https://docs.scipy.org/doc/numpy/reference/generated/numpy.pad.html
import numpy as np
bob = np.asarray([1,2,3])
alice = np.pad(bob,(0,100-len(bob)),'constant',constant_values=('nan','nan'))
So in your code it would look something like this:
import random
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
def movingaverage(values,window):
weights = np.repeat(1.0,window)/window
smas = np.convolve(values,weights,'valid')
shorted = int((100-len(smas))/2)
print(shorted)
smas = np.pad(smas,(shorted,shorted),'constant',constant_values=('nan','nan'))
return smas
sampleSize = 100
min = -10
max = 10
window = 5
vX = np.array([])
vY = np.array([])
x = 0
val = 0
while x < sampleSize:
val += (random.randint(min,max))
vY = np.append(vY,val)
vX = np.append(vX,x)
x += 1
plt.plot(vX,vY)
plt.plot(vX,(movingaverage(vY,window)))
plt.show()
To answer your basic question, the key is to take a slice of the x-axis appropriate to the data of the moving average. Since you have a convolution of 100 data elements with a window of size 5, the result is valid for the last 96 elements. You would plot it like this:
plt.plot(vX[window - 1:], movingaverage(vY, window))
That being said, your code could stand to have some optimization done on it. For example, numpy arrays are stored in fixed size static buffers. Any time you do append or delete on them, the entire thing gets reallocated, unlike Python lists, which have amortization built in. It is always better to preallocate if you know the array size ahead of time (which you do).
Secondly, running an explicit loop is rarely necessary. You are generally better off using the under-the-hood loops implemented at the lowest level in the numpy functions instead. This is called vectorization. Random number generation, cumulative sums and incremental arrays are all fully vectorized in numpy. In a more general sense, it's usually not very effective to mix Python and numpy computational functions, including random.
Finally, you may want to consider a different convolution method. I would suggest something based on numpy.lib.stride_tricks.as_strided. This is a somewhat arcane, but very effective way to implement a sliding window with numpy arrays. I will show it here as an alternative to the convolution method you used, but feel free to ignore this part.
All in all:
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
def movingaverage(values, window):
# this step creates a view into the same buffer
values = np.lib.stride_tricks.as_strided(values, shape=(window, values.size - window + 1), strides=values.strides * 2)
smas = values.sum(axis=0)
smas /= window # in-place to avoid temp array
return smas
sampleSize = 100
min = -10
max = 10
window = 5
v_x = np.arange(sampleSize)
v_y = np.cumsum(np.random.random_integers(min, max, sampleSize))
plt.plot(v_x, v_y)
plt.plot(v_x[window - 1:], movingaverage(v_y, window))
plt.show()
A note on names: in Python, variable and function names are conventionally name_with_underscore. CamelCase is reserved for class names. np.random.random_integers uses inclusive bounds just like random.randint, but allows you to specify the number of samples to generate. Confusingly, np.random.randint has an exclusive upper bound, more like random.randrange.
I am trying to define a multivariate custom distribution through pymc3.DensityDist(); however, I keep getting the following error that dimensions do not match:
"LinAlgError: 0-dimensional array given. Array must be two-dimensional"
I have already seen https://github.com/pymc-devs/pymc3/issues/535 but I could not find the answer to my question. Just for clarity, here is my simple example
import numpy as np
import pymc3 as pm
def pdf(x):
y = 0
print(x)
sigma = np.identity(2)
isigma = sigma
mu = np.array([[1,2],[3,4]])
for i in range(2):
x0 = x- mu[i,:]
xsinv = np.linalg.multi_dot([x0,isigma,x0])
y = y + np.exp(-0.5*xsinv)
return y
logp = lambda x: np.log(pdf(x))
with pm.Model() as model:
pm.DensityDist('x',logp, shape=2)
step = pm.Metropolis(tune=False, S=np.identity(2))
trace = pm.sample(100000, step=step, chain=1, tune=0,progressbar=False)
result = trace['x']
In this simple code I want to define an unnormilized pdf function, which is sum of two unnormalized normal distributions, and take samples from this pdf through Metropolis algorithm.
Thanks,
Try replacing numpy for theano in the following lines:
xsinv = tt.dot(tt.dot(x0, isigma), x0)
y = y + tt.exp(-0.5 * xsinv)
as a side note, try using NUTS instead of metropolis and let PyMC3 choose the sampling method for you, just do
trace = pm.sample(1000)
For future reference you can also ask questions here
Firstly, there are a few topics on this but they involve deprecated packages with pandas etc. Suppose I'm trying to predict a variable w with variables x,y and z. I want to run a multiple linear regression to try and predict w. There are quite a few solutions that will produce the coefficients but I'm not sure how to use these. So, in pseudocode;
import numpy as np
from scipy import stats
w = np.array((1,2,3,4,5,6,7,8,9,10)) # Time series I'm trying to predict
x = np.array((1,3,6,1,4,6,8,9,2,2)) # The three variables to predict w
y = np.array((2,7,6,1,5,6,3,9,5,7))
z = np.array((1,3,4,7,4,8,5,1,8,2))
def model(w,x,y,z):
# do something!
return guess # where guess is some 10 element array formed
# using multiple linear regression of x,y,z
guess = model(w,x,y,z)
r = stats.pearsonr(w,guess) # To see how good guess is
Hopefully this makes sense as I'm new to MLR. There is probably a package in scipy that does all this so any help welcome!
You can use the normal equation method.
Let your equation be of the form : ax+by+cz +d =w
Then
import numpy as np
x = np.asarray([[1,3,6,1,4,6,8,9,2,2],
[2,7,6,1,5,6,3,9,5,7],
[1,3,4,7,4,8,5,1,8,2],
[1,1,1,1,1,1,1,1,1,1]]).T
y = numpy.asarray([1,2,3,4,5,6,7,8,9,10]).T
a,b,c,d = np.linalg.pinv((x.T).dot(x)).dot(x.T.dot(y))
Think I've found out now. If anyone could confirm that this produces the correct results that'd be great!
import numpy as np
from scipy import stats
# What I'm trying to predict
y = [-6,-5,-10,-5,-8,-3,-6,-8,-8]
# Array that stores two predictors in columns
x = np.array([[-4.95,-4.55],[-10.96,-1.08],[-6.52,-0.81],[-7.01,-4.46],[-11.54,-5.87],[-4.52,-11.64],[-3.36,-7.45],[-2.36,-7.33],[-7.65,-10.03]])
# Fit linear least squares and get regression coefficients
beta_hat = np.linalg.lstsq(x,y)[0]
print(beta_hat)
# To store my best guess
estimate = np.zeros((9))
for i in range(0,9):
# y = x1b1 + x2b2
estimate[i] = beta_hat[0]*x[i,0]+beta_hat[1]*x[i,1]
# Correlation between best guess and real values
print(stats.pearsonr(estimate,y))