How to find slope of LinearRegression using sklearn on python? - python

I'm newbie in python and I would like to find slope and intercept using sklearn package. Below is my code.
import numpy as np
from sklearn.linear_model import LinearRegression
def findLinearRegression():
x = [1,2,3,4,5]
y = [5,7,12,9,15]
lrm = LinearRegression()
lrm.fit(x,y)
m = lrm.coef_
c = lrm.intercept_
print(m)
print(c)
I got an error ValueError: Expected 2D array, got 1D array instead. Any advice or guidance on this would be greatly appreciated, Thanks.

You'll need to reshape the x and y series to a 2D array.
Replace the code where you declare x and y with the below code and the function would work the way intended.
x = np.array([1,2,3,4,5]).reshape(-1, 1)
y = np.array([5,7,12,9,15]).reshape(-1, 1)

x should be a column vector
x = np.array([1,2,3,4,5]).reshape(-1,1)

You need to reshape your inputs. Simply replace
x = [1,2,3,4,5]
by
x = np.array([1,2,3,4,5]).reshape(-1, 1)

Related

can anyone please explain why showing the error it saying require 2d but given was 1d but pandas series are 1d right can anyone please explain

my_train_list = {"area": [1000,2000,3000,4000,5000], "price":[1000000,2000000,3000000,4000000,5000000]}
my_df = pd.DataFrame(my_train_list)
my_x = my_df['area']
my_y = my_df.price
my_lin_pre = linear_model.LinearRegression()
my_lin_pre.fit([my_x], my_y)
Can anyone please explain why it saying it the error it saying require 2d array given 1d array but series in pandas are 1d right
can anyone please explain how to solve this?
Scikit-learn estimators and transformers excepect 2D array. Your input should be an array of features (where each feature is an array itself).
You can do as follow to fix your code:
from sklearn.linear_model import LinearRegression
X = my_df[['area']]
y = my_df.price
model = LinearRegression()
model.fit(X, Y)
The trick here is that my_df[['area']] will return a pd.DataFrame (2D) instead of a pd.Series (1D)

Python - y should be a 1d array, got an array of shape instead

Let's consider data :
import numpy as np
from sklearn.linear_model import LogisticRegression
x=np.linspace(0,2*np.pi,80)
x = x.reshape(-1,1)
y = np.sin(x)+np.random.normal(0,0.4,80)
y[y<1/2] = 0
y[y>1/2] = 1
clf=LogisticRegression(solver="saga", max_iter = 1000)
I want to fit logistic regression where y is dependent variable, and x is independent variable. But while I'm using :
clf.fit(x,y)
I see error
'y should be a 1d array, got an array of shape (80, 80) instead'.
I tried to reshape data by using
y=y.reshape(-1,1)
But I end up with array of length 6400! (How come?)
Could you please give me a hand with performing this regression ?
Change the order of your operations:
First geneate x and y as 1-D arrays:
x = np.linspace(0, 2*np.pi, 8)
y = np.sin(x) + np.random.normal(0, 0.4, 8)
Then (after y was generated) reshape x:
x = x.reshape(-1, 1)
Edit following a comment as of 2022-02-20
The source of the problem in the original code is that;
x = np.linspace(0,2*np.pi,80) - generates a 1-D array.
x = x.reshape(-1,1) - reshapes it into a 2-D array, with one column and
as many rows as needed.
y = np.sin(x) + np.random.normal(0,0.4,80) - operates on a columnar array and
a 1-D array (treated here as a single row array).
the effect is that y is a 2-D array (80 * 80).
then the attempt to reshape y gives a single column array with 6400 rows.
The proper solution is that both x and y should be initially 1-D
(single row) arrays and my code does just this.
Then both arrays can be reshaped.
I encountered this error and solving it via reshape but it didn't work
ValueError: y should be a 1d array, got an array of shape () instead.
Actually, this was happening due to the wrong placement of [] brackets around np.argmax, below is the wrong code and correct one, notice the positioning of [] around the np.argmax in both the snippets
Wrong Code
ax[i,j].set_title("Predicted Watch : "+str(le.inverse_transform([pred_digits[prop_class[count]]])) +"\n"+"Actual Watch : "+str(le.inverse_transform(np.argmax([y_test[prop_class[count]]])).reshape(-1,1)))
Correct Code
ax[i,j].set_title("Predicted Watch :"+str(le.inverse_transform([pred_digits[prop_class[count]]]))+"\n"+"Actual Watch : "+str(le.inverse_transform([np.argmax(y_test[prop_class[count]])])))

Cannot reshape numpy array to vector

I am trying to reshape an (N, 1) array d to an (N,) vector. According to this solution and my own experience with numpy, the following code should convert it to a vector:
from sklearn.neighbors import kneighbors_graph
from sklearn.datasets import make_circles
X, labels = make_circles(n_samples=150, noise=0.1, factor=0.2)
A = kneighbors_graph(X, n_neighbors=5)
d = np.sum(A, axis=1)
d = d.reshape(-1)
However, d.shape gives (1, 150)
The same happens when I exactly replicate the code for the linked solution. Why is the numpy array not reshaping?
The issue is that the sklearn functions returned the nearest neighbor graph as a sparse.csr.csr_matrix. Applying np.sum returned a numpy.matrix, a data type that (in my opinion) should no longer exist. numpy.matrixs are incompatible with just about everything, and numpy operations on them return unexpected results.
The solution was casting the numpy.csr.csr_matrix to a numpy.array:
A = kneighbors_graph(X, n_neighbors=5)
A = A.toarray()
d = np.sum(A, axis=1)
d = d.reshape(-1)
Now we have d.shape = (150,)

ValueError: x and y must have same first dimension in linear regression in python

I wrote a linear regression model with a single variable, but it raises a value error after running the following code
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression as lr
import numpy as np
x=np.array([0,1,2,3,4,5,6,7,8,9])
y=np.array([1,3,2,5,7,8,8,9,10,12])
reg=lr().fit(x.reshape(10,1),y.reshape(10,1))
y_l = reg.intercept_ + reg.coef_ *x
plt.plot(x,y_l)
plt.show()
I reshaped the numpy array x by using x.reshape(10,1) in the linear equation. Then it did not raise any value error. But I don't know the reason behind this.
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression as lr
import numpy as np
x=np.array([0,1,2,3,4,5,6,7,8,9])
y=np.array([1,3,2,5,7,8,8,9,10,12])
reg=lr().fit(x.reshape(10,1),y.reshape(10,1))
y_l = reg.intercept_ + reg.coef_ *x.reshape(10,1)
plt.plot(x,y_l)
plt.show()
Can anyone help me with this? Thanks in advance.
reg.coef_ is a 2D array - with shape (1, 1) in this case. it's always 2D in order to account for multiple coefficients when using multiple linear regression.
Broadcasting rules makes the expression reg.coef_ * x return a 2D array, resulting in the error you see.
In your case, I'd say the cleanest expression to fix this is:
y_l = reg.intercept_ + reg.coef_.reshape(1) * x
This happens because of multiplying the np.array with the 2D array reg.coef_ with length (n_features). In order to multiply these elements, you need to either reshape the np.array or reshape the 2D array reg.coef_ into a similar fashion.
This should also work:
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression as lr
import numpy as np
x=np.array([0,1,2,3,4,5,6,7,8,9])
y=np.array([1,3,2,5,7,8,8,9,10,12])
reg=lr().fit(x.reshape(10,1),y.reshape(10,1))
y_l = reg.intercept_ + reg.coef_.reshape(1)*x
plt.plot(x,y_l)
plt.show()
print(reg.coef_.shape)

use of numpy.newaxis in machine learning

I am trying to increase dimensionality of my inital array:
import matplotlib.pyplot as plt
import numpy as np
from sklearn.preprocessing import PolynomialFeatures
x = 10*rng.rand(50)
y = np.sin(x) + 0.1*rng.rand(50)
poly = PolynomialFeatures(7, include_bias=False)
poly.fit_transform(x[:,np.newaxis])
First, I know np.newaxis is creating additional column. Why is this necessary?
Now I will train the updated x data(poly) with linear regression
test_x = np.linspace(0,10,1000)
from sklearn.linear_model import LinearRegression
model = LinearRegression()
# train with increased dimension(x=poly) with its target
model.fit(poly,y)
# testing
test_y = model.predict(x_test)
When I run this it give me :ValueError: Expected 2D array, got scalar array instead: on model.fit(poly,y) line. I've already added a dimension to poly, what is happening?
Also what's the difference between x[:,np.newaxis] Vs. x[:,None]?
In [55]: x=10*np.random.rand(5)
In [56]: x
Out[56]: array([6.47634068, 6.25520837, 7.58822106, 4.65466951, 2.35783624])
In [57]: x.shape
Out[57]: (5,)
newaxis does not add a column, it adds a dimension:
In [58]: x1 = x[:,np.newaxis]
In [59]: x1
Out[59]:
array([[6.47634068],
[6.25520837],
[7.58822106],
[4.65466951],
[2.35783624]])
In [60]: x1.shape
Out[60]: (5, 1)
np.newaxis has the value of None, so both work the same.
In[61]: x[:,None].shape
Out[61]: (5, 1)
One is a little clearer to human readers, the other a little easier to type.
https://www.numpy.org/devdocs/reference/constants.html
Whether x or x1 works depends on the expectations of the learning code. Some learning code expects inputs of the shape (samples, features). It could assume that a (50,) shape array is 50 samples, 1 feature, or 1 case, 50 features. But it's better if you tell exactly what you mean.
Look at the docs:
https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html#sklearn.preprocessing.PolynomialFeatures.fit_transform
poly.fit_transform
X : numpy array of shape [n_samples, n_features]
Sure looks like fit_transform expects a 2d input.
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression.fit
Both X and y are supposed to be 2d.

Categories

Resources