Python Sklearn linear regression not callable - python

I am implementing simple linear regression and multiple linear regression using pandas and sklearn
My code is as follows
import pandas as pd
import numpy as np
import scipy.stats
from sklearn import linear_model
from sklearn.metrics import r2_score
df = pd.read_csv("Auto.csv", na_values='?').dropna()
lr = linear_model.LinearRegression()
y = df['mpg']
x = df['displacement']
X = x.values.reshape(-1,1)
sklearn_model = lr.fit(X,y)
This works fine, but for multiple linear regression, for some reason it doesn't work WITH the () at the end of sklearn's linear regression, when I use it with the brackets I get the following error:
TypeError: 'LinearRegression' object is not callable
My multiple linear regression code is as follows:
lr = linear_model.LinearRegression
feature_1 = np.array(df[['displacement']])
feature_2 = np.array(df[['weight']])
feature_1 = feature_1.reshape(len(feature_1),1)
feature_2 = feature_2.reshape(len(feature_2),1)
X = np.hstack([feature_1,feature_2])
sklearn_mlr = lr(X,df['mpg'])
I want to know what I'm doing wrong. Additionally, I'm not able to print the various attributes in the linear regression method if I don't use the () at the end. e.g.
print(sklearn_mlr.coef_)
Gives me the error:
AttributeError: 'LinearRegression' object has no attribute 'coef_'

Given this snippet:
lr = linear_model.LinearRegression
feature_1 = np.array(df[['displacement']])
feature_2 = np.array(df[['weight']])
feature_1 = feature_1.reshape(len(feature_1),1)
feature_2 = feature_2.reshape(len(feature_2),1)
X = np.hstack([feature_1,feature_2])
sklearn_mlr = lr(X,df['mpg'])
Your issue is that you have not initialized an instance of the LinearRegression class. You need to initialize it like you did in the first example. Then you can use the fit method like so:
lr = linear_model.LinearRegression()
feature_1 = np.array(df[['displacement']])
feature_2 = np.array(df[['weight']])
feature_1 = feature_1.reshape(len(feature_1),1)
feature_2 = feature_2.reshape(len(feature_2),1)
X = np.hstack([feature_1,feature_2])
sklearn_mlr = lr.fit(X,df['mpg'])
Once an instance has been fit it will have the attributes listed in the documentation (e.g. .coef_). As it was you were trying to access .coef of the LogisticRegression class itself.

lr is a class in your example.
You need to initialize it, and then call .fit(X,df['mpg']) from the instance.

Why not import it as follows:
from sklearn.linear_model import LinearRegression
In my opinion it is much cleaner than what you did. You can then use it like that:
lr = LinearRegression()

Related

How can I reconstruct the logistic regression probabilities of sklearn package when multi_class="multinomial"?

I am having a hard time figuring out the algorithm sklearn uses to calculate the predicted probabilities when the LogisticRegression instance is created with multi_class="multinomial". This is how I setup the logistic regression:
from sklearn.datasets import load_iris
X, y = load_iris(return_X_y=True)
from sklearn.linear_model import LogisticRegression
clf1 = LogisticRegression(multi_class='ovr')
clf2 = LogisticRegression(multi_class='multinomial')
clf1.fit(X, y)
clf2.fit(X, y)
pred_prob_ovr = clf1.predict_proba(X)
pred_prob_multinomial = clf2.predict_proba(X)
I can reconstruct the probabilities when multi_class="ovr" using:
import numpy as np
logit = np.matmul(X, clf1.coef_.transpose()) + clf1.intercept_
A = np.exp(logit)
P = 1/(1+1/A)
Prob_ovr = P/P.sum(axis=1).reshape((-1, 1))
However, when I use multi_class="multinomial", the above procedure doesn't work and I need to use this method:
logit = np.matmul(X, clf2.coef_.transpose()) + clf2.intercept_
from sklearn.utils.extmath import softmax
Prob_multinomial = softmax(logit)
Could you please describe why the coefficients and intercept, obtained from the model with multi_class="multinomial", don't work with the first procedure?
Thanks

ValueError: Expected 2D array, got 1D array instead: array=[-1]

Here is the problem
Extract just the median_income column from the independent variables (from X_train and X_test).
Perform Linear Regression to predict housing values based on median_income.
Predict output for test dataset using the fitted model.
Plot the fitted model for training data as well as for test data to check if the fitted model satisfies the test data.
I did a linear regression earlier.Following is the code
import pandas as pd
import os
os.getcwd()
os.chdir('/Users/saurabhsaha/Documents/PGP-AI:ML-Purdue/New/datasets')
df=pd.read_excel('California_housing.xlsx')
df.total_bedrooms=df.total_bedrooms.fillna(df.total_bedrooms.mean())
x = df.iloc[:,2:8]
y = df.median_house_value
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=.20)
from sklearn.linear_model import LinearRegression
california_model = LinearRegression().fit(x_train,y_train)
california_model.predict(x_test)
Prdicted_values = pd.DataFrame(california_model.predict(x_test),columns=['Pred'])
Prdicted_values
Final = pd.concat([x_test.reset_index(drop=True),y_test.reset_index(drop=True),Prdicted_values],axis=1)
Final['Err_pct'] = abs(Final.median_house_value-
Final.Pred)/Final.median_house_value
Here is my dataset- https://docs.google.com/spreadsheets/d/1vYngxWw7tqX8FpwkWB5G7Q9axhe9ipTu/edit?usp=sharing&ouid=114925088866643320785&rtpof=true&sd=true
Following is my code.
x1_train=x_train.median_income
x1_train
x1_train.shape
x1_test=x_test.median_income
x1_test
type(x1_test)
x1_test.shape
from sklearn.linear_model import LinearRegression
california_model_new = LinearRegression().fit(x1_train,y_train)```
I get an error right here and when I try converting my 2 D array to 1 D as follows , i can not
```python
import numpy as np
x1_train= x1_train.reshape(-1, 1)
x1_test = x1_train.reshape(-1, 1)
This is the error I get
AttributeError: 'Series' object has no attribute 'reshape'
I am new to data science so if you can explain a bit then it would be real helpful
x1_train and x1_test are pandas Series objects, whereas the the reshape() method is applied to numpy arrays.
Do this instead:
x1_train= x1_train.to_numpy().reshape(-1, 1)
x1_test = x1_train.to_numpy().reshape(-1, 1)

How do I manually `predict_proba` from logistic regression model in scikit-learn?

I am trying to manually predict a logistic regression model using the coefficient and intercept outputs from a scikit-learn model. However, I can't match up my probability predictions with the predict_proba method from the classifier.
I have tried:
from sklearn.datasets import load_iris
from scipy.special import expit
import numpy as np
X, y = load_iris(return_X_y=True)
clf = LogisticRegression(random_state=0).fit(X, y)
# use sklearn's predict_proba function
sk_probas = clf.predict_proba(X[:1, :])
# and attempting manually (using scipy's inverse logit)
manual_probas = expit(np.dot(X[:1], clf.coef_.T)+clf.intercept_)
# with a completely manual inverse logit
full_manual_probas = 1/(1+np.exp(-(np.dot(iris_test, iris_coef.T)+clf.intercept_)))
outputs:
>>> sk_probas
array([[9.81815067e-01, 1.81849190e-02, 1.44120963e-08]])
>>> manual_probas
array([[9.99352591e-01, 9.66205386e-01, 2.26583306e-05]])
>>> full_manual_probas
array([[9.99352591e-01, 9.66205386e-01, 2.26583306e-05]])
I do seem to get the classes to match (using np.argmax), but the probabilities are different. What am I missing?
I've looked at this and this but haven't managed to figure it out yet.
The documentation states that
For a multi_class problem, if multi_class is set to be “multinomial” the softmax function is used to find the predicted probability of each class
That is, in order to get the same values as sklearn you have to normalize using softmax, like this:
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
import numpy as np
X, y = load_iris(return_X_y=True)
clf = LogisticRegression(random_state=0, max_iter=1000).fit(X, y)
decision = np.dot(X[:1], clf.coef_.T)+clf.intercept_
print(clf.predict_proba(X[:1]))
print(np.exp(decision) / np.exp(decision).sum())
To use sigmoids instead you can do it like this:
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
import numpy as np
X, y = load_iris(return_X_y=True)
clf = LogisticRegression(random_state=0, max_iter=1000, multi_class='ovr').fit(X, y) # Notice the extra argument
full_manual_probas = 1/(1+np.exp(-(np.dot(X[:1], clf.coef_.T)+clf.intercept_)))
print(clf.predict_proba(X[:1]))
print(full_manual_probas / full_manual_probas.sum())

Predicting claim number through GLM model

I'm conducting a case study where I have to predict claim number per policy. Since my variable ClaimNb is not binary I can't use logistic Regression but I have to use Poisson.
My code for GLM model:
import statsmodels.api as sm
import statsmodels.formula.api as smf
formula= 'ClaimNb ~ BonusMalus+VehAge+Freq+VehGas+Exposure+VehPower+Density+DrivAge'
model = smf.glm(formula = formula, data=df,
family=sm.families.Poisson())
I have also split my data
# train-test-split
train , test = train_test_split(data,test_size=0.2,random_state=0)
# seperate the target and independent variable
train_x = train.drop(columns=['ClaimNb'],axis=1)
train_y = train['ClaimNb']
test_x = test.drop(columns=['ClaimNb'],axis=1)
test_y = test['ClaimNb']
My problem now is the prediction, I have used the following but did not work:
from sklearn.linear_model import PoissonRegressor model = PoissonRegressor(alpha=1e-3, max_iter=1000)
model.fit(train_x,train_y)
predict = model.predict(test_x)
Please is there any other way to predict and check the accuracy of the model?
thanks
You need to assign the model.fit() and predict with that, it's different from sklearn. Also, if you using the formula, it is better to split your dataframe into train and test, predict using that. For example:
import statsmodels.api as sm
import statsmodels.formula.api as smf
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(0,100,(50,4)),columns=['ClaimNb','BonusMalus','VehAge','Freq'])
#X = df[['BonusMalus','VehAge','Freq']]
#y = df['ClaimNb']
df_train = df.sample(round(len(df)*0.8))
df_test = df.drop(df_train.index)
formula= 'ClaimNb ~ BonusMalus+VehAge+Freq'
model = smf.glm(formula = formula, data=df,family=sm.families.Poisson())
result = model.fit()
And we can do the prediction:
result.predict(df_train)
Or:
result.predict(df_test)

Trying to run regression code. Getting error about 'linear_model'

I am trying to run this regression code.
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import sklearn
import sklearn.cross_validation
# Load the data
oecd_bli = pd.read_csv("C:/Users/Excel/Desktop/Briefcase/PDFs/ALL PYTHON & R CODE SAMPLES/Hands-On Machine_Learning_with_Scikit_Learn_and_Tensorflow/GDP Per Capita/oecd_bli.csv", thousands=',')
gdp_per_capita = pd.read_csv("C:/Users/Excel/Desktop/Briefcase/PDFs/ALL PYTHON & R CODE SAMPLES/Hands-On Machine_Learning_with_Scikit_Learn_and_Tensorflow/GDP Per Capita/gdp_per_capita.csv",thousands=',')
# view first 10 rows of data frame
oecd_bli[:10]
gdp_per_capita[:10]
country_stats = pd.merge(oecd_bli, gdp_per_capita, left_index=True, right_index=True)
country_stats[:10]
X = np.c_[country_stats["GDP"]]
Y = np.c_[country_stats["VALUE"]]
print(X)
print(Y)
# Visualize the data
country_stats.plot(kind='scatter', x="GDP", y='VALUE')
plt.show()
# Select a linear model
lin_reg_model = sklearn.linear_model.LinearRegression()
# Train the model
lin_reg_model.fit(X, Y)
# Make a prediction for Cyprus
X_new = [[22587]] # Cyprus' GDP per capita
print(lin_reg_model.predict(X_new))
I get this error.
AttributeError: module 'sklearn' has no attribute 'linear_model'
I'm not sure what's going on. I am trying to learn about this from an example that I saw in a book.
#import package, call the class
from sklearn.linear_model import LinearRegression
#build the model(create a regression object)
model = LinearRegression()
#fit the model
model.fit(x,y)
linear_model is a subpackage of sklearn. It wont work if you only imported via: import sklearn. Try import sklearn.linear_model instead.
Python does not automatically import all the subpackages. When I tried to explicitly import, linear_module, it works:
from sklearn import linear_model

Categories

Resources