Linear Regression issues - python

I'm trying to run a linear regression for 2 columns of data (IMF_VALUES, BBG_FV)
I have this code:
import numpy as np
from sklearn import linear_model
import matplotlib.pyplot as plt
import pandas as pd
raw_data = pd.read_csv("IMF and BBG Fair Values.csv")
ISO_TH = raw_data[["IMF_VALUE","BBG_FV"]]
filtered_TH = ISO_TH[np.isfinite(raw_data['BBG_FV'])]
npMatrix = np.matrix(filtered_TH)
IMF_VALUE, BBG_FV = npMatrix[:,0], npMatrix[:,1]
regression = linear_model.LinearRegression
regression.fit(IMF_VALUE, BBG_FV)
When I run this as a test, I get this error and I really have no idea why:
TypeError Traceback (most recent call last)
<ipython-input-28-1ee2fa0bbed1> in <module>()
1 regression = linear_model.LinearRegression
----> 2 regression.fit(IMF_VALUE, BBG_FV)
TypeError: fit() missing 1 required positional argument: 'y'

Make sure that both are one dimensional arrays:
regression.fit(np.array(IMF_VALUE).reshape(-1,1), np.array(BBG_FV).reshape(-1,1))

Related

SelectKBest is not working when I read CSV files

How can I use the SelectKBest function when I try to read a csv file from my desktop as pandas.
(im a noob so plz be patient with me)
import pandas as pd
from sklearn.feature_selection import SelectKBest, chi2
data = pd.read_csv(r"pima.csv")
X, y = data(return_X_y=True)
X.shape
X_new = SelectKBest(chi2, k=20).fit_transform(X, y)
X_new.shape`
I've tried pima with single quotes (') and double (") with/without (r) nothing changed
the file is a famous (pima indian diabetes) dataset that is available everywhere on google
I get this error when I try to run it:
'DataFrame' object is not callable
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_4116\4011967154.py in <module>
2 from sklearn.feature_selection import SelectKBest, chi2
3 data = pd.read_csv(r"pima.csv")
----> 4 X, y = data(return_X_y=True)
5 X.shape
6
TypeError: 'DataFrame' object is not callable
If you're loading a dataframe with pandas your X and y need to be selected as columns, probably like this:
X = data.drop(['Outcome'], axis=1)
y = data['Outcome']

dataset is not callable problems

Im trying to impute NaN values but,first i want to check the best method to calculate this values. Im new using this methods, so im want to use a code i found to capare the differents regressors and choose the best. The original code is this:
from sklearn.experimental import enable_iterative_imputer # noqa
from sklearn.datasets import fetch_california_housing
from sklearn.impute import SimpleImputer
from sklearn.impute import IterativeImputer
from sklearn.linear_model import BayesianRidge
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import ExtraTreesRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import cross_val_score
N_SPLITS = 5
rng = np.random.RandomState(0)
X_full, y_full = fetch_california_housing(return_X_y=True)
# ~2k samples is enough for the purpose of the example.
Remove the following two lines for a slower run with different error bars.
X_full = X_full[::10]
y_full = y_full[::10]
n_samples, n_features = X_full.shape
fetch_california_housing is his Dataset.
So, when i try to adapt this code to my case i wrote this code:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from numpy import genfromtxt
data = genfromtxt('documents/datasets/df.csv', delimiter=',')
features = data[:, :2]
targets = data[:, 2]
N_SPLITS = 5
rng = np.random.RandomState(0)
X_full, y_full = data(return_X_y= True)
# ~2k samples is enough for the purpose of the example.
# Remove the following two lines for a slower run with different error bars.
X_full = X_full[::10]
y_full = y_full[::10]
n_samples, n_features = X_full.shape
I always get the same error:
AttributeError: 'numpy.ndarray' object is not callable
and before I used my DF as csv (df.csv) the error is the same
AttributeError: 'Dataset' object is not callable
the complete error is this:
ypeError Traceback (most recent call last) <ipython-input-8-3b63ca34361e> in <module>
3 rng = np.random.RandomState(0) 4
----> 5 X_full, y_full = df(return_X_y=True)
6 # ~2k samples is enough for the purpose of the example.
7 # Remove the following two lines for a slower run with different error bars.
TypeError: 'DataFrame' object is not callable
and i dont know how to solve one of both error to go away
I hope to explain well my problem cause my english is not very good

encountering as error when trying to create a artificial dataframe in Python

This is my first post and pardon me for any misses from my end.
Was trying to create an artificial data frame to use k-means clustering. Getting this error while running the data set creating function and viewing the data frame getting error as below.
TypeError: _append_dispatcher() missing 1 required positional argument: 'values'
I would appreciate your support and help to resolve.
from scipy.stats import norm
import random
from numpy import *
import numpy as np
from ast import literal_eval
from pandas import DataFrame
def create_clustered_data(N,k):
random.seed(10)
points_per_cluster=float(N)/k
x=[]
for i in range(k):
income_centroid=random.uniform(20000,200000)
age_centroid=random.uniform(20,70)
for j in range(int(points_per_cluster)):
x=np.append([random.normal(income_centroid,10000),random.normal(age_centroid,2)])
x=np.array(x)
return(x)
df=create_clustered_data(100,5)
df
Error Message
TypeError Traceback (most recent call last)
<ipython-input-204-0ff0b56b46c6> in <module>
18 return(x)
19
---> 20 df=create_clustered_data(100,5)
21 df
22
<ipython-input-204-0ff0b56b46c6> in create_clustered_data(N, k)
14 age_centroid=random.uniform(20,70)
15 for j in range(int(points_per_cluster)):
---> 16 x=np.append([random.normal(income_centroid,10000),random.normal(age_centroid,2)])
17 x=np.array(x)
18 return(x)
<__array_function__ internals> in append(*args, **kwargs)
TypeError: _append_dispatcher() missing 1 required positional argument: 'values'
Here x=[] creates a list, not a numpy array also the check the syntax of the numpy append function.
One way to solve the problem would be to append it to the list using the list.append function and then convert the list to a numpy array.
from scipy.stats import norm
import random
from numpy import *
import numpy as np
from ast import literal_eval
from pandas import DataFrame
def create_clustered_data(N,k):
random.seed(10)
points_per_cluster=float(N)/k
x=[]
for i in range(k):
income_centroid=random.uniform(20000,200000)
age_centroid=random.uniform(20,70)
for j in range(int(points_per_cluster)):
x.append([random.normal(income_centroid,10000),random.normal(age_centroid,2)])
ar = np.array(x)
return(ar)
df=create_clustered_data(100,5)
df

Extra tree classifier missing argument y

So I was trying to implement Extra Tree Classifier in order to find the parameters importance in my data base, I wrote this simple code but for some reason I keep getting thiss Error.
My Code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.ensemble import ExtraTreesClassifier
df = pd.read_csv('C:\\Users\\ali97\\Desktop\\Project\\Database\\5-FINAL2\\Final After Simple Filtering.csv')
extra_tree_forest = ExtraTreesClassifier(n_estimators = 5, criterion ='entropy', max_features = 2)
extra_tree_forest.fit(df)
feature_importance = extra_tree_forest.feature_importances_
feature_importance_normalized = np.std([tree.feature_importances_ for tree in extra_tree_forest.estimators_], axis = 1)
plt.bar(X.columns, feature_importance_normalized)
plt.xlabel('Lbale')
plt.ylabel('Feature Importance')
plt.title('Parameters Importance')
plt.show()
The Error:
TypeError Traceback (most recent call last)
<ipython-input-7-4aad8882ce6d> in <module>
16 extra_tree_forest = ExtraTreesClassifier(n_estimators = 5, criterion ='entropy', max_features = 2)
17
---> 18 extra_tree_forest.fit(df)
19
20 feature_importance = extra_tree_forest.feature_importances_
TypeError: fit() missing 1 required positional argument: 'y'
Thank you
Usually, for fit function, we need to have both attributes(X) and labels(Y) and you need to use extra_tree_forest.fit(X, Y) to train this classifier.
I recommend you split labels and attributes and import them as two separate lists when you import
Final After Simple Filtering.csv.

How to read a csv file and plot confusion matrix in python

I have a CSV file with 2 columns as
actual,predicted
1,0
1,0
1,1
0,1
.,.
.,.
How do I read this file and plot a confusion matrix in Python?
I tried the following code from a program.
import pandas as pd
from sklearn.metrics import confusion_matrix
import numpy
CSVFILE='./mappings.csv'
test_df=pd.read_csv[CSVFILE]
actualValue=test_df['actual']
predictedValue=test_df['predicted']
actualValue=actualValue.values
predictedValue=predictedValue.values
cmt=confusion_matrix(actualValue,predictedValue)
print cmt
but it gives me this error.
Traceback (most recent call last):
File "confusionMatrixCSV.py", line 7, in <module>
test_df=pd.read_csv[CSVFILE]
TypeError: 'function' object has no attribute '__getitem__'
pd.read_csv is a function. You call a function in Python by using parenthesis.
You should use pd.read_csv(CSVFILE) instead of pd.read_csv[CSVFILE].
import pandas as pd
from sklearn.metrics import confusion_matrix
import numpy as np
CSVFILE = './mappings.csv'
test_df = pd.read_csv(CSVFILE)
actualValue = test_df['actual']
predictedValue = test_df['predicted']
actualValue = actualValue.values.argmax(axis=1)
predictedValue =predictedValue.values.argmax(axis=1)
cmt = confusion_matrix(actualValue, predictedValue)
print cmt
Here's a simple solution to calculate the accuracy and plot confusion matrix for the input in the format mentioned in the question.
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
file=open("results.txt","r")
result=[]
actual=[]
i = 0
for line in file:
i+=1
sent=line.split("\t")
sent[0]=int(sent[0])
sent[1]=int(sent[1])
result.append(sent[1])
actual.append(sent[0])
cnf_mat=confusion_matrix(actual,result)
print cnf_mat
print('Test Accuracy:', accuracy_score(actual,result))

Categories

Resources