Error in SVM decision boundaries plotting - python

I am trying to imitate this one code that i found on Kaggle on plotting SVM decision boundaries. I am using my own dataset with 608 data and 10 features, with 2 classes. Those 2 classes, for instance, is whether you're diabetec or not. I copied the SVM part of the code on this link (in which you can find when you scroll it way down at the bottom) where it mentioned about decision boundary visualisation. Here's the link to my reference.
However, i get this error saying that "X must be a Numpy array". Can someone explain to me what does this mean?
The code below is what i've done. Take note that my dataset have been normalised beforehand. Also, I'm splitting the data into 70:30 ratio.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import matplotlib.pyplot as show
import matplotlib as cm
import matplotlib.colors as colors
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn import svm
from mlxtend.plotting import plot_decision_regions
autism = pd.read_csv('diabetec.csv')
x = autism.drop(['TARGET'], axis = 1)
y = autism['TARGET']
x_train, X_test, y_train, y_test = train_test_split(x, y, test_size = 0.30, random_state=1)
t = np.array(y_train)
t = t.astype(np.integer)
clf_svm = SVC(C=1.3, gamma=0.8, kernel='rbf')
clf_svm.fit(x_train, t)
plt.figure(figsize=[15,10])
plot_decision_regions(x_train, t, clf = clf_svm, hide_spines = False, colors = 'purple,limegreen', markers = ['x','o'])
plt.title('Support Vector Machine')

plot_decision_regions expects a numpy array but x_train is a pandas dataframe . Try with x_train.values, i.e.
plot_decision_regions(x_train.values, t, clf = clf_svm, ...

Related

Identifying arrays with a band structure

I would like to identify arrays with a band like structure (first image) as compared to a more homogeneous structure shown in the homogenous image.
I have so far used some skewness and RMS techniques to test for this but it doesn't work well if the bands are evenly spaced. Are there any more refined ways of identifying such arrays in Python?
Try sns.pairplot.
import numpy as np
import scipy as sp
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# The dataset: wages
# We fetch the data from OpenML. Note that setting the parameter as_frame to True will retrieve the data as a pandas dataframe.
from sklearn.datasets import fetch_openml
survey = fetch_openml(data_id=534, as_frame=True)
# Then, we identify features X and targets y: the column WAGE is our target variable (i.e., the variable which we want to predict).
X = survey.data[survey.feature_names]
X.describe(include="all")
y = survey.target.values.ravel()
survey.target.head()
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
train_dataset = X_train.copy()
train_dataset.insert(0, "WAGE", y_train)
sns.pairplot(train_dataset, kind='reg', diag_kind='kde')

Dataset indices for predicted values is not matching with those for actual values

I am a python novice who is trying to solve a regression problem with neural networks. I am at the stage where I want to plot the predicted vs actual followed by determining the regression coefficient.
Model training
#import statements
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.neural_network import MLPRegressor
from sklearn.model_selection import train_test_split
%matplotlib inline
#importing the dataset
data = pd.read_csv("PPV_dataset.csv")
X = np.array(data.drop(["PPV"],1))
y = np.array(data["PPV"])
#model training & prediction
nn = MLPRegressor(hidden_layer_sizes=(100,), activation = 'logistic', solver = 'sgd')
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)
nn.fit(X_train, y_train)
pred = nn.predict(X_test)
#indices of test set
a = X_test
indices = []
for row in range(len(X)):
for i in range(len(a)):
if np.all(a[i]==X[row]):
indices.append(row)
#listing actual values in an array
actual_values = []
for i in range(len(indices)):
actual_values.append(y[indices[i]])
Comparing actual to predicted values
len(actual_values)
13
len(pred)
12
Image of dataset
You should use the matplotlib and the seaborn libraries for plotting you graph,
and for coeficient r_sq = nn.score(actual_values, pred)
I recommend using seaborn.lmplot() in your case
for roberts particular case I suggest:
from sklearn.metrics import r2_score
r2_score(y_true, y_pred)

Facing trouble while plotting continuous curve using matplotlib.pyplot

I was using matplotlib.pyplot to plot a continuous curve in jupyter notebook. I used the following code:
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
X_train, X_test, y_train, y_test = train_test_split(X.reshape(-1,1), y, random_state = 0)
poly = PolynomialFeatures(degree=9)
X_train_p = poly.fit_transform(X_train)
X_test_p = poly.fit_transform(X_test)
plt.figure(figsize=(5,5))
plt.title("deg={}".format(9))
plt.plot(X_train, y_train.reshape(-1,1), 'r')
plt.show()
I expected the data points to be successively connected by straight lines, however the outcome turns up like this:
I tried multiple variations of reshaping X_train and y_train using .reshape(), but didn't get the expected outcome.

Simple Linear Regression using Sklearn. Fit() is not working

I am using this dataset:
https://filebin.net/wr2jy0ass7rsl0vt
There are three colums : "Date","Temperature","Anomaly" . I use "Date" to predict "Temperature". The code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
data_df = pd.read_csv("ave_yearly_temp_nyc_1895-2017.csv")
data_df.columns= ["Date","Temperature","Anomaly"]
data_df["Date"] = data_df["Date"]//100
regressor = LinearRegression()
X_train,X_test, y_train,y_test = train_test_split(data_df.iloc[:,0],data_df.iloc[:,1],test_size=0.2, random_state=0)
regressor.fit(X_train,y_train) #training the algorithm
The data_df:
The error:
How to fix it?
It needs a 2D array, using iloc[:,0] you are getting a 1D array.
Instead you can use the entire dataframe column as parameter.
Try using:
X_train,X_test, y_train,y_test = train_test_split(data_df['Date'],data_df['Temperature'],test_size=0.2, random_state=0)
Try to do what the error message tells you. It seems that the implementation expects X to contain more than only one feature. Hence you'll need to transform it like this:
X_train, X_test, y_train, y_test = train_test_split(np.array(data_df.iloc[:,0]).reshape(-1, 1),data_df.iloc[:,1],test_size=0.2, random_state=0)

How to get predicted values along with test data, and visualize actual vs predicted?

from sklearn import datasets
import numpy as np
import pandas as pd from sklearn.model_selection
import train_test_split
from sklearn.linear_model import Perceptron
data = pd.read_csv('student_selection.csv')
x = data[['Average','Pass','Division','Domicile']]
y = data[['Selected']]
x_train,x_test,y_train,y_test train_test_split(x,y,test_size=1,random_state=0)
ppn = Perceptron(eta0=1.0, fit_intercept=True, max_iter=1000, n_iter_no_change=5, random_state=0)
ppn.fit(x_train, y_train)
y_pred = ppn.predict(x_train)
x_train['Predicted'] = pd.Series(y_pred)
How to see the actual vs predicted as a table and along with a plot? x_train is the value I am getting as predicted, but I am unable to merge it with the actual data to see the deviation.
How to see the actual vs predicted as a table and along with a plot?
Just run:
y_predict= pnn.predict(x)
data['y_predict'] = y_predict
and have the column in your dataframe, if you want to plot it you can use:
import matplotlib.pyplot as plt
plt.scatter(data['Selected'], data['y_predict'])
plt.show()

Categories

Resources