How to implement/use Artificial immune system(AIS) in python? - python

I'm new to machine machine learning algorithms and classification techniques.
I have created a dataset, and trained a model with a SVM in python using sklearn module.
But now I have to change my approach from SVM to artificial immune system. My question thus is, Is there a module for AIS in python that I can use? Just like Sklearn which provides SVM.
If there is none, Where can I find an example or help on implementing one ?
Below is my code in SVM, in case anyone would need it.
# In the name of GOD
# SeyyedMahdi Hassanpour
# SeyyedMahdihp#gmail.com
# SeyyedMahdihp # github
import numpy as np
from sklearn import svm, model_selection
import pandas as pd
df = pd.read_csv('final_dataset123456.csv')
x = np.array(df.drop(['label'], 1))
y = np.array(df['label'])
x_train, x_test, y_train, y_test = model_selection.train_test_split(x, y, test_size=0.36, random_state=39)
clf = svm.SVC()
clf.fit(x_train, y_train)
accuracy = clf.score(x_test, y_test)
print(accuracy)
ar = [0,0,0,0,0,0,0,0,0,0,0,0,0.2,0.058824,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.25,0,0,0,0,0.020833,0.2,0.090909,0,0.032258,0,0,0,0,0,0.0625,0,0,0,0.058333,0,0,0.1,0,0.125,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
br = [0.5,1,1,0.254902,0.853933,1,1,0.254902,1,0.27451,0.2,1,0.4,0.176471,1,1,1,1,0.625,1,0.125,1,0.393939,0.857143,0.052632,1,0.75,0.847826,1,1,0.583333,0.7,1,1,1,0.729167,0.6,0.818182,1,0.193548,0.333333,1,0.674419,1,1,1,0.8,1,1,0.2,0.37037,1,0.8,0.529412,0.375,1,1,0.23913,1,1,1,1,0.666667,1,1,1,1,0,1,0,1,0.23913,0.7,0.7,1,1,1,1,1,1,1,1,0.23913,1,1,1,1,1,1,1,1,1,0.666667,1,0.7,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,1,1,1,1,1,1]
example_measures = np.array([ar,br])
example_measures = example_measures.reshape(len(example_measures), -1)
prediction = clf.predict(example_measures)
print(prediction)

There are many artificial immune system applications implemented on WEKA Platform. You can download and use it from sourceforge.
Here is the link:
https://sourceforge.net/directory/?q=artificial+immune+system

Related

How to output feature names with XGBOOST feature selection

My model uses feature importance for feature selection with XGBOOST. But, at the end, it outputs all the confusion matrices/results and how many features the model includes. That now works successfully, but I also need to have the feature names that were used in each model outputted as well.
I get a warning that says "X has feature names, but SelectFromModel was fitted without feature names", so I know something needs to be added to have them be in the model before I can output them, but I'm not sure how to handle either of those steps. I found several old questions about this, but I wasn't able to successfully implement any of them to my particular code. I'd really appreciate any ideas you have. Thank you!
from numpy import loadtxt
from numpy import sort
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.feature_selection import SelectFromModel
from sklearn.metrics import classification_report
# load data
dataset = df_train
# split data into X and y
X_train = df[df.columns.difference(['IsDeceased','IsTotal','Deceased','Sick','Injured','Displaced','Homeless','MissingPeople','Other','Total'])]
y_train = df['IsDeceased'].values
X_test = df_test[df_test.columns.difference(['IsDeceased','IsTotal','Deceased','Sick','Injured','Displaced','Homeless','MissingPeople','Other','Total'])]
y_test = df_test['IsDeceased'].values
# fit model on all training data
model = XGBClassifier()
model.fit(X_train, y_train)
# make predictions for test data and evaluate
print("Accuracy: %.2f%%" % (accuracy * 100.0))
# Fit model using each importance as a threshold
thresholds = sort(model.feature_importances_)
for thresh in thresholds:
# select features using threshold
selection = SelectFromModel(model, threshold=thresh, prefit=True)
select_X_train = selection.transform(X_train)
# train model
selection_model = XGBClassifier()
selection_model.fit(select_X_train, y_train)
print(thresh)
# eval model
select_X_test = selection.transform(X_test)
y_pred = selection_model.predict(select_X_test)
report = classification_report(y_test,y_pred)
print("Thresh= {} , n= {}\n {}" .format(thresh,select_X_train.shape[1], report))
cm = confusion_matrix(y_test, y_pred)
print(cm)

How to optimise the structure and parameters of ANN (MLPClassifier)

For my University project, I was asked to optimise the structure and parameters of ANN using one or more of the following methods:
Random Search
Meta Learning
Adaptive Boosting
Cascade Correlation
Here is the original code to improve:
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.decomposition import PCA
from sklearn.neural_network import MLPClassifier
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
nof_prin_components = 200
pca = PCA(n_components=nof_prin_components, whiten=True).fit(X_train)
X_train_pca = pca.transform(X_train)
X_test_pca = pca.transform(X_test)
nohn = 200 # nof hidden neurons
clf = MLPClassifier(hidden_layer_sizes=(nohn,),solver='sgd',activation='tanh',
batch_size=256, early_stopping=True).fit(X_train_pca, y_train)
y_pred = clf.predict(X_test_pca)
print(classification_report(y_test, y_pred))
I haven't got any problems with implementing Random Search and Grid Search, it definitely makes sense to me, it's well documented and there are lots of examples of how to use it. Here's how I've implemented it:
If it comes to the rest of the methods, I have no idea how to use them. I can't find any useful examples that I can implement in my solution.
The question is: what's the easiest of the listed methods (except Random Search) to implement and describe in the report? How can I implement it along with my MLPClassifier?

How to set 'update_weights' in OpenCV's MLP implementation?

I'm used to using sklearn, whose documentation I find very easy to follow. However, I now need to learn to use OpenCV - in particular, I need to be able to use an MLP classifier, and to update its weights as new training data comes in.
In sklearn, this can be done using the partial_fit method. According to the OpenCV documentation, there is an UPDATE_WEIGHTS flag that can be set, but I can't figure out how to include it in my code.
Here's a MCVE of what I have so far:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score
import numpy as np
import cv2
from sklearn.neural_network import MLPClassifier
def softmax(x):
softmaxes = np.zeros(x.shape)
for i in range(x.shape[1]):
softmaxes[:, i] = np.exp(x)[:, i]/np.sum(np.exp(x), axis=1)
return softmaxes
data = load_breast_cancer()
X = data.data
y = data.target.reshape(-1, 1)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1729)
y2 = np.zeros((y_train.shape[0], 2))
y2[:,0] = np.where(y_train==0, 1, 0)
y2[:,1] = np.where(y_train==1, 1, 0)
ann = cv2.ml.ANN_MLP_create()
ann.setLayerSizes(np.array([X.shape[1], y2.shape[1]]))
ann.setActivationFunction(cv2.ml.ANN_MLP_SIGMOID_SYM)
ann.train(np.float32(X_train), cv2.ml.ROW_SAMPLE, np.float32(y2))
mlp = MLPClassifier()
mlp.fit(X_train, y_train)
preds_proba = softmax(ann.predict(np.float32(X_test))[1])
print(roc_auc_score(y_test, preds_proba[:,1]))
print(roc_auc_score(y_test, mlp.predict_proba(X_test)[:,1]))
As the score between the OpenCV classifier and the sklearn learn one are comparable, I'm pretty confident it's implemented correctly.
How can I modify this code, so that when a new training sample comes in, I can update the weights based on that sample alone, rather than retraining on the entire train set?
The equivalent in sklearn would be:
mlp.partial_fit(X_new_sample, y_new_sample).
Figured out the answer. The syntax is the following:
ann.train(cv2.ml.TrainData_create(np.float32(X), 0, np.float32(y)), flags=1)

Sklearn Classifier

I started on my journey on machine learning a few months back, Today I was practicing my skills and I tried I few different Algorithms, I used Linear Regression, Decision Tree Classifier and Support Vector Machine, My code is very simple and it's working just fine (" I guess " ), But since I'm new pardon me if this a silly question, But using the Linear Regression and Decision Tree Classifier give me an accuracy from 1.04 to 1.22, But if I use SVM it give me 0.72, So I'm confuse since I read that SVM is better than Linear Regression in speed and performance, So can you guys please help me clarify this. :)
Thanks in Advance :)
THIS IS MY CODE:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier
dataset = pd.read_csv("/home/jairo/Downloads/diabetes.csv")
dataset.shape
x = dataset.drop(['Outcome'], axis=1)
y = dataset['Outcome']
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
classifier = DecisionTreeClassifier()
classifier.fit(x_train, y_train)
predic = classifier.predict(x_test)
score = accuracy_score(y_test, predic.round(), normalize=False)
print("Accuracy : {}".format(score/100))
THIS IS THE LAST OUTPUT THAT I GOT:
Accuracy : 1.15
Classification performance is highly dependent on your type of input and what you want to classify. One isn't objectively "better" than the other. To perhaps add some insight into your results, SVM works by trying to find a hyperplane that divides your data into classes. If you had positive and negative as your two potential outcomes, for example, it would try to find the hyperplane that divides your points in n-dimensional space such that all points on either side of the hyperplane belong to that class. n here refers to the number of features.

Support vector machine overfitting my data

I am trying to make predictions for the iris dataset. I have decided to use svms for this purpose. But, it gives me an accuracy 1.0. Is it a case of overfitting or is it because the model is very good? Here is my code.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
svm_model = svm.SVC(kernel='linear', C=1,gamma='auto')
svm_model.fit(X_train,y_train)
predictions = svm_model.predict(X_test)
accuracy_score(predictions, y_test)
Here, accuracy_score returns a value of 1. Please help me. I am a beginner in machine learning.
You can try cross validation:
Example:
from sklearn.model_selection import LeaveOneOut
from sklearn import datasets
from sklearn.svm import SVC
from sklearn.model_selection import cross_val_score
#load iris data
iris = datasets.load_iris()
X = iris.data
Y = iris.target
#build the model
svm_model = SVC( kernel ='linear', C = 1, gamma = 'auto',random_state = 0 )
#create the Cross validation object
loo = LeaveOneOut()
#calculate cross validated (leave one out) accuracy score
scores = cross_val_score(svm_model, X,Y, cv = loo, scoring='accuracy')
print( scores.mean() )
Result (the mean accuracy of the 150 folds since we used leave-one-out):
0.97999999999999998
Bottom line:
Cross validation (especially LeaveOneOut) is a good way to avoid overfitting and to get robust results.
The iris dataset is not a particularly difficult one from where to get good results. However, you are right not trusting a 100% classification accuracy model. In your example, the problem is that the 30 test points are all correctly well classified. But that doesn't mean that your model is able to generalise well for all new data instances. Just try and change the test_size to 0.3 and the results are no longer 100% (it goes down to 97.78%).
The best way to guarantee robustness and avoid overfitting is using cross validation. An example on how to do this easily from your example:
from sklearn import datasets
from sklearn import svm
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
iris = datasets.load_iris()
X = iris.data[:, :4]
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
svm_model = svm.SVC(kernel='linear', C=1, gamma='auto')
scores = cross_val_score(svm_model, iris.data, iris.target, cv=10) #10 fold cross validation
Here cross_val_score uses different parts of the dataset as testing data iteratively (cross validation) while keeping all your previous parameters. If you check score you will see that the 10 accuracies calculated now range from 87.87% to 100%. To report the final model performance you can for example use the mean of the scored values.
Hope this helps and good luck! :)

Categories

Resources