How to implement/use Artificial immune system(AIS) in python? - python
I'm new to machine machine learning algorithms and classification techniques.
I have created a dataset, and trained a model with a SVM in python using sklearn module.
But now I have to change my approach from SVM to artificial immune system. My question thus is, Is there a module for AIS in python that I can use? Just like Sklearn which provides SVM.
If there is none, Where can I find an example or help on implementing one ?
Below is my code in SVM, in case anyone would need it.
# In the name of GOD
# SeyyedMahdi Hassanpour
# SeyyedMahdihp#gmail.com
# SeyyedMahdihp # github
import numpy as np
from sklearn import svm, model_selection
import pandas as pd
df = pd.read_csv('final_dataset123456.csv')
x = np.array(df.drop(['label'], 1))
y = np.array(df['label'])
x_train, x_test, y_train, y_test = model_selection.train_test_split(x, y, test_size=0.36, random_state=39)
clf = svm.SVC()
clf.fit(x_train, y_train)
accuracy = clf.score(x_test, y_test)
print(accuracy)
ar = [0,0,0,0,0,0,0,0,0,0,0,0,0.2,0.058824,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.25,0,0,0,0,0.020833,0.2,0.090909,0,0.032258,0,0,0,0,0,0.0625,0,0,0,0.058333,0,0,0.1,0,0.125,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
br = [0.5,1,1,0.254902,0.853933,1,1,0.254902,1,0.27451,0.2,1,0.4,0.176471,1,1,1,1,0.625,1,0.125,1,0.393939,0.857143,0.052632,1,0.75,0.847826,1,1,0.583333,0.7,1,1,1,0.729167,0.6,0.818182,1,0.193548,0.333333,1,0.674419,1,1,1,0.8,1,1,0.2,0.37037,1,0.8,0.529412,0.375,1,1,0.23913,1,1,1,1,0.666667,1,1,1,1,0,1,0,1,0.23913,0.7,0.7,1,1,1,1,1,1,1,1,0.23913,1,1,1,1,1,1,1,1,1,0.666667,1,0.7,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,1,1,1,1,1,1]
example_measures = np.array([ar,br])
example_measures = example_measures.reshape(len(example_measures), -1)
prediction = clf.predict(example_measures)
print(prediction)
There are many artificial immune system applications implemented on WEKA Platform. You can download and use it from sourceforge.
Here is the link:
https://sourceforge.net/directory/?q=artificial+immune+system
Related
How to output feature names with XGBOOST feature selection
My model uses feature importance for feature selection with XGBOOST. But, at the end, it outputs all the confusion matrices/results and how many features the model includes. That now works successfully, but I also need to have the feature names that were used in each model outputted as well. I get a warning that says "X has feature names, but SelectFromModel was fitted without feature names", so I know something needs to be added to have them be in the model before I can output them, but I'm not sure how to handle either of those steps. I found several old questions about this, but I wasn't able to successfully implement any of them to my particular code. I'd really appreciate any ideas you have. Thank you! from numpy import loadtxt from numpy import sort from xgboost import XGBClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score from sklearn.feature_selection import SelectFromModel from sklearn.metrics import classification_report # load data dataset = df_train # split data into X and y X_train = df[df.columns.difference(['IsDeceased','IsTotal','Deceased','Sick','Injured','Displaced','Homeless','MissingPeople','Other','Total'])] y_train = df['IsDeceased'].values X_test = df_test[df_test.columns.difference(['IsDeceased','IsTotal','Deceased','Sick','Injured','Displaced','Homeless','MissingPeople','Other','Total'])] y_test = df_test['IsDeceased'].values # fit model on all training data model = XGBClassifier() model.fit(X_train, y_train) # make predictions for test data and evaluate print("Accuracy: %.2f%%" % (accuracy * 100.0)) # Fit model using each importance as a threshold thresholds = sort(model.feature_importances_) for thresh in thresholds: # select features using threshold selection = SelectFromModel(model, threshold=thresh, prefit=True) select_X_train = selection.transform(X_train) # train model selection_model = XGBClassifier() selection_model.fit(select_X_train, y_train) print(thresh) # eval model select_X_test = selection.transform(X_test) y_pred = selection_model.predict(select_X_test) report = classification_report(y_test,y_pred) print("Thresh= {} , n= {}\n {}" .format(thresh,select_X_train.shape[1], report)) cm = confusion_matrix(y_test, y_pred) print(cm)
How to optimise the structure and parameters of ANN (MLPClassifier)
For my University project, I was asked to optimise the structure and parameters of ANN using one or more of the following methods: Random Search Meta Learning Adaptive Boosting Cascade Correlation Here is the original code to improve: from sklearn.model_selection import train_test_split from sklearn.metrics import classification_report from sklearn.decomposition import PCA from sklearn.neural_network import MLPClassifier X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2) nof_prin_components = 200 pca = PCA(n_components=nof_prin_components, whiten=True).fit(X_train) X_train_pca = pca.transform(X_train) X_test_pca = pca.transform(X_test) nohn = 200 # nof hidden neurons clf = MLPClassifier(hidden_layer_sizes=(nohn,),solver='sgd',activation='tanh', batch_size=256, early_stopping=True).fit(X_train_pca, y_train) y_pred = clf.predict(X_test_pca) print(classification_report(y_test, y_pred)) I haven't got any problems with implementing Random Search and Grid Search, it definitely makes sense to me, it's well documented and there are lots of examples of how to use it. Here's how I've implemented it: If it comes to the rest of the methods, I have no idea how to use them. I can't find any useful examples that I can implement in my solution. The question is: what's the easiest of the listed methods (except Random Search) to implement and describe in the report? How can I implement it along with my MLPClassifier?
How to set 'update_weights' in OpenCV's MLP implementation?
I'm used to using sklearn, whose documentation I find very easy to follow. However, I now need to learn to use OpenCV - in particular, I need to be able to use an MLP classifier, and to update its weights as new training data comes in. In sklearn, this can be done using the partial_fit method. According to the OpenCV documentation, there is an UPDATE_WEIGHTS flag that can be set, but I can't figure out how to include it in my code. Here's a MCVE of what I have so far: from sklearn.datasets import load_breast_cancer from sklearn.model_selection import train_test_split from sklearn.metrics import roc_auc_score import numpy as np import cv2 from sklearn.neural_network import MLPClassifier def softmax(x): softmaxes = np.zeros(x.shape) for i in range(x.shape[1]): softmaxes[:, i] = np.exp(x)[:, i]/np.sum(np.exp(x), axis=1) return softmaxes data = load_breast_cancer() X = data.data y = data.target.reshape(-1, 1) X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1729) y2 = np.zeros((y_train.shape[0], 2)) y2[:,0] = np.where(y_train==0, 1, 0) y2[:,1] = np.where(y_train==1, 1, 0) ann = cv2.ml.ANN_MLP_create() ann.setLayerSizes(np.array([X.shape[1], y2.shape[1]])) ann.setActivationFunction(cv2.ml.ANN_MLP_SIGMOID_SYM) ann.train(np.float32(X_train), cv2.ml.ROW_SAMPLE, np.float32(y2)) mlp = MLPClassifier() mlp.fit(X_train, y_train) preds_proba = softmax(ann.predict(np.float32(X_test))[1]) print(roc_auc_score(y_test, preds_proba[:,1])) print(roc_auc_score(y_test, mlp.predict_proba(X_test)[:,1])) As the score between the OpenCV classifier and the sklearn learn one are comparable, I'm pretty confident it's implemented correctly. How can I modify this code, so that when a new training sample comes in, I can update the weights based on that sample alone, rather than retraining on the entire train set? The equivalent in sklearn would be: mlp.partial_fit(X_new_sample, y_new_sample).
Figured out the answer. The syntax is the following: ann.train(cv2.ml.TrainData_create(np.float32(X), 0, np.float32(y)), flags=1)
Sklearn Classifier
I started on my journey on machine learning a few months back, Today I was practicing my skills and I tried I few different Algorithms, I used Linear Regression, Decision Tree Classifier and Support Vector Machine, My code is very simple and it's working just fine (" I guess " ), But since I'm new pardon me if this a silly question, But using the Linear Regression and Decision Tree Classifier give me an accuracy from 1.04 to 1.22, But if I use SVM it give me 0.72, So I'm confuse since I read that SVM is better than Linear Regression in speed and performance, So can you guys please help me clarify this. :) Thanks in Advance :) THIS IS MY CODE: import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression from sklearn.metrics import accuracy_score from sklearn.tree import DecisionTreeClassifier dataset = pd.read_csv("/home/jairo/Downloads/diabetes.csv") dataset.shape x = dataset.drop(['Outcome'], axis=1) y = dataset['Outcome'] x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2) classifier = DecisionTreeClassifier() classifier.fit(x_train, y_train) predic = classifier.predict(x_test) score = accuracy_score(y_test, predic.round(), normalize=False) print("Accuracy : {}".format(score/100)) THIS IS THE LAST OUTPUT THAT I GOT: Accuracy : 1.15
Classification performance is highly dependent on your type of input and what you want to classify. One isn't objectively "better" than the other. To perhaps add some insight into your results, SVM works by trying to find a hyperplane that divides your data into classes. If you had positive and negative as your two potential outcomes, for example, it would try to find the hyperplane that divides your points in n-dimensional space such that all points on either side of the hyperplane belong to that class. n here refers to the number of features.
Support vector machine overfitting my data
I am trying to make predictions for the iris dataset. I have decided to use svms for this purpose. But, it gives me an accuracy 1.0. Is it a case of overfitting or is it because the model is very good? Here is my code. X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0) svm_model = svm.SVC(kernel='linear', C=1,gamma='auto') svm_model.fit(X_train,y_train) predictions = svm_model.predict(X_test) accuracy_score(predictions, y_test) Here, accuracy_score returns a value of 1. Please help me. I am a beginner in machine learning.
You can try cross validation: Example: from sklearn.model_selection import LeaveOneOut from sklearn import datasets from sklearn.svm import SVC from sklearn.model_selection import cross_val_score #load iris data iris = datasets.load_iris() X = iris.data Y = iris.target #build the model svm_model = SVC( kernel ='linear', C = 1, gamma = 'auto',random_state = 0 ) #create the Cross validation object loo = LeaveOneOut() #calculate cross validated (leave one out) accuracy score scores = cross_val_score(svm_model, X,Y, cv = loo, scoring='accuracy') print( scores.mean() ) Result (the mean accuracy of the 150 folds since we used leave-one-out): 0.97999999999999998 Bottom line: Cross validation (especially LeaveOneOut) is a good way to avoid overfitting and to get robust results.
The iris dataset is not a particularly difficult one from where to get good results. However, you are right not trusting a 100% classification accuracy model. In your example, the problem is that the 30 test points are all correctly well classified. But that doesn't mean that your model is able to generalise well for all new data instances. Just try and change the test_size to 0.3 and the results are no longer 100% (it goes down to 97.78%). The best way to guarantee robustness and avoid overfitting is using cross validation. An example on how to do this easily from your example: from sklearn import datasets from sklearn import svm from sklearn.model_selection import train_test_split from sklearn.model_selection import cross_val_score iris = datasets.load_iris() X = iris.data[:, :4] y = iris.target X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0) svm_model = svm.SVC(kernel='linear', C=1, gamma='auto') scores = cross_val_score(svm_model, iris.data, iris.target, cv=10) #10 fold cross validation Here cross_val_score uses different parts of the dataset as testing data iteratively (cross validation) while keeping all your previous parameters. If you check score you will see that the 10 accuracies calculated now range from 87.87% to 100%. To report the final model performance you can for example use the mean of the scored values. Hope this helps and good luck! :)