How to convert time series data into image? - python

I have a dataset where I have 12000+ data points and 25 features out of which last feature is the class label. This is classification problem. Now, I want to convert every data points into image, . I have no idea how to do that. Please help. I work on Python. If anyone have could provide sample code I will be grateful. Thanks in advance.

There is already some work on that, you can use either Gramian Angular Fields (GAF) or Markov Transition Fields (MTF), a good description is in Imaging Time-Series to Improve Classification and Imputation. Also, some other works used recurrent plots as Deep-Gap: deep learning framework. Imaging TS is an interesting way to think about them so you can use e.g. CNNs easily. But which method you like to use? BTW be aware this might not be an "efficient" way to classify time series :)

Related

Support Vector Machine to classify whole DataFrames in python

I would like to create a Support Vector Machine to classify whole DataFrames. So in each cell there would be one DataFrame with a set of Data.
I am working with python.
Do you know if this is in any way possible?
Thank you in advance!
I have not been able to find any examples.

Lib for creating ROC-Curve and DET-Curve by giving it the match scores and non-match-scores?

Im comparing some Open-Source Face-Recognition-Frameworks running with python (dlib) and for that i wanted to create ROC and DET curves. For creating match-scores im using the casia faceV5 dataset. Everything is only for educational purpose.
My Questions is:
Whats the best way to generate these kind of curves? (Any good libs for that?)
I found this via google Skicit but i still dont know how i should use that for face recognition?
I mean, which information should i have to pass? I know that ROC is using the true match rate and the false match rate, but from a developers point of view i just dont know how to integrate these informations to that Skicit-function.
My Test:
Im creating genuine match-scores of every person in the casia dataset. Therefor i use different pictures of the same person to create it. I save this scores in the array "genuineScores".
Example:
Person1_Picture1.jpg comparing with Person1_Picture2.jpg
Person2_Picture1.jpg comparing with Person2_Picture2.jpg etc.
Im also creating impostor match-scores. For this im using two pictures of different persons. I save this scores in the array "impostorScores".
Example:
Person1_Picture1.jpg comparing with Person2_Picture1.jpg
Person2_Picture1.jpg comparing with Person3_Picture1.jpg etc.
Now im just looking for a lib where i could pass the two arrays and its creating a roc curve for me.
Or is there another method for doing so?
I appreciate any kind of help. Thank you.

Machine learning - generate new data from current dataset

I have created a dataset from some sensor measurements and some labels and did some classification on it with good results. However, since my the amount of data in my dataset is relatively small (1400 examples) I want to generate more data based on this data. Each row from my dataset consists of 32 numeric values and a label.
Which would be the best approach to generate more data based on the existing dataset I have? So far I have looked at Generative Adversarial Networks and Autoencoders, but I don't think this methods are suitable in my case.
Until now I have worked in Scikit-learn but I could use other libraries as well.
The keyword is here Data Augmentation. You use your available data and modify them slightly to generate additional data which are a little bit different from your source data.
Please take a look at this link. The author uses Data Augmentation to rotate and flip the cat image. So he generate 6 additional images with different perspectives from a single source image.
If you transfer this idea to your sensor data you can add some kind of random noise to your data to increase the dataset. You can find a simple example for Data Aufmentation for time series data here.
Another approach is to window the data and move the window a small step, so the data in the window are a little bit different.
The guys from the statistics stackexchange write something about it. Please check this for additional information.

Converting images of alphabet to feature vector

I want to write a python code for Persian letter recognition. I have a dataset of Farsi alphabet that has 15 instances from each class. There are 19 classes.
Actually I don't have much experience in python. I almost know what are the steps theoretically but I dont know the coding.
Fisrt I want to convert images to feature vectors, but I don't know how to do this:/ I've searched a lot but I couldn't find anything useful.
Any help would be highly appreciated.
As you don't have enough data to train a deep convolutional network, I suggest you to take a look at this Python/OpenCV tutorial on a very similar dataset to yours (MNIST): https://www.learnopencv.com/handwritten-digits-classification-an-opencv-c-python-tutorial/

Preprocess large datafile with categorical and continuous features

First thanks for reading me and thanks a lot if you can give any clue to help me solving this.
As I'm new to Scikit-learn, don't hesitate to provide any advice that can help me to improve the process and make it more professional.
My goal is to classify data between two categories. I would like to find a solution that would give me the most precise result. At the moment, I'm still looking for the most suitable algorithm and data preprocessing.
In my data I have 24 values : 13 are nominal, 6 are binarized and the others are continuous. Here is an example of a line
"RENAULT";"CLIO III";"CLIO III (2005-2010)";"Diesel";2010;"HOM";"_AAA";"_BBB";"_CC";0;668.77;3;"Fevrier";"_DDD";0;0;0;1;0;0;0;0;0;0;247.97
I have around 900K lines for learning and I do my test over 100K lines
As I want to compare several algorithm implementations, I wanted to encode all the nominal values so it can be used in several Classifier.
I tried several things:
LabelEncoder : this was quite good but it gives me ordered values that would be miss-interpreted by the classifier.
OneHotEncoder : if I understand well, it is quite perfect for my needs because I could select the column to binarize. But as I have a lot of nominal values, it always goes in MemoryError. Moreover, its input must be numerical so it is compulsory to LabelEncode everything before.
StandardScaler : this is quite useful but not for what I need. I decided to integrate it to scale my continuous values.
FeatureHasher : first I didn't understand what it does. Then, I saw that it was mainly used for Text analysis. I tried to use it for my problem. I cheated by creating a new array containing the result of the transformation. I think it was not built to work that way and it was not even logical.
DictVectorizer : could be useful but looks like OneHotEncoder and put even more data in memory.
partial_fit : this method is given by only 5 classifiers. I would like to be able to do it with Perceptron, KNearest and RandomForest at least so it doesn't match my needs
I looked on the documentation and found these information on the page Preprocessing and Feature Extraction.
I would like to have a way to encode all the nominal values so that they will not be considered as ordered. This solution can be applied on large datasets with a lot of categories and weak resources.
Is there any way I didn't explore that can fit my needs?
Thanks for any clue and piece of advice.
To convert unordered categorical features you can try get_dummies in pandas, more details can refer to its documentation. Another way is to use catboost, which can directly handle categorical features without transforming them into numerical type.

Categories

Resources