Multidimensional LSTM

Multidimensional LSTM - python

I'm having a problem to make at least one functional machine learning model, the examples I found all over the network are either off topic or good but incomplete (missing dataset, explanations...).
The closest example related to my problem is this.
I'm trying to create a model based on accelerometer & gyroscope sensor, each one has its own 3 axis, for example if I lift the sensor parallel to the gravity then return it back to his initial position, then I should have a table like this.
Example
Now this whole table correspond to one movement which I call it "Fade_away", and the duration for this same movement is variable.
I have only two main questions:
In which format I need to save my dataset, because I don't think an array could arrange this kind of data?
How can I implement a simple model at least with one hidden layer?
To make it easier, let's say that I have 3 outputs, "Fade_away", "Punch" and "Rainbow".

Related

AI categorical prediction for time variant data

I'm currently trying to use a sensor to measure a process's consistency. The sensor output varies wildly in its actual reading but displays features that are statistically different across three categories [dark, appropriate, light], with dark and light being out of control items. For example, one output could read approximately 0V, the process repeats and the sensor then reads 0.6V. Both the 0V reading and the 0.6V reading could represent an in control process. There is a consistent difference for sensor readings for out of control items vs in control items. An example set of an in control item can be found here and an example set of two out of control items can be found here. Because of the wildness of the sensor and characteristic shapes of each category's data, I think the best way to assess the readings is to process them with an AI model. This is my first foray into creating a model that creates a categorical prediction given a time series window. I haven't been able to find anything on the internet with my searches (I'm possibly looking for the wrong thing). I'm certain that what I'm attempting is feasible and has a strong case for an AI model, I'm just not certain what the optimal way to make it is. One idea that I had was to treat the data similarly to how an image is treated by an object detection model, with the readings as the input array and the category as the output, but I'm not certain that this is the best way to go about solving the problem. If anyone can help point me in the right direction or give me a resource, I would greatly appreciate it. Thanks for reading my post!

Compare 2 set of 3D cloud points

I am working on the classification of a 3D point cloud using several python libraries (whitebox, PCL, PDAL). My goal is to classify the soil. The data set has been classified by a company so I am based on their classification as ground truth.
For the moment I am able to classify the soil, to do that I declassified the data set and redo a classification with PDAL. Now I'm at the stage of comparing the two datasets to see the quality of my classification.
I made a script which takes the XYZ coordinates of the 2 sets and puts it in a list and I compare them one by one, however the dataset contains around 5 millions points and it takes 1 minute by 5 points at the begining. After few minutes everything crash. Can anyone give me tips? Here a picture of my clouds The set at the lets is the ground truth and at the right is the one classified by me

Your problem is that you are not using any spatial data structure to ease your point proximity queries. There are several ways you can mitigate this issue, such as KD tree and Octree.
By using such spatial structures you will be able to discard a large portion of unnecessary distance computations, thus improving the performance.

Machine learning - generate new data from current dataset

I have created a dataset from some sensor measurements and some labels and did some classification on it with good results. However, since my the amount of data in my dataset is relatively small (1400 examples) I want to generate more data based on this data. Each row from my dataset consists of 32 numeric values and a label.
Which would be the best approach to generate more data based on the existing dataset I have? So far I have looked at Generative Adversarial Networks and Autoencoders, but I don't think this methods are suitable in my case.
Until now I have worked in Scikit-learn but I could use other libraries as well.

The keyword is here Data Augmentation. You use your available data and modify them slightly to generate additional data which are a little bit different from your source data.
Please take a look at this link. The author uses Data Augmentation to rotate and flip the cat image. So he generate 6 additional images with different perspectives from a single source image.
If you transfer this idea to your sensor data you can add some kind of random noise to your data to increase the dataset. You can find a simple example for Data Aufmentation for time series data here.
Another approach is to window the data and move the window a small step, so the data in the window are a little bit different.
The guys from the statistics stackexchange write something about it. Please check this for additional information.

KMeans: Extracting the parameters/rules that fill up the clusters

I have created a 4-cluster k-means customer segmentation in scikit learn (Python). The idea is that every month, the business gets an overview of the shifts in size of our customers in each cluster.
My question is how to make these clusters 'durable'. If I rerun my script with updated data, the 'boundaries' of the clusters may slightly shift, but I want to keep the old clusters (even though they fit the data slightly worse).
My guess is that there should be a way to extract the paramaters that decides which case goes to their respective cluster, but I haven't found the solution yet.

Got the answer in a different topic:
Just record the cluster means. Then when new data comes in, compare it to each mean and put it in the one with the closest mean.

Incomplete feed dictionary for graph consisting of multiple separate parts?

I read somewhere around here that running multiple Tensorflow graphs in a single process is considered bad practice. Therefore, I now have a single graph which consists of multiple separate "sub-graphs" of the same structure. Their purpose is to generate specific models that describe production tolerances of multiple sensors of the same type. The tolerances are different for each sensor.
I'm trying to use TF to optimize a loss function in order to come up with a numerical description (i.e. a tensor) of that production tolerance for each sensor separately.
In order to achieve that and avoid having to deal with multiple graphs (i.e. avoid bad practice), I built a graph that contains a distinct sub-graph for each sensor.
The problem is that I only get data from a single sensor at a time. So, I cannot build a feed_dict that has all placeholders for all sub-graphs filled with numbers (all zeros wouldn't make sense).
TF now complains about missing values for certain placeholders, namely those of the other sensors that I don't have yet. So basically I would like to calculate a sub-graph without feeding the other sub-graphs.
Is that at all possible and, if yes, what will I have to do in order to hand an incomplete feed_dict to the graph?
If it's not possible to train only parts of a graph, even if they have no connection to other parts, what's the royal road to create models with the same structure but different weights that can get trained separately but don't use multiple graphs?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.