I have two models trained with Tensorflow Python, exported to binary files named export1.meta and export2.meta. Both files will generate only one output when feeding with input, say output1 and output2.
My question is if it is possible to merge two graphs into one big graph so that it will generate output1 and output2 together in one execution.
Any comment will be helpful. Thanks in advance!
I kicked this around with my local TF expert, and the brief answer is "no"; TF doesn't have a built-in facility for this. However, you could write custom endpoint layers (input and output) with synch operations from Python's process management, so that they'd maintain parallel processing of each input, and concatenate the outputs.
Rationale
I like the way this could be used to get greater accuracy with multiple features, where the features have little or no correlation. For instance, you could train two character recognition models: one to identify the digit, the other to discriminate between left- and right-handed writers.
This would also allow you to examine the internal kernels that evolved for each individual feature, without interdependence with other features: the double-loop of an '8' vs the general slant of right-handed writing.
I also expect that the models for individual features will converge measurably faster than one over-arching training session.
Finally, it's quite possible that the individual models could be used in mix-and-match feature sets. For instance, train another model to differentiate letters, while letting your previously-trained left/right flagger would still have a pretty good guess at the writer's moiety.
Related
Recurrent Neural Networks (RNN) With Attention Mechanism is generally used for Machine Translation and Natural Language Processing. In Python, implementation of RNN With Attention Mechanism is abundant in Machine Translation (For Eg. https://talbaumel.github.io/blog/attention/, however what I would like to do is to use RNN With Attention Mechanism on a temporal data file (not any textual/sentence based data).
I have a CSV file with of dimensions 21392 x 1972, which I have converted to a Dataframe using Pandas. The first column is of Datetime Format and last column consists of target classes like "Class1", "Class2", "Class3" etc. which I would like to identify. So in total, there are 21392 rows (instances of data in 10 minutes time-steps) and 1971 features. The last (1972th column) is the label column, with 14 different classes in total.
I have looked into available implementation documentation on Keras (https://medium.com/datalogue/attention-in-keras-1892773a4f22) as well as on Tensorflow (Visualizing attention activation in Tensorflow), but none of them seem to be doing what I want to accomplish. I understand that this is an unusual approach, but would want to try this and use the attention mechanism because many of my features are presumably redundant in the data.
import pandas as pd
mydataset = pd.read_csv('final_merged_data.csv')
It is predominant from existing literature that an Attention Mechanism works quite well when coupled into the RNN. I am unable to locate any such implementation of RNN with Attention Mechanism, which can also provide a visualisation as well. I am also unable to understand how I can convert my data into a sequence (or a list of lists) so that I can use it with One Hot Encoding afterwards for using RNN with Attention. I am new to using Python as well as Keras/Tensorflow, and am quite confused on the procedure to convert my data/typecast it to a form which will be able to mimic the sequence classification problem. My problem is basically of multi-class classification, like one would normally do using Machine Learning Classifiers to predict the labels, but using RNN with Attention. Any help in this regard would be highly appreciated. Cheers!
Kindly refer to this paper for using Sequence to Sequence Model with attention for time series classification.
https://www.computer.org/csdl/proceedings/icdmw/2016/5910/00/07836709.pdf
I am trying to apply the word2vec model implemented in the library gensim 3.6 in python 3.7, Windows 10 machine. I have a list of sentences (each sentences is a list of words) as an input to the model after performing preprocessing.
I have computed the results (obtaining 10 most similar words of a given input word using model.wv.most_similar) in Anaconda's Spyder followed by Sublime Text editor.
But, I am getting different results for the same source code executed in two editors.
Which result should I need to choose and Why?
I am specifying the screenshot of the results obtained by running the same code in both spyder and sublime text. The input word for which I need to obtain 10 most similar word is #universe#
I am really confused how to choose the results, on what basis? Also, I have started learning Word2Vec recently.
Any suggestion is appreciated.
Results Obtained in Spyder:
Results Obtained using Sublime Text:
The Word2Vec algorithm makes use of randomization internally. Further, when (as is usual for efficiency) training is spread over multiple threads, some additional order-of-presentation randomization is introduced. These mean that two runs, even in the exact same environment, can have different results.
If the training is effective – sufficient data, appropriate parameters, enough training passes – all such models should be of similar quality when doing things like word-similarity, even though the actual words will be in different places. There'll be some jitter in the relative rankings of words, but the results should be broadly similar.
That your results are vaguely related to 'universe' but not impressively so, and that they vary so much from one run to another, suggest there may be problems with your data, parameters, or quantity of training. (We'd expect the results to vary a little, but not that much.)
How much data do you have? (Word2Vec benefits from lots of varied word-usage examples.)
Are you retaining rare words, by making min_count lower than its default of 5? (Such words tend not to get good vectors, and also wind up interfering with the improvement of nearby words' vectors.)
Are you trying to make very-large vectors? (Smaller datasets and smaller vocabularies can only support smaller vectors. Too-large vectors allow 'overfitting', where idiosyncracies of the data are memorized rather than generalized patterns learned. Or, they allow the model to continue improving in many different non-competitive directions, so model end-task/similarity results can be very different from run-to-run, even though each model is doing about-as-well as the other on its internal word-prediction tasks.)
Have you stuck with the default epochs=5 even with a small dataset? (A large, varied dataset requires fewer training passes - because all words appear many times, all throughout the dataset, anyway. If you're trying to squeeze results from thinner data, more epochs may help a little – but not as much as more varied data would.)
I'm trying to train a very simple neural network to classify samples of data where some classes necessarily succeed others - this is why I decided to let the input data enter the network in batches. Using Tensorflow, apparently you get multiple ways of declaring batches, like tf.data.Dataset.batch (with which I currently train using the Adam Optimizer) and tf.train.batch. Where is the difference? Should the methods be used together or are they exclusive? In the latter case: which one should I prefer?
tf.train.* is an older API, more complex and prone to errors than the tf.data.* one (you need to take care yourself of queues, thread runners, coordinator, etc). For your stated purpose (batching data and feeding it to a model), the two are functionally equivalent, as in both achieve your goal. However, you should consider using tf.data as that's both simpler to use and the currently recommended way to handle input datasets.
I've been using tensorflow for a while now. At first I had stuff like this:
def myModel(training):
with tf.scope_variables('model', reuse=not training):
do model
return model
training_model = myModel(True)
validation_model = myModel(False)
Mostly because I started with some MOOCs that tought me to do that. But they also didn't use TFRecords or Queues. And I didn't know why I was using two separate models. I tried building only one and feeding the data with the feed_dict: everything worked.
Ever since I've been usually using only one model. My inputs are always place_holders and I just input either training or validation data.
Lately, I've noticed some weird behavior on models that use tf.layers.dropout and tf.layers.batch_normalization. Both functions have a 'training' parameter that I use with a tf.bool placeholder. I've seen tf.layers used generally with a tf.estimator.Estimator, but I'm not using it. I've read the Estimators code and it appears to create two different graphs for training and validation. May be that those issues are arising from not having two separate models, but I'm still skeptical.
Is there a clear reason I'm not seeing that implies that two separate-equivalent models have to be used?
You do not have to use two neural nets for training and validation. After all, as you noticed, tensorflow helps you having a monolothical train-and-validate net by allowing the training parameter of some layers to be a placeholder.
However, why wouldn't you? By having separate nets for training and for validation, you set yourself on the right path and future-proof your code. Your training and validation nets might be identical today, but you might later see some benefit to having distinct nets such as having different inputs, different outputs, removing out intermediate layers, etc.
Also, because variables are shared between them, having distinct training and validation nets comes at almost no penalty.
So, keeping a single net is fine; in my experience though, any project other than playful experimentation is likely to implement a distinct validation net at some point, and tensorflow makes it easy to do just that with minimal penalty.
tf.estimator.Estimator classes indeed create a new graph for each invocation and this has been the subject of furious debates, see this issue on GitHub. Their approach is to build the graph from scratch on each train, evaluate and predict invocations and restore the model from the last checkpoint. There are clear downsides of this approach, for example:
A loop that calls train and evaluate will create two new graphs on every iteration.
One can't evaluate while training easily (though there are workarounds, train_and_evaluate, but this doesn't look very nice).
I tend to agree that having the same graph and model for all actions is convenient and I usually go with this solution. But in a lot of cases when using a high-level API like tf.estimator.Estimator, you don't deal with the graph and variables directly, so you shouldn't care how exactly the model is organized.
I'm trying to make an ANN to classify a PDF file as either malicious or clean, by utilising the 26,000 PDF samples (both clean and malicious) found on contagiodump. For each PDF file, I used PDFid.py to parse the file and return a vector of 42 numbers. The 26000 vectors are then passed into pybrain; 50% for training and 50% for testing. This is my source code:
https://gist.github.com/sirpoot/6805938
After much tweaking with the dimensions and other parameters I managed to get a false positive rate of about 0.90%. This is my output:
https://gist.github.com/sirpoot/6805948
My question is, is there any explicit way for me to decrease the false positive rate further? What do I have to do to reduce the rate to perhaps 0.05%?
There are several things you can try to increase the accuracy of your neural network.
Use more of your data for training. This will permit the network to learn from a larger set of training samples. The drawback of this is that having a smaller test set will make your error measurements more noisy. As a rule of thumb, however, I find that 80%-90% of your data can be used in the training set, with the rest for test.
Augment your feature representation. I'm not familiar with PDFid.py, but it only returns ~40 values for a given PDF file. It's possible that there are many more than 40 features that might be relevant in determining whether a PDF is malicious, so you could conceivably use a different feature representation that includes more values to increase the accuracy of your model.
Note that this can potentially involve a lot of work -- feature engineering is difficult! One suggestion I have if you decide to go this route is to look at the PDF files that your model misclassifies, and try to get an intuitive idea of what went wrong with those files. If you can identify a common feature that they all share, you could try adding that feature to your input representation (giving you a vector of 43 values) and re-train your model.
Optimize the model hyperparameters. You could try training several different models using training parameters (momentum, learning rate, etc.) and architecture parameters (weight decay, number of hidden units, etc.) chosen randomly from some reasonable intervals. This is one way to do what is called "hyperparameter optimization" and, like feature engineering, it can involve a lot of work. However, unlike feature engineering, hyperparameter optimization can largely be done automatically and in parallel, provided you have access to a lot of processing cores.
Try a deeper model. Deep models have become quite "hot" in the machine learning literature recently, especially for speech processing and some types of image classification. By using stacked RBMs, a second-order learning method (PDF), or a different nonlinearity like a rectified linear activation function, then you can add multiple layers of hidden units to your model, and sometimes this will help improve your error rate.
These are the ones that come to mind right off the bat. Good luck !
Let me first say I am in no ways an expert in Neural Networks. But I played with pyBrain once and I used the .train() method in a while error < 0.001 loop to get the error rate I wanted. So you can try using all of them for training with that loop and test it with other files.