I am working on a project using a keras deep learning model that i need to transfer into PyTorch .
The goal of the project is to localize some elements on the images. To train it, I first use patches extracted from my images and then infer on the full image. I read that it was possible with the (None,None,1) input shape for the keras input layer and it is currently working. However, the same training system does not seem to work in pytorch. Therefore i was wondering is the (None,None,1) input layer doing something specific when I start inferring on full images?
Thanks for your answers
As in the discussion in the link and referencing the words of fchollet:
Of course,
it's not always possible to have such free dimensions (for instance it's
not possible to have variable-length sequences with TensorFlow, but it is
with Theano).
One can assume that it's because the architecture of the framework. As you stated, it may be accepted in keras, but not accepted in PyTorch.
Related
Halo there, I'm still struggling in python.
Now I'm going to use the EfficientNet model to detect the ripeness of palm oil.
I'm using 5852 training picture which is divided into 4 class (1463 per class) with 132 testing picture (33 per class).
After testing with 200 epoch, the result is far from good.
Is there any solution for me to improve the result?
Here's the result of my model accuracy and model loss.
Here's my code
https://colab.research.google.com/drive/18AtIP7aOycHPDR84PuQ7iS8aYUdclZIe?usp=sharing
your help means a lot to me.
You have rescaling in your generators, and it may be the root of the problem.
Tensorflow implementation of Efficientnets already contain rescaling layer, so you mustn't rescale images in your ImageDataGenerator. You can check this via .summary() method.
Official documentation says:
Note: each Keras Application expects a specific kind of input preprocessing. For EfficientNet, input preprocessing is included as part of the model (as a Rescaling layer), and thus tf.keras.applications.efficientnet.preprocess_input is actually a pass-through function. EfficientNet models expect their inputs to be float tensors of pixels with values in the [0-255] range
Resnets, for example, don't have this layer, and you should rescale images before feeding them to the model. It's tricky to remember those things for every single network from tf.keras.applications, so I suggest to just check them before using new models.
I am using convolutional neural networks to predict vegetation growth. My input is a (n,51,51,1) terrain elevation tensor, and the label is a (n,51,51,1) vegetation tensor.
Since flow from directory uses foldernames as labels, this is a bit of a problem. My network is performing well, but having to have all the data in memory is a bit limiting. If anyone knows how to setup a flow from directory for this problem I would appreciate it. I'm using R as an interface to keras and tensorflow, but solutions in python are welcome too. Included the picture in case it wasn't clear what I'm doing. Thanks!
This is a complex problem you are trying to solve. Image creation is another can of worms than classification (which is what you are talking about)
You can check this article that talks more in depth about the generational networks.
Another way to think about it, is to have the last output layer with 51*51 hidden units and do regression. By this I mean to treat it as a regression problem where you do regression on each pixel individually.
I am working on a problem which requires me to build a deep learning model that based on certain input image it has to output another image. It is worth noting that these two images are conceptually related but they don't have the same dimensions.
At first I thought that a classical CNN with a final dense layer whose argument is the multiplication of the height and width of the output image would suit this case, but when training it was giving strange figures such as accuracy of 0.
While looking for some answers on the Internet I discovered the concepts of CNN autoencoders and I was wondering if this approach could help me solve my problem. Among all the examples I saw, the input and output of an autoencoder had the same size and dimensions.
At this point I wanted to ask if there was a type of CNN autoencoders that produce an output image that has different dimension compared to input image.
Auto-encoder (AE) is an architecture that tries to encode your image into a lower-dimensional representation by learning to reconstruct the data from such representation simultaniously. Therefore AE rely on a unsupervised (don't need labels) data that is used both as an input and as the target (used in the loss).
You can try using a U-net based architecture for your usecase. A U-net would forward intermediate data representations to later layers of the network which should assist with faster learning/mapping of the inputs into a new domain..
You can also experiment with a simple architecture containing a few ResNet blocks without any downsampling layers, which might or might not be enough for your use-case.
If you want to dig a little deeper you can look into Disco-GAN and related methods.They explicitly try to map image into a new domain while maintaining image information.
I was asked to create a machine algorithm using tensorflow and python that could detect anomalies by creating a range of 'normal' values. I have two perameters, a large array of floats around 1.5 and timestamps. I have not seen similar threads using tensorflow in a basic sense, and since I am new to technology I am looking to make a more basic machine. However, I would like to have it be unsupervised, meaning that I do not specify what an anomaly is, but rather a large amount of past data does. Thank you, I am running python 3.5 and tensorflow 1.2.1.
Deep Learning - Anomaly and Fraud Detection
https://exploreai.org/p/deep-learning-anomaly-and-fraud-detection
Simply normalize the values and feed it to the tensorflow autoencoder model.
So, autoencoders are deep neural networks used to reproduce the input at the output layer i.e. the number of neurons in the output layer is exactly the same as the number of neurons in the input layer. Consider the image below
The autoencoders work in a similar way. The encoder part of the architecture breaks down the input data to a compressed version ensuring that important data is not lost but the overall size of the data is reduced significantly. This concept is called Dimensionality Reduction.
Check this repo for code : Autoencoder in tensorflow
I'm in need of an artificial neural network library (preferably in python) for one (simple) task. I want to train it so that it can tell wether a thing is in an image. I would train it by feeding it lots of pictures and telling it wether it contains the thing I'm looking for or not:
These images contain this thing, return True (or probability of it containing the thing)
These images do not contain this thing, return False (or probability of it containing the thing)
Does such a library already exist? I'm fairly new to ANNs and image recognition; although I understand how they both work in principle I find it quite hard to find an adequate library for this task, and even research in this field has proven to be kind of a frustration - any advice towards the right direction is greatly appreciated.
There are several good Neural Network approaches in Python, including TensorFlow, Caffe, Lasagne, and sknn (Sci-kit Neural Network). sknn provides an easy, out of the box solution, although in my opinion it is more difficult to customize and can be slow on large datasets.
One thing to consider is whether you want to use a CNN (Convolutional Neural Network) or a standard ANN. With an ANN you will mostly likely have to "unroll" your images into a vector whereas with a CNN, it expects the image to be a cube (if in color, a square otherwise).
Here is a good resource on CNNs in Python.
However, since you aren't really doing a multiclass image classification (for which CNNs are the current gold standard) and doing more of a single object recognition, you may consider a transformed image approach, such as one using the Histogram of Oriented Gradients (HOG).
In any case, the accuracy of a Neural Network approach, especially when using CNNs, is highly dependent on successful hyperparamter tuning. Unfortunately, there isn't yet any kind of general theory on what hyperparameter values (number and size of layers, learning rate, update rule, dropout percentage, batch size, etc.) are optimal in a given situation. So be prepared to have a nice Training, Validation, and Test set setup in order to fit a robust model.
I am unaware of any library which can do this for you. I use a lot of Caffe and can give you a solution till you find a single library which can do it for you.
I hope you know about ImageNet and that Caffe has a trained model based on ImageNet.
Here is the idea:
Define what the object is. Say object = "laptop".
Use Caffe's ImageNet trained model, change the code to display the required output you want (you mentioned TRUE or FALSE) when the object is in the output labels.
Here is a link to the ImageNet tutorial which I wrote.
Here is what you might try:
Take a look here. It is a stripped down version of the ImageNet program which I used in a prediction engine.
In line 80 you'll get the top-1 predicted output label. In line 86 you'll get the top-5 predicted labels. Write a line of code to check whether object is in the output_label and return TRUE or FALSE according to it.
I understand that you are looking for a specific library, I will look for it, but this is something I would try out in the beginning.