Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I have a set data(now only 20 pair but maybe I can produce over 500 pair), my input data is a1 , a2 , a3 , a4 , a5 , a6 , a7 and my output is b, I don't have any idea about equation and what it looks like.
And I am new in machine learning, which algorithm or library or framework in python should I use to prediction the equation of these data?
thanks in advance
Your problem is called "regression problem". For this problems there are many approaches available. The easiest one is to start with a LinearRegression Model like described here: http://benalexkeen.com/linear-regression-in-python-using-scikit-learn/
If you think the relationship between input/output is more complex you will start with nonlinear models like
https://machinelearningmastery.com/develop-first-xgboost-model-python-scikit-learn/
There are many kinds of machine learning algorithms out there, and comprehensive libraries out there as well. The Tensorflow library is generally regarded as a good source for implementing neural networks, however with so few inputs (assuming you indeed mean inputs and not features), it will probably not have enough data to train. You will need to identify if you are trying to classify values or do regression on them (do you have a finite set of values, predicting a range of values, etc.) If you're using python you may wish to check out the scikit-learn library and perhaps do some simple linear or polynomial regression, or do something like KNN for classification. If you wish to learn more and have a more comprehensive tutorial, Kaggle has some good resources (and data science tutorials) to get you started.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have an undirected graph with edges of equal distance with 7 features per node. I want to train a Neural Network with this graph as an input and output a scalar. What network architecture do I need for my network to analyse the graph locally (for example, a node and it's neighbours) and to generalise, much like Convolutional Neural Networks operate on grid data. I have heard of Graph Neural Networks however I don't know if this is what i'm looking for. Will it be able to analyse my graph much like a CNN does with an image, sharing the generalisation benefits that convolution kernels bring?
I want to implement the solution in TensorFlow, ideally with Keras.
Thank you
The performance will most likely depend on the exact output that you're hoping to get. From your description a 2D-CNN should be good enough and easier to implement with Keras than a GNN.
However, there are some advantages to retaining the graph structure from your data. I think this is too much to present here, but you can find a proper explanation on "Spatio-Temporal Analysis and Prediction of Cellular Traffic in Metropolis" by Wang et al.
This paper also has the benefit of describing data processing to input into the network.
If you don't want to use basic Keras models to assemble your own GNN you may also want to take a look at Spektral, which is a python library for graph deep learning.
Without any other constraints I would firstly use a CNN, because it will be faster to implement with almost ready to use models from Keras.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I have started learning Tensorflow recently and I am wondering if it is worth using in simple optimization problems (least squares, maximum likelihood estimation, ...) instead of more traditional libraries (scikit-learn, statsmodel)?
I have implemented a basic AR model estimator using Tensorflow with MLE and the AdamOptimizer and the results are not convincing either performance or computation speed wise.
What do you think?
This is somewhat opinion based, but Tensorflow and similar frameworks such as PyTorch are useful when you want to optimize an arbitrary, parameter-rich non-linear function (e.g., a deep neural network). For a 'standard' statistical model, I would use code that was already tailored to it instead of reinventing the wheel. This is true especially when there are closed-form solutions (as in linear least squares) - why go into to the murky water of local optimization when you don't have to? Another advantage of using existing statistical libraries is that they usually provide you with measures of uncertainty about your point estimates.
I see one potential case in which you might want to use Tensorflow for a simple linear model: when the number of variables is so big the model can't be estimated using closed-form approaches. Then gradient descent based optimization makes sense, and tensorflow is a viable tool for that.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I am in my final year Project. In my project,I will collect data from a specific road. I have choosen 5 points in that road.From each point i will collect data from GPS about which day of the week,time of the day and time Taken to reach from previous point to that point.
I want to train neural network using this data.
So,the input is which day of the week,time,source and destination & output will be the time needed to reach the destination point from the source point.
What will be easiest to complete this job in python? which library should i choose?
I don't actually know about the conditions of you final year's project, but just a few sidenotes:
Using 4 inputs to your perceptron layer (weekday, hourofday, source, destination) to predict one final neuron (timedelta), you will most likely not need the non-linear powers of a neural network.
If you collect data on your own, you will most likely have too few observations to actually train a neural network. And with too few observations, it will probably overfit to your data.
You are very likely perfectly fine with a linear regression.
If you want to try using a neural network whatsoever, take a look at h2o - it offers a broad variety of machine learning / AI functionality to train models and make predictions.
However, to me it seems that you may require additional reading on this topic. You must understand how to interpret results (if any) and you should know about the pros and cons of each method - this includes knowing about data types and values being appropriate or not for certain models to be applied.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
In my project, we are supposed to use an SVM based algorithm. So to get a basic idea about implementation of SVM, we are trying to implement an algorithm, which when fed with an array of 1000 integers where first 95 integers are of values ranging from 0-5, then the next 5 around 10,000 and then again 95 integers of values ranging from 0-5 and next 5 around 10,000 and so on, will be able to predict the next 100 integers (1001st - 1100th) with first 95 integers around 0-5 and the last 5 around 10,000 ...
How to implement it? Preferred programming language is python. So are there any svm modules like libsvm which will facilitate this?
I know this might be a stupid question, but any help would be appreciated a lot !!
Please reply
Here are some resources on AI (SVM specifically) from the Python wiki:
Milk - Milk is a machine learning toolkit in Python. Its focus is on supervised classification with several classifiers available: SVMs (based on libsvm), k-NN, random forests, decision trees. It also performs feature selection. These classifiers can be combined in many ways to form different classification systems.
LibSVM - LIBSVM is an integrated software for support vector classification, (C-SVC, nu-SVC), regression (epsilon-SVR, nu-SVR) and distribution estimation (one-class SVM). It supports multi-class classification. A Python interface is available by by default.
Shogun - The machine learning toolbox's focus is on large scale kernel methods and especially on Support Vector Machines (SVM) . It provides a generic SVM object interfacing to several different SVM implementations, among them the state of the art OCAS, Liblinear, LibSVM, SVMLight, SVMLin and GPDT. Each of the SVMs can be combined with a variety of kernels. The toolbox not only provides efficient implementations of the most common kernels, like the Linear, Polynomial, Gaussian and Sigmoid Kernel but also comes with a number of recent string kernels. SHOGUN is implemented in C++ and interfaces to Matlab(tm), R, Octave and Python and is proudly released as Machine Learning Open Source Software
I'd go with SVM from SciKit. Other options include svmlight and LaSVM.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I was wondering whether and how it is possible to use a python generator as data input to scikit-learn classifier's .fit() functions? Due to huge amounts of data, this seems to make sense to me.
In particular I am about to implement a random forest approach.
Regards
K
The answer is "no". To do out of core learning with random forests, you should
Split your data into reasonably-sized batches (restricted by the amount of RAM you have; bigger is better);
train separate random forests;
append all the underlying trees together in the estimators_ member of one of the trees (untested):
for i in xrange(1, len(forests)):
forests[0].estimators_.extend(forests[i].estimators_)`
(Yes, this is hacky, but no solution to this problem has been found yet. Note that with very large datasets, it might pay to just sample a number training examples that fits in the RAM of a big machine instead of training on all of it. Another option is to switch to linear models with SGD, those implement a partial_fit method, but obviously they're limited in the kind of functions they can learn.)
The short answer is "No, you can't". Classical Random Forest classifier is not an incremental or online classifier, so you can't discard training data while learning, and have to provide all the dataset at once.
Due to popularity of RF in machine learning (not least because of the good prediction results for some interesting cases), there are some attempts to implement online variation of Random Forest, but to my knowledge those are not yet implemented in any python ML package.
See Amir Saffari's page for such an approach (not Python).