Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 months ago.
Improve this question
My friend is doing his thesis related to fish spawning in rivers, for this, he collects hours of data that he then analysis manually in Audacity and looks for specific patterns in the spectrograms that might indicate the sound of fish spawning.
Since he has days' worth of data I proposed a challenge to myself: to create an algorithm that might help him in detecting these patterns.
I am fairly new to Machine Learning, but a junior in programming and this sounds like a fun learning experience.
I identify the main problems as:
samples are 1 hour in length.
noise in the background (such as cars and the rivers)
Is this achievable with machine learning or should I look into other options? If yes which ones?
Thank you for taking the time to read!
the first step would be to convert the sound signals into features that machines can understand. Maybe look into MFCCs for that.
Given that you have an appropriate feature representation of your problem domain, the main thing to consider would be what kind of machine learning algorithm would you apply? Unless you would like to sit and annotate hours of data, naive supervised learning is out of the window.
I think your best bet would be to modify VAD (voice activity detection) algorithms or better yet, Speaker recognition/Identification modals.
You could also approach it by first having a complex enough representation that allows you to "see" the sound and comparing it with every frame in the test data of the specific length. Might be useful to check out DTW (Dynamic Time warping)
If you have not designed such modals before, it will be a bit difficult and might take quite a long time.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 months ago.
Improve this question
I try to summarize my problem as good as I can, but it is really don't easy for me because I do not really know which key words to ask about...
I am quite new in machine learning and used Keras to solve problems I know the answer.
Example:
As an input have tons of photos of different object like cars, birds... (x-value) and a label what they are (y-value). The program can then recognize objects on new photos. The usual stuff.
How can I write a program that discovers new ways to solve a problem?
Example:
Players play a game, and I record their moves as well as the result/score. How can I write program that finds the optimal strategy? I tried to just sort out games with bad scores but I guess there is a better way, something like
if score is really good, then learn this strongly
if score is slightly over average, then learn this
if score is slightly under average, then do not learn this
if score is really bad,then avoid this strongly
Can you give me a hint? Would be really nice.
I believe you are talking about reinforcement learning, your example about players play a game and return score is what exactly reinforcement learning is. Reinforcement learning works using punish and reward concept, the model will try to do something and when the model returns a good result, the model will get a reward, and otherwise, when the model returns a bad result, it will be punished. Perhaps you can do some research and google about it.
But in my opinion since you mentioned that you are quite new to machine learning and keras, you probably need to learn about supervised learning first. Your case about photo (x) and label (y) is all about supervised learning is.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I'm doing a project where I have data of 100 sensors and its cycles until it breaks. It shows a lot of characteristcs until its failure, and then shows it for the replacement sensor. With this data, I have to built a model where I can predict for how long the sensor will work until its failure, but only with a few data, not the full cycle. I have no idea what machine learning model is suitable for this.
The type of problem you are describing is known as survival analysis. A wide range of both statistical and machine learning methods are available to help you solve these type of problems.
What is great about these methods is it also allow you to use data points where the event you are interested in has not occur. In your example, it means you can possibly extend your dataset by including data from sensors which has not failed yet.
When you look at the methods I suggest you also spend some time examining how to evaluate these types of models, since the evaluation methods are also slightly different then in typical machine learning problems.
A comprehensive range of techniques is available at: http://dmkd.cs.vt.edu/TUTORIAL/Survival/Slides.pdf
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
was just wondering whether anyone had any thoughts on best practices when working in databricks. It is financially costing a lot to develop within databricks, hence would like to know where else it would be best to develop python code in. With thought also to collaborative work, is there a similar set up to databricks for collaborative work that is free or of little cost to use.
Any suggestions, greatly appreciated!
The cost of Databricks is really related to the size of the clusters you are running (1 worker, 1 driver or 1 driver 32 workers?), the spec of the machines in the cluster (low RAM and CPU or high RAM and CPU), and how long you leave them running (always running or short time to live, aka "Terminate after x minutes of inactivity". I am also assuming you are not running the always on High Concurrency cluster mode.
Some general recommendations would be:
work with smaller datasets in dev, eg representative samples which would enable you to...
work with smaller clusters in dev, eg instead of working with large 32 node clusters, work with 2 node small clusters
set time to live as short eg 15 mins
which together would reduce your cost
Obviously there is a trade-off in assembling representative samples and making sure your outputs are still accurate and useful but that's up to you.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have images of the following type
etc..
What would be the easiest way of identifying what piece it is and if it is black or white? Do I need to use machine learning or is there an easier way?
It depends on your inputs. If all of your data looks like this (nice and clean, with contours being identical, just background and color changes), you could probably use some kind of pixel + color matching and you could be good to go.
You definitely know that deep learning and machine learning only approximate function (functions of pixels in this case), if you can find it (the function) without using those methods (with sensible amount of work), you always should.
And no, machine learning is not a silver bullet, you get an image and you throw it into convolutional neural networks black-box magic and you get your results, that's not the point.
Sorry but deep learning might be just an overkill to recognize a known set of images. Use template matching!
https://machinelearningmastery.com/using-opencv-python-and-template-matching-to-play-wheres-waldo/
You could do this using machine learning (plan or convolutional neural nets). This isn't that hard of a problem, but you have to do manual work of creating proper dataset with lots of pictures.
For example: For each piece you need to create picture with white/black field color. And you have to do different combinations, different chess piece sets vs. different table color schema. In order to make the system more robust to color schema you can try different color channels.
There are lots of questions, will the pictures you test always be in same resolution? If they aren't then you should also take that into consideration when creating dataset.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I'm developing a little tool which is able to classify musical genres. To do this, I would like to use a K-nn algorithm (or another one, but this one seems to be good enough) and I'm using python-yaafe for the feature extraction.
My problem is that, when I extract a feature from my song (example: mfcc), as my songs are 44100Hz-sampled, I retrieve a lot (number of sample windows) of 12-values-array, and I really don't know how to deal with that. Is there an approach to get just one representative value per feature and per song?
One approach would be to take the least RMS energy value of the signal as a parameter for classification.
You should use a music segment, rather than using the whole music file for classification.Theoretically, the part of the music of 30 sec, starting after the first 30 secs of the music, is best representative for genre classification.
So instead of taking the whole array, what you can do is to consider the part which corresponds to this time window, 30sec-59sec. Calculate the RMS energy of the signal separately for every music file, averaged over the whole time. You may also take other features into account, eg. , MFCC.
In order to use MFCC, you may go for the averaged value of all signal windows for a particular music file. Make a feature vector out of it.
You may use the difference between the features as the distance between the data points for classification.