I'm trying to make a program for recognizing music instrument and notes(like C, C# B, ...) using machine learning in python.
I got data from IRMAS and philhamonic orchestra homepage.
How can I analyze music? I want to get noise removed and MFCC values. in 20 second music, I want to get within 20 featured values. I'm trying to use SVM using these data.
Sorry for too broad question... If there is something else i should mention, let me know then i'll answer imediately.
I have mathematica, also. I tried it using 'MFCC encoder' but i have no idea how can i normalize these data and set a threshold.
Take a look at this Mathematica example of using Neural Networks and MFCC encoding to classify music genre.
Related
I have a number of .mp3 files which all start with a short voice introduction followed by piano music. I would like to remove the voice part and just be left with the piano part, preferably using a Python script. The voice part is of variable length, ie I cannot use ffmpeg to remove a fixed number of seconds from the start of each file.
Is there a way of detecting the start of the piano part and then know how many seconds to remove using ffmpeg or even using Python itself?.
Thank you
This is a non-trivial problem if you want a good outcome.
Quick and dirty solutions would involve inferred parameters like:
"there's usually 15 seconds of no or low-db audio between the speaker and the piano"
"there's usually not 15 seconds of no or low-db audio in the middle of the piano piece"
and then use those parameters to try to get something "good enough" using audio analysis libraries.
I suspect you'll be disappointed with that approach given that I can think of many piano pieces with long pauses and this reads like a classic ML problem.
The best solution here is to use ML with a classification model and a large data set. Here's a walk-through that might help you get started. However, this isn't going to be a few minutes of coding. This is a typical ML task that will involve collecting and tagging lots of data (or having access to pre-tagged data), building a ML pipeline, training a neural net, and so forth.
Here's another link that may be helpful. He's using a pretrained model to reduce the amount of data required to get started, but you're still going to put in quite a bit of work to get this going.
I actually have Photodiode connect to my PC an do capturing with Audacity.
I want to improve this by using an old RPI1 as dedicated test station. As result the shutter speed should appear on the console. I would prefere a python solution for getting signal an analyse it.
Can anyone give me some suggestions? I played around with oct2py, but i dont really under stand how to calculate the time between the two peak of the signal.
I have no expertise on sound analysis with Python and this is what I found doing some internet research as far as I am interested by this topic
pyAudioAnalysis for an eponym purpose
You an use pyAudioAnalysis developed by Theodoros Giannakopoulos
Towards your end, function mtFileClassification() from audioSegmentation.py can be a good start. This function
splits an audio signal to successive mid-term segments and extracts mid-term feature statistics from each of these sgments, using mtFeatureExtraction() from audioFeatureExtraction.py
classifies each segment using a pre-trained supervised model
merges successive fix-sized segments that share the same class label to larger segments
visualize statistics regarding the results of the segmentation - classification process.
For instance
from pyAudioAnalysis import audioSegmentation as aS
[flagsInd, classesAll, acc, CM] = aS.mtFileClassification("data/scottish.wav","data/svmSM", "svm", True, 'data/scottish.segments')
Note that the last argument of this function is a .segment file. This is used as ground-truth (if available) in order to estimate the overall performance of the classification-segmentation method. If this file does not exist, the performance measure is not calculated. These files are simple comma-separated files of the format: ,,. For example:
0.01,9.90,speech
9.90,10.70,silence
10.70,23.50,speech
23.50,184.30,music
184.30,185.10,silence
185.10,200.75,speech
...
If I have well understood your question this is at least what you want to generate isn't it ? I rather think you have to provide it there.
Most of these information are directly quoted from his wiki which I suggest you to read it. Yet don't hesitate to reach out as far as I am really interested by this topic
Other available libraries for audio analysis :
I have a dataset for store inventory management. For every product I have the history of product Order of renewal. For example for a product A, I have:
A,last_time_of_renawal,volume_order,time_of_order
A,last_time_of_renawal1,volume_order1,time_of_order1
for every line, I have also other information like (category of product, sales number, stock_volume...)
How can I use this dataset and tensorflow (or other deep learning library) to predict the next time_of_order for a product knowing the last_time_of_order
that is too broad of a question for StackOverflow. Try to specify it using this guide.
But essentially you want to do a regression on the delta between time_of_order and last_time_of_order. That's your y. Then you have your features using category of product etc (your x).
Now you have a wide world of statistical analysis at your disposal.
If you insist on using deep-learning: Try setting up a "simple" neural network using a youtube playlist. When you succeed, you can try using your own data.
And you if you encounter problems... come back to StackOverflow with a specific programming question :) Have fun!
I'm currently working on a project where I'm taking emails, stripping out the message bodies using the email package, then I want to categorize them using labels like sports, politics, technology, etc...
I've successfully stripped the message bodies out of my emails, now I'm looking to start classifying. I've done the classic example of sentiment-analysis classification using the move_reviews corpus separating documents into positive and negative reviews.
I'm just wondering how I could apply this approach to my project? Can I create multiple classes like sports, technology, politics, entertainment, etc.? I have hit a road block here and am looking for a push in the right direction.
If this isn't an appropriate question for SO I'll happily delete it.
Edit: Hello everyone, I see that this post has gained a bit of popularity, I did end up successfully completing this project, here is a link to the code in the projects GitHub Repo:
https://github.com/codyreandeau/Email-Categorizer/blob/master/Email_Categorizer.py
The task of text classification is a Supervised Machine Learning problem. This means that you need to have labelled data. When you approached the movie_review problem, you used the +1/-1 labels to train your sentiment analysis system.
Getting back to your problem:
If you have labels for your data, approach the problem in the same manner. I suggest you use the scikit-learn library. You can draw some inspiration from here: Scikit-Learn for Text Classification
If you don't have labels, you can try an unsupervised learning approach. If you have any clue about how many categories(call the number K) you have, you can try a KMeans approach. This means, grouping the emails in K categories based on how similar they are. Similar emails will end up in similar buckets. Then inspect the clusters by hand and come up with a label. Assign new emails to the most similar cluster. If you need help with KMeans check this quick recipe: Text Clustering Recipe
Suggestion: Getting labels for emails can be easier than you think. For example, Gmail lets you export your emails with folder information. If you have categorised your email, you can take advantage of this.
To create a classifier, you need a training data set with the classes you are looking for. In your case, you may need to either:
create your own data set
use a pre-existing dataset
The brown corpus is a seminal text with many of the categories you are speaking about. This could be a starting point to help classify your emails using some package like gensim to find semantically similar texts.
Once you classify your emails, you can then train a system to predict a label for each unseen email.
I have a small number of similar types of sounds (I shall refer to these as DB_sounds) to which I need to match a recording (Rec_sounds). Each Rec_sound is short and unique and needs to be matched to its corresponding DB_sound. How do I go about matching them?
To illustrate my problem, consider the following:
Bob, with a deep voice in room A (with some background noise) says Ma
Alice, with high voice in room B says Eh
A Baby is learning to speak. His first word is Eh
Ma and Eh are 2 different types of DB_sounds, so I have to return 2 different results. I have several DB_sound samples of different people saying Ma and Eh to compare the Rec_sounds to
The sounds that I am dealing with are voice recordings of single syllables like la, ba, ne, eh, ma etc.
How should I tackle this?
I don't think audio fingerprinting will work (see spectrogram), and existing voice recognition software like this google api integration in python don't work since I am not trying to recognize human language, but just sounds.
I don't mind building something from the ground up, just point me in a direction you think will work, and please add plenty justification for why you think so.
Spectrograms of 8 samples of a baby saying EH
Time domain graphs of 8 samples of a baby saying EH
If you just want to recognize sounds, I would start with a simple procedure:
Crop silence from each sound sample (simple energy treshold).
Compute Audio Features for each sample of your database (e.g. MFCCs).
Perform a cross-validated classification procedure to map the audio features to the sound category you want to recognize.
Helpful Python Libs: scipy for reading wav files, essentia for audio feature extraction, scikit-learn for classification and other machine learning.