Analyse audio files with Python

Analyse audio files with Python - python

I actually have Photodiode connect to my PC an do capturing with Audacity.
I want to improve this by using an old RPI1 as dedicated test station. As result the shutter speed should appear on the console. I would prefere a python solution for getting signal an analyse it.
Can anyone give me some suggestions? I played around with oct2py, but i dont really under stand how to calculate the time between the two peak of the signal.

I have no expertise on sound analysis with Python and this is what I found doing some internet research as far as I am interested by this topic
pyAudioAnalysis for an eponym purpose
You an use pyAudioAnalysis developed by Theodoros Giannakopoulos
Towards your end, function mtFileClassification() from audioSegmentation.py can be a good start. This function
splits an audio signal to successive mid-term segments and extracts mid-term feature statistics from each of these sgments, using mtFeatureExtraction() from audioFeatureExtraction.py
classifies each segment using a pre-trained supervised model
merges successive fix-sized segments that share the same class label to larger segments
visualize statistics regarding the results of the segmentation - classification process.
For instance
from pyAudioAnalysis import audioSegmentation as aS
[flagsInd, classesAll, acc, CM] = aS.mtFileClassification("data/scottish.wav","data/svmSM", "svm", True, 'data/scottish.segments')
Note that the last argument of this function is a .segment file. This is used as ground-truth (if available) in order to estimate the overall performance of the classification-segmentation method. If this file does not exist, the performance measure is not calculated. These files are simple comma-separated files of the format: ,,. For example:
0.01,9.90,speech
9.90,10.70,silence
10.70,23.50,speech
23.50,184.30,music
184.30,185.10,silence
185.10,200.75,speech
...
If I have well understood your question this is at least what you want to generate isn't it ? I rather think you have to provide it there.
Most of these information are directly quoted from his wiki which I suggest you to read it. Yet don't hesitate to reach out as far as I am really interested by this topic
Other available libraries for audio analysis :

Related

AI categorical prediction for time variant data

I'm currently trying to use a sensor to measure a process's consistency. The sensor output varies wildly in its actual reading but displays features that are statistically different across three categories [dark, appropriate, light], with dark and light being out of control items. For example, one output could read approximately 0V, the process repeats and the sensor then reads 0.6V. Both the 0V reading and the 0.6V reading could represent an in control process. There is a consistent difference for sensor readings for out of control items vs in control items. An example set of an in control item can be found here and an example set of two out of control items can be found here. Because of the wildness of the sensor and characteristic shapes of each category's data, I think the best way to assess the readings is to process them with an AI model. This is my first foray into creating a model that creates a categorical prediction given a time series window. I haven't been able to find anything on the internet with my searches (I'm possibly looking for the wrong thing). I'm certain that what I'm attempting is feasible and has a strong case for an AI model, I'm just not certain what the optimal way to make it is. One idea that I had was to treat the data similarly to how an image is treated by an object detection model, with the readings as the input array and the category as the output, but I'm not certain that this is the best way to go about solving the problem. If anyone can help point me in the right direction or give me a resource, I would greatly appreciate it. Thanks for reading my post!

Decide rather a text is about "Topic A" or not - NLP with Python

I'm trying to write a python program that will decide if a given post is about the topic of volunteering. My data-sets are small (only the posts, which are examined 1 by 1) so approaches like LDA do not yield results.
My end goal is a simple True/False, a post is about the topic or not.
I'm trying this approach:
Using Google's word2vec model, I'm creating a "cluster" of words that are similar to the word: "volunteer".
CLUSTER = [x[0] for x in MODEL.most_similar_cosmul("volunteer", topn=120)]
Getting the posts and translating them to English, using Google translate.
Cleaning the translated posts using NLTK (removing stopwords, punctuation, and lemmatize the post)
Making a BOW out of the translated, clean post.
This stage is difficult for me. I want to calculate a "distance" / "similarity" / something that will help me get the True/False answer that I'm looking for, but I can't think of a good way to do that.
Thank you for your suggestions and help in advance.

You are attempting to intuitively improvise a set of steps that, in the end, will classify these posts into the two categories, "volunteering" and "not-volunteering".
You should looks for online examples that do "text classification" that are similar to your task, work through them (with their original demo data) for understanding, then adapt them incrementally to work with your data instead.
At some point, word2vec might be a helpful contributor to your task - but I wouldn't start with it. Similarly, eliminating stop-words, performing lemmatization, etc might eventually be helpful, but need not be important up front.
You'll typically want to start by acquiring (by hand-labeling if necessary) a training set of text for which you know the "volunteering" or "not-volunteering" value (known labels).
Then, create some feature-vectors for the texts – A simple starting approach that offers a quick baseline for later improvements is a "bag of words" representation.
Then, feed those representations, with the known-labels, to some existing classification algorithm. The popular scikit-learn package in Python offers many. That is: you don't yet need to be worrying about choosing ways to calculate a "distance" / "similarity" / something that will guide your own ad hoc classifier. Just feed the labeled data into one (or many) existing classifiers, and check how well they're doing. Many will be using various kinds of similarity/distance calculations internally - but that's automatic and explicit from choosing & configuring the algorithm.
Finally, when you have something working start-to-finish, no matter how modest in results, then try alternate ways of preprocessing text (stop-word-removal, lemmatization, etc), featurizing text, and alternate classifiers/algorithm paramterizations - to compare results, and thus discover what works well given your specific data, goals, and practical constraints.
The scikit-learn "Working With Text Data" guide is worth reviewing & working-through, and their "Choosing the right estimator" map is useful for understanding the broad terrain of alternate techniques and major algorithms, and when different ones apply to your task.
Also, scikit-learn contributors/educators like Jake Vanderplas (github.com/jakevdp) and Olivier Grisel (github.com/ogrisel) have many online notebooks/tutorials/archived-video-presentations which step through all the basics, often including text-classification problems much like yours.

Detecting a noise in an audio stream

My goal is to be able to detect a specific noise that comes through the speakers of a PC using Python. That means the following, in pseudo code:
Sound is being played out of the speakers, by applications such as games for example,
ny "audio to detect" sound happens, and I want to detect that, and take an action
The specific sound I want to detect can be found here.
If I break that down, i believe I need two things:
A way to sample the audio that is being streamed to an audio device
I actually have this bit working -- with the code found here : https://gist.github.com/renegadeandy/8424327f471f52a1b656bfb1c4ddf3e8 -- it is based off of sounddevice example plot - which I combine with an audio loopback device. This allows my code, to receive a callback with data that is played to the speakers.
A way to compare each sample with my "audio to detect" sound file.
The detection does not need to be exact - it just needs to be close. For example there will be lots of other noises happening at the same time, so its more being able to detect the footprint of the "audio to detect" within the audio stream of a variety of sounds.
Having investigated this, I found technologies mentioned in this post on SO and also this interesting article on Chromaprint. The Chromaprint article uses fpcalc to generate fingerprints, but because my "audio to detect" is around 1 - 2 seconds, fpcalc can't generate the fingerprint. I need something which works across smaller timespaces.
Can somebody help me with the problem #2 as detailed above?
How should I attempt this comparison (ideally with a little example), based upon my sampling using sounddevice in the audio_callback function.
Many thanks in advance.

Amazon SageMaker: TrainingJobAnalytics returns only one timestamp for inbuilt xgboost

I am trying to use TrainingJobAnalytics to plot the training and validation loss curves for a training job using XGBoost on SageMaker. The training job completes successfully and I can see the training and validation rmse values in the CloudWatch logs.
However when I try to get them in my notebook using TrainingJobAnalytics, I only get the metrics for a single timestamp and not all of them.
My code is as below:
metrics_dataframe = TrainingJobAnalytics(training_job_name=job_name).dataframe()
What's going wrong and how can I fix it?

I went the rabbit hole, with this one, but let me share my experience with monitoring training data on SageMaker "out-of-the-box".
TL;DR; Monitoring runs on 1-minute intervals resolution, thus any logs
shortened than one minute are omitted. SageMaker Debugger is also
explored as an alternative. SMD scalar minimalistic example gist.
So, to begin with, the same issue has been mentioned a couple of times:
Amazon SageMaker: TrainingJobAnalytics returns only one timestamp for inbuilt xgboost
https://github.com/aws/amazon-sagemaker-examples/issues/945
https://github.com/aws/sagemaker-python-sdk/issues/1361
None of them, however, has received a good explanation of why this is happening. So I decided to read through Amazon's official documentation.
https://aws.amazon.com/premiumsupport/knowledge-center/cloudwatch-retrieve-data-point-metrics/
If the metric is a high-resolution metric (pushed at a sub-1 minute
interval), confirm that the data points to the metric are pushed with
the --storage resolution parameter set to 1. Without this
configuration, CloudWatch doesn't store the sub-minute data points and
aggregates them into one-minute data points. In these cases, data
points for a sub-minute period aren't retrievable.
https://aws.amazon.com/cloudwatch/faqs/
Q: What resolution can I get from a Custom Metric?
https://docs.aws.amazon.com/sagemaker/latest/dg/training-metrics.html#define-train-metrics
Amazon CloudWatch supports high-resolution custom metrics and its
finest resolution is 1 second. However, the finer the resolution, the
shorter the lifespan of the CloudWatch metrics. For the 1-second
frequency resolution, the CloudWatch metrics are available for 3
hours. For more information about the resolution and the lifespan of
the CloudWatch metrics, see GetMetricStatistics in the Amazon
CloudWatch API Reference.
https://docs.aws.amazon.com/sagemaker/latest/dg/monitoring-cloudwatch.html#cloudwatch-metrics-jobs
Metrics are available at a 1-minute frequency.
So, basically for my scenario Amazon CloudWatch wasn't tooling that fit my needs.
I decided to explore SageMaker Debugger, and oh man was that hard. In theory, it also works out of the box. And it probably does, but not in a trivial "call a logger" way. You need to:
configure it correctly first (what you need to monitor)
use preexisting conventions for the most popular libraries
hook it up to your model/pipeline
a lot of "behind-the-scenes" functionality
feels like specifically made for those 2 scenarios that are always being presented in any educational videos about SageMaker debugger.
I must admit though, it is quite powerful if you are an Amazon Engineer and know how to use it and when to use it.
Finally, I decided to write a simple local debugger, which monitors a single value and then displays it - took me around 8-10 hours, as I wasn't following their conventions (and the documentation never covered the "simplest example possible"). Providing it here as a gist:
https://gist.github.com/yoandinkov/d431ffef708599cb7f24a653305d1b8f
This is based on following references:
https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_debugger.html
https://github.com/aws/amazon-sagemaker-examples/tree/master/sagemaker-debugger
https://github.com/awslabs/sagemaker-debugger#run-debugger-in-custom-container
https://github.com/awslabs/sagemaker-debugger/blob/master/docs/pytorch.md
https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-debugger/build_your_own_container_with_debugger/debugger_byoc.html
To finalize this "Alice in the (not so) Wonderland" experience, use W&B or Tensorboard. Otherwise, you'll need a substantial amount of time and a steep learning curve to understand what is going on "out-of-the-box". Might be beneficial after a while though, I don't know. (I, personally, won't use it at the current time being)
And let's not forget the most important part - have fun while exploring the myriad of possibilities in this vast weird internet place.

finding speed and tone of speech in an audio using python

Given an audio , I want to calculate the pace of the speech. i.e how fast or slow is it.
Currently I am doing the following:
- convert speech to text and obtaining a transcript (using a free tool).
- count number of words in transcript.
- calculate length or duration of file.
- finally, pace = (number of words in transcript / duration of file).
However the accuracy of the pace obtained is dependent purely on transcription , which I think is an unnecessary step.
Is there any python-library/sox/ffmpeg way that will enable me to
to calculate, in a straightforward way,the speed/pace of talk in an audio
dominant Pitches/tones of that audio?
I referred : I referred : http://sox.sourceforge.net/sox.html and https://digitalcardboard.com/blog/2009/08/25/the-sox-of-silence/

Your method sounds interesting as a quick first-order approximation, but limited by the transcript resolution. You can analyze directly the audio file.
I'm not familiar with Sox, but from their manual seems like the stat option gives "... time and frequency domain statistical information about the audio"
Sox claims to be a "Swiss Army knife of audio manipulation", and just by skimming through their docs seems like it might suit you to find the general tempo.
If you want to run pitch analysis too, then you can develop your own algorithm with python - I recently used librosa and found it very useful and well documented.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.