finding speed and tone of speech in an audio using python

finding speed and tone of speech in an audio using python - python

Given an audio , I want to calculate the pace of the speech. i.e how fast or slow is it.
Currently I am doing the following:
- convert speech to text and obtaining a transcript (using a free tool).
- count number of words in transcript.
- calculate length or duration of file.
- finally, pace = (number of words in transcript / duration of file).
However the accuracy of the pace obtained is dependent purely on transcription , which I think is an unnecessary step.
Is there any python-library/sox/ffmpeg way that will enable me to
to calculate, in a straightforward way,the speed/pace of talk in an audio
dominant Pitches/tones of that audio?
I referred : I referred : http://sox.sourceforge.net/sox.html and https://digitalcardboard.com/blog/2009/08/25/the-sox-of-silence/

Your method sounds interesting as a quick first-order approximation, but limited by the transcript resolution. You can analyze directly the audio file.
I'm not familiar with Sox, but from their manual seems like the stat option gives "... time and frequency domain statistical information about the audio"
Sox claims to be a "Swiss Army knife of audio manipulation", and just by skimming through their docs seems like it might suit you to find the general tempo.
If you want to run pitch analysis too, then you can develop your own algorithm with python - I recently used librosa and found it very useful and well documented.

Related

Separate music instruments and estimate note density

I have a .wav file of a musical performance with three instruments: a clarinet, a bass and drums. My goal is to detect the note speed played by the clarinet player (so I want the number of notes played every five seconds: ex: between 0s and 5s = 4 notes played; between 5s and 10s = 2 notes played, etc.).
To separate I used spleeter by Deezer, and to calculate the note density, I used the librosa function librosa.onset.onset_detect(). However, I am note quite satisfied with the result.
Does anybody have a better idea to solve my problem?

This problem that can be divided into two parts :
Transform audio data to musical data (trying for the three instruments to be separated during this step)
Analyze the density of musical notes
To make the first step as consistent as possible, you should use a single library. For instance you could use an audio to midi converter (there is one python library here) to get the musical data. Then you could do the rest of the job by yourself, counting MIDI notes and so on (which would be a lot simpler than working with audio).

Recognize start of piano music in an MP3 file which starts with a spoken introduction, and remove spoken part, using Python

I have a number of .mp3 files which all start with a short voice introduction followed by piano music. I would like to remove the voice part and just be left with the piano part, preferably using a Python script. The voice part is of variable length, ie I cannot use ffmpeg to remove a fixed number of seconds from the start of each file.
Is there a way of detecting the start of the piano part and then know how many seconds to remove using ffmpeg or even using Python itself?.
Thank you

This is a non-trivial problem if you want a good outcome.
Quick and dirty solutions would involve inferred parameters like:
"there's usually 15 seconds of no or low-db audio between the speaker and the piano"
"there's usually not 15 seconds of no or low-db audio in the middle of the piano piece"
and then use those parameters to try to get something "good enough" using audio analysis libraries.
I suspect you'll be disappointed with that approach given that I can think of many piano pieces with long pauses and this reads like a classic ML problem.
The best solution here is to use ML with a classification model and a large data set. Here's a walk-through that might help you get started. However, this isn't going to be a few minutes of coding. This is a typical ML task that will involve collecting and tagging lots of data (or having access to pre-tagged data), building a ML pipeline, training a neural net, and so forth.
Here's another link that may be helpful. He's using a pretrained model to reduce the amount of data required to get started, but you're still going to put in quite a bit of work to get this going.

Detecting a noise in an audio stream

My goal is to be able to detect a specific noise that comes through the speakers of a PC using Python. That means the following, in pseudo code:
Sound is being played out of the speakers, by applications such as games for example,
ny "audio to detect" sound happens, and I want to detect that, and take an action
The specific sound I want to detect can be found here.
If I break that down, i believe I need two things:
A way to sample the audio that is being streamed to an audio device
I actually have this bit working -- with the code found here : https://gist.github.com/renegadeandy/8424327f471f52a1b656bfb1c4ddf3e8 -- it is based off of sounddevice example plot - which I combine with an audio loopback device. This allows my code, to receive a callback with data that is played to the speakers.
A way to compare each sample with my "audio to detect" sound file.
The detection does not need to be exact - it just needs to be close. For example there will be lots of other noises happening at the same time, so its more being able to detect the footprint of the "audio to detect" within the audio stream of a variety of sounds.
Having investigated this, I found technologies mentioned in this post on SO and also this interesting article on Chromaprint. The Chromaprint article uses fpcalc to generate fingerprints, but because my "audio to detect" is around 1 - 2 seconds, fpcalc can't generate the fingerprint. I need something which works across smaller timespaces.
Can somebody help me with the problem #2 as detailed above?
How should I attempt this comparison (ideally with a little example), based upon my sampling using sounddevice in the audio_callback function.
Many thanks in advance.

Analyse audio files with Python

I actually have Photodiode connect to my PC an do capturing with Audacity.
I want to improve this by using an old RPI1 as dedicated test station. As result the shutter speed should appear on the console. I would prefere a python solution for getting signal an analyse it.
Can anyone give me some suggestions? I played around with oct2py, but i dont really under stand how to calculate the time between the two peak of the signal.

I have no expertise on sound analysis with Python and this is what I found doing some internet research as far as I am interested by this topic
pyAudioAnalysis for an eponym purpose
You an use pyAudioAnalysis developed by Theodoros Giannakopoulos
Towards your end, function mtFileClassification() from audioSegmentation.py can be a good start. This function
splits an audio signal to successive mid-term segments and extracts mid-term feature statistics from each of these sgments, using mtFeatureExtraction() from audioFeatureExtraction.py
classifies each segment using a pre-trained supervised model
merges successive fix-sized segments that share the same class label to larger segments
visualize statistics regarding the results of the segmentation - classification process.
For instance
from pyAudioAnalysis import audioSegmentation as aS
[flagsInd, classesAll, acc, CM] = aS.mtFileClassification("data/scottish.wav","data/svmSM", "svm", True, 'data/scottish.segments')
Note that the last argument of this function is a .segment file. This is used as ground-truth (if available) in order to estimate the overall performance of the classification-segmentation method. If this file does not exist, the performance measure is not calculated. These files are simple comma-separated files of the format: ,,. For example:
0.01,9.90,speech
9.90,10.70,silence
10.70,23.50,speech
23.50,184.30,music
184.30,185.10,silence
185.10,200.75,speech
...
If I have well understood your question this is at least what you want to generate isn't it ? I rather think you have to provide it there.
Most of these information are directly quoted from his wiki which I suggest you to read it. Yet don't hesitate to reach out as far as I am really interested by this topic
Other available libraries for audio analysis :

Get sound input & Find similar sound with Python

What I wanna do is just like 'Shazam' or 'SoundHound' with Python, only sound version, not music.
For example, when I make sound(e.g door slam), find the most similar sound data in the sound list.
I don't know you can understand that because my English is bad, but just imagine the sound version of 'Shazam'.
I know that 'Shazam' doesn't have open API.
Is there any api like 'Shazam'?
Or,
How can I implement it?

There are several libraries you can use, but none of them will classify a sample as a 'door shut' for example. BUT you can use those libraries for feature extraction and build/get a data set of sound, build a classifier, train it and use it for sound classification.
The libraries:
Friture - Friture is a graphical program designed to do time-frequency analysis on audio input in real-time. It provides a set of visualization widgets to display audio data, such as a scope, a spectrum analyser, a rolling 2D spectrogram.
LibXtract - LibXtract is a simple, portable, lightweight library of audio feature extraction functions. The purpose of the library is to provide a relatively exhaustive set of feature extraction primatives that are designed to be 'cascaded' to create a extraction hierarchies.
Yaafe - Yet Another Audio Feature Extractor is a toolbox for audio analysis. Easy to use and efficient at extracting a large number of audio features simultaneously. WAV and MP3 files supported, or embedding in C++, Python or Matlab applications.
Aubio - Aubio is a tool designed for the extraction of annotations from audio signals. Its features include segmenting a sound file before each of its attacks, performing pitch detection, tapping the beat and producing midi streams from live audio.
LibROSA - A python module for audio and music analysis. It is easy to use, and implements many commonly used features for music analysis.
If you do choose to use my advise as I mention above, I recommend on
scikit-learn as Machine Learning libraries. It contains a lot of classifiers you may want to use.

The problem here is that music has structure, while the sounds you want to find may have different signatures. Using the door as an example, the weight, size and material of the door alone will influence the types of acoustic signatures it will produce. If you want to search by similarity, a bag-of-features approach may be the easy(ish) way to go. However, there are different approaches, such as taking samples by a sliding window along the spectrogram of a sound, and trying to match (by similarity) with a previous sound you recorded, decomposition of the sound, etc...

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.