How to detect pitch abnormalities in audio stream? - python

I need to extract audio stream from a video and check whether it has any pitch changes or abnormalities. Ideally, we want to quantify any pitch changes in the audio stream. I'm aware that I can use ffmpeg to extract the audio stream from the video. However, what tools or programs (python?) can then be used to identify and quantify any pitch changes or abnormalities in the audio stream?

Pitch analysis is not an easy task, luckily there are existing solutions for that. https://pypi.org/project/crepe/ is an example that looks promising.
You could read the resulting CSV of pitch data into a Pandas dataframe and perform whatever data analysis you can think of.
For example for the pitch change analysis you could do
df['pitch_change'] = df.frequency.diff(periods=1)
To get a column representing the pitch change of every time unit.

Related

OpenCV - Apply multiple image analysis algorithms with different runtimes and synchronize the results to one image

I'm struggling with a real-time application I'm currently writing. I capture a webcam stream and apply multiple image processing algorithms to each individual frame, e.g. to get the emotion of a person in the frame and to detect objects in it.
Unfortunately, the algorithms have different runtimes and since some are based on neural networks, those in particular are slow.
My goal is, that I want to show a video stream without lags. I don't care if an image processing algorithm grabs only every n-th frame or shows the results with a delay.
To get rid of the lags, I put the image processing in different threads but I wonder if there is a more sophisticated way to synchronize my analysis on the video stream's frames - or maybe even a library that helps building pipelines for real time data analytics?
Every hint is welcome!

How do I use OpenPose data to segment a long clip?

I love the OpenPose library -- and I've been playing with the demo for a while. I like the option of it spitting out JSON file data of the poses.
I wanted to ask -- are there any examples I've missed or solutions where someone takes that pose keypoints data and uses it to segment a long clip?
For example: If I wanted to cut a clip of one person punching the other -- and use that to train a network to segment a different longer clip to TRIM only the punch if any in the other clip.
Any help would be appreciated. Using Python/Tensorflow
OpenPose analyzes each frame of the video. You just need to step into it to run your analysis and decide if you save that part or not.
You can import the video as a CV VideoCapture, extract each frame into cv Mat, convert using CV2OPMAT, exctract keypoints and run your "punch detection" on a frame. You can reference OpenPose examples for image analysis. If the frame qualifies save the frame before conversions (CV MAT) back to video using CV Video Writer like in this example: https://www.life2coding.com/convert-image-frames-video-file-using-opencv-python/
Extra consideration, you may need to convert pixels into BGR format using CV CVTCOLOR.
Let me know if it works :)

3D point cloud from continuous video stream of two (stereo) cameras

I have continuous videos taken from two cameras placed on up right and up left corners of my car's windshield (please note that they are not fixed to each other, and I aligned them approximately straight). Now I am trying to make a 3D point cloud out of that and have no idea how to do that. I surfed the internet a lot and still couldn't find any useful info. Can you send me some links or hints on how can I make that work in Python.
You can try the stereo matching and point cloud generation implementation in the OpenCV library. Start with this short Python sample.
I suppose that you have two independent video streams that are not exactly synchronized. You will have to synchronize them first, because the linked sample expects two images, not videos. Extract images from videos using OpenCV or ffmpeg and find an image pair that shares exactly the same timepoint (e.g. green appearing on a traffic light). Alternatively you can use the audio tracks for synchronization, see https://github.com/benkno/audio-offset-finder. Beware: synchronization based on a single frame pair or a short audio excerpt will probably work only for few minutes before and after the synchronized timepoint.

Extracting features from audio signal

I have just started to work on data in the form of audio. I am using librosa as a tool. My project requires me to extract features like:
Total duration of the audio
Minimum Intensity of the audio signal
Maximum Intensity of the audio signal
Mean Intensity of the audio signal
Jitter
Rate of speaking
Number of Pauses
Maximum Duration of Pauses
Average Duration of Pauses
Total Duration of Pauses
Although, I know about these terms but I have no idea how to extract these from an audio file. Are these inbuilt in some form in the librosa.feature variable? Or we need to manually calculate these? Can someone guide me how to proceed?
I know that this job can be performed using softwares like Praat, but I need to do it in python.
Praat can be used for spectral analysis (spectrograms), pitch
analysis, formant analysis, intensity analysis, jitter, shimmer, and
voice breaks.

Multitrack Recording in PsychoPy

Is it possible to record from a mic while playback of an audio file continues?
If so, can I use a headphone splitter to record exactly what the listener hears to the same track?
Ideally, I would like a stereo audio file wherein one track is the original audio file, and the second track is what the mic simultaneously recorded.
Context:
In my experiment, participants will listen to audio clips, then attempt to synchronize with them using a musical instrument, while the audio clip continues to play.
It's really important I'm able to analyze how closely they are able to reproduce/temporally coordinate with the stimulus. I'm not too concerned with quality, as long as I can accurately compare event onsets.

Categories

Resources