I am currently trying to write a program that will detect a notification sound from an app and press some keys in response. I want it to be able to detect a specific sound from a sound file. This app also has background noise coming from it.
My research has been less than successful in finding an audio recognition software. If worst comes to worst I can also just use something that pulls audio from input and finagle OBS.
Related
I've been searching for quite some time now, but the only example I've seen where camera motion keyframes (movements) can be extracted from a video is in blender. I'm looking for a script, library, colab notebook, or any recommendation to be able input a video (for example, getting the camera keyframe movements from a video of a drone flying around) and exporting it to any type of file. The main purpose of this is double-sided: I want to be able to take the camera keyframe movements from the drone video and input it into an AI Art Animation and I want to be able to learn and add this python tool or library into a colab notebook of AI tools I've been slowly building over the months. Any help from you masterful wizards will be appreciated. Thanks!
I'm a "hacker", "novice-intermediate" level python coder, meaning I can read code and understand most basic and some advanced type coding. I'm able to read code quite well and manipulate it to my preferences, but I just don't even know where to start when it comes to the request I'm asking, ergo my post to Stack.
I am working on a project to control the PC exclusively by voice control and gestures(via webcam). So, with the voice control I open the app(for example, YouTube). Now, without typing anything on the search bar, I want to do it through voice typing (without even touching the keyboard), like if I say "search water videos" the cursor will automatically search the thing for me, and give me the result.
Basically, I want to find a text box on a screen of an app using image processing.
There shall be some predefined keywords like, search(for searching), delete(for deleting anything that is mistyped) Go Back(to go back to the previous window), Exit(for exiting the app).
Can it be done with the help of openCV Python?
Many thanks in advance!
so currently I'm trying to make a Python script which reads the ALSA Mixer's output (or rather the general audio output) for the volumes/amplitudes of the current playing audio frequencies to trigger the GPIO ports on my Raspberry Pi, so I can effectively make an EQ out of LEDs responding to the current audio output. I want to create a real-time analysis (which is not bounded to the ALSA Mixer, whatever works, works), so I can stream my music from my iPhone via Airplay to the Raspberry Pi or watch a YouTube video and the LED-EQ is, well, doing what it's supposed to do.
My problem is that I coudn't found any library or function in python on the Internet, which let's me get the current audio output or rather the frequency amplitudes. Does anybody have an idea of how to make this thing going?
P.S.: I tried Lightshowpi but I couldn't figure out how to use Shairport-sync with it, so if anyone has an answer to that, let me know it. :)
Edit:
If there's a way to get the wave form from the e.g. the last 8 Bytes of the audio stream I may do a fourier transformation (could a fourier transformation really work in a real-time environment, because of the heavy maths load on the CPU?)
We have a python program which outputs specific waveforms over the audio to drive an LED, for an easy and cheap robot programming device.
With the windows systems that we've tested everything works fine, but on some systems the waveform seems to be altered. We've used the control panel to disable any 'enhancements' for the audio output endpoint but it doesn't seem to help.
So, is it possible, using python, to instruct Windows to play audio unchanged? Or do some of the audio gurus here have another theory of what could be affecting the audio?
Sound cards are for playing audio, not sending data. You can't rely on an arbitrary signal not being altered by the hardware, much less the software. For example, many sound cards have a capacitor in series with the output to filter out DC bias. If you try to pass a DC-biased (or very low frequency) signal through such a sound card, it will be distorted. And there's nothing you can do about it at the software level.
Background
I'm attempting to craft a simple video playback script for a small cinema that automates the playing of videos and control of the projector, sound and lighting systems. I have two video outputs, one goes to a monitor in the projection booth, and the other directly to the projector. I desire to play video (and only video) fullscreen to the projector while putting controls and a small (~1/4 screen) preview on the monitor. This will allow the projectionist to view the video being output and control the playback from the monitor in the booth while all the audience ever sees is the video output.
Problem
I am currently using Python to control VLC player (with libvlc Python bindings) to playback videos. I have everything working fine except that I can't figure out how to get a preview (direct copy) of the video being played fullscreen on the projector output into my GUI.
I have tried using the clone filter, but I cant get the cloned window to automagically appear full screen nor in my GUI. The clone filter seems like the logical choice but it seems to be VERY inflexible when it comes to specifying destination screens, fullscreen, etc. I must be able to open video windows full screen on the projector monitor. Professionalism is key and it would look bad if the projectionist had to drag a window over and double click on it when the movie started.
Currently Using:
Debian Linux
Python 2.7
wxPython
libvlc
I would like to continue using Python as I already have the code for controlling the projector, sound processor, lighting and curtain written and tested. I chose VLC because it really seems bulletproof when it comes to video playback but am not committed to it's continued use. I also chose wxWidgets for my GUI as a result of past experience but I am not stuck on that either.
This describes the direct solution and does not concentrate on any alternative or the overall design of your application.
As Your Application and VLC media player are separate processes, you will not be able to get what you want directly because there is no "shared memory" between those 2 applications. The best shot to "copy" the decoded frames from VLC will be to e.g. send a RAW Video .mts stream (ts is usually used for this kind of usecase) and send e.g. to udp://localhost:1234.
In your application, you will need to be able to receive the ts stream, "decode" it and display at the spot of interest.
For start, i would try if you are able to do this using 2 vlc players that you control manually. When you achieved that the first VLC streams to udp and outputs on the main display at the same time, and the other VLC player receives and plays the udp stream you can go on:
Find a player library that you can use directly in your wxpython application and check if it can receive the udp stream as well E.g.
https://wxpython.org/Phoenix/docs/html/wx.media.MediaCtrl.html
This player lib for example requires gstreamer as a base.
As a result, main display and the picture in your applicatoin might have a latency of some seconds. To come around this latency, the best way that i currently know is using WebRTC but this is a lot more complex setup than the above.
https://www.sipwise.org/news/technical/tv-over-webrt/
Sure in case you do some "encoding" for WebRTC or even for UDP, you would need to utilize some hardware encoder, e.g. Nvidia NVENC in order to be able to guarantee the needed resources are always there.