I have been searching and reading the approach of representing raw binaries or executable to a Spectrogram figure in Python. The thing I found was representing an already audio file, as .wav, into spectrogram
(https://fairyonice.github.io/implement-the-spectrogram-from-scratch-in-python.html)
(https://pythontic.com/visualization/signals/spectrogram)
My understanding that any file in my computer is a list of 0's and 1's. So if I want to represent an executable as a spectrogram, it is important first to define the sampling rate and bit depth. I was thinking of having 8000 samples/second as a sample rate and each sample is 1 byte. Then, we can represent the generated wave signal and the spectrogram.
Please let me know if I am misunderstanding anything.
Related
I have a large library of many pre-recorded music notes (some ~1200), which are all of consistant amplitude.
I'm researching methods of layering two notes over each other so that it sounds like a chord where both notes are played at the same time.
Samples with different attack times:
As you can see, these samples have different peak amplitude points, which need to line up in order to sound like a human played chord.
Manually aligned attack points:
The 2nd image shows the attack points manually alligned by ear, but this is a unfeasable method for such a large data set where I wish to create many permutations of chord samples.
I'm considering a method whereby I identify the time of peak amplitude of two audio samples, and then align those two peak amplitude times when mixing the notes to create the chord. But I am unsure of how to go about such an implementation.
I'm thinking of using python mixing solution such as the one found here Mixing two audio files together with python with some tweaking to mix audio samples over each other.
I'm looking for ideas on how I can identify the times of peak amplitude in my audio samples, or if you have any thoughts on other ways this idea could be implemented I'd be very interested.
Incase anyone were actually interested in this question, I have found a solution to my problem. It's a little convoluded, but it has yeilded excellent results.
To find the time of peak amplitude of a sample, I found this thread here: Finding the 'volume' of a .wav at a given time where the top answer provided links to a scala library called AudioFile, which provided a method to find the peak amplite by going through a sample in frame buffer windows. However this library required all files to be in .aiff format, so a second library of samples was created consisting of all the old .wav samples converted to .aiff.
After reducing the frame buffer window, I was able to determine in which frame the highest amplitude was found. Dividing this frame by the sample rate of the audio samples (which was known to be 48000), I was able to accurately find the time of peak amplitude. This information was used to create a file which stored both the name of the sample file, along with its time of peak amplitude.
Once this was accomplished, a python script was written using the Pydub library http://pydub.com/ which would pair up two samples, and find the difference (t) in their times of peak amplitudes. The sample with the lowest time of peak amplitude would have silence of length (t) preappended to it from a .wav containing only silence.
These two samples were then overlayed onto each other to produce the accurately mixed chord!
I need to analyse a sound file, in order to get when the sound is louder.
I have this :
rate, data = wavfile.read('test.wav')
I know the meaning of the rate value, but what really is in the data variable ?
It works well when I want to retrieve the time intervals of the louder part of the audio, by looking at the data list, but I can't really find out the meaning of this list...
Thank you very much
data holds a numpy array representing the sound in your .wav file.
Some good explanations on how the sound is represented in that data can be found in the following question:
What do the bytes in a .wav file represent?
data in wav files is audio samples. Most of the time it's 16bit signed integers. For wav files you mostly care about rate (sound frequency) and number of channels (if your wav file is not mono).
I wired up the MCP3008 ADC chip to an Electret Microphone and to my pi. I'm reading the input using bit-banging in python, and I'm getting an integer from 0-1024.
I followed this tutorial to do the bit-banging: https://learn.adafruit.com/reading-a-analog-in-and-controlling-audio-volume-with-the-raspberry-pi/connecting-the-cobbler-to-a-mcp3008
My question is how do I take this integer and convert it to something meaningful? Can I somehow write these bytes to a file in python to get the raw audio data that Audacity can play? Right now when I try to write the values they just show up as the integer instead of binary. I'm really new to python, and I've found this link for converting the raw data, but I'm having trouble generating the raw data first:Python open raw audio data file
I'm not even sure what these values represent, are they PCM data that I have to do math with related to time?
What you are doing here is sampling a time-varying analogue signal. so yes, the values you obtain are PCM - but with a huge caveat (see below). If you write them as a WAV file (possibly using this to help you), you will be able to open them in Audacity. You could either convert the values to unsigned 8-bit (by truncation and) or to 16-bit signed with a shift and subtraction.
The caveat is that PCM is the modulation of a sample clock with the signal. The clock signal in your case is the frequency with which you bit-bang the ADC.
Practically, it is very difficult to arrange for this to be regular in software - and particularly when bit-banging the device from a high-level language such as Python. You need to sample at twice the bandwidth of the signal (Nyquist's law) - so realistically, 8kHz for telephone speech quality.
An irregular sample clock will also result in significant artefacts - which you will hear as distortion.
I've got some raw ADPCM compressed audio streams and I want to play them with pygame, but as far as I know this isn't possible with pygame. How can I decompress them with python to normal PCM streams (or something else pygame can play) and then play them with pygame?
I already tried the audioop module as it has got something that converts ADPCM to linear streams but I neither know what linear streams are nor how to use the function that converts them.
I already tried the audioop module as it has got something that converts ADPCM to linear streams but I neither know what linear streams are nor how to use the function that converts them.
The short version: "Linear" is what you want.* So, the function you want is adpcm2lin.
How do you use it?
Almost everything in audioop works the same way: you loop over frames, and call a function on each frame. If your input data has some inherent frame size, like when you're reading from an MP3 file (using an external library), or your output library demands some specific frame size, you're a bit constrained on how you determine your frames. But when you're dealing with raw PCM formats, the frames are whatever size you want, from a single sample to the whole file.**
Let's do the whole file first, for simplicity:
with open('spam.adpcm', 'rb') as f:
adpcm = f.read()
pcm, _ = audioop.adpcm2lin(adpcm, 2, None)
If your adpcm file is too big to load into memory and process all at once, you'll need to keep track of the state, so:
with open('spam.adpcm', 'rb') as f:
state = None
while True:
adpcm = f.read(BLOCKSIZE)
if not adpcm:
return
pcm, state = audioop.adpcm2lin(adpcm, 2, state)
yield pcm
Of course I'm assuming that you don't need to convert the sample rate or do anything else. If you do, any such conversions should come after the ADPCM decompression.***
* The long version: "Linear" means the samples are encoded directly, rather than mapped through another algorithm. For example, if you have a 16-bit A-to-D, and you save the audio in an 8-bit linear PCM file, you're just saving the top 8 bits of each sample. That gives you a very dynamic range, so quieter sounds get lost in the noise. There are various companding algorithms that give you a much wider dynamic range for the same number of bits (at the cost of losing other information elsewhere, of course); see μ-law algorithm for details on how they work. But if you can stay in 16 bits, linear is fine.
** Actually, with 4-bit raw ADPCM, you really can't do a single sample… but you can do 2 samples, which is close enough.
*** If you're really picky, you might want to convert to 32-bit first, then do the work, then convert back to 16-bit to avoid accumulating losses. But when you're starting with 4-bit ADPCM, you aren't going for audiophile sound here.
The module "wave" of python gives me a list of hexadecimal bytes, that I can read like numbers. Let's say the frequency of my sample is 11025. Is there a 'header' in those bytes that specify this? I know I can use the wave method to get the frequency, but I wanna talk about the .wav file structure. It has a header? If I get those bytes, how do I know wich ones are the music and the ones that are information? If I could play these numbers in a speaker 11025 times per second with the intensity from 0 to 255, could I play the sound just like it is in the file?
Thanks!
.wav files are actually RIFF files under the hood. The WAVE section contains both the format information and the waveform data. Reading the codec, sample rate, sample size, and sample polarity from the format information will allow you to play the waveform data assuming you support the codec used.