Sound generation / synthesis with python?

Sound generation / synthesis with python? - python

Is it possible to get python to generate a simple sound like a sine wave?
Is there a module available for this? If not, how would you go about creating your own?
Also, would you need some kind of host environment for python to run in in order to play sound, or can it be achieved just from making calls from the terminal?
If the answer is OS-dependent, I'm using a mac.

I was looking for the same thing, In the end, I wrote this code which is working fine.
import math #import needed modules
import pyaudio #sudo apt-get install python-pyaudio
PyAudio = pyaudio.PyAudio #initialize pyaudio
#See https://en.wikipedia.org/wiki/Bit_rate#Audio
BITRATE = 16000 #number of frames per second/frameset.
FREQUENCY = 500 #Hz, waves per second, 261.63=C4-note.
LENGTH = 1 #seconds to play sound
BITRATE = max(BITRATE, FREQUENCY+100)
NUMBEROFFRAMES = int(BITRATE * LENGTH)
RESTFRAMES = NUMBEROFFRAMES % BITRATE
WAVEDATA = ''
#generating wawes
for x in xrange(NUMBEROFFRAMES):
WAVEDATA = WAVEDATA+chr(int(math.sin(x/((BITRATE/FREQUENCY)/math.pi))*127+128))
for x in xrange(RESTFRAMES):
WAVEDATA = WAVEDATA+chr(128)
p = PyAudio()
stream = p.open(format = p.get_format_from_width(1),
channels = 1,
rate = BITRATE,
output = True)
stream.write(WAVEDATA)
stream.stop_stream()
stream.close()
p.terminate()

I know I'm a little late to the game on this one, but this is a pretty fantastic python project for synthesis and audio composition: https://github.com/hecanjog/pippi
It's still actively being developed, but it's been going for a while.

After wasting time on some uncompilable or non-existent projects, I discovered the python module wavebender, which offers generation of single or multiple channels of sine, square and combined waves. The results can be written either to a wavefile or to sys.stdout, from where they can be interpreted directly by aplay in real-time. Some useful examples are explained here, and are included at the project's github page.

The Python In Music wiki page has not been terribly well-kept-up, but it's a good starting point.
http://wiki.python.org/moin/PythonInMusic

I am working on a powerful synthesizer in python. I used custom functions to write directly to a .wav file. There are built in functions that can be used for this purpose. You will need to modify the .wav header to reflect the sample rate, bits per sample, number of channels, and duration of synthesis.
Here is an early version of a sin wave generator that outputs a list of values that after applying bytearray becomes suitable for writing to the data parameter of a wave file. [edit] A conversion function will need to transform the list into little endian hex values before applying the bytearray. See the WAVE PCM soundfile format link below for details on the .wav specification.[/edit]
def sin_basic(freq, time=1, amp=1, phase=0, samplerate=44100, bitspersample=16):
bytelist = []
import math
TwoPiDivSamplerate = 2*math.pi/samplerate
increment = TwoPiDivSamplerate * freq
incadd = phase*increment
for i in range(int(samplerate*time)):
if incadd > (2**(bitspersample - 1) - 1):
incadd = (2**(bitspersample - 1) - 1) - (incadd - (2**(bitspersample - 1) - 1))
elif incadd < -(2**(bitspersample - 1) - 1):
incadd = -(2**(bitspersample - 1) - 1) + (-(2**(bitspersample - 1) - 1) - incadd)
bytelist.append(int(round(amp*(2**(bitspersample - 1) - 1)*math.sin(incadd))))
incadd += increment
return bytelist
A newer version can use waveforms to modulate the frequency, amplitude, and phase of the waveform parameters. The data format makes it trivial to blend and concatenate waves together. If this seems up your alley, check out WAVE PCM soundfile format.

I like PyAudiere , which lets you play numpy arrays as sounds... I guess it jives well with my Matlab background. I believe it's cross-platform. I think scikits.audiolab does the same thing, and may be more current / better supported... seems easier to me than trying to save things as wavfiles or write them to buffers and use Python's builtin sound library.

I found these two python repositories very useful, might wanna have a look at it...
python https://github.com/JeremyCCHsu/Python-Wrapper-for-World-Vocoder
ipython : https://timsainb.github.io/spectrograms-mfccs-and-inversion-in-python.html
[EDIT] As pointed out, here is an explanational of the two links
python one seems to have an error, but many people were able to make it run, so I'm not sure. ipython worked like a charm, so I hope you can run it.
Both of the links are supposed to take an audio as an input, preferably .wav file. Featurize it ( USE FFT : 512, step size = 512/8 ) to obtain spectrograms ( you can even visualize it ), it's a 2D matrix, and then train your Machine learning objects or do whatever you want using a matrix that represents the original audio. If you want, at anypoint, what those vectors represent you can resynthesize audio back as well.

Related

Python3 modifying wav audio data correctly

Learning how to modify different types of audio files, .wav, .mp3, etcetera using Python3 using the wave module. Specifically .wav file format, in this regard for this question. Presently, I know there are ISO standards for audio formats, and any references for this subject are greatly appreciated regarding audio standards for the .wav file format as well on a side note.
But in terms of my question, simply ignoring the RIFF, FMT headers, in a .wav file using the Python3 wave module import.
Is there a more efficient way to skip the RIFF headers, other containers, and go straight to the data container to modify its contents?
This crude example simply is converting a two-channel audio .wav file to a single-channel audio .wav file while modifying all values to (0, 0).
import wave
import struct
# Open Files
inf = wave.open(r"piano2.wav", 'rb')
outf = wave.open(r"output.wav", 'wb')
# Input Parameters
ip = list(inf.getparams())
print('Input Parameters:', ip)
# Example Output: Input Parameters: [2, 2, 48000, 302712, 'NONE', 'not compressed']
# Output Parameters
op = ip[:]
op[0] = 1
outf.setparams(op)
number_of_channels, sample_width, frame_rate, number_of_frames, comp_type, comp_name = ip
format = '<{}h'.format(number_of_channels)
print('# Channels:', format)
# Read >> Second
for index in range(number_of_frames):
frame = inf.readframes(1)
data = struct.unpack(format, frame)
# Here, I change data to (0, 0), testing purposes
print('Before Audio Data:', data)
print('After Modifying Audio Data', (0, 0))
# Change Audio Data
data = (0, 0)
value = data[0]
value = (value * 2) // 3
outf.writeframes(struct.pack('<h', value))
# Close In File
inf.close()
# Close Out File
outf.close()
Is there a better practice or reference material if simply just modifying data segments of .wav files?
Say you wanted to literally add a sound at a specific timestamp, that would be a more appropriate result to my question.

Performance comparison
Let's examine first 3 ways to read WAVE files.
The slowest one - wave module
As you might have noticed already, wave module can be painfully slow. Consider this code:
import wave
import struct
wavefile = wave.open('your.wav', 'r') # check e.g. freesound.org for samples
length = wavefile.getnframes()
for i in range(0, length):
wavedata = wavefile.readframes(1)
data = struct.unpack("<h", wavedata)
For a WAVE as defined below:
Input File : 'audio.wav'
Channels : 1
Sample Rate : 48000
Precision : 16-bit
Duration : 00:09:35.71 = 27634080 samples ~ 43178.2 CDDA sectors
File Size : 55.3M
Bit Rate : 768k
Sample Encoding: 16-bit Signed Integer PCM
it took on average 27.7s to load the full audio. The flip side to the wave module it is that is available out of the box and will work on any system.
The convenient one - audiofile
A much more convenient and faster solution is e.g. audiofile. According to the project description, its focus is on reading speed.
import audiofile as af
signal, sampling_rate = af.read(audio.wav)
This gave me on average 33 ms to read the mentioned file.
The fastest one - numpy
If we decide to skip header (as OP asks) and go solely for speed, numpy is a great choice:
import numpy as np
byte_length = np.fromfile(filename, dtype=np.int32, count=1, offset=40)[0]
data = np.fromfile(filename, dtype=np.int16, count=byte_length // np.dtype(np.int16).itemsize, offset=44)
The header structure (that tells us what offset to use) is defined here.
The execution of that code takes ~6 ms, 5x less than the audioread. Naturally it comes with a price / preconditions: we need to know in advance what is the data type.
Modifying the audio
Once you have the audio in a numpy array, you can modify it at will, you can also decide to stream the file rather than reading everything at once. Be warned though: since sound is a wave, in a typical scenario simply injecting new data at arbitrary time t will lead to distortion of that audio (unless it was silence).
As for writing the stream back, "modifying the container" would be terribly slow in Python. That's why you should either use arrays or switch to a more suitable language (e.g. C).
If we go with arrays, we should mind that numpy knows nothing about the WAVE format and therefore we'd have to define the header ourselves and write individual bytes. Perfectly feasible exercise, but clunky. Luckily, scipy provides a convenient function that has the benefits of numpy speed (it uses numpy underneath), while making the code much more readable:
from scipy.io.wavfile import write
fs = np.fromfile('audio.wav', dtype=np.int32, count=1, offset=24)[0] # we need sample rate
with open('audio_out.wav', 'a') as fout:
new_data = data.append(np.zeros(2 * fs)) # append 2 seconds of zeros
write(fout, fs, new_data)
It could be done in a loop, where you read a chunk with numpy / scipy, modify the array (data) and write to the file (with a for append).

Trying to split audio into 20ms chunks and making a spectrogram but its looking weird

I basically have this audio file that is 16-bit PCM WAV mono at 44100hz and I'm trying to convert it into a spectrogram. But I want a spectrogram of the audio every 20ms (Trying this for speech recognition), but whenever I try to compare what I have to Audacity, its really different. I'm kind of new to python so I was trying to base this off of my java knowledge. Any help would be appreciated. I think I'm either splitting the read samples incorrectly (What I did was split it every 220 elements in the array since I believe Audio Data is just samples in the time domain to get it to 20ms audio)
Here's the code I have right now:
import librosa.display
import numpy
audioPath = 'C:\\Users\\pawar\\Desktop\\Resister.wav'
audioData, sampleRate = librosa.load(audioPath, sr=None)
print(sampleRate)
new = numpy.zeros(shape=(220, 1))
counter = 0
for i in range(0, len(audioData), 882):
new = (audioData[i: i + 882])
STFT = librosa.stft(new, n_fft=882)
print(type(STFT))
audioDatainDB = librosa.amplitude_to_db(abs(STFT))
print(type(audioDatainDB))
librosa.display.specshow(audioDatainDB, sr=sampleRate, x_axis='time', y_axis='hz')
#plt.figure(figsize=(20,10))
plt.show()
counter += 1
print("Your local debug print statement ", counter)
As for the values, well I was playing around with them quite a bit trying to get it to work. Doesn't seem to be of any luck ;/
Here's the output:
https://i.stack.imgur.com/EVntx.png
And here's what Audacity shows:
https://i.stack.imgur.com/GIGy8.png
I know its not 20ms in the audacity one but you can see the two don't look even a bit similar

Get an audio sample as float number from pyaudio-stream

As I am currently about to build a device based on a Raspberry Pi for measuring some stuff from noise recorded with a sound card (e.g. variance), and trying to do this within python, I got stuck figuring out how to get a an audiosample as float-number for further calculations.
What did I do:
Took a Line-In-to-chinch-adapter and touching the plugs for generating some sort of test signal.
Recording to for example Audacity or Matlab shows plausible results, like
What I want to get:
In ideal, I want to get for example 5 frames á 1024 samples from the sound card, and convert them into a list, tuple or numpy array as a float number for further calculations.
When trying to achieve this with python/pyaudio with the code at the end of this post, I got something like this:
Due to the fact that the values I got with python seem to differ from them in Matlab (and others) by the factor of about two, I think I've overseen something or did something wrong.
I think I made a mistake somewhere at the struct.unpack region, but can't figure out where exactly or why.
I'd like to ask you for help, pointing out where the error is and what I did wrong.
Little testcode for getting some samples and plotting them:
import pyaudio
import struct
import matplotlib.pyplot as plt
FORMAT = pyaudio.paFloat32
SAMPLEFREQ = 44100
FRAMESIZE = 1024
NOFFRAMES = 220
p = pyaudio.PyAudio()
print('running')
stream = p.open(format=FORMAT,channels=1,rate=SAMPLEFREQ,input=True,frames_per_buffer=FRAMESIZE)
data = stream.read(NOFFRAMES*FRAMESIZE)
decoded = struct.unpack(str(NOFFRAMES*FRAMESIZE)+'f',data)
stream.stop_stream()
stream.close()
p.terminate()
print('done')
plt.plot(decoded)
plt.show()

Try use "numpy.fromstring" function to replace "struct.unpack":
import numpy
stream = p.open(format=FORMAT,channels=1,rate=SAMPLEFREQ,input=True,frames_per_buffer=FRAMESIZE)
data = stream.read(NOFFRAMES*FRAMESIZE)
decoded = numpy.fromstring(data, 'Float32');
let me know if this works for you

encode binary to audio python or C

using C or python (python preferred), How would i encode a binary file to audio that is then outputted though the headphone jack, also how would i decode the audio back to binary using input from the microphone jack, so far i have learned how to covert a text file to binary using python, It would be similar to RTTY communication.
This is so that i can record data onto a cassette tape.
import binascii
a = open('/Users/kyle/Desktop/untitled folder/unix commands.txt', 'r')
f = open('/Users/kyle/Desktop/file_test.txt', 'w')
c = a.read()
b = bin(int(binascii.hexlify(c), 16))
f.write(b)
f.close()

From your comments, you want to process the binary data bit by bit, turning each bit into a high or low sound.
You still need to decide exactly what those high and low sounds are, and how long each one sounds for (and whether there's a gap in between, and so on). If you make it slow, like 1/4 of a second per sound, then you're treating them as notes. If you make it very fast, like 1/44100 of a second, you're treating them as samples. The human ear can't hear 44100 different sounds in a second; instead, it hears a single sound at up to 22050Hz.
Once you've made those decisions, there are two parts to your problem.
First, you have to generate a stream of samples—for example, a stream of 44100 16-bit integers for every second. For really simple things, like playing a chunk of a raw PCM file in 44k 16-bit mono format, or generating a square wave, this is trivial. For more complex cases, like playing a chunk of an MP3 file or synthesizing a sound out of sine waves and filters, you'll need some help. The audioop module, and a few others in the stdlib, can give you the basics; beyond that, you'll need to search PyPI for appropriate modules.
Second, you have to send that sample stream to the headphone jack. There's no built-in support for this in Python. On some platforms, you can do this just by opening a special file and writing to it. But, more generally, you will need to find a third-party library on PyPI.
The simpler modules work for one particular type of audio system. Mac and Windows each have their own standards, and Linux has a half dozen different ones. There are also some Python modules that talk to higher-level wrappers; you may have to install and set up the wrapper, but once you do that, your code will work on any system.
So, let's work through one really simple example. Let's say you've got PortAudio set up on your system, and you've installed PyAudio to talk to it. This code will play square waves of 441Hz and 220.5Hz (just above middle C and low C) for just under 1/4th of a second (just because that's really easy).
import binascii
a = open('/Users/kyle/Desktop/untitled folder/unix commands.txt', 'r')
c = a.read()
b = bin(int(binascii.hexlify(c), 16))
sample_stream = []
high_note = (b'\xFF'*100 + b'\0'*100) * 50
low_note = (b'\xFF'*50 + b'\0'*50) * 100
for bit in b[2:]:
if bit == '1':
sample_stream.extend(high_note)
else:
sample_stream.extend(low_note)
sample_buffer = b''.join(sample_stream)
p = pyaudio.PyAudio()
stream = p.open(format=p.get_format_from_width(8),
channels=1,
rate=44100,
output=True)
stream.write(sample_buffer)

So you want to transmit digital information using audio? Basically you want to implement a MODEM in software (no matter if it is pure software, it's still called modem).
A modem (MOdulator-DEModulator) is a device that modulates an analog carrier signal to encode digital information, and also demodulates such a carrier signal to decode the transmitted information. The goal is to produce a signal that can be transmitted easily and decoded to reproduce the original digital data. Modems can be used over any means of transmitting analog signals, from light emitting diodes to radio. [wikipedia]
There are modems everywhere you need to transmit data over an analog media, be it sound, light or radio waves. Your TV remote probably is an infrared modem.
Modems implemented in pure software are called soft-modems. Most soft-modems I see in the wild are using some form of FSK modulation:
Frequency-shift keying (FSK) is a frequency modulation scheme in which digital information is transmitted through discrete frequency changes of a carrier wave.1 The simplest FSK is binary FSK (BFSK). BFSK uses a pair of discrete frequencies to transmit binary (0s and 1s) information.2 With this scheme, the "1" is called the mark frequency and the "0" is called the space frequency. The time domain of an FSK modulated carrier is illustrated in the figures to the right. [wikipedia]
There are very interesting applications for data transmission through atmosphere via sound waves - I guess it is what shopkick uses to verify user presence.
For Python check the GnuRadio project.
For a C library, look at the work of Steve Underwood (but please don't contact him with silly questions). I used his soft-modem to bootstrap a FAX to email gateway for Asterisk (a fax transmission is not much more than a B/W TIFF file encoded in audio for transmission over a phone line).

So, here is Abarnert's code, updated to python3.
import binascii
import pyaudio
a = open('/home/ian/Python-programs/modem/testy_mcTest.txt', 'rb')
c = a.read()
b = bin(int(binascii.hexlify(c), 16))
sample_stream = []
high_note = (b'\xFF'*100 + b'\0'*100) * 50
low_note = (b'\xFF'*50 + b'\0'*50) * 100
for bit in b[2:]:
if bit == '1':
sample_stream.extend(high_note)
else:
sample_stream.extend(low_note)
sample_buffer = ''.join(map(str, sample_stream))
p = pyaudio.PyAudio()
stream = p.open(format=p.get_format_from_width(1),
channels=1,
rate=44100,
output=True)
stream.write(sample_buffer)
# stop stream (4)
stream.stop_stream()
stream.close()
# close PyAudio (5)
p.terminate()

If you're looking for a library that does this, I'd recommend libquiet. It uses an existing SDR library to perform its modulation, and it provides binaries that will offer you a pipe at one end and will feed the sound right to your soundcard using PortAudio at the other. It has GMSK for "over the air" low bitrate transmissions and QAM for cable-based higher bitrate.

Using Wave Python Module to Get and Write Audio

So, I'm trying to use the Python Wave module to get an audio file and basically get all of the frames from it, examine them, and then write them back to another file. I tried to output the sound that I'm reading to another file just now, but it came out either as noise, or as no sound at all. So, I'm pretty sure that I'm not analyzing the file and getting the correct frames...? I'm dealing with a stereo 16-bit sound file. While I could use a simpler file to just understand the process, I eventually want to be able to accept any kind of sound file to work with, so I need to understand what the problem is.
I also noted that 32-bit sound files wouldn't be read by the Wave module - it gave me an error of "Unknown Format". Any ideas about that? Is it something I can bypass so that I could at least, for example, read 32-bit audio files, even if I can only 'render' 16-bit files?
I'm somewhat aware that wave files are interleaved between the left and right channels (first sample's for the left channel, second's for the right, etc)., but how do I separate the channels? Here's my code. I cut out the output code to just see if I'm reading the files correctly. I'm using Python 2.7.2:
import scipy
import wave
import struct
import numpy
import pylab
fp = wave.open('./sinewave16.wav', 'rb') # Problem loading certain kinds of wave files in binary?
samplerate = fp.getframerate()
totalsamples = fp.getnframes()
fft_length = 2048 # Guess
num_fft = (totalsamples / fft_length) - 2
temp = numpy.zeros((num_fft, fft_length), float)
leftchannel = numpy.zeros((num_fft, fft_length), float)
rightchannel = numpy.zeros((num_fft, fft_length), float)
for i in range(num_fft):
tempb = fp.readframes(fft_length / fp.getnchannels() / fp.getsampwidth());
#tempb = fp.readframes(fft_length)
up = (struct.unpack("%dB"%(fft_length), tempb))
#up = (struct.unpack("%dB"%(fft_length * fp.getnchannels() * fp.getsampwidth()), tempb))
#print (len(up))
temp[i,:] = numpy.array(up, float) - 128.0
temp = temp * numpy.hamming(fft_length)
temp.shape = (-1, fp.getnchannels())
fftd = numpy.fft.rfft(temp)
pylab.plot(abs(fftd[:,1]))
pylab.show()
#Frequency of an FFT should be as follows:
#The first bin in the FFT is DC (0 Hz), the second bin is Fs / N, where Fs is the sample rate and N is the size of the FFT. The next bin is 2 * Fs / N. To express this in general terms, the nth bin is n * Fs / N.
# (It would appear to me that n * Fs / N gives you the hertz, and you can use sqrt(real portion of number*r + imaginary portion*i) to find the magnitude of the signal
Currently, this will load the sound file, unpack it into a struct, and plot the sound file so that I can look at it, but I don't think it's getting all of the audio file, or it's not getting it correctly. Am I reading the wave file into the struct correctly? Are there any up-to-date resources on using Python to read and analyze wave / audio files? Any help would be greatly appreciated.

Perhaps you should try the SciPy io.wavefile module:
http://docs.scipy.org/doc/scipy/reference/io.html

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.