I am writing a program which takes audio input via Pyaudio, analyses the input via fft, extracts the dominant frequency and then generates a sin wave as a bytestring based upon the obtained frequency. All of those steps work (kinda).
However when it comes to playing the sin wave via a pyaudio callback stream, the signal gets very choppy. If i play the audio in a blocking stream however, the output sound fine.
This is the function that generates the sin wave(note that I left out quite a bit of surrounding code here):
self.waveData = b'' #needs to be reset here, otherwise content of this grows exponentially i guess
for i in range(0, int(fs * 0.01)): #make just small sample of 0.1 duration
pcmValue = int(volume*np.sin(2*np.pi*frequency*i/fs)) #i replaces self.x
self.waveData += struct.pack('h', pcmValue)
self.waveData+=int((duration/0.01))*self.waveData[0:len(self.waveData)] #just multiply content of bytestring with duration/samplesize(duration/0.02)
So I don't think the problem lies withing here.
This is the stream and callback function (they both sit in the same class obviously):
def __init__(self):
self.stream2 = p.open(format=pyaudio.paInt16, channels=1, rate=fs, output=True,output_device_index=5,frames_per_buffer=2**13,stream_callback=self.actuallyplaystream)
def actuallyplaystream(self,in_data, frame_count, time_info, status):
return (self.waveData, pyaudio.paContinue)
In the rest of the program I just use
self.stream2.start_stream()
and (.stop_stream()) to start the playback. As said the playback in and of it self works ok. However the choppiness in intervalls of the frames_per_buffer=2**13, setting in creating the stream. In plain english that means that, when I increase the buffer-chunksize, mentioned above, the intervalls between each 'chop' decreases and respectively increases when decreasing the chunksize.
This seems like a very easy problem to solve, however I can't seem to find the answer in any SO post or even the docs of pyaudio. I would greatly appreciate any help or even just a hint towards solving this issue.
Please don't hesitate to ask if you are confuses by the code snippets. I will figure a way out to link to all my code, if that's needed.
Thank You
(Also: if this question is inappropriate, I will immediately remove it)
Related
I'm building a simple Python application that involves altering the speed of an audio track.
(I acknowledge that changing the framerate of an audio also make pitch appear different, and I do not care about pitch of the audio being altered).
I have tried using solution from abhi krishnan using pydub, which looks like this.
from pydub import AudioSegment
sound = AudioSegment.from_file(…)
def speed_change(sound, speed=1.0):
# Manually override the frame_rate. This tells the computer how many
# samples to play per second
sound_with_altered_frame_rate = sound._spawn(sound.raw_data, overrides={
"frame_rate": int(sound.frame_rate * speed)
})
# convert the sound with altered frame rate to a standard frame rate
# so that regular playback programs will work right. They often only
# know how to play audio at standard frame rate (like 44.1k)
return sound_with_altered_frame_rate.set_frame_rate(sound.frame_rate)
However, the audio with changed speed sounds distorted, or crackled, which would not be heard with using Audacity to do the same, and I hope I find out a way to reproduce in Python how Audacity (or other digital audio editors) changes the speed of audio tracks.
I presume that the quality loss is caused by the original audio having low framerate, which is 8kHz, and that .set_frame_rate(sound.frame_rate) tries to sample points of the audio with altered speed in the original, low framerate. Simple attempts of setting the framerate of the original audio or the one with altered framerate, and the one that were to be exported didn't work out.
Is there a way in Pydub or in other Python modules that perform the task in the same way Audacity does?
Assuming what you want to do is to play audio back at say x1.5 the speed of the original. This is synonymous to saying to resample the audio samples down by 2/3rds and pretend that the sampling rate hasn't changed. Assuming this is what you are after, I suspect most DSP packages would support it (search audio resampling as the keyphrase).
You can try scipy.signal.resample_poly()
from scipy.signal import resample_poly
dec_data = resample_poly(sound.raw_data,up=2,down=3)
dec_data should have 2/3rds of the number of samples as the original raw_data samples. If you play dec_data samples at the sound's sampling rate, you should get a sped-up version. The downside of using resample_poly is you need a rational factor, and having large numerator or denominator will cause output less ideal. You can try scipy's resample function or seek other packages, which supports audio resampling.
I am trying to do a project, and in part of the project I have the user say a word which gets recorded. This word then gets the silence around it cut out, and there is a button that plays back their word without the silence. I am using librosa's librosa.effects.trim command to achieve this.
For example:
def record_audio():
global myrecording
global yt
playsound(beep1)
myrecording = sd.rec(int(seconds * fs), samplerate=fs, channels=1)
sd.wait()
playsound(beep2)
#trimming the audio
yt, index = librosa.effects.trim(myrecording, top_db=60)
However, when I play the audio back, I can tell that it is not trimming the recording. The variable explorer shows that myrecording and yt are the same length. I can hear it when I play what is supposed to be the trimmed audio clip back as well. I don't get any error messages when this occurs either. Is there any way to get librosa to actually clip the audio? I have tried adjusting top_db and that did not fix it. Aside from that, I am not quite sure what I could be doing wrong.
For a real answer, you'd have to post a sample recording so that we could inspect what exactly is going on.
In lieu of of that, I'd like to refer to this GitHub issue, where one of the main authors of librosa offers advice for a very similar issue.
In essence: You want to lower the top_db threshold and reduce frame_length and hop_length. E.g.:
yt, index = librosa.effects.trim(myrecording, top_db=50, frame_length=256, hop_length=64)
Decreasing hop_length effectively increases the resolution for trimming. Decreasing top_db makes the function less sensitive, i.e., low level noise is also regarded as silence. Using a computer microphone, you do probably have quite a bit of low level background noise.
If this all does not help, you might want to consider using SOX, or its Python wrapper pysox. It also has a trim function.
Update Look at the waveform of your audio. Does it have a spike somewhere at the beginning? Some crack sound perhaps. That will keep librosa from trimming correctly. Perhaps manually throwing away the first second (=fs samples) and then trimming solves the issue:
librosa.effects.trim(myrecording[fs:], top_db=50, frame_length=256, hop_length=64)
Im using Python VLC to build a custom playback app in pyqt. I have painted a nice custom slider to track along with the video, but hit a bit of an annoying problem.
No matter how often I tell my slider to update, it's quite glitchy (jumping every 1/4 second or so) and looks choppy (just the timeline, not the video).
Digging into it, I learned that
media_player.get_position()
Has quite a low polling rate. It returns the same value quite often then jumps a large amount the next time it gives a new value.
So right now I ran some test metrics and found it tends to update every 0.25-0.3 seconds. So now I have a system that basicay stores the last value and last system time a new value came in, and the last jump-distance in returned values and does some basic math with those things to fake proper linear timeline data between polls to make a very smooth timeline slider.
The problem is this assumes my value of every 0.25-0.3 seconds is consistent across machines, hardware, frame rates of videos etc.
Does anyone know of a better fix?
Maybe a way to increase the poll rate of VLC to give me better data to begin with - or some better math to handle smoothing?
Thanks
Using get_position() returns a value between 0.0 and 1.0, essentially a percentage of the current position measured against the total running time.
Instead you can use get_time() which returns the current position in 1000ths of a second.
i.e.
print (self.player.get_time()/1000) would print the current position in seconds.
You could also register a callback for the vlc event EventType.MediaPlayerTimeChanged, as mentioned in the other answer given by #mtz
i.e.
Where self.player is defined as:
self.Instance = vlc.Instance()
self.player = self.Instance.media_player_new()
Then:
self.vlc_event_manager = self.player.event_manager()
self.vlc_event_manager.event_attach(vlc.EventType.MediaPlayerTimeChanged, self.media_time_changed)
def media_time_changed(self,event):
print (event.u.new_time/1000)
print (self.player.get_time()/1000)
print (self.player.get_position())
Try using the libvlc_MediaPlayerPositionChanged or libvlc_MediaPlayerTimeChanged mediaplayer events instead.
https://www.videolan.org/developers/vlc/doc/doxygen/html/group__libvlc__event.html#gga284c010ecde8abca7d3f262392f62fc6a4b6dc42c4bc2b5a29474ade1025c951d
I am trying to transmit and receive a BPSK signal from an Ettus Research N210 to an Ettus Research B200. I run my received signal through gain control, clock sync, and a PLL, then try to demodulate the signal.
Here is my flowchart.
In simulation (passing the signal through a channel block instead of transmitting from one radio to the other), this flowchart works fine. Below are the results of the simulation. As you can see, the receiver sees the rotated constellation and the processing corrects for this. Everything is fine and the packets are successfully decoded.
However, when I transmit and receive from my two real radios, I no longer receive signals that resemble 2-PSK. Instead, the constellation plots of the RX signal just look like blobs.
Here is my flowchart again with the USRP blocks un-commented.
And here are the results of the transmission and receive.
I am very confused by the lack of constellation pattern in the received signal. Sometimes when I send a packet, the RX constellation takes on a more orderly oval-looking shape, but it does not look like a line. Unfortunately I was unable to catch the oval pattern on screenshot since it returns to blob pattern very quickly.
I do not think this is a hardware issue because I have successfully used these radios before for UHF GMSK stuff.
Is there something wrong with my timing recovery / processing?
Thanks yall in advance for any and all help.
Found the issue. I had set my sampling rate lower than the USRP's minimum sampling rate. After a day of frustration, I changed my sampling rate to 320k and a few things in my processing block, and now things work and I get a nice looking constellation.
Here are my updated (working) flowchart and plots.
I want to make certain frequencies in a sequence of audio data louder. I have already analyzed the data using FFT and have gotten a value for each audio frequency in the data. I just have no idea how I can use the frequencies to manipulate the sound data itself.
From what I understand so far, data is encoded in such a way that the difference between every two consecutive readings determines the audio amplitude at that time instant. So making the audio louder at that time instant would involve making the difference between the two consecutive readings greater. But how do I know which time instants are involved with which frequency? I don't know when the frequency starts appearing.
(I am using Python, specifically PyAudio for getting the audio data and Num/SciPy for the FFT, though this probably shouldn't be relevant.)
You are looking for a graphic equalizer. Some quick Googling turned up rbeq, which seems to be a plugin for Rhythmbox written in Python. I haven't looked through the code to see if the actual EQ part is written in Python or is just controlling something in the host, but I recommend looking through their source.