Playing audio from a constantly changing numpy array - python

I have a numpy array which is continuously growing in size, with a function adding data to it every so often. This array actually is sound data, which I would like to play, not after the array is done, but while it is still growing. Is there a way I can do that using pyaudio? I have tried implementing a callback, but without success. It gives me choppy audio with delay

You could perhaps intercept the event or pipeline that appends the data to your array.
In order to get rid of the choppiness you will need some kind of intermediate buffer - imagine that data comes at random intervals - sometimes there will be several data points added simultaneously and sometimes no data for a period of time, but on longer timescales there will be some average inflow value. This is a standard practice done in streaming services to increase video quality.
Adjust the buffer size and this should eliminate the choppiness. This will of course introduce initial delay in playing the data, i.e. it won't be "live", but might be close to live with less choppiness.

Related

Resampling without changing pitch and ratio

I'm doing speech recognition and denoising. In order to feed the data to my model I need to resample and make it 2 channels. although I don't know the optimized resampling rate for each sound. when I use a fixed number for resampling rate(resr) like 20000 or 16000 sometimes it works and sometimes it makes the pitch wrong or makes it slow. How does resampling work in this case? Do I need an optimizer?
Also what can I do if I have a phone call and one person's voice is too quiet that it indeed gets recognized as noise?
This is my code:
num_channels = sig.shape[0]
# Resample first channel
resig = torchaudio.transforms.Resample(sr, resr)(sig[:1,:])
print(resig.shape)
if (num_channels > 1):
# Resample the second channel and merge both channels
retwo = torchaudio.transforms.Resample(sr,resr)(sig[1:,:])
resig = torch.cat([resig, retwo])
I don't know the optimized resampling rate for each sound
Sample rate is not a parameter you tune for each audio, rather, the same sample rate that was used to train the speech recognition model should be used.
sometimes it works and sometimes it makes the pitch wrong or makes it slow.
Resampling when properly done, does not alter pitch or speed. I guess that you are saving the resulting data with wrong sample rate. Sample rate is not something you can pick an arbitrary number. You have to pick a one that conforms to the system you are working with.
Having said that the proper way to do resampling, regardless of the number of channels is to simply pass the waveform to torchaudio.functional.resample function with original and target sample rate. The function process multiple channels at the same time, so there is no need to run resample function separately for each channel.
Then, if you know the sample rate of input audio beforehand, and all the audio you process have the same sample rate, using torchaudio.transforms.Resample will make the process faster because it will cache the convolution kernel used for resampling.
resampler = torchaudio.transforms.Resample(original_sample_rate, target_sample_rate)
for sig in signals:
resig = resampler(sig)
# process the resulting resampled signal

numpy.load gets stuck, how to debug?

I have a function that generates data as a numpy array, but first checks if the data was previously generated and just loads it.
if os.path.exists(SUBJECT_DATA_PATH):
print('Found prepared patches. Loading..')
vol = np.load(SUBJECT_DATA_PATH)
return vol
This is fast when dealing with ~10 arrays it must load, but for some reason, when I increase the number around 50, the loading gets increasingly slow, up to a point where it seems to freeze...
The size of the arrays is very manageable and I can generate the 50 arrays without any problem. If I don't load the data, and generate it on the fly instead, it is done in seconds. There seem to be something about np.load() that gets VERY slow quickly...
Any obvious hints?

What is the best way to access specific tensors in a shuffled queue?

I am processing video data in python using tensorflow and want to run a loss calculation using temporal information using the current frame and the ones before and after it. After I've read in the images they are shuffled using tf.train.shuffle_batch as is necessary for the training. However later I want to access the frame before and after the current one, is there a way to access the specific tensor for those frame by maintaining (for want of a better phrase) a pointer to the tensors corresponding to those frames?
At the moment I read in all frames 3 times, once for itself and once each for the frame before and after so they can be shuffled together but this seems inefficient to be reading in and storing the same frame info multiple times.
No, there is no other way than the one you already implemented. Shuffling uses a limited buffer where items are stored and randomly sampled from. If you shuffle individual frames, you don't even have the guarantee the three frames are in the queue at the same time, let alone the possibility to know where they end up in the queue.

Most performant data structure in Python to handle live streaming market data

I am about to handle live streaming stock market data, hundreds of "ticks" (dicts) per second, store them in an in-memory data structure and analyze the data.
I was reading up on pandas and got pretty excited about it, only to learn that pandas' append function is not recommended because it copies the whole data frame on each individual append.
So it seems pandas is pretty much unusable to real time handling and analysis of high frequency streaming data, e.g. financial or sensor data.
So I'm back to native Python, which is quite OK. To save RAM, I am thinking about storing the last 100,000 data points or so on a rolling basis.
What would be the most performant Python data structure to use?
I was thinking using a list, and inserting data point number 100,001, then deleting the first element, as in del list[0]. That way, I can keep a rolling history of the last 100,000 data points, by my indices will get larger and larger. A native "rolling" data structure (as in C with a 16bit index and increments without overflow checks) seems not possible in Python?
What would be the best way to implement my real-time data analysis in Python?
The workflow you describe makes me think of a deque, basically a list that allows extending on one end (e.g. right), while popping (fetching/removing) them off the other end (e.g. left). The reference even has a short list of deque recipes to illustrate such common use cases as implementing tail or maintaining a moving average (as a generator).

Fast slicing .h5 files using h5py

I am working with .h5 files with little experience.
In a script I wrote I load in data from an .h5 file. The shape of the resulting array is: [3584, 3584, 75]. Here the values 3584 denotes the number of pixels, and 75 denotes the number of time frames. Loading the data and printing the shape takes 180 ms. I obtain this time using os.times().
If I now want to look at the data at a specific time frame I use the following piece of code:
data_1 = data[:, :, 1]
The slicing takes up a lot of time (1.76 s). I understand that my 2D array is huge but at some point I would like to loop over time which will take very long as I'm performing this slice within the for loop.
Is there a more effective/less time consuming way of slicing the time frames or handling this type of data?
Thank you!
Note: I'm making assumptions here since I'm unfamiliar with .H5 files and the Python code the accesses them.
I think that what is happening is that when you "load" the array, you're not actually loading an array. Instead, I think that an object is constructed on top of the file. It probably reads in dimensions and information related to how the file is organized, but it doesn't read the whole file.
That object mimicks an array so good that when you later on perform the slice operation, the normal Python slice operation can be executed, but at this point the actual data is being read. That's why the slice takes so long time compared to "loading" all the data.
I arrive at this conclusion because of the following.
If you're reading 75 frames of 3584x3584 pixels, I'm assuming they're uncompressed (H5 seems to be just raw dumps of data), and in that case, 75 * 3.584 * 3.584 = 963.379.200, this is around 918MB of data. Couple that with you "reading" this in 180ms, we get this calculation:
918MB / 180ms = 5.1GB/second reading speed
Note, this number is for 1-byte pixels, which is also unlikely.
This speed thus seems highly unlikely, as even the best SSDs today reach way below 1GB/sec.
It seems much more plausible that an object is just constructed on top of the file and the slice operation incurs the cost of reading at least 1 frame worth of data.
If we divide the speed by 75 to get per-frame speed, we get 68MB/sec speed for 1-byte pixels, and with 24 or 32-bit pixels we get up to 270MB/sec reading speeds. Much more plausible.

Categories

Resources