How to manipulate wav file data in Python? - python

I'm trying to read a wav file, then manipulate its contents, sample by sample
Here's what I have so far:
import scipy.io.wavfile
import math
rate, data = scipy.io.wavfile.read('xenencounter_23.wav')
for i in range(len(data)):
data[i][0] = math.sin(data[i][0])
print data[i][0]
The result I get is:
0
0
0
0
0
0
etc
It is reading properly, because if I write print data[i] instead I get usually non-zero arrays of size 2.

The array data returned by wavfile.read is a numpy array with an integer data type. The data type of a numpy array can not be changed in place, so this line:
data[i][0] = math.sin(data[i][0])
casts the result of math.sin to an integer, which will always be 0.
Instead of that line, create a new floating point array to store your computed result.
Or use numpy.sin to compute the sine of all the elements in the array at once:
import numpy as np
import scipy.io.wavfile
rate, data = scipy.io.wavfile.read('xenencounter_23.wav')
sin_data = np.sin(data)
print sin_data
From your additional comments, it appears that you want to take the sine of each value and write out the result as a new wav file.
Here is an example that (I think) does what you want. I'll use the file 'M1F1-int16-AFsp.wav' from here: http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/Samples.html. The function show_info is just a convenient way to illustrate the results of each step. If you are using an interactive shell, you can use it to inspect the variables and their attributes.
import numpy as np
from scipy.io import wavfile
def show_info(aname, a):
print "Array", aname
print "shape:", a.shape
print "dtype:", a.dtype
print "min, max:", a.min(), a.max()
print
rate, data = wavfile.read('M1F1-int16-AFsp.wav')
show_info("data", data)
# Take the sine of each element in `data`.
# The np.sin function is "vectorized", so there is no need
# for a Python loop here.
sindata = np.sin(data)
show_info("sindata", sindata)
# Scale up the values to 16 bit integer range and round
# the value.
scaled = np.round(32767*sindata)
show_info("scaled", scaled)
# Cast `scaled` to an array with a 16 bit signed integer data type.
newdata = scaled.astype(np.int16)
show_info("newdata", newdata)
# Write the data to 'newname.wav'
wavfile.write('newname.wav', rate, newdata)
Here's the output. (The initial warning means there is perhaps some metadata in the file that is not understood by scipy.io.wavfile.read.)
<snip>/scipy/io/wavfile.py:147: WavFileWarning: Chunk (non-data) not understood, skipping it.
WavFileWarning)
Array 'data'
shape: (23493, 2)
dtype: int16
min, max: -7125 14325
Array 'sindata'
shape: (23493, 2)
dtype: float32
min, max: -0.999992 0.999991
Array 'scaled'
shape: (23493, 2)
dtype: float32
min, max: -32767.0 32767.0
Array 'newdata'
shape: (23493, 2)
dtype: int16
min, max: -32767 32767
The new file 'newname.wav' contains two channels of signed 16 bit values.

Related

How to read float values from binary data in Python?

I have a binary file in which data segments are interspersed. I know locations (byte offsets) of every data segment, as well as size of those data segments, as well as the type of data points (float, float32 - meaning that every data point is coded by 4 bytes). I want to read those data segments into an array like structure (for example, numpy array or pandas dataframe), but I have trouble doing so. I've tried using numpy's memmap, but it short circuits on the last data segment, and numpy's from file just gets me bizzare results.
Sample of the code:
begin=datadf["$BEGINDATA"][0] #datadf is pandas.df that has where data begins and its size
buf.seek(begin) #buf is the file that is opened in rb mode
size=datadf["$DATASIZE"][0]+1 #same as the above
data=buf.read(size) #this should get me that data segment, but in binary
Is there a way to reliably convert to float32 from this binary data.
For further clarification, I'm including printout of first 10 data points.
buf.seek(begin)
print(buf.read(40)) #10 points of float32 (4bytes) means 40
>>>b'\xa5\x10[#\x00\x00\x88#a\xf3\xf7A\x00\x00\x88#&\x93\x9bA\x00\x00\x88#\x00\x00\x00#\xfc\xcd\x08?\x1c\xe2\xbe?\x03\xf9\xa4?'
If it's any value, while there are 4 bytes (32 bit width) for each float point, every float point is capped to maximum value of 10 000
If you want a numpy.ndarray, you can just use numpy.frombuffer
>>> import numpy as np
>>> data = b'\xa5\x10[#\x00\x00\x88#a\xf3\xf7A\x00\x00\x88#&\x93\x9bA\x00\x00\x88#\x00\x00\x00#\xfc\xcd\x08?\x1c\xe2\xbe?\x03\xf9\xa4?'
>>> np.frombuffer(data, dtype=np.float32)
array([ 3.422891 , 4.25 , 30.993837 , 4.25 , 19.44685 ,
4.25 , 2. , 0.5343931, 1.4912753, 1.2888492],
dtype=float32)

Applying a simple function to CSV and save multiple csv files

I am trying to replicate the data by multiplying every value with a range of values and save the results as CSV.
I have created a function "Replicate_Data" which takes the input numpy array and multiply with a random value between a range. What is the best way to create a 100 files and save as P3D1 , P4D1 and so on.
def Replicate_Data(data: np.ndarray) -> np.ndarray:
Rep_factor= random.uniform(-3,7)
data1 = data * Rep_factor
return data1
P2D1 = Replicate_Data(P1D1)
np.savetxt("P2D1.csv", P2D1, delimiter="," , dtype = complex)
Here is an example you can use as reference.
I generate toy data named toy, then I make n random values using np.random.uniform and call it randos, then I multiply these two objects to form out using numpy broadcasting. You could also do this multiplication in a loop (the same one you save in, in fact): depending on the size of your input array it could be very memory intensive as I've written it. A more complete answer probably depends on the shape of your input data.
import numpy as np
toy = np.random.random(size=(2,2)) # a toy input array
n = 100 # number of random values
randos = np.random.uniform(-3,7,size=n) # generate 100 uniform randoms
# now multiply all elements in toy by the randoms in randos
out = toy[None,...]*randos[...,None,None] # this depends on the shape.
# this will work only if toy has two dimensions. Otherwise requires modification
# it will take a lot of memory... 100*toy.nbytes worth
# now save in the loop..
for i,o in enumerate(out):
name = 'P{}D1'.format(str(i+1))
np.savetxt(name,o,delimiter=",")
# a second way without the broadcasting (slow, better on memory)
# more like 2*toy.nbytes
#for i,r in enumerate(randos):
# name = 'P{}D1'.format(str(i+1))
# np.savetxt(name,r*toy,delimiter=",")

Reading a wav file with scipy and librosa in python

I am trying to load a .wav file in Python using the scipy folder. My final objective is to create the spectrogram of that audio file. The code for reading the file could be summarized as follows:
import scipy.io.wavfile as wav
(sig, rate) = wav.read(_wav_file_)
For some .wav files I am receiving the following error:
WavFileWarning: Chunk (non-data) not understood, skipping it.
WavFileWarning) ** ValueError: Incomplete wav chunk.
Therefore, I decided to use librosa for reading the files using the:
import librosa
(sig, rate) = librosa.load(_wav_file_, sr=None)
That is working properly for all cases, however, I noticed a difference in the colors of the spectrogram. While it was the same exact figure, however, somehow the colors were inversed. More specifically, I noticed that when keeping the same function for calculation of the specs and changing only the way I am reading the .wav there was this difference. Any idea what can produce that thing? Is there a default difference between the way the two approaches read the .wav file?
EDIT:
(rate1, sig1) = wav.read(spec_file) # rate1 = 16000
sig, rate = librosa.load(spec_file) # rate 22050
sig = np.array(α*sig, dtype = "int16")
Something that almost worked is to multiple the result of sig with a constant α alpha that was the scale between the max values of the signal from scipy wavread and the signal derived from librosa. Still though the signal rates were different.
This sounds like a quantization problem. If samples in the wave file are stored as float and librosa is just performing a straight cast to an int, and value less than 1 will be truncated to 0. More than likely, this is why sig is an array of all zeros. The float must be scaled to map it into range of an int. For example,
>>> a = sp.randn(10)
>>> a
array([-0.04250369, 0.244113 , 0.64479281, -0.3665814 , -0.2836227 ,
-0.27808428, -0.07668698, -1.3104602 , 0.95253315, -0.56778205])
Convert a to type int without scaling
>>> a.astype(int)
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
Convert a to int with scaling for 16-bit integer
>>> b = (a* 32767).astype(int)
>>> b
array([ -1392, 7998, 21127, -12011, -9293, -9111, -2512, -42939,
31211, -18604])
Convert scaled int back to float
>>> c = b/32767.0
>>> c
array([-0.04248177, 0.24408704, 0.64476455, -0.36655782, -0.28360851,
-0.27805414, -0.0766625 , -1.31043428, 0.9525132 , -0.56776635])
c and b are only equal to about 3 or 4 decimal places due to quantization to int.
If librosa is returning a float, you can scale it by 2**15 and cast it to an int to get same range of values that scipy wave reader is returning. Since librosa is returning a float, chances are the values going to lie within a much smaller range, such as [-1, +1], than a 16-bit integer which will be in [-32768, +32767]. So you need to scale one to get the ranges to match. For example,
sig, rate = librosa.load(spec_file, mono=True)
sig = sig × 32767
If you yourself do not want to do the quantization, then you could use pylab using the pylab.specgram function, to do it for you. You can look inside the function and see how it uses vmin and vmax.
It is not completely clear from your post (at least for me) what you want to achieve (as there is also neither a sample input file nor any script beforehand from you). But anyways, to check if the spectrogram of a wave file has significant differences depending on the case that the signal data returned from any of the read functions is float32 or int, I tested the following 3 functions.
Python Script:
_wav_file_ = "africa-toto.wav"
def spectogram_librosa(_wav_file_):
import librosa
import pylab
import numpy as np
(sig, rate) = librosa.load(_wav_file_, sr=None, mono=True, dtype=np.float32)
pylab.specgram(sig, Fs=rate)
pylab.savefig('spectrogram3.png')
def graph_spectrogram_wave(wav_file):
import wave
import pylab
def get_wav_info(wav_file):
wav = wave.open(wav_file, 'r')
frames = wav.readframes(-1)
sound_info = pylab.fromstring(frames, 'int16')
frame_rate = wav.getframerate()
wav.close()
return sound_info, frame_rate
sound_info, frame_rate = get_wav_info(wav_file)
pylab.figure(num=3, figsize=(10, 6))
pylab.title('spectrogram pylab with wav_file')
pylab.specgram(sound_info, Fs=frame_rate)
pylab.savefig('spectrogram2.png')
def graph_wavfileread(_wav_file_):
import matplotlib.pyplot as plt
from scipy import signal
from scipy.io import wavfile
import numpy as np
sample_rate, samples = wavfile.read(_wav_file_)
frequencies, times, spectrogram = signal.spectrogram(samples,sample_rate,nfft=1024)
plt.pcolormesh(times, frequencies, 10*np.log10(spectrogram))
plt.ylabel('Frequency [Hz]')
plt.xlabel('Time [sec]')
plt.savefig("spectogram1.png")
spectogram_librosa(_wav_file_)
#graph_wavfileread(_wav_file_)
#graph_spectrogram_wave(_wav_file_)
which produced the following 3 outputs:
which apart from the minor differences in size and intensity seem quite similar, no matter the read method, library or data type, which makes me question a little, for what purpose need the outputs be 'exactly' same and how exact should they be.
I do find strange though that the librosa.load() function offers a dtype parameter but works anyways only with float values. Googling in this regard led to me to only this issue which wasn't much help and this issue says that that's how it will stay with librosa, as internally it seems to only use floats.
To add on to what has been said, Librosa has a utility to convert integer arrays to floats.
float_audio = librosa.util.buf_to_float(sig)
I use this to great success when producing spectrograms of Pydub audiosegments. Keep in mind, one of its arguments is the number of bytes per sample. It defaults to 2. You can read about it more in the documentation here. Here is the source code:
def buf_to_float(x, n_bytes=2, dtype=np.float32):
"""Convert an integer buffer to floating point values.
This is primarily useful when loading integer-valued wav data
into numpy arrays.
See Also
--------
buf_to_float
Parameters
----------
x : np.ndarray [dtype=int]
The integer-valued data buffer
n_bytes : int [1, 2, 4]
The number of bytes per sample in `x`
dtype : numeric type
The target output type (default: 32-bit float)
Returns
-------
x_float : np.ndarray [dtype=float]
The input data buffer cast to floating point
"""
# Invert the scale of the data
scale = 1./float(1 << ((8 * n_bytes) - 1))
# Construct the format string
fmt = '<i{:d}'.format(n_bytes)
# Rescale and format the data buffer
return scale * np.frombuffer(x, fmt).astype(dtype)

Adding Elements to numpy array Changes Input Value

Code:
from PIL import Image
import numpy as np
img = Image.open('test.tif')
imarray = np.zeros(shape = (34,23,18))
for i in range(34): # there are 34 images in the .tif file
for j in range(18): # each slice has size 18x23
for k in range(23):
try:
img.seek(i)
imarray[i,k,j] = img.getpixel((k,j))
except EOFError:
break
The purpose of this code is to accept .tif greyscale stacks. I want to be able to work with them as numpy arrays, so storing the original pixel values is essential.
This code successfully copies each slice to the np.array "imarray." However, it changes the values. For example, I printed all of the "img.getpixel" values for a given slice, and the values (type int) ranged between 2000 and 65500. However, the values in imarray (type float64) did not exceed 2800. I tried casting, ie:
imarray[0,j,i] = np.float64(img.getpixel((j,i)))
But it did not help. How can I revise this code to avoid my input data (img.getpixels) changing? If there are better alternatives to this approach, I'm happy to hear

Convert large array of floats to colors and write to binary efficiently

I am working with large NetCDF4 files (about 1 GB and up but less than my 8 GB memory for now). 99% of the time the data type will be a float32. I want to map these values to an array of RGB colors which I will then write to a binary file to be read by another application for viewing. Because I only need 1 byte for each R, G, and B, I want to have an array of np.uint8 to represent this. In the end the array will take up 25% less space than the floats. However, as the original data is big, I don't want to keep both the original data and the color data in memory at the same time. For now I provide a color for the low value and the color for the high value. The problem is that in my program for a short period time, the color data consists of floats instead of np.uint8, which leads to taking up 3 times as much memory as the original data. Is there a way to skip the float conversion or at least only have one float in memory so that I don't take up this much memory? I have provided relevant code below:
from netCDF4 import Dataset
import numpy as np
import dask.array as da
import gc
import time
import sys
# Read file path
file_path = sys.argv[1]
# Default colors is blue for low and red for high
lowColor = np.array([0, 0, 255], dtype=int)
highColor = np.array([255, 0, 0], dtype=int)
data = Dataset(file_path)
allVariables = data.variables
# Sometimes we have time_bnds, lat_bnds, etc.
# Keep anything that doesn't have 'bnds'
varNames = list(filter(lambda x: 'bnds' not in x, list(allVariables.keys())))
# Remove the dimensions
varNames = list(filter(lambda x: x not in data.dimensions, varNames))
var = varNames[0]
flattened = allVariables[var][:].flatten()
origShape = allVariables[var].shape
if isinstance(flattened, np.ma.core.MaskedArray):
flattened = flattened.filled(np.nan)
# Find the minimum value and the range of values.
# Using these two we can make a percentage of how
# far 'up' each value and simply convert colors
# based on that. Because there's a chance of the data
# having NaNs, I can't use ptp().
lowVal = np.nanmin(flattened)
ptp = np.nanmax(flattened) - lowVal
# Subtract the min from each value and divide by ptp
# and add a dimension for dot product later.
percents = ((flattened - lowVal) / ptp)[np.newaxis, :]
# Remove flattened from memory as it is not needed anymore
flattened = None
gc.collect()
# Calculate the color difference
diff = (highColor - lowColor)[np.newaxis, :].T
# Do the dot product to create a list of colors
# Transpose so each color is each row. Also
# add the low color
colors = lowColor + np.dot(diff, percents).T # All floats here
# Round each value and cast to uint8 and finally reshape to
# the original data
colors = np.round(colors).astype(np.uint8)
colors = colors.reshape(origShape + (3,))
colors.tofile('colors_' + allVariables[var].name + '.bin')

Categories

Resources