Data changed after saving with soundfile and reading with librosa - python

I am processing an audio file with librosa as:
import librosa
import soundfile as sf
y,sr = librosa.cora.load('test.wav', sr=22050)
y_processed = some_processing(y)
sf.write('test_processed.wav', y_processed , sr)
y_read = librosa.cora.load('test_processed.wav', sr=22050)
Now the issue is that y_processed and y_read do not match. My understanding is that this comes from some encoding done by soundfile library. Why is this happening and how can I get from y_processed to y_read without saving?

According to this article, librosa.load(), along with other things, normalizes the bit depth between -1 and 1.
I experienced the same problem as you did, where the min and max values of the "loaded" signal were much closer to each other.
Since I don't exactly how your data differs from each other, this may not help you, but this has helped me.
y_processed_buf = librosa.util.buf_to_float(y_processed)
This seems to be the culprit, which would normalizes your values (source code). It is also called during librosa.load(), which is how I stumbled over it.

Related

Is there a way to add gain to an audio signal with Librosa in python?

I am currently working on augmenting audio in Python. I've been using Librosa due to its speed and simplicity but need to fallback on PyDub for some other utilities such as applying gain.
Is there a mathematical way to add gain to the Numpy array provided with librosa.load? In PyDub it is quite easy but I have to constantly convert back between Pydub's get_array_of_samples() to np.array then to the proper 32 bit float representation on the [-1,1) scale (that Librosa uses by default). I'd rather keep it all in one library for simplicity.
Also a normalization of an audio signal to 0 db gain beforehand would be useful too. I am a bit new to a lot of the terminology used in audio signal processing.
This is what I am currently doing. Down the road I would like to make this a class method which starts with using librosa's numpy array, so if there is a way to mathematically add specified gain in a certain unit to a numpy array from librosa that would be ideal.
Thanks
import librosa
import numpy as np
from pydub import AudioSegment, effects
pydub_audio = AudioSegment.from_file(audio_file_path)
pydub_audio = pydub_audio.set_frame_rate(16000) # make file 16k khz frame rate
print("Original dBFS is {}".format(pydub_audio.dBFS))
pydub_audio = pydub_audio.apply_gain(20) # apply 20db of gain to introduce clipping
#pydub_audio = effects.normalize(pydub_audio)
print("New dBFS is {}".format(pydub_audio.dBFS))
pydub_array = pydub_audio.get_array_of_samples()
pydub_array = np.array(pydub_array)
print("PyDub audio type is {}".format(pydub_array.dtype))
pydub_array_32bitfloat = pydub_array.astype(np.float32, order = 'C') / 32768 # rescaling to between [-1, 1] like librosa
print("Rescaled Pydub type is {}".format(pydub_array_32bitfloat.dtype))
import soundfile as sf
sf.write(r"test_pydub_gain.wav", pydub_array_32bitfloat, samplerate = 16000, format = 'wav')
thinking about it, (if i am not wrong), mathematicaly the gain is:
dBFS = 20 * log (level2 / level1)
so i would multiply all elements of the array by
10**(dBFS/20) to apply the gain

Is there any problem with the OpenSlide.read_region function?

I am using the python API of openslide packages to read some ndpi file.When I use the read_region function, sometimes it return a odd image. What problems could have happend?
I have tried to read the full image, and it will be worked well. Therefore, I think there is no problem with the original file.
from openslide import OpenSlide
import cv2
import numpy as np
slide = OpenSlide('/Users/xiaoying/django/ndpi-rest-api/slide/read/21814102D-PAS - 2018-05-28 17.18.24.ndpi')
image = slide.read_region((1, 0),6, (780, 960))
image.save('image1.png')
The output is strange output
As the read_region documentation says, the x and y parameters are always in the coordinate space of level 0. For the behavior you want, you'll need to multiply those parameters by the downsample of the level you're reading.
This appears to be a version-realted bug, see also
https://github.com/openslide/openslide/issues/291#issuecomment-722935212
The problem seems to relate to libpixman verions 0.38.x . There is a Workaround section written by GunnarFarneback suggesting to load a different version first e.g.
export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libpixman-1.so.0.34.0
upadte easier solution is:
We are using Python 3.6.8+ and this did the trick for us: conda install pixman=0.36.0

Error passing wav file to IPython.display

I am new to Python but I am studying it as programming language for DSP. I recorded a wav file, and have been trying to play it back using IPython.display.Audio:
import IPython.display
from scipy.io import wavfile
rate, s = wavfile.read('h.wav')
IPython.display.Audio(s, rate=rate)
But this gives the following error:
struct.error: ushort format requires 0 <= number <= 0xffff
I tried installing FFmpeg but it hasn't helped.
That's not a very useful error message, it took a bit of debugging to figure out what was going on! It is caused by the "shape" of the matrix returned from wavfile being the wrong way around.
The docs for IPython.display.Audio say it expects a:
Numpy 2d array containing waveforms for each channel. Shape=(NCHAN, NSAMPLES).
If I read a (stereo) wav file I have lying around:
rate, samples = wavfile.read(path)
print(samples.shape)
I get (141120, 2) showing this is of shape (NSAMPLES, NCHAN). Passing this array directly to Audio I get a similar error as you do. Transposing the array will flip these around, causing the array to be compatible with this method. The transpose of a matrix in Numpy is accessed via the .T attribute, e.g.:
IPython.display.Audio(samples.T, rate=rate)
works for me.
Thank you for your answer, it helped me.
below is my code, maybe can help someone.
frequency = 44100
duration = 5
record = sd.rec((frequency * duration), samplerate=frequency , channels=1, blocking=True, dtype='float64')
sd.wait()
st.audio(record.T, sample_rate=frequency)

Reading pre-processed cr2 RAW image data in python

I am trying to read raw image data from a cr2 (canon raw image file). I want to read the data only (no header, etc.) pre-processed if possible (i.e pre-bayer/the most native unprocessed data) and store it in a numpy array. I have tried a bunch of libraries such as opencv, rawkit, rawpy but nothing seems to work correctly.
Any suggestion on how I should do this? What I should use? I have tried a bunch of things.
Thank you
Since libraw/dcraw can read cr2, it should be easy to do. With rawpy:
#!/usr/bin/env python
import rawpy
raw = rawpy.imread("/some/path.cr2")
bayer = raw.raw_image # with border
bayer_visible = raw.raw_image_visible # just visible area
Both bayer and bayer_visible are then a 2D numpy array.
You can use rawkit to get this data, however, you won't be able to use the actual rawkit module (which provides higher level APIs for dealing with Raw images). Instead, you'll want to use mostly the libraw module which allows you to access the underlying LibRaw APIs.
It's hard to tell exactly what you want from this question, but I'm going to assume the following: Raw bayer data, including the "masked" border pixels (which aren't displayed, but are used to calculate various things about the image). Something like the following (completely untested) script will allow you to get what you want:
#!/usr/bin/env python
import ctypes
from rawkit.raw import Raw
with Raw(filename="some_file.CR2") as raw:
raw.unpack()
# For more information, see the LibRaw docs:
# http://www.libraw.org/docs/API-datastruct-eng.html#libraw_rawdata_t
rawdata = raw.data.contents.rawdata
data_size = rawdata.sizes.raw_height * rawdata.sizes.raw_width
data_pointer = ctypes.cast(
rawdata.raw_image,
ctypes.POINTER(ctypes.c_ushort * data_size)
)
data = data_pointer.contents
# Grab the first few pixels for demonstration purposes...
for i in range(5):
print('Pixel {}: {}'.format(i, data[i]))
There's a good chance that I'm misunderstanding something and the size is off, in which case this will segfault eventually, but this isn't something I've tried to make LibRaw do before.
More information can be found in this question on the LibRaw forums, or in the LibRaw struct docs.
Storing in a numpy array I leave as an excersize for the user, or for a follow up answer (I have no experience with numpy).

Sound generation / synthesis with python?

Is it possible to get python to generate a simple sound like a sine wave?
Is there a module available for this? If not, how would you go about creating your own?
Also, would you need some kind of host environment for python to run in in order to play sound, or can it be achieved just from making calls from the terminal?
If the answer is OS-dependent, I'm using a mac.
I was looking for the same thing, In the end, I wrote this code which is working fine.
import math #import needed modules
import pyaudio #sudo apt-get install python-pyaudio
PyAudio = pyaudio.PyAudio #initialize pyaudio
#See https://en.wikipedia.org/wiki/Bit_rate#Audio
BITRATE = 16000 #number of frames per second/frameset.
FREQUENCY = 500 #Hz, waves per second, 261.63=C4-note.
LENGTH = 1 #seconds to play sound
BITRATE = max(BITRATE, FREQUENCY+100)
NUMBEROFFRAMES = int(BITRATE * LENGTH)
RESTFRAMES = NUMBEROFFRAMES % BITRATE
WAVEDATA = ''
#generating wawes
for x in xrange(NUMBEROFFRAMES):
WAVEDATA = WAVEDATA+chr(int(math.sin(x/((BITRATE/FREQUENCY)/math.pi))*127+128))
for x in xrange(RESTFRAMES):
WAVEDATA = WAVEDATA+chr(128)
p = PyAudio()
stream = p.open(format = p.get_format_from_width(1),
channels = 1,
rate = BITRATE,
output = True)
stream.write(WAVEDATA)
stream.stop_stream()
stream.close()
p.terminate()
I know I'm a little late to the game on this one, but this is a pretty fantastic python project for synthesis and audio composition: https://github.com/hecanjog/pippi
It's still actively being developed, but it's been going for a while.
After wasting time on some uncompilable or non-existent projects, I discovered the python module wavebender, which offers generation of single or multiple channels of sine, square and combined waves. The results can be written either to a wavefile or to sys.stdout, from where they can be interpreted directly by aplay in real-time. Some useful examples are explained here, and are included at the project's github page.
The Python In Music wiki page has not been terribly well-kept-up, but it's a good starting point.
http://wiki.python.org/moin/PythonInMusic
I am working on a powerful synthesizer in python. I used custom functions to write directly to a .wav file. There are built in functions that can be used for this purpose. You will need to modify the .wav header to reflect the sample rate, bits per sample, number of channels, and duration of synthesis.
Here is an early version of a sin wave generator that outputs a list of values that after applying bytearray becomes suitable for writing to the data parameter of a wave file. [edit] A conversion function will need to transform the list into little endian hex values before applying the bytearray. See the WAVE PCM soundfile format link below for details on the .wav specification.[/edit]
def sin_basic(freq, time=1, amp=1, phase=0, samplerate=44100, bitspersample=16):
bytelist = []
import math
TwoPiDivSamplerate = 2*math.pi/samplerate
increment = TwoPiDivSamplerate * freq
incadd = phase*increment
for i in range(int(samplerate*time)):
if incadd > (2**(bitspersample - 1) - 1):
incadd = (2**(bitspersample - 1) - 1) - (incadd - (2**(bitspersample - 1) - 1))
elif incadd < -(2**(bitspersample - 1) - 1):
incadd = -(2**(bitspersample - 1) - 1) + (-(2**(bitspersample - 1) - 1) - incadd)
bytelist.append(int(round(amp*(2**(bitspersample - 1) - 1)*math.sin(incadd))))
incadd += increment
return bytelist
A newer version can use waveforms to modulate the frequency, amplitude, and phase of the waveform parameters. The data format makes it trivial to blend and concatenate waves together. If this seems up your alley, check out WAVE PCM soundfile format.
I like PyAudiere , which lets you play numpy arrays as sounds... I guess it jives well with my Matlab background. I believe it's cross-platform. I think scikits.audiolab does the same thing, and may be more current / better supported... seems easier to me than trying to save things as wavfiles or write them to buffers and use Python's builtin sound library.
I found these two python repositories very useful, might wanna have a look at it...
python https://github.com/JeremyCCHsu/Python-Wrapper-for-World-Vocoder
ipython : https://timsainb.github.io/spectrograms-mfccs-and-inversion-in-python.html
[EDIT] As pointed out, here is an explanational of the two links
python one seems to have an error, but many people were able to make it run, so I'm not sure. ipython worked like a charm, so I hope you can run it.
Both of the links are supposed to take an audio as an input, preferably .wav file. Featurize it ( USE FFT : 512, step size = 512/8 ) to obtain spectrograms ( you can even visualize it ), it's a 2D matrix, and then train your Machine learning objects or do whatever you want using a matrix that represents the original audio. If you want, at anypoint, what those vectors represent you can resynthesize audio back as well.

Categories

Resources