Slicing audio signal to detect pitch

Slicing audio signal to detect pitch - python

I am using Librosa to transcribe monophonic guitar audio signals.
I thought that, it would be a good start to "slice" the signal depending on the onset times, to detect note changes at the correct time.
Librosa provides a function that detects the local minima before the onset times. I checked those timings and they are correct.
Here is the waveform of the original signal and the times of the minima.
[ 266240 552960 840704 1161728 1427968 1735680 1994752]
The melody played is E4, F4, F#4 ..., B4.
Therefore the results should ideally be: 330Hz, 350Hz, ..., 493Hz (approximately).
As you can see, the times in the minima array, represent the time just before the note was played.
However, on a sliced signal (of 10-12 seconds with only one note per slice), my frequency detection methods have really poor results. I am confused because I can't see any bugs in my code:
y, sr = librosa.load(filename, sr=40000)
onset_frames = librosa.onset.onset_detect(y=y, sr=sr)
oenv = librosa.onset.onset_strength(y=y, sr=sr)
onset_bt = librosa.onset.onset_backtrack(onset_frames, oenv)
# Converting those times from frames to samples.
new_onset_bt = librosa.frames_to_samples(onset_bt)
slices = np.split(y, new_onset_bt[1:])
for i in range(0, len(slices)):
print freq_from_hps(slices[i], 40000)
print freq_from_autocorr(slices[i], 40000)
print freq_from_fft(slices[i], 40000)
Where the freq_from functions are taken directly from here.
I would assume this is just bad precision from the methods, but I get some crazy results. Specifically, freq_from_hps returns:
1.33818658287
1.2078047577
0.802142642257
0.531096911977
0.987532329094
0.559638134414
0.953497587952
0.628980979055
These values are supposed to be the 8 pitches of the 8 corresponding slices (in Hz!).
freq_from_fft returns similar values whereas freq_from_autocorr returns some more "normal" values but also some random values near 10000Hz:
242.748000585
10650.0394232
275.25299319
145.552578747
154.725859019
7828.70876515
174.180627765
183.731497068
This is the spectrogram from the whole signal:
And this is, for example, the spectrogram of slice 1 (the E4 note):
As you can see, the slicing has been done correctly. However there are several issues. First, there is an octave issue in the spectrogram. I was expecting some issues with that. However, the results I get from the 3 methods mentioned above are just very weird.
Is this an issue with my signal processing understanding or my code?

Is this an issue with my signal processing understanding or my code?
Your code looks fine to me.
The frequencies you want to detect are the fundamental frequencies of your pitches (the problem is also known as "f0 estimation").
So before using something like freq_from_fft I'd bandpass filter the signal to get rid of garbage transients and low frequency noise—the stuff that's in the signal, but irrelevant to your problem.
Think about, which range your fundamental frequencies are going to be in. For an acoustic guitar that's E2 (82 Hz) to F6 (1,397 Hz). That means you can get rid of anything below ~80 Hz and above ~1,400 Hz (for a bandpass example, see here). After filtering, do your peak detection to find the pitches (assuming the fundamental actually has the most energy).
Another strategy might be, to ignore the first X samples of each slice, as they tend to be percussive and not harmonic in nature and won't give you much information anyway. So, of your slices, just look at the last ~90% of your samples.
That all said, there is a large body of work for f0 or fundamental frequency estimation. A good starting point are ISMIR papers.
Last, but not least, Librosa's piptrack function may do just what you want.

Related

How can I Encode a pink noise signal to spikes with snntorch.spikegen.latency?

I have 2 pink noise signals created with a random generator
and I put that into a for-loop like:
for i in range(1000):
input[i] = numpy.random.uniform(-1,1)
for i in range(1000):
z[i] = z[i-1] + (1-b)*(z[i] - input[i-1])
Now I try to convert this via the snntorch library. I already used the rate coding part of this library and want to compare it with the latency coding part. So I want to use snntorch.spikegen.latency() but I don't know how to use it right. I changed all the parameters and got no good result.
Do you have any tips for the Encoding/Decoding part to convert this noise into a spike train and convert it back?
Thanks to everyone!

Can you share how you're currently trying to use the latency() function?
It should be similar to rate() in that you just pass z to the latency function. Though there are many more options involved (e.g., normalize=True finds the time constant to ensure all spike times occur within the range of time num_steps).
Each element in z will correspond to one spike. So if it is of dimension N, then the output should be T x N.
The value/intensity of the element corresponds to what time that spike occurs. Negative intensities don't make sense here, so either take the absolute value of z before passing it in, or level shift it.

(scipy.stats.qmc) How to do multiple randomized Quasi Monte Carlo

I want to generate many randomized realizations of a low discrepancy sequence thanks to scipy.stat.qmc. I only know this way, which directly provide a randomized sequence:
from scipy.stats import qmc
ld = qmc.Sobol(d=2, scramble=True)
r = ld.random_base2(m=10)
But if I run
r = ld_deterministic.random_base2(m=10)
twice I get
The balance properties of Sobol' points require n to be a power of 2. 2048 points have been previously generated, then: n=2048+2**10=3072. If you still want to do this, the function 'Sobol.random()' can be used.
It seems like using Sobol.random() is discouraged from the doc.
What I would like (and it should be faster) is to first get
ld = qmc.Sobol(d=2, scramble=False)
then to generate like a 1000 scrambling (or other randomization method) from this initial series.
It avoids having to regenerate the Sobol sequence for each sample and just do scrambling.
How to that?
It seems to me like it is the proper way to do many Randomized QMC, but I might be wrong and there might be other ways.

As the warning suggests, Sobol' is a sequence meaning that there is a link between with the previous samples. You have to respect the properties of 2^m. It's perfectly fine to use Sobol.random() if you understand how to use it, this is why we created Sobol.random_base2() which prints a warning if you try to do something that would break the properties of the sequence. Remember that with Sobol' you cannot skip 10 points and then sample 5 or do arbitrary things like that. If you do that, you will not get the convergence rate guaranteed by Sobol'.
In your case, what you want to do is to reset the sequence between the draws (Sobol.reset). A new draw will be different from the previous one if scramble=True. Another way (using a non scrambled sequence for instance) is to sample 2^k and skip the first 2^(k-1) points then you can sample 2^n with n<k-1.

Do you know why the filtered output always starts at value zero? spicy filter python eeg

it is my first question in stack.
For EEG filters I try to use lfilter from spicy by the next function:
def butter_lowpass_filter(data):
b, a = butter(3, 0.05)
y = lfilter(b, a, data)
return y
but every time when calling function and send data by NumPy massive to the function, I receive the result that starts from zero. Why Butterworth filter every time from 0, I need measure in real-time.
Here, already trying to decide this problem, but without result.
How to filter/smooth with SciPy/Numpy?
it is not good for me, because i every time receive the next picture
enter image description here

This behavior is fine. However, it would create a spike in the beginning of your data. To avoid this you should subtract the first value (or the mean of the first N values) of your EEG so the data itself will also start at zero, or close to zero. The process can be referred to as baseline correction or, in some cases when you remove a straight line from start to finish, as detrending.
Note that filtering EEG is a whole science, you may want to look at packages designed for that, such as MNE python (here is their summary on filters)

Why is Python randint() generating bizarre grid effect when used in script to transform image?

I am playing around with images and the random number generator in Python, and I would like to understand the strange output picture my script is generating.
The larger script iterates through each pixel (left to right, then next row down) and changes the color. I used the following function to offset the given input red, green, and blue values by a randomly determined integer between 0 and 150 (so this formula is invoked 3 times for R, G, and B in each iteration):
def colCh(cVal):
random.seed()
rnd = random.randint(0,150)
newVal = max(min(cVal - 75 + rnd,255),0)
return newVal
My understanding is that random.seed() without arguments uses the system clock as the seed value. Given that it is invoked prior to the calculation of each offset value, I would have expected a fairly random output.
When reviewing the numerical output, it does appear to be quite random:
Scatter plot of every 100th R value as x and R' as y:
However, the picture this script generates has a very peculiar grid effect:
Output picture with grid effect hopefully noticeable:
Furthermore, fwiw, this grid effect seems to appear or disappear at different zoom levels.
I will be experimenting with new methods of creating seed values, but I can't just let this go without trying to get an answer.
Can anybody explain why this is happening? THANKS!!!
Update: Per Dan's comment about possible issues from JPEG compression, the input file format is .jpg and the output file format is .png. I would assume only the output file format would potentially create the issue he describes, but I admittedly do not understand how JPEG compression works at all. In order to try and isolate JPEG compression as the culprit, I changed the script so that the colCh function that creates the randomness is excluded. Instead, it merely reads the original R,G,B values and writes those exact values as the new R,G,B values. As one would expect, this outputs the exact same picture as the input picture. Even when, say, multiplying each R,G,B value by 1.2, I am not seeing the grid effect. Therefore, I am fairly confident this is being caused by the colCh function (i.e. the randomness).
Update 2: I have updated my code to incorporate Sascha's suggestions. First, I moved random.seed() outside of the function so that it is not reseeding based on the system time in every iteration. Second, while I am not quite sure I understand how there is bias in my original function, I am now sampling from a positive/negative distribution. Unfortunately, I am still seeing the grid effect. Here is my new code:
random.seed()
def colCh(cVal):
rnd = random.uniform(-75,75)
newVal = int(max(min(cVal + rnd,255),0))
return newVal
Any more ideas?

As imgur is down for me right now, some guessing:
Your usage of PRNGs is a bit scary. Don't use time-based seeds in very frequently called loops. It's very much possible, that the same seeds are generated and of course this will generate patterns. (granularity of time + number of random-bits used matter here)
So: seed your PRNG once! Don't do this every time, don't do this for every channel. Seed one global PRNG and use it for all operations.
There should be no pattern then.
(If there is: also check the effect of interpolation = image-size change)
Edit: As imgur is on now, i recognized the macro-block like patterns, like Dan mentioned in the comments. Please change your PRNG-usage first before further analysis. And maybe show more complete code.
It may be possible, that you recompressed the output and JPEG-compression emphasized the effects observed before.
Another thing is:
newVal = max(min(cVal - 75 + rnd,255),0)
There is a bit of a bias here (better approach: sample from symmetric negative/positive distribution and clip between 0,255), which can also emphasize some effect (what looked those macroblocks before?).

DSP - get the amplitude of all the frequencies

this question is related to :
DSP : audio processing : squart or log to leverage fft?
in which I was lost about the right algorithm to choose.
Now,
Goal :
I want to get all the frequencies of my signal, that I get from an audio file.
Context:
I use numpy, and scikits.audiolab. I made a lot of reading on the dsp subject, went to dspguru.com as well, read papers and nice blogs over the net.
The code I use is this one :
import numpy as np
from scikits.audiolab import Sndfile
f = Sndfile('first.ogg', 'r')
# Sndfile instances can be queried for the audio file meta-data
fs = f.samplerate
nc = f.channels
enc = f.encoding
print(fs,nc,enc)
# Reading is straightfoward
data = f.read_frames(10)
print(data)
print(np.fft.rfft(data))
I am new to DSP.
My question
I would like to be able to separate all the frequencies of a signal to compare different signals.
I use numpy.fft.rfft on the array of sound; But now, this operation alone is not enough. So, what is the best solution to get all the magnitudes of frequencies correctly ?
I saw that multiplying the resulting values get the complex numbers off and transform the whole as a real number.
Now what please ? Is that it ?
if you need me to clarify anything, just ask.
Thanks a lot !

You say "I want to get all the frequencies of my signal, that I get from an audio file." but what you really want is the magnitude of the frequencies.
In your code, it looks like (I don't know python) you only read the first 10 samples. Assuming your file is mono, that's fine, but you probably want to look at a larger set of samples, say 1024 samples. Once you do that, of course, you'll want to repeat on the next set of N samples. You may or may not want to overlap the sets of samples, and you may want to apply a window function, but what you've done here is a good start.
What sleepyhead says is true. The output of the fft is complex. To find the magnitude of a given frequency, you need to find the length or absolute value of the complex number, which is simply sqrt( r^2 + i^2 ).

Mathematically Fourier Transform returns complex values as it is transform with the function *exp(-i*omega*t). So the PC gives you spectrum as a complex number corresponding to the cosine and sine transforms. In order to get the amplitude you just need to take the absolute value: np.abs(spectrum). In order to get the power spectrum square the absolute value. Complex representation is valuable as you can get not only amplitude, but also phase of the frequencies - that may be useful in DSP as well.

If I got it right, you want walk over all data(sound) and capture amplitude, for this make a "while" over the data capturing at each time 1024 samples
data = f.read_frames(1024)
while data != '':
print(data)
print(np.fft.rfft(data))
data = f.read_frames(1024)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Slicing audio signal to detect pitch - python

Related

How can I Encode a pink noise signal to spikes with snntorch.spikegen.latency?

(scipy.stats.qmc) How to do multiple randomized Quasi Monte Carlo

Do you know why the filtered output always starts at value zero? spicy filter python eeg

Why is Python randint() generating bizarre grid effect when used in script to transform image?

DSP - get the amplitude of all the frequencies

Categories

Resources