Strange lines in specgram using matplotlib - python

I am using matplotlib.specgram to create spectrograms of recordings of spoken words. For a reason unknown to me the spectrograms have strange lines in them as seen from the images below.
I am wondering what is causing these lines and how can I get rid of them?

I think that #farenorth is right.
When the spectrogram is calculated then for each timestep (x-axis) a certain grayscale is picked for a given intensity. Let's assume the grayscale is set globally. If suddenly you have higher intensities in a new timestep, the grayscale would saturate.
This would be a real problem if you were to work in real-time, since you could be starting with very quiet audio that suddenly becomes loud, but you'll have to pick a grayscale vs intensity ratio at the beginning based on knowledge of past audiotracks.
So the approach of 'mlab.specgram' is to to scale all timesteps independently. Therefore if there is sudden change during a timestep, things don't look comparable to the neighboring steps, which is what #farenorth pointed out.
A synthetic example below. The top plot is just a chirped sinewave, the bottom plots are the same with a sudden bang added.
'''specgram(x, NFFT=256, Fs=2,detrend=mlab.detrend_none,
window=mlab.window_hanning, noverlap=128,
cmap=None, xextent=None, pad_to=None, sides='default',
scale_by_freq=None, mode='default')'''
import numpy as np
import matplotlib.pyplot as p
%matplotlib inline
time= np.arange(1,5,0.0004)
time=np.linspace(1,5,1024*16)
f= 50+ time*50
#add a bang
bang=np.ones(len(time))
bang[ len(time)/2:len(time)*3/4]=100
chirp1= np.sin(2*np.pi*f*time)
chirp2= np.sin(2*np.pi*f*time) *bang
p.figure(figsize=((20,8)))
p.subplot(221)
p.plot(chirp1)
p.subplot(222)
p.specgram(chirp1 ,noverlap=0,cmap=p.cm.gray)
p.subplot(223)
p.plot(chirp2)
p.subplot(224)
p.specgram(chirp2 ,noverlap=0,cmap=p.cm.gray)
p.show()
You can't get rid of that with specgram, since there is no option for global scaling. But you could easily roll your own STFT or better, a Gabor spectrogram (STFT with Gaussian window if I understand it right).

Related

Customizing matlab plots for high resolution and custom scaling

I have a 1d signal of many samples (millions). I also have it's wavelet transform coefficients (in float64) and frequencies stored in arrays. I am trying to make a high resolution plot of both the signal level vs time and also of the scallogram. The default parameters for size etc are too small for effective visualization. I am exporting it to both png and pdf using savefig object.
I would like to make it higher resolution (1920*1080 or equivalent sizes depending upon aspect ratio). I am unable to comprehend matplotlib arguments and objects and unfortunately I have not been able to follow well with tutorials available online. There is so much functionality and different ways of doing it, that moving from one resource to another for learning required re learning something new for the same task.
So far I have been able to understand interpolation choices, colormap choices, figure label, x and y labels. I am unable to understand the difference between imshow and plot, how to pass on size and fidelity of the plot etc, or passing the axis scales (currently my scales are off). The figsize is mentioned in inches and I am not sure how it relates to pixels. Would love to hear guidance on the same.
I would also like to plot STFT for my samples with high image fidelity with custom window size etc. Currently using specgram but would like to know how to pass on window size, overlap etc and the color map and interpolation schemes, and if other alternatives are available.
I'm plotting these for multiple different data sets in a single code (for loop) and would like to have all images being of uniform size and same scale since all have equal sample sizes.

How can I improve the look of scipy's spectrograms?

I need to generate spectrograms for audio files with Python and I'm following the solution given here. However, the spectrograms I'm getting don't look very "populated," and not at all like other spectrograms I get from other software.
This is the code I used for the particular image I'm showing here:
import matplotlib.pyplot as plt
from matplotlib import cm
from scipy import signal
from scipy.io import wavfile
sample_rate, samples = wavfile.read('audio-mono.wav')
frequencies, times, spectrogram = signal.spectrogram(samples[:700000], sample_rate)
cMap = cm.get_cmap('gray', 3000) # Maybe I'm not understanding this very well
fig = plt.figure(figsize=(4,2), dpi=400, frameon=False)
plt.pcolormesh(times, frequencies, spectrogram, cmap=cMap)
plt.savefig('spectrogram.png')
The following images are spectrograms from Audacity and Aegisub, respectively, both for the same file for which the third image's spectrogram was created (with scipy).
To create this spectrogram, trying to see if it was a figure-size/resolution issue, I tried a couple of things things, one by one, and the end result is this image (with both of them applied).
First, when extracting the .wav file from the .mp4 file, I set the sampling rate to 10 KHz to avoid having such a big y-axis in the plot and see if this helps. This is why you see a max of 5,000. I though I could live with some frequencies neglected given that I care, most of all, about speech frequencies.
Then, to get a better zoom, I created a spectrogram with only the first 700,000 elements of the samples array (see code), which, in the case of this file, represent about 70 seconds. This didn't help either. I even tried to create the spectrogram with the same slice of the samples array, but by taking only every tenth value, then every twentieth, and so on, but this only made the spectrogram have horizontal lines instead of dots. This is not applied here in the figure I'm showing you, because I realized it's far from helping. I also tinkered with the figure size and the resolution, but it didn't really help either.
As you can see in the first figure, the y-axis goes from 0 to 5 KHz, and many frequencies have some intensity at that level. Also, the only moment in that 70-second span with complete silence is around the 35-second mark. The accuracy of this becomes obvious when listening to the file.
In the second figure, there is no y-axis mark, but I can see it has a bigger range than the 5 KHz, which I think accounts for the difference with the first figure. I'm pretty sure that, unfortunately, I can't change this view range. However, this spectrogram also shows the moment of complete silence accurately, and it is at least properly "populated" in the rest of it.
By seeing the third figure (the one I generated with scipy), one could easily think there are several parts of complete silence in those first 70 seconds, which is far from true. I'd like it to look more like those above it, because I know they're much more accurate, but I don't really know how I can do it, and this one won't work at all.
I'm pretty sure there is something I can do, but I think I still don't know scipy enough to know what it is.
Thanks in advance.
EDIT 1
PLOTTED THE SPECTROGRAM WITHOUT SPECIFYING A COLORMAP
You can see the plot looks a bit more populated, but still not even close to the other ones.
EDIT 2
Considering the idea given in the first comment of this question, I used a manipulated version of the gray colormap to have black as the first entry (as normal) but with the second entry being the color that's normally halfway, and then 2,999 colors from there up to white. Please excuse me if I'm using wrong terminology here or if this is not correctly phrased. I'm still trying to understand how to work with colormaps.
The code used to create and plot the spectrogram is the same. The only difference is the colormap used, which I manipulated as follows:
import numpy as np
from matplotlib.colors import ListedColormap
cMap = cm.get_cmap('gray', 3000)
new_colors = cMap(np.linspace(0.5, 1, 3000))
black = [0, 0, 0, 1]
new_colors[0, :] = black
new_cmp = ListedColormap(new_colors)
Using new_cmp as the colormap for the pcolormesh() function, I get the following spectrogram.
This is much, much better than the original, and looks much more like the ones from Audacity and Aegisub. However, I'd like to know if there is a better approach I can take to make my spectrograms look better, if there could be something else that's causing this to not look so much as the sample ones, and if there is a better way to do what I did with the colormap. As I said, I'm still struggling with them.
EDIT 3
I'm now sharing the audio I used to create these spectrograms here.

Remove outliers in an image after applying treshold

Here`s the deal. I want to create a mask that visualizes all the changes between two images (GeoTiffs which are converted to 2D numpy arrays).
For that I simply subtract the pixel values and normalize the absolute value of the subtraction:
Since the result will be covered in noise, I use a treshold and remove all pixels with a value below a certain limit.
def treshold(array, thresholdLimit):
print("Treshold...")
result = (array > thresholdLimit) * array
return result
This works without a problem. Now comes the issue. When applying the treshold, outliers remain, which is not intended:
What is a good way to remove those outliers?
Sometimes the outliers are small chunks of pixels, like 5-6 pixels together, how could those be removed?
Additionally, the images I use are about 10000x10000 pixels.
I would appreciate all advice!
EDIT:
Both images are landsat satelite images, covering the exact same area.
The difference here is that one image shows cloud coverage and the other one is free of clouds.
The bright snakey line in the top right is part of a river that has been covered by a cloud. Since water bodies like the ocean or rivers are depicted black in those images, the difference between the bright cloud and the dark river results in the river showing a high degree of change.
I hope the following images make this clear:
Source tiffs :
Subtraction result:
I also tried to smooth the result of the tresholding by using a median filter but the result was still covered in outliers:
from scipy.ndimage import median_filter
def filter(array, limit):
print("Median-Filter...")
filteredImg = np.array(median_filter(array, size=limit)).astype(np.float32)
return filteredImg
I would suggest the following:
Before proceeding please double check if the two images are 100% registered. To check that you should overlay them using e.g. different color channels. Even minimal registration errors can render your task impossible
Smooth both input images slightly (before the subtraction). For that I would suggest you use standard implementations. Play around with the filter parameters to find an acceptable compromise between smoothness (or reduction of graininess of source image 1) and resolution
Then try to match the image statistics by applying histogram normalization, using the histogram of image 2 as a target for the histogram of image 1. For this you can also use e.g. the OpenCV implementation
Subtract the images
If you then still observe obvious noise, look at the histogram of the subtraction result and see if you can relate the noise to intensity outliers. If you can clearly separate signal and noise based on intensity, apply again a thresholding (informed by your histogram). Alternatively (or additionally), if the noise is structurally different from your signal (e.g. clustered), you could look into morphological operations to remove it.

Using Fourier Transform to Transform Image into Sound (I don't think it's working)

Backstory
I started messing with electronics, and realized I need an oscilloscope. I went to buy the oscilloscope (for like $40) online and watched tutorials on how to use them. I stumbled upon a video using the "X-Y" function of the oscilloscope to draw images; I thought that was cool. I tried searching how to do this from scratch and learned you need to convert the image into the frequency domain and some how convert that to an audio signal and send the signal to the two channels on the oscilloscope from the left and right channels from the audio output. So now I am trying to do the image processing part.
What I Got So Far
Choosing an Image
First thing I did was to create an nxn image using some drawing software. I've read online that the total number of pixels of the image should be a power of two. I don't know why, but I created 256x256 pixel images to minimize calculation time. Here is the image I used for this example.
I kept the image simple, so I can vividly see the symmetry when it is transformed. Therefore, if there is no symmetry, then there must be something wrong.
The MATLAB Code
The first thing I did was read the image, convert to gray scale, change data type, and grab the size of the image (for size variability for later use).
%Read image
img = imread('tets.jpg');
%Convert image to gray scale
grayImage = rgb2gray(img);
%Incompatability of data type. uint8 type vs double
grayImage = double(grayImage);
%Grab size of image
[nx, ny, nz] = size(grayImage);
The Algorithm
This is where things get a bit hazy. I am somewhat familiar with the Fourier Transform due to some Mechanical Engineering classes, but the topic was broadly introduced and never really fundamentally part of the course. It was more like, "Hey, check out this thing; but use the Laplace Transformation instead."
So somehow you have to incorporate spatial, amplitude, frequency, and time when doing the calculation. I understand that the spatial coordinates is just the location of each pixel on the image in a matrix or bitmap. I also understand that the amplitude is just the gray scale value from 0-255 of a certain pixel. However, I don't necessarily know how to incorporate frequency and time based on the pixel itself. I think I read somewhere that the frequency increases as the y location of the pixel increases, and the time variable increases with the x location. Here's the link (read first part of Part II).
So I tried following the formula as well as other formulas online and this is what I got for the MATLAB code.
if nx ~= ny
error('Image size must be NxN.'); %for some reason
else
%prepare transformation matrix
DFT = zeros(nx,ny);
%compute transformation for each pixel
for ii = 1:1:nx
for jj = 1:1:ny
amplitude = grayImage(ii,jj);
DFT(ii,jj) = amplitude * exp(-1i * 2 * pi * ((ii*ii/nx) + (jj*jj/ny)));
end
end
%plot of complex numbers
plot(DFT, '*');
%calculate magnitude and phase
magnitudeAverage = abs(DFT)/nx;
phase = angle(DFT);
%plot magnitudes and phase
figure;
plot(magnitudeAverage);
figure;
plot(phase);
end
This code simply tries to follow this discrete fourier transform example video that I found on YouTube. After the calculation I plotted the complex numbers in complex domain. This appears to be in polar coordinates; I don't know why. As stated in the video about the Nyquist Limit, I plotted the average magnitude too. As well as the phase angles of the complex numbers. I'll just show you the plots!
The Plots
Complex Numbers
This is the complex plot; I believe it's in polar form instead of cartesian, but I don't know. It appears symmetric too.
Average Amplitude Vs. Sample
The vertical axis is amplitude, and the horizontal axis is the sample number. This looks like the deconstruction of the signal, but then again I don't really know what I am looking at.
Phase Angle Vs. Sample
The vertical axis is the phase angle, and the horizontal axis is the sample number. This looks the most promising because it looks like a plot in the frequency domain, but this isn't suppose to be a plot in the frequency domain; rather, its a plot in the sample domain? Again, I don't know what I am looking at.
I Need Help Understanding
I need to somehow understand these plots, so I know I am getting the right plot. I believe there may be something very wrong in the algorithm because it doesn't necessarily implement the frequency and time component. So maybe you can tell me how that is done? Or at least guide me?
TLDR;
I am trying to convert images into sound files to display on an oscilloscope. I am stuck on the image processing part. I believe there is something wrong with the MATLAB code (check above) because it doesn't necessarily include the frequency and time component of each pixel. I need help with the code and understanding how to interpret the result, so I know the transfromations are correct-ish.

Higher sampling for image's projection

My software should judge spectrum bands, and given the location of the bands, find the peak point and width of the bands.
I learned to take the projection of the image and to find width of each peak.
But I need a better way to find the projection.
The method I used reduces a 1600-pixel wide image (eg 1600X40) to a 1600-long sequence. Ideally I would want to reduce the image to a 10000-long sequence using the same image.
I want a longer sequence as 1600 points provide too low resolution. A single point causes a large difference (there is a 4% difference if a band is judged from 18 to 19) to the measure.
How do I get a longer projection from the same image?
Code I used: https://stackoverflow.com/a/9771560/604511
import Image
from scipy import *
from scipy.optimize import leastsq
# Load the picture with PIL, process if needed
pic = asarray(Image.open("band2.png"))
# Average the pixel values along vertical axis
pic_avg = pic.mean(axis=2)
projection = pic_avg.sum(axis=0)
# Set the min value to zero for a nice fit
projection /= projection.mean()
projection -= projection.min()
What you want to do is called interpolation. Scipy has an interpolate module, with a whole bunch of different functions for differing situations, take a look here, or specifically for images here.
Here is a recently asked question that has some example code, and a graph that shows what happens.
But it is really important to realise that interpolating will not make your data more accurate, so it will not help you in this situation.
If you want more accurate results, you need more accurate data. There is no other way. You need to start with a higher resolution image. (If you resample, or interpolate, you results will acually be less accurate!)
Update - as the question has changed
#Hooked has made a nice point. Another way to think about it is that instead of immediately averaging (which does throw away the variance in the data), you can produce 40 graphs (like your lower one in your posted image) from each horizontal row in your spectrum image, all these graphs are going to be pretty similar but with some variations in peak position, height and width. You should calculate the position, height, and width of each of these peaks in each of these 40 images, then combine this data (matching peaks across the 40 graphs), and use the appropriate variance as an error estimate (for peak position, height, and width), by using the central limit theorem. That way you can get the most out of your data. However, I believe this is assuming some independence between each of the rows in the spectrogram, which may or may not be the case?
I'd like to offer some more detail to #fraxel's answer (to long for a comment). He's right that you can't get any more information than what you put in, but I think it needs some elaboration...
You are projecting your data from 1600x40 -> 1600 which seems like you are throwing some data away. While technically correct, the whole point of a projection is to bring higher dimensional data to a lower dimension. This only makes sense if...
Your data can be adequately represented in the lower dimension. Correct me if I'm wrong, but it looks like your data is indeed one-dimensional, the vertical axis is a measure of the variability of that particular point on the x-axis (wavelength?).
Given that the projection makes sense, how can we best summarize the data for each particular wavelength point? In my previous answer, you can see I took the average for each point. In the absence of other information about the particular properties of the system, this is a reasonable first-order approximation.
You can keep more of the information if you like. Below I've plotted the variance along the y-axis. This tells me that your measurements have more variability when the signal is higher, and low variability when the signal is lower (which seems useful!):
What you need to do then, is decide what you are going to do with those extra 40 pixels of data before the projection. They mean something physically, and your job as a researcher is to interpret and project that data in a meaningful way!
The code to produce the image is below, the spec. data was taken from the screencap of your original post:
import Image
from scipy import *
from scipy.optimize import leastsq
# Load the picture with PIL, process if needed
pic = asarray(Image.open("spec2.png"))
# Average the pixel values along vertical axis
pic_avg = pic.mean(axis=2)
projection = pic_avg.sum(axis=0)
# Compute the variance
variance = pic_avg.var(axis=0)
from pylab import *
scale = 1/40.
X_val = range(projection.shape[0])
errorbar(X_val,projection*scale,yerr=variance*scale)
imshow(pic,origin='lower',alpha=.8)
axis('tight')
show()

Categories

Resources