I need to generate spectrograms for audio files with Python and I'm following the solution given here. However, the spectrograms I'm getting don't look very "populated," and not at all like other spectrograms I get from other software.
This is the code I used for the particular image I'm showing here:
import matplotlib.pyplot as plt
from matplotlib import cm
from scipy import signal
from scipy.io import wavfile
sample_rate, samples = wavfile.read('audio-mono.wav')
frequencies, times, spectrogram = signal.spectrogram(samples[:700000], sample_rate)
cMap = cm.get_cmap('gray', 3000) # Maybe I'm not understanding this very well
fig = plt.figure(figsize=(4,2), dpi=400, frameon=False)
plt.pcolormesh(times, frequencies, spectrogram, cmap=cMap)
plt.savefig('spectrogram.png')
The following images are spectrograms from Audacity and Aegisub, respectively, both for the same file for which the third image's spectrogram was created (with scipy).
To create this spectrogram, trying to see if it was a figure-size/resolution issue, I tried a couple of things things, one by one, and the end result is this image (with both of them applied).
First, when extracting the .wav file from the .mp4 file, I set the sampling rate to 10 KHz to avoid having such a big y-axis in the plot and see if this helps. This is why you see a max of 5,000. I though I could live with some frequencies neglected given that I care, most of all, about speech frequencies.
Then, to get a better zoom, I created a spectrogram with only the first 700,000 elements of the samples array (see code), which, in the case of this file, represent about 70 seconds. This didn't help either. I even tried to create the spectrogram with the same slice of the samples array, but by taking only every tenth value, then every twentieth, and so on, but this only made the spectrogram have horizontal lines instead of dots. This is not applied here in the figure I'm showing you, because I realized it's far from helping. I also tinkered with the figure size and the resolution, but it didn't really help either.
As you can see in the first figure, the y-axis goes from 0 to 5 KHz, and many frequencies have some intensity at that level. Also, the only moment in that 70-second span with complete silence is around the 35-second mark. The accuracy of this becomes obvious when listening to the file.
In the second figure, there is no y-axis mark, but I can see it has a bigger range than the 5 KHz, which I think accounts for the difference with the first figure. I'm pretty sure that, unfortunately, I can't change this view range. However, this spectrogram also shows the moment of complete silence accurately, and it is at least properly "populated" in the rest of it.
By seeing the third figure (the one I generated with scipy), one could easily think there are several parts of complete silence in those first 70 seconds, which is far from true. I'd like it to look more like those above it, because I know they're much more accurate, but I don't really know how I can do it, and this one won't work at all.
I'm pretty sure there is something I can do, but I think I still don't know scipy enough to know what it is.
Thanks in advance.
EDIT 1
PLOTTED THE SPECTROGRAM WITHOUT SPECIFYING A COLORMAP
You can see the plot looks a bit more populated, but still not even close to the other ones.
EDIT 2
Considering the idea given in the first comment of this question, I used a manipulated version of the gray colormap to have black as the first entry (as normal) but with the second entry being the color that's normally halfway, and then 2,999 colors from there up to white. Please excuse me if I'm using wrong terminology here or if this is not correctly phrased. I'm still trying to understand how to work with colormaps.
The code used to create and plot the spectrogram is the same. The only difference is the colormap used, which I manipulated as follows:
import numpy as np
from matplotlib.colors import ListedColormap
cMap = cm.get_cmap('gray', 3000)
new_colors = cMap(np.linspace(0.5, 1, 3000))
black = [0, 0, 0, 1]
new_colors[0, :] = black
new_cmp = ListedColormap(new_colors)
Using new_cmp as the colormap for the pcolormesh() function, I get the following spectrogram.
This is much, much better than the original, and looks much more like the ones from Audacity and Aegisub. However, I'd like to know if there is a better approach I can take to make my spectrograms look better, if there could be something else that's causing this to not look so much as the sample ones, and if there is a better way to do what I did with the colormap. As I said, I'm still struggling with them.
EDIT 3
I'm now sharing the audio I used to create these spectrograms here.
Related
I am using matplotlib.specgram to create spectrograms of recordings of spoken words. For a reason unknown to me the spectrograms have strange lines in them as seen from the images below.
I am wondering what is causing these lines and how can I get rid of them?
I think that #farenorth is right.
When the spectrogram is calculated then for each timestep (x-axis) a certain grayscale is picked for a given intensity. Let's assume the grayscale is set globally. If suddenly you have higher intensities in a new timestep, the grayscale would saturate.
This would be a real problem if you were to work in real-time, since you could be starting with very quiet audio that suddenly becomes loud, but you'll have to pick a grayscale vs intensity ratio at the beginning based on knowledge of past audiotracks.
So the approach of 'mlab.specgram' is to to scale all timesteps independently. Therefore if there is sudden change during a timestep, things don't look comparable to the neighboring steps, which is what #farenorth pointed out.
A synthetic example below. The top plot is just a chirped sinewave, the bottom plots are the same with a sudden bang added.
'''specgram(x, NFFT=256, Fs=2,detrend=mlab.detrend_none,
window=mlab.window_hanning, noverlap=128,
cmap=None, xextent=None, pad_to=None, sides='default',
scale_by_freq=None, mode='default')'''
import numpy as np
import matplotlib.pyplot as p
%matplotlib inline
time= np.arange(1,5,0.0004)
time=np.linspace(1,5,1024*16)
f= 50+ time*50
#add a bang
bang=np.ones(len(time))
bang[ len(time)/2:len(time)*3/4]=100
chirp1= np.sin(2*np.pi*f*time)
chirp2= np.sin(2*np.pi*f*time) *bang
p.figure(figsize=((20,8)))
p.subplot(221)
p.plot(chirp1)
p.subplot(222)
p.specgram(chirp1 ,noverlap=0,cmap=p.cm.gray)
p.subplot(223)
p.plot(chirp2)
p.subplot(224)
p.specgram(chirp2 ,noverlap=0,cmap=p.cm.gray)
p.show()
You can't get rid of that with specgram, since there is no option for global scaling. But you could easily roll your own STFT or better, a Gabor spectrogram (STFT with Gaussian window if I understand it right).
I've collected a sensor data every 5 minutes for a month (30 days).
That means, I have a timeseries data with 288*30 data points in total.
I'd like to scatterplot the data (x-axis: time, y-axis: sensor value).
the following code is for test.
import pandas as pd
from matplotlib import pyplot as plt
import numpy as np
# generate time series randomly (length: 1 month)
rng=pd.date_range("2015-11-11",periods=288*30,freq="5min")
ts=pd.Series(np.random.randn(len(rng)),rng)
nr=3
nc=1
fig=plt.figure(1)
fig.subplots_adjust(left=0.04,top=1,bottom=0.02,right=0.98,wspace=0.1,hspace=0.1)
for i in range(3):
ctr=i+1
ax=fig.add_subplot(nr,nc,ctr)
ax.scatter(ts.index,ts.values)
ax.set_xlim(ts.index.min(),ts.index.max())
plt.show()
I've generated random time series data having 288*30 observations and tried to draw it in scatter plot. However, as you can see, it is impossible to analyze the figure.
I want to redraw it satisfying the following conditions:
I want a zoomed-in version of the figure. In other words, a part of data points of some time range (e.g., 2~3 hours) is shown at once. Then, there should be enough space between adjacent points.
I want save the figure as png or pdf file. Then, if I open the file, the image (or pdf) viewer has a horizontal scroll bar which enables me to explore the whole figure.
Is there anyone who can solve it?
I do not think it will be not hard for a matplotlib expert, but quite hard for me, a beginner.
note to readers: answer changed significantly from v1 due to clarification of the question
I want a zoomed-in version of the figure. In other words, a part of data points of some time range (e.g., 2~3 hours) is shown at once. Then, there should be enough space between adjacent points.
Zooming in matplotlib is implemented with the x and y limits of the axis. So you can simply change the arguments to your call to ax.set_xlim such that the corresponding times differ by 2-3 hours or however long you want. Knowing that you have a sample every 5 minutes, since 2 hours/(5 min/sample) = 24, you could use
ax.set_xlim(ts.index.min(),ts.index.min() + 24)
to get a 2-hour range.
I want save the figure as png or pdf file. Then, if I open the file, the image (or pdf) viewer has a horizontal scroll bar which enables me to explore the whole figure.
Use savefig to save the figure to a file. Note that if you have set the axis limits using set_xlim or xlim or equivalent, this will save only the portion of the figure that is visible within the given limits. So to save the entire figure (with all data points visible), you will need to set the axis limits to the minimum and maximum values, respectively.
When you open the image/PDF file in a viewer, whether it displays a scroll bar (and how much of the figure is shown) is entirely up to the viewer. You cannot control this in Python. But you can give it some chance of showing up with a horizontal scroll bar by making the figure very large in the horizontal direction. To do so, you can pass the figsize=(width, height) keyword argument when creating the figure, or use the set_size_inches(width, height) method on an existing Figure object. The measurements are in inches in both cases. Pass a value for width that is much larger than that for height and you will get a very wide figure; for example, 40 for width and 4 for height. You'll have to experiment with these values to find which ones give your figure the proportions you want.
I am using the arcgisimage API to have map layers to my scatterplot.
However, the documentation for the API, found here http://basemaptutorial.readthedocs.org/en/latest/backgrounds.html is not that good, especially concerning sizing of the images:
xpixels actually sets the zoom of the image. A bigger number will ask
a bigger image, so the image will have more detail. So when the zoom
is bigger, the xsize must be bigger to maintain the resolution
dpi is the image resolution at the output device. Changing its value
will change the number of pixels, but not the zoom level
THe xsize mentioned is not defined anywhere, and doubling the DPI between 300 and 600 doesn't affect the size of the image.
Anyone have a better documentation/tutorial?
I am learning about some similar things...And I am new to it. So what I can do is to offer some simple ideas in my mind. Wish this will help you.(Although it seems not likely to do so. ^_^ )
The following code is from the given example of tutorial by adding some adjustments. (LA centered)
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
map = Basemap(llcrnrlon=-118.5,llcrnrlat=33.15,urcrnrlon=-117.15,urcrnrlat=34.5, epsg=4269)
#http://server.arcgisonline.com/arcgis/rest/services
#EPSG Number of America is 4269
map.arcgisimage(service='World_Physical_Map', xpixels = 1500, verbose= True)
plt.show()
Firstly, I guess here "xsize" equals "xpixels", or just should be "size" (a misspelling? I'm not sure of that). As told in the tutorial, what "xpixel" influence is the resolution of the final figure and the size of it.
When xpixels=150, you'll get the following picture (about 206KB):
However when xpixels=1500, you'll get a picture with higher resolution (about 278KB). Also when you zoom in to figure out the detail, this picture is clearer than the former one.
If you want to see the picture with bigger zoom, you need to set "xpixels" to a larger value to keep it clear.(That is to maintain the resolution. I guess they gave out just a few simple explaination without many details.) And I have no idea what "dpi" is used for...It is like first time you cut a cake into 300 grids, then the second time you cut it into 600 grids. The figure won't be clearer. And from your saying I know that neither will it become a bigger graph.
I want to display an image file using imshow. It is an 1600x1200 grayscale image and I found out that matplotlib uses float32 to decode the values. It takes about 2 seconds to load the image and I would like to know if there is any way to make this faster. The point is that I do not really need a high resolution image, I just want to mark certain points and draw the image as a background. So,
First question: Is 2 seconds a good performance for such an image or
can I speed up.
Second question: If it is good performance how can I make the process
faster by reducing the resolution. Important point: I still want the
image to strech over 1600x1200 Pixel in the end.
My code:
import matplotlib
import numpy
plotfig = matplotlib.pyplot.figure()
plotwindow = plotfig.add_subplot(111)
plotwindow.axis([0,1600,0,1200])
plotwindow.invert_yaxis()
img = matplotlib.pyplot.imread("lowres.png")
im = matplotlib.pyplot.imshow(img,cmap=matplotlib.cm.gray,origin='centre')
plotfig.set_figwidth(200.0)
plotfig.canvas.draw()
matplotlib.pyplot.show()
This is what I want to do. Now if the picture saved in lowres.png has a lower resolution as 1600x1200 (i.e. 400x300) it is displayed in the upper corner as it should. How can I scale it to the whole are of 1600x1200 pixel?
If I run this program the slow part comes from the canvas.draw() command below. Is there maybe a way to speed up this command?
Thank you in advance!
According to your suggestions I have updated to the newest version of matplotlib
version 1.1.0svn, checkout 8988
And I also use the following code:
img = matplotlib.pyplot.imread(pngfile)
img *= 255
img2 = img.astype(numpy.uint8)
im = self.plotwindow.imshow(img2,cmap=matplotlib.cm.gray, origin='centre')
and still it takes about 2 seconds to display the image... Any other ideas?
Just to add: I found the following feature
zoomed_inset_axes
So in principle matplotlib should be able to do the task. There one can also plot a picture in a "zoomed" fashion...
The size of the data is independent of the pixel dimensions of the final image.
Since you say you don't need a high-resolution image, you can generate the image quicker by down-sampling your data. If your data is in the form of a numpy array, a quick and dirty way would be to take every nth column and row with data[::n,::n].
You can control the output image's pixel dimensions with fig.set_size_inches and plt.savefig's dpi parameter:
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import numpy as np
data=np.arange(300).reshape((10,30))
plt.imshow(data[::2,::2],cmap=cm.Greys)
fig=plt.gcf()
# Unfortunately, had to find these numbers through trial and error
fig.set_size_inches(5.163,3.75)
ax=plt.gca()
extent=ax.get_window_extent().transformed(fig.dpi_scale_trans.inverted())
plt.savefig('/tmp/test.png', dpi=400,
bbox_inches=extent)
You can disable the default interpolation of imshow by adding the following line to your matplotlibrc file (typically at ~/.matplotlib/matplotlibrc):
image.interpolation : none
The result is much faster rendering and crisper images.
I found a solution as long as one needs to display only low-resolution images. One can do so using the line
im = matplotlib.pyplot.imshow(img,cmap=matplotlib.cm.gray, origin='centre',extent=(0,1600,0,1200))
where the extent-parameter tells matplotlib to plot the figure over this range. If one uses an image which has a lower resolution, this speeds up the process quite a lot. Nevertheless it would be great if somebody knows additional tricks to make the process even faster in order to use a higher resolution with the same speed.
Thanks to everyone who thought about my problem, further remarks are appreciated!!!
I am displaying a jpg image (I rotate this by 90 degrees, if this is relevant) and of course
the axes display the pixel coordinates. I would like to convert the axis so that instead of displaying the pixel number, it will display my unit of choice - be it radians, degrees, or in my case an astronomical coordinate. I know the conversion from pixel to (eg) degree. Here is a snippet of what my code looks like currently:
import matplotlib.pyplot as plt
import Image
import matplotlib
thumb = Image.open(self.image)
thumb = thumb.rotate(90)
dpi = plt.rcParams['figure.dpi']
figsize = thumb.size[0]/dpi, thumb.size[1]/dpi
fig = plt.figure(figsize=figsize)
plt.imshow(thumb, origin='lower',aspect='equal')
plt.show()
...so following on from this, can I take each value that matplotlib would print on the axis, and change/replace it with a string to output instead? I would want to do this for a specific coordinate format - eg, rather than an angle of 10.44 (degrees), I would like it to read 10 26' 24'' (ie, degrees, arcmins, arcsecs)
Finally on this theme, I'd want control over the tick frequency, on the plot. Matplotlib might print the axis value every 50 pixels, but I'd really want it every (for example) degree.
It sounds like I would like to define some kind of array with the pixel values and their converted values (degrees etc) that I want to be displayed, having control over the sampling frequency over the range xmin/xmax range.
Are there any matplotlib experts on Stack Overflow? If so, thanks very much in advance for your help! To make this a more learning experience, I'd really appreciate being prodded in the direction of tutorials etc on this kind of matplotlib problem. I've found myself getting very confused with axes, axis, figures, artists etc!
Cheers,
Dave
It looks like you're dealing with the matplotlib.pyplot interface, which means that you'll be able to bypass most of the dealing with artists, axes, and the like. You can control the values and labels of the tick marks by using the matplotlib.pyplot.xticks command, as follows:
tick_locs = [list of locations where you want your tick marks placed]
tick_lbls = [list of corresponding labels for each of the tick marks]
plt.xticks(tick_locs, tick_lbls)
For your particular example, you'll have to compute what the tick marks are relative to the units (i.e. pixels) of your original plot (since you're using imshow) - you said you know how to do this, though.
I haven't dealt with images much, but you may be able to use a different plotting method (e.g. pcolor) that allows you to supply x and y information. That may give you a few more options for specifying the units of your image.
For tutorials, you would do well to look through the matplotlib gallery - find something you like, and read the code that produced it. One of the guys in our office recently bought a book on Python visualization - that may be worthwhile looking at.
The way that I generally think of all the various pieces is as follows:
A Figure is a container for all the Axes
An Axes is the space where what you draw (i.e. your plot) actually shows up
An Axis is the actual x and y axes
Artists? That's too deep in the interface for me: I've never had to worry about those yet, even though I rarely use the pyplot module in production plots.