I'm using the matplotlib function psd to generate the power spectral density of a bunch of radio signals I'm receiving. All I want are the returned values but the function plots the whole spectrum not matter what. Is there a way to prevent it from plotting? Is there another function that could do this without the plot? I'm trying to run this as rapidly as possible so anything to speed it up (aka preventing the plot entirely) would be very useful.
The code is pretty straightforward but I'm not sure how to suppress this plotting and ideally prevent it from doing it entirely because I want this code to run as fast as possible:
from pylab import *
power, psd_frequencies = psd(radio_samples, NFFT=256, Fs=samples_rate, Fc=center_frequency)
Alternatives to running psd() that would be faster are very welcome too.
To reproduce exactly what matplotlib plots in the psd plot, you may use its own method:
from matplotlib.mlab import psd
power, psd_frequencies = psd(radio_samples, NFFT=256, Fs=samples_rate)
psd_frequencies += center_frequency
This gives you the data, but without the plot.
Related
lineslist, below, represents a set of lines (for some chemical spectrum, let's say), in MHz. I know the linewidth of the laser used to probe these lines to be 5 MHz. So, naively, the kernel density estimate of these lines with a bandwidth of 5 should give me the continuous distribution that would be produced in an experiment using the aforementioned laser.
The following code:
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
lineslist=np.array([-153.3048645 , -75.71982528, -12.1897835 , -73.94903264,
-178.14293936, -123.51339541, -118.11826988, -50.19812838,
-43.69282206, -34.21268228])
sns.kdeplot(lineslist, shade=True, color="r",bw=5)
plt.show()
yields
Which looks like a Gaussian with bandwidth much larger than 5 MHz.
I'm guessing that for some reason, the bandwidth of the kdeplot has different units than the plot itself. The separation between the highest and lowest line is ~170.0 MHz. Supposing that I need to rescale the bandwidth by this factor:
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
lineslist=np.array([-153.3048645 , -75.71982528, -12.1897835 , -73.94903264,
-178.14293936, -123.51339541, -118.11826988, -50.19812838,
-43.69282206, -34.21268228])
sns.kdeplot(lineslist, shade=True, color="r",bw=5/(np.max(lineslist)-np.min(lineslist)))
plt.show()
I get:
With lines that seem to have the expected 5 MHz bandwidth.
As dandy as that solution is, I've pulled it from my arse, and I'm curious whether someone more familiar with seaborn's kdeplot internals can comment on why this is.
Thanks,
Samuel
One thing to note is that Seaborn doesn't actually handle the bandwidth itself - it passes the setting on more-or-less as-is to either SciPy or the Statsmodels packages, depending on what you have installed. (It prefers Statsmodels, but will fall back to SciPy.)
The documentation for this parameter in the various sub-packages is a little confusing, but from what I can tell, the key issue here is that the setting for SciPy is a bandwidth factor, rather than a bandwidth itself. That is, this factor is (effectively) multiplied by the standard deviation of the data you're plotting to give you the actual bandwidth used in the kernels.
So with SciPy, if you have a fixed number which you want to use as your bandwidth, you need to divide through by your data standard deviation. And if you're trying to plot multiple datasets consistently, you need to adjust for the standard deviation of each dataset. This adjustment effectively what you did by scaling by the range -- but again, it's not the range of the data that's the number used, but the standard deviation of the data.
To make things all the more confusing, Statsmodels expects the true bandwidth when given a scalar value, rather than a factor that's multiplied by the standard deviation of the sample. So depending on what backend you're using, Seaborn will behave differently. There's no direct way to tell Seaborn which backend to use - the best way to test is probably trying to import statsmodels, and seeing if that succeeds (takes bandwidth directly) or fails (takes bandwidth factor).
By the way, these results were tested against Seaborn version 0.7.0 - I expect (hope?) that versions in the future might change this behavior.
I hope this question isn't too elementary. I've searched extensively for a solution but haven't discovered one yet.
I've recently begun using Jupyter Notebook with Sympy to take notes and do my homework in my Calculus II class (and what a HUGE BENEFIT this has been!).
However, my sole problem with it is that I'm unable to figure out how to configure the size (i.e. the dimensions in pixels) of the plot figure.
It's easy enough to do using matplotlib directly (matplotlib.pyplot.figure() specifically), but I'm using the Sympy.mpmath.plotmodule because Sympy works much better for the symbolic manipulation we're doing in this course. I know Sympy has its own plotting module, but the one in mpmath seems easier to use so far (with the exception of this one issue, of course).
However, I've looked through the mpmath documentation and have googled the problem repeatedly, without a solution.
How can you change the size of the image that results from plotting a function using the mpmath API?
You may try changing the size of sympy's plots via pyplot's rcParams:
import sympy
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['figure.figsize'] = 10, 3
sympy.mpmath.plot([cos, sin], [-4, 4])
The following code gives me a very nice violinplot (and boxplot within).
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
foo = np.random.rand(100)
sns.violinplot(foo)
plt.boxplot(foo)
plt.show()
So far so good. However, when I look at foo, the variable does not contain any negative values. The seaborn plot seems misleading here. The normal matplotlib boxplot gives something closer to what I would expect.
How can I make violinplots with a better fit (not showing false negative values)?
As the comments note, this is a consequence (I'm not sure I'd call it an "artifact") of the assumptions underlying gaussian KDE. As has been mentioned, this is somewhat unavoidable, and if your data don't meet those assumptions, you might be better off just using a boxplot, which shows only points that exist in the actual data.
However, in your response you ask about whether it could be fit "tighter", which could mean a few things.
One answer might be to change the bandwidth of the smoothing kernel. You do that with the bw argument, which is actually a scale factor; the bandwidth that will be used is bw * data.std():
data = np.random.rand(100)
sns.violinplot(y=data, bw=.1)
Another answer might be to truncate the violin at the extremes of the datapoints. The KDE will still be fit with densities that extend past the bounds of your data, but the tails will not be shown. You do that with the cut parameter, which specifies how many units of bandwidth past the extreme values the density should be drawn. To truncate, set it to 0:
sns.violinplot(y=data, cut=0)
By the way, the API for violinplot is going to change in 0.6, and I'm using the development version here, but both the bw and cut arguments exist in the current released version and behave more or less the same way.
I use matplotlib for a signal processing application and I noticed that it chokes on large data sets. This is something that I really need to improve to make it a usable application.
What I'm looking for is a way to let matplotlib decimate my data. Is there a setting, property or other simple way to enable that? Any suggestion of how to implement this are welcome.
Some code:
import numpy as np
import matplotlib.pyplot as plt
n=100000 # more then 100000 points makes it unusable slow
plt.plot(np.random.random_sample(n))
plt.show()
Some background information
I used to work on a large C++ application where we needed to plot large datasets and to solve this problem we used to take advantage of the structure of the data as follows:
In most cases, if we want a line plot then the data is ordered and often even equidistantial. If it is equidistantial, then you can calculate the start and end index in the data array directly from the zoom rectangle and the inverse axis transformation. If it is ordered but not equidistantial a binary search can be used.
Next the zoomed slice is decimated, and because the data is ordered we can simply iterate a block of points that fall inside one pixel. And for each block the mean, maximum and minimum is calculated. Instead of one pixel, we then draw a bar in the plot.
For example: if the x axis is ordered, a vertical line will be drawn for each block, possibly the mean with a different color.
To avoid aliasing the plot is oversampled with a factor of two.
In case it is a scatter plot, the data can be made ordered by sorting, because the sequence of plotting is not important.
The nice thing of this simple recipe is that the more you zoom in the faster it becomes. In my experience, as long as the data fits in memory the plots stays very responsive. For instance, 20 plots of timehistory data with 10 million points should be no problem.
It seems like you just need to decimate the data before you plot it
import numpy as np
import matplotlib.pyplot as plt
n=100000 # more then 100000 points makes it unusable slow
X=np.random.random_sample(n)
i=10*array(range(n/10))
plt.plot(X[i])
plt.show()
Decimation is not best for example if you decimate sparse data it might all appear as zeros.
The decimation has to be smart such that each LCD horizontal pixel is plotted with the min and the max of the data between decimation points. Then as you zoom in you see more an more detail.
With zooming this can not be done easy outside matplotlib and thus is better to handle internally.
Since some years ago I use matlab for my plots (mostly density plots), but now I want to change to matplotlib. I have a problem trying to figure out how to get analogous plots in matplotlib. I have to represent a 2D array. In matlab I used to use the surf function, and then change to view(2) (az=0 and el=90). An example:
surf(X,Y,log10(z),'FaceColor','interp','EdgeColor','none')
view(2)
In matplotlib I have tried some functions, but I have not got the same feeling. m3plot is a computationally expensive toolkit and it is not the same as using surf. imshow does not allow to use log functions in his arguments (like the example), and log values is something mandatory for me. Then it is pcolor, but I can not find a 'FaceColor'-like option to smooth the edges. I would like to know if someone knows what is the best equivalent in matplotlib.
Thank you for your time!
Try installing mayavi which has the surf function (mayavi is a fully-blown 3D visualisation library using hardware acceleration)
Finally, the solution that suits me is to use the routine pcolormesh(). This combined with the option shading='gouraud' interpolates the data and smooth the edges. In addition, it works pretty well with large arrays in comparision with pcolor.