I stumbled across this today: it seems that it is much faster to plot lines in matplotlib if the linewidth is less than 1.0. I have only tested this on the Mac, but the effect seems very strong.
For instance, if you try this code, you will see that the data plots about 10x faster with a linewidth of 0.5 rather than a linewidth of 1.0.
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0,10,20000)
y = np.sin(x) + np.random.random(len(x))*0.1
plt.ion()
plt.show()
plt.plot(x,y,lw=0.5)
plt.draw()
plt.figure()
plt.plot(x,y,lw=1.0)
plt.draw()
I used this code to make a graph of the relationship between linewidth and speed:
import numpy as np
import matplotlib.pyplot as plt
import time
x = np.linspace(0,10,10000)
y = np.sin(x) + np.random.random(len(x))*0.1
plt.ion()
plt.show()
linewidths = np.linspace(2,0,20)
times = []
for lw in linewidths:
t = time.time()
plt.plot(x,y,lw=lw)
plt.draw()
times.append(time.time()-t)
plt.figure()
plt.ioff()
plt.plot(linewidths[1:],times[1:],'ro')
plt.xlabel('Linewidth (points)')
plt.ylabel('Time (seconds)')
plt.show()
And here is the result:
Using a linewidth less than 1.0 provides a ~10x speedup, and after 1.0, the time increases linearly. I only observe this effect if the number of datapoints is large, greater than about 5000 points or so. It makes sense to me that if I ask matplotlib to display more pixels, then it might take a little longer to make the plot, but I was not expecting a huge speedup for using a slightly smaller linewidth (0.5 versus 1.0).
Can anyone explain why this occurs? I am happy to have discovered it, as it makes it much faster to display large datasets.
Some suggested that this might be specific to the MacOSX backend. This seems likely. If I try to save the plots in png format instead of plotting them to the screen, the times seem more randomly distributed:
Someone can probably replace this with a more thorough answer, but it appears that this effect is unique to the MacOSX backend, since it does not appear when saving the figures as png. The plotting time seems to also be affected by the version of Matplotlib (1.3.x versus 1.3.0). But, it seems the Mac users can enjoy a speedup for large datasets by decreasing the linewidth to a value smaller than 1.0.
Related
Suppose I have the following script that produces a plot (as shown below) where some datapoints have hatching. At DPI = 200, the hatching frequency (space between dots) is good, but if I want to increase the resolution of the plot (DPI = 600 for example), the dots become very fine. Is there a way to set the gap between dots? Thanks in advance.
import numpy as np
from matplotlib import pyplot as plt
from mpl_toolkits.basemap import Basemap
Sig = np.random.rand(50,50)
Sig = np.ma.masked_greater(Sig, 0.25)
f, ax1 = plt.subplots(1,1)
ax1.pcolor(np.linspace(0,90,50),np.linspace(0,50,50),Sig, hatch=".",alpha=0)
fig = plt.gcf()
fig.set_size_inches(8, 8)
fig.savefig('Trial.png',bbox_inches='tight', dpi=200)
There is no way to accurately control the spacing between hatch patterns. You do have the option to increase the hatch density though. Instead of hatch = "." you can add the symbol more often, hatch="..."; this will produce a denser pattern.
The above figure has been produced with standard dpi of 100.
Changing the dpi to 300 gives the following image:
As can be seen the issue of a changed hatch density for different dpi is not there anymore; it has been brought up in this issue and afterwards been fixed. The solution is thus to update to the newest matplotlib version.
I am generating plots like this one:
When using less ticks, the plot fits nicely and the bars are wide enough to see them correctly. Nevertheless, when there are lots of ticks, instead of making the plot larger, it just compress the y axe, resulting in thin bars and overlapping tick text.
This is happening both for plt.show() and plt.save_fig().
Is there any solution so it plots the figure in a scale which guarantees that bars have the specified width, not more (if too few ticks) and not less (too many, overlapping)?
EDIT:
Yes, I'm using barh, and yes, I'm setting height to a fixed value (8):
height = 8
ax.barh(yvalues-width/2, xvalues, height=height, color='blue', align='center')
ax.barh(yvalues+width/2, xvalues, height=height, color='red', align='center')
I don't quite understand your code, it seems you do two plots with the same (only shifted) yvalues, but the image doesn't look so. And are you sure you want to shift by width/2 if you have align=center? Anyways, to changing the image size:
No, I am not sure there is no other way, but I don't see anything in the manual at a glance. To set image size by hand:
fig = plt.figure(figsize=(5, 80))
ax = fig.add_subplot(111)
...your_code
the size is in cm. You can compute it beforehand, try for example
import numpy as np
fig_height = (max(yvalues) - min(yvalues)) / np.diff(yvalue)
this would (approximately) set the minimum distance between ticks to a centimeter, which is too much, but try to adjust it.
I think of two solutions for your case:
If you are trying to plot a histogram, use hist function [1]. This will automatically bin your data. You can even plot multiple overlapping histograms as long as you set alpha value lower than 1. See this post
import matplotlib.pyplot as plt
import numpy as np
x = mu + sigma*np.random.randn(10000)
plt.hist(x, 50, normed=1, facecolor='green',
alpha=0.75, orientation='horizontal')
You can also identify interval of your axis ticks. This will place a tick every 10 items. But I doubt this will solve your problem.
import matplotlib.ticker as ticker
...
ax.yaxis.set_major_locator(ticker.MultipleLocator(10))
I'm trying to plot the contour map of a given function f(x,y), but since the functions output scales really fast, I'm losing a lot of information for lower values of x and y. I found on the forums to work that out using vmax=vmax, it actually worked, but only when plotted for a specific limit of x and y and levels of the colormap.
Say I have this plot:
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure()
u = np.linspace(-2,2,1000)
x,y = np.meshgrid(u,u)
z = (1-x)**2+100*(y-x**2)**2
cont = plt.contour(x,y,z,500,colors='black',linewidths=.3)
cont = plt.contourf(x,y,z,500,cmap="jet",vmax=100)
plt.colorbar(cont)
plt.show
I want to uncover whats beyond the axis limits keeping the same scale, but if I change de x and y limits to -3 and 3 I get:
See how I lost most of my levels since my max value for the function at these limits are much higher. A work around to this problem is to increase the levels to 1000, but that takes a lot of computational time.
Is there a way to plot only the contour levels that I need? That is, between 0 and 100.
An example of a desired output would be:
With the white space being the continuation of the plot without resizing the levels.
The code I'm using is the one given after the first image.
There are a few possible ideas here. The one I very much prefer is a logarithmic representation of the data. An example would be
from matplotlib import ticker
fig = plt.figure(1)
cont1 = plt.contourf(x,y,z,cmap="jet",locator=ticker.LogLocator(numticks=10))
plt.colorbar(cont1)
plt.show()
fig = plt.figure(2)
cont2 = plt.contourf(x,y,np.log10(z),100,cmap="jet")
plt.colorbar(cont2)
plt.show()
The first example uses matplotlibs LogLocator functions. The second one just directly computes the logarithm of the data and plots that normally.
The third example just caps all data above 100.
fig = plt.figure(3)
zcapped = z.copy()
zcapped[zcapped>100]=100
cont3 = plt.contourf(x,y,zcapped,100,cmap="jet")
cbar = plt.colorbar(cont3)
plt.show()
I'm charting the progress of a differential equation solver (boundary value problem). Each iteration yields a complete set of function evaluations f(x), which can then be plotted against x. Each graph is (supposedly) closer to the correct solution than the last until convergence is reached. A sequential colormap is used to make earlier graphs faded and later ones saturated.
This works fine when the number of iterations is predetermined:
import matplotlib.pyplot as plt
ax = plt.subplot(111)
cm = plt.get_cmap('OrRd')
ax.set_color_cycle([cm(1.*i/(iter+1)) for i in range(1,iter+2)])
ax.plot(x,y)
for k in range(iter):
# iterative solve
ax.plot(x,y)
However, if I use a convergence criterion instead of a predetermined number of iterations, I won't be able to set_color_cycle beforehand. And putting that line after the loop doesn't work.
I know that I can store my intermediate results and plot only after convergence is reached, but this strikes me as heavy-handed because I really have no use for all the intermediate results other than to see them on the plot.
So here are my questions:
1. How do I change the colormap of the existing graphs after plotting? (This is easy in MATLAB.)
2. How do I do the same thing with another collection of graphs on the same plot (e.g. from a different initial guess, converging to a different solution) without disturbing the first collection, so that two colormaps distinguish the collections from one another. (This should be obvious with the answer to Question 1, but just in case.)
Many thanks.
You can also use plt.set_cmap, see here or (more elaborately, scroll down) here:
import numpy as np
import matplotlib.pyplot as plt
plt.imshow(np.random.random((10,10)), cmap='magma')
plt.colorbar()
plt.set_cmap('viridis')
Use the update_colors() to update the colors of all lines:
import pylab as pl
import numpy as np
cm = pl.get_cmap('OrRd')
x = np.linspace(0, 1, 100)
def update_colors(ax):
lines = ax.lines
colors = cm(np.linspace(0, 1, len(lines)))
for line, c in zip(lines, colors):
line.set_color(c)
fig, ax = pl.subplots()
for i in range(10):
ax.plot(x, x**(1+i*0.1))
update_colors(ax)
One trick you could consider is rather than trying to change the colour values after plotting you can use a black overlay with less than 100% transparency to "fade" the past plots, e.g. an alpha of 10% would reduce the brightness of each past plot sequentially.
I would like to use Matplotlib to generate a scatter plot with a huge amount of data (about 3 million points). Actually I've 3 vectors with the same dimension and I use to plot in the following way.
import matplotlib.pyplot as plt
import numpy as np
from numpy import *
from matplotlib import rc
import pylab
from pylab import *
fig = plt.figure()
fig.subplots_adjust(bottom=0.2)
ax = fig.add_subplot(111)
plt.scatter(delta,vf,c=dS,alpha=0.7,cmap=cm.Paired)
Nothing special actually. But it takes too long to generate it actually (I'm working on my MacBook Pro 4 GB RAM with Python 2.7 and Matplotlib 1.0). Is there any way to improve the speed?
Unless your graphic is huge, many of those 3 million points are going to overlap.
(A 400x600 image only has 240K dots...)
So the easiest thing to do would be to take a sample of say, 1000 points, from your data:
import random
delta_sample=random.sample(delta,1000)
and just plot that.
For example:
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import numpy as np
import random
fig = plt.figure()
fig.subplots_adjust(bottom=0.2)
ax = fig.add_subplot(111)
N=3*10**6
delta=np.random.normal(size=N)
vf=np.random.normal(size=N)
dS=np.random.normal(size=N)
idx=random.sample(range(N),1000)
plt.scatter(delta[idx],vf[idx],c=dS[idx],alpha=0.7,cmap=cm.Paired)
plt.show()
Or, if you need to pay more attention to outliers, then perhaps you could bin your data using np.histogram, and then compose a delta_sample which has representatives from each bin.
Unfortunately, when using np.histogram I don't think there is any easy way to associate bins with individual data points. A simple, but approximate solution is to use the location of a point in or on the bin edge itself as a proxy for the points in it:
xedges=np.linspace(-10,10,100)
yedges=np.linspace(-10,10,100)
zedges=np.linspace(-10,10,10)
hist,edges=np.histogramdd((delta,vf,dS), (xedges,yedges,zedges))
xidx,yidx,zidx=np.where(hist>0)
plt.scatter(xedges[xidx],yedges[yidx],c=zedges[zidx],alpha=0.7,cmap=cm.Paired)
plt.show()
What about trying pyplot.hexbin? It generates a sort of heatmap based on point density in a set number of bins.
You could take the heatmap approach shown here. In this example the color represents the quantity of data in the bin, not the median value of the dS array, but that should be easy to change. More later if you are interested.