How can I work around overflow error in matplotlib?

How can I work around overflow error in matplotlib? - python

I'm solving a set of coupled differential equations with odeint package from scipy.integrate.
For the integration time I have:
t=numpy.linspace(0,8e+9,5e+06)
where 5e+06 is the timestep.
I then plot the equations I have as such:
plt.xscale('symlog') #x axis logarithmic scale
plt.yscale('log',basey=2) #Y axis logarithmic scale
plt.gca().set_ylim(8, 100000) #Changing y axis ticks
ax = plt.gca()
ax.yaxis.set_major_formatter(matplotlib.ticker.ScalarFormatter())
ax.xaxis.set_major_formatter(matplotlib.ticker.ScalarFormatter())
plt.title("Example graph")
plt.xlabel("time (yr)")
plt.ylabel("quantity a")
plt.plot(t,a,"r-", label = 'Example graph')
plt.legend(loc='best')
where a is time dependent variable. (This is just one graph from many.)
However, the graphs look a bit jagged, rather than oscillatory and I obtain this error:
OverflowError: Exceeded cell block limit (set 'agg.path.chunksize' rcparam)
I'm not overly sure what this error means, I've looked at other answers but don't know how to implement the 'agg.path.chunksize'.
Also, the integration + plotting takes around 7 hours and that is with some CPU processing hacks, so I really do not want to implement anything that would increase the time.
How can I overcome this error?
I have attempted to reduce the timestep, however I obtain this error instead:
Excess work done on this call (perhaps wrong Dfun type).
Run with full_output = 1 to get quantitative information.

As the error message suggests, you may set the chunksize to a larger value.
plt.rcParams['agg.path.chunksize'] = 1000
However you may also critically reflect why this error occurs in the first place. It would only occur if you are trying to plot an unreasonably large amount of data on the graph. Meaning, if you try to plot 200000000 points, the renderer might have problems to keep them all in memory. But one should probably ask oneself, why is it necessary to plot so many points? A screen may display some 2000 points in lateral direction, a printed paper maybe 6000. Using more points does not make sense, generally speaking.
Now if the solution of your differential equations requires a large point density, it does not automatically mean that you need to plot them all.
E.g. one could just plot every 100th point,
plt.plot(x[::100], y[::100])
most probably without even affecting the visual plot appearance.

Related

Creating a packed bubble / scatter plot in python (jitter based on size to avoid overlapping)

I have come across a number of plots (end of page) that are very similar to scatter / swarm plots which jitter the y-axis in order avoid overlapping dots / bubbles.
How can I get the y values (ideally in an array) based on a given set of x and z values (dot sizes)?
I found the python circlify library but it's not quite what I am looking for.
Example of what I am trying to create
EDIT: For this project I need to be able to output the x, y and z values so that they can be plotted in the user's tool of choice. Therefore I am more interested in solutions that generate the y-coords rather than the actual plot.

Answer:
What you describe in your text is known as a swarm plot (or beeswarm plot) and there are python implementations of these (esp see seaborn), but also, eg, in R. That is, these plots allow adjustment of the y-position of each data point so they don't overlap, but otherwise are closely packed.
Seaborn swarm plot:
Discussion:
But the plots that you show aren't standard swarm plots (which almost always have the weird looking "arms"), but instead seem to be driven by some type of physics engine which allows for motion along x as well as y, which produces the well packed structures you see in the plots (eg, like a water drop on a spiders web).
That is, in the plot above, by imagining moving points only along the vertical axis so that it packs better, you can see that, for the most part, you can't really do it. (Honestly, maybe the data shown could be packed a bit better, but not dramatically so -- eg, the first arm from the left couldn't be improved, and if any of them could, it's only by moving one or two points inward). Instead, to get the plot like you show, you'll need some motion in x, like would be given by some type of physics engine, which hopefully is holding x close to its original value, but also allows for some variation. But that's a trade-off that needs to be decided on a data level, not a programming level.
For example, here's a plotting library, RAWGraphs, which produces a compact beeswarm plot like the Politico graphs in the question:
But critically, they give the warning:
"It’s important to keep in mind that a Beeswarm plot uses forces to avoid collision between the single elements of the visual model. While this helps to see all the circles in the visualization, it also creates some cases where circles are not placed in the exact position they should be on the linear scale of the X Axis."
Or, similarly, in notes from this this D3 package: "Other implementations use force layout, but the force layout simulation naturally tries to reach its equilibrium by pushing data points along both axes, which can be disruptive to the ordering of the data." And here's a nice demo based on D3 force layout where sliders adjust the relative forces pulling the points to their correct values.
Therefore, this plot is a compromise between a swarm plot and a violin plot (which shows a smoothed average for the distribution envelope), but both of those plots give an honest representation of the data, and in these plots, these closely packed plots representation comes at a cost of a misrepresentation of the x-position of the individual data points. Their advantage seems to be that you can color and click on the individual points (where, if you wanted you could give the actual x-data, although that's not done in the linked plots).
Seaborn violin plot:
Personally, I'm really hesitant to misrepresent the data in some unknown way (that's the outcome of a physics engine calculation but not obvious to the reader). Maybe a better compromise would be a violin filled with non-circular patches, or something like a Raincloud plot.

I created an Observable notebook to calculate the y values of a beeswarm plot with variable-sized circles. The image below gives an example of the results.
If you need to use the JavaScript code in a script, it should be straightforward to copy and paste the code for the AccurateBeeswarm class.
The algorithm simply places the points one by one, as close as possible to the x=0 line while avoiding overlaps. There are also options to add a little randomness to improve the appearance. x values are never altered; this is the one big advantage of this approach over force-directed algorithms such as the one used by RAWGraphs.

How to speed up real time plotting in pyqtgraph

In short, I'm trying to find a faster way to plot real time data coming through a serial input. The data looks like a coordinate (x,y) and about 40 are coming in each second. The stream will store the data in a array, using x as the index and setting y as the value for it. This portion is being threaded. While the stream can read in data immediatley, the pyqtgraph library isn't able to keep up with this speed.
Here's the portion of the code where I am plotting the data. The distances and theta variables are arrays with 6400 indexes. They have been transformed into polar values and plotted with each iteration. I added a delay there just to help keep it real-time, though it's only a temporary solution.
while True:
x = distances * np.cos(theta)
y = distances * np.sin(theta)
plot.plot(x, y, pen=None, symbol='o', clear=True)
pg.QtGui.QApplication.processEvents()
#sleep(0.025)
While it's going the way I expect it to, it's not able to plot the most recent data from the serial input. It's easily several seconds behind from the most recent reads, probably because it can not plot 6400 points every 1/40 of a second. I'm wondering if there's a way to only update 1 point rather than having to re-plot the entire scatter every time in pyqtgraph.
It may be possible to plot based on point, but if so, is there a way to keep track of each individual point? There should be no point that shares the same angle value and have different distances, and should essentially overwrite it.
I'm also wondering if there are other graphing animation libraries out there that may be a possible solution worth considering.
This is what it looks like, if you're wondering:

Threading allows you to always have data available to plot but the plot speed is bottlenecked due to the paintEvent latency for each plot iteration. From my understanding, there is no way to update 1 point per paint event with setData instead of having to replot the entire data set for each iteration. So if you have 6400, you must repaint all points even if you are updating the just data with 1 additional point.
Potential workarounds to this include downsampling your data or to only plot once every X amount of data points. Essentially, you are capped at the speed you can plot data to the screen but you can alter your data set to display the most relevant information with less screen refreshes.

Controlling brightness in pcolormesh

I am attempting to plot the electron's probability (in an hydrogen atom), using python and matplotlib's pcolormesh.
All is well except that since the distribution drops so rapidly - some details are not visible, e.g., the surroundings of the zeroes of the radial function (in the higher energy states) are too fade, making it hard to notice that the wave function actually vanishes at some radii.
I know I can handle this with some rescaling and "adjustments" to the wave function, but I would rather tweak my plotting skills and realize how to do this with matplotlib.
I want to adjust the heat map so that more of the map would be bright.
Is there a way to control its sensitivity?
Thanks in advance.

You can use gamma correction do to that. I've used it in quite similar situations with very good results.
One way to do that:
normalized = original/original.max() # rescale to between 0 and 1
corrected = numpy.power(normalized, gamma) # try values between 0.5 and 2 as a start point
plt.imshow(corrected)
This works because elevating the interval between 0 and 1 to a given exponent yields monotonically increasing results that cross 0,0 and 1,1. This is similar to moving the middle slider of Photoshop/GIMP "levels" dialog.
EDIT: better yet, it seems that Matplotlib already has a class for that.

What is the Pythonic way to plot large numbers of points with varying appearance?

I'm trying to plot large numbers of points that have varying appearance (shape, edge color, face, color, etc.) and am finding that plotting the obvious way (using plot for each point) takes a very long time. I see various ways to improve performance, but find that these either reduce flexibility in point appearance, or end up being far more low level than seems correct to me.
For example, if I have
fig, ax = matplotlib.pyplot.subplots()
rands = numpy.random.random_sample((n,))
where n is some large number, then using plot to plot each point
for x in range(n):
ax.plot(x, rands[x], 'o', color=str(rands[x]), mec=str(1-rands[x]))
takes a very long time and seems very inefficient. Much faster results can be achieved with by plotting many points at once
ax.plot(range(n), rands, 'o', color='b', mec='r')
but with a loss of control over many features of the individual points (here, for example neither color nor mec can be a list, and many other aspects suffer the same limitation). Using convenience methods like scatter
ax.scatter(range(n), rands, marker='o', color=[str(y) for y in rands])
also produces fast results; but again at the loss of considerable flexibility (though points can be colored individually, plot's remaining options for setting features of individual points are not supported) and of some automatic axis limiting (use of set_xlim and set_ylim seem necessary to accomplish what plot does automatically).
Finally, I see many examples that use graphic elements like circles in conjunction with collections which, while "fast for common use cases", result in code that looks to "low level" to me
patches = []
colors = []
for x in range(n):
circ = matplotlib.patches.Circle((x/float(n), rands[x]), .01)
colors.append([rands[x],rands[x],rands[x]])
patches.append(circ)
collection = matplotlib.collections.PatchCollection(patches)
collection.set_facecolor(colors)
collection.set_edgecolor([[1-h for h in c] for c in colors])
ax.add_collection(collection)
since it not only breaks the abstraction of plotting points, but also requires considerable scaling and adjustment to restore (even partially) the appearance provided automatically by plot (here for example matplotlib.pyplot.axis('equal') is necessary to avoid distorted "points").
This is frustrating because plot seems the natural method to use as it provides all the right customization of individual points, and results in figures that are nicely scaled and with axes that are naturally bounded — it's just too slow when used a point at a time, and doesn't accept lists as arguments for most properties.
What is the correct Pythonic way to plot large numbers of points where features of each point (marker, edge color, face color, alpha, size, etc.) must potentially be customized? Is using circles (or other shapes) and collections (followed by scaling and other tweaking of the figure) really the preferred (or at least necessary) approach?

I've done python plotting in two ways:
pychart: Hasn't been released since 2006, so I stopped using it.
gnuplot: Written in C, may be fast for your purposes, and is pretty
flexible. This amounts to writing a gnuplot file and feeding it to the gnuplot binary.

Fitting bokeh-plot's ordinate in a line to each other

Which setting do I have to use to fit the ordinate axis position in the middle to the other two? The bigger y-axis scale moves it away sadly.
I am creating the graphs with:
plotting.gridplot(rows)
Where
rows.append(l)
with
l = line('x', 'y', source=datasource,
x_range=x_range[0], ...]
x_range[0] = l.x_range
for multiple 'y' in the datasource.
The graphs range is coupled via the x_range.

That's a bit hard to do at the moment, unfortunately. We are in the process of integrating cassowary.js for much better layout options of subplots, guides, annotations, etc. In the mean time, you can set the min_border (and min_border_left, min_border_top, etc) on your plots. This will make the border area a minimum size even if it could be smaller. So if you set it large enough to accommodate any labels you expect to see, then it should help make the plot sizes consistent.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.