I need to draw a plot of a function that I know its analytic form. For example, y=exp(-x^2/2).
If I plot this function over an interval of [-5,5] with 11 points, I get the blue curve below. Notice how lines connecting points are linearly interpolated, making an incorrect representation of the function. If I use more points, I could get more accurate plot like the yellow curve below. On the other hand, if I use cubic spline then the curve seems more realistic even if the sampling rate is low (green curve).
This is not a problem in most cases, but if I want a vector output like svg or pdf, I'd love to reduce the number of points as low as possible to minimize the size of output. So, I'd like to see if there is any option in matplotlib or any other python-based plotting library that can either connect the dots using spline, or specify the slope and curvature at each point (similar to Photoshop or MS Powerpoint).
I have come across a number of plots (end of page) that are very similar to scatter / swarm plots which jitter the y-axis in order avoid overlapping dots / bubbles.
How can I get the y values (ideally in an array) based on a given set of x and z values (dot sizes)?
I found the python circlify library but it's not quite what I am looking for.
Example of what I am trying to create
EDIT: For this project I need to be able to output the x, y and z values so that they can be plotted in the user's tool of choice. Therefore I am more interested in solutions that generate the y-coords rather than the actual plot.
Answer:
What you describe in your text is known as a swarm plot (or beeswarm plot) and there are python implementations of these (esp see seaborn), but also, eg, in R. That is, these plots allow adjustment of the y-position of each data point so they don't overlap, but otherwise are closely packed.
Seaborn swarm plot:
Discussion:
But the plots that you show aren't standard swarm plots (which almost always have the weird looking "arms"), but instead seem to be driven by some type of physics engine which allows for motion along x as well as y, which produces the well packed structures you see in the plots (eg, like a water drop on a spiders web).
That is, in the plot above, by imagining moving points only along the vertical axis so that it packs better, you can see that, for the most part, you can't really do it. (Honestly, maybe the data shown could be packed a bit better, but not dramatically so -- eg, the first arm from the left couldn't be improved, and if any of them could, it's only by moving one or two points inward). Instead, to get the plot like you show, you'll need some motion in x, like would be given by some type of physics engine, which hopefully is holding x close to its original value, but also allows for some variation. But that's a trade-off that needs to be decided on a data level, not a programming level.
For example, here's a plotting library, RAWGraphs, which produces a compact beeswarm plot like the Politico graphs in the question:
But critically, they give the warning:
"It’s important to keep in mind that a Beeswarm plot uses forces to avoid collision between the single elements of the visual model. While this helps to see all the circles in the visualization, it also creates some cases where circles are not placed in the exact position they should be on the linear scale of the X Axis."
Or, similarly, in notes from this this D3 package: "Other implementations use force layout, but the force layout simulation naturally tries to reach its equilibrium by pushing data points along both axes, which can be disruptive to the ordering of the data." And here's a nice demo based on D3 force layout where sliders adjust the relative forces pulling the points to their correct values.
Therefore, this plot is a compromise between a swarm plot and a violin plot (which shows a smoothed average for the distribution envelope), but both of those plots give an honest representation of the data, and in these plots, these closely packed plots representation comes at a cost of a misrepresentation of the x-position of the individual data points. Their advantage seems to be that you can color and click on the individual points (where, if you wanted you could give the actual x-data, although that's not done in the linked plots).
Seaborn violin plot:
Personally, I'm really hesitant to misrepresent the data in some unknown way (that's the outcome of a physics engine calculation but not obvious to the reader). Maybe a better compromise would be a violin filled with non-circular patches, or something like a Raincloud plot.
I created an Observable notebook to calculate the y values of a beeswarm plot with variable-sized circles. The image below gives an example of the results.
If you need to use the JavaScript code in a script, it should be straightforward to copy and paste the code for the AccurateBeeswarm class.
The algorithm simply places the points one by one, as close as possible to the x=0 line while avoiding overlaps. There are also options to add a little randomness to improve the appearance. x values are never altered; this is the one big advantage of this approach over force-directed algorithms such as the one used by RAWGraphs.
I created a graph in MATLAB (see figure below) such that around every data point there is a data distribution plotted (grey area plots). The way I did it in MATLAB was to create a set of axes for every distribution curve and then plot the curves without showing those axes at every point of the data curve. I also used a command 'linkaxes' to set figure limits for all the curves at once.
I must say that this is far from an elegant solution and I had many troubles with saving this figure in the correct aspect ratio settings. All in all I couldn't find any other useful option in MATLAB.
Is there a more elegant solution for such types of graphs in Python? I am not that much interested in how to do the areas highlighted, but how to place a set of curves(distributions) exactly at the positions of the main data curve points.
Thank you!
Plotly currently supports Catmull-Rom splines for interpolation of the lines between markers on a Scatter plot.
I have graphs where the data is fundamentally a normal distribution. Cubic or Hermite interpolation works very well for this type of data in other graphing frameworks - unfortunately the Catmull-Rom splines (or at least Plotly's implementation of them) really doesn't.
I've experimented with values of "smoothing" between 0.0 and 1.0 (it seems, though this is not documented, that values over 1.0 make no further difference). Unfortunately, they all look bad.
I've seen a suggestion elsewhere that it might make sense to do my own interpolation using scipy's interpolate.interp2d, and graph that line separately. However, this fails for my use case, since I want the color of the line to be paired with the color of the markers, and for both to appear on the legend as a single item, as shown above.
Has anyone had any experience making the Plotly splines look nicer than they do on a quasi-normal distribution using smoothing=1.0?
In summary, I'd like to plot a model I made using sympy.
I'm currently trying to animate the evolution of a bicycle model along a curve in 2-space.
So far I have three matplotlib patches that represent the body and two tires of the bicycle, as well as the dynamic equations created with the sympy mechanics module.
I'm currently able to numerically integrate the equations using KanesMethod, however plotting these objects is a different challenge.
This is what I've come up with so far, and I feel like there has to be a better way to go about all this.
position_vector.subs(...).to_matrix(inertial_reference_frame)
Then I'd have to plot each of the two coordinates over time.
I don't even know where to start with plotting reference frames. I guess I'd somehow turn the reference frame into a set of 3 unit vectors, substitute numerical values, and plot them using matplotlib's quiver command.
Any insight would be greatly appreciated!