One of my projects needs order analysis of vibration signals with Python instead of Matlab, they want to visualize data using colormap which usually has frequency on the horizontal axis and rotational speed on the vertical axis. Just like this picture:
How can I do this?
Let's say I have a large DataFrame with lots of rows and I want to do a simple line plot of the data using hvplot/holoviews.
Example:
pd.DataFrame([i for i in range(1000000)]).hvplot()
On my machine this plot is slow to be rendered and very slow to navigate in with pan, zoom and so on. Is there an option to make the plot lighter to handle, kind of what datashade option does for multidimensional plotting?
At the moment, on my real data sampling is not an option, I want to keep all of my original data.
Datashader is not limited to multidimensional data, so just add datashade=True or rasterize=True (which is preferred in most cases, since it lets Bokeh provide colorbars and hover information).
I have come across a number of plots (end of page) that are very similar to scatter / swarm plots which jitter the y-axis in order avoid overlapping dots / bubbles.
How can I get the y values (ideally in an array) based on a given set of x and z values (dot sizes)?
I found the python circlify library but it's not quite what I am looking for.
Example of what I am trying to create
EDIT: For this project I need to be able to output the x, y and z values so that they can be plotted in the user's tool of choice. Therefore I am more interested in solutions that generate the y-coords rather than the actual plot.
Answer:
What you describe in your text is known as a swarm plot (or beeswarm plot) and there are python implementations of these (esp see seaborn), but also, eg, in R. That is, these plots allow adjustment of the y-position of each data point so they don't overlap, but otherwise are closely packed.
Seaborn swarm plot:
Discussion:
But the plots that you show aren't standard swarm plots (which almost always have the weird looking "arms"), but instead seem to be driven by some type of physics engine which allows for motion along x as well as y, which produces the well packed structures you see in the plots (eg, like a water drop on a spiders web).
That is, in the plot above, by imagining moving points only along the vertical axis so that it packs better, you can see that, for the most part, you can't really do it. (Honestly, maybe the data shown could be packed a bit better, but not dramatically so -- eg, the first arm from the left couldn't be improved, and if any of them could, it's only by moving one or two points inward). Instead, to get the plot like you show, you'll need some motion in x, like would be given by some type of physics engine, which hopefully is holding x close to its original value, but also allows for some variation. But that's a trade-off that needs to be decided on a data level, not a programming level.
For example, here's a plotting library, RAWGraphs, which produces a compact beeswarm plot like the Politico graphs in the question:
But critically, they give the warning:
"It’s important to keep in mind that a Beeswarm plot uses forces to avoid collision between the single elements of the visual model. While this helps to see all the circles in the visualization, it also creates some cases where circles are not placed in the exact position they should be on the linear scale of the X Axis."
Or, similarly, in notes from this this D3 package: "Other implementations use force layout, but the force layout simulation naturally tries to reach its equilibrium by pushing data points along both axes, which can be disruptive to the ordering of the data." And here's a nice demo based on D3 force layout where sliders adjust the relative forces pulling the points to their correct values.
Therefore, this plot is a compromise between a swarm plot and a violin plot (which shows a smoothed average for the distribution envelope), but both of those plots give an honest representation of the data, and in these plots, these closely packed plots representation comes at a cost of a misrepresentation of the x-position of the individual data points. Their advantage seems to be that you can color and click on the individual points (where, if you wanted you could give the actual x-data, although that's not done in the linked plots).
Seaborn violin plot:
Personally, I'm really hesitant to misrepresent the data in some unknown way (that's the outcome of a physics engine calculation but not obvious to the reader). Maybe a better compromise would be a violin filled with non-circular patches, or something like a Raincloud plot.
I created an Observable notebook to calculate the y values of a beeswarm plot with variable-sized circles. The image below gives an example of the results.
If you need to use the JavaScript code in a script, it should be straightforward to copy and paste the code for the AccurateBeeswarm class.
The algorithm simply places the points one by one, as close as possible to the x=0 line while avoiding overlaps. There are also options to add a little randomness to improve the appearance. x values are never altered; this is the one big advantage of this approach over force-directed algorithms such as the one used by RAWGraphs.
I have the recurrent issue of having matplotlib bar graphs containing too many categorical values in the X axis. Resize a figure automatically in matplotlib and Python matplotlib multiple bars does not make the trick because my x values are not x. I am having the idea of splitting the graph into two graphs when it get past a certain amount of data point in the graph. I cannot find anything about in the matplotlib document, nor anywhere.
Is there a matplotlib tool to do that? or i would need to write an algorithm that detects the length of the dataset?
I have a 3D scatter which wanna plot using Plotly in python. The problem is size of the dataframe is too large and I want to use webgl to plot the graph. As I know plotly has go.Scatter3d function to plot scatters. Also, there is a go.Scattergl to plot large datasets. However, U can't find something like go.Scatter3Dgl. What should I do?
I believe 3D scatter plots use webgl by default. If you inspect a scatter_3d you'll find that it is in a class="gl-container". Likewise a regular Scatter is in a class="main-svg and a Scattergl is in a class="gl-container".
From plotly:
Note: It is important to note that any figures containing WebGL traces
(i.e. of type scattergl, heatmapgl, contourgl, scatter3d, surface,
mesh3d, scatterpolargl, cone, streamtube, splom, or parcoords) that
are exported in a vector format will include encapsulated rasters,
instead of vectors, for some parts of the image.