I love Matplotlib but sometimes the lack of 'idiots guide' examples is infuriating.
Long story short, I have several large lists of XYZ positional data from simulated motion throw 3D space from multiple entities. I currently do this statically, i.e.
for entity in entities:
x=map(itemgetter(0),positionLog(entity))
y=map(itemgetter(1),positionLog(entity))
z=map(itemgetter(2),positionLog(entity))
ax.plot(x,y,z,label=nameLookup(entity))
plt.show()
What I'd like to do is to have these lists 'step' out, i.e where all the entities are at t(0), then add in the t(1) points and so on.
However, it's not clear in any of the examples I've found how to accomplish this. The examples that I see show how to do individual runs, i.e. for one entity, but I can't see how to do all (N) in lock-step.
Suggestions please? :D
So one way to do what I think you want is to make x, y, and z lists. add t(0) to the plot and show the plot. Next, append t(1) to you original x, y, z lists, update the plot with the new x, y, z coordinates, then refresh the plot (which is the old way of doing animations in matplotlib).
This example: http://matplotlib.sourceforge.net/examples/animation/basic_example.html
uses the built in animation function to generate an animation the new way, which is exactly what I think you want, just add your third coordinate.
Related
let me describe my problem, so I am creating a simple graphing calculator, the way I did it was that every y coordinate is calculated by putting it into a function f(x) then graphing the point (x, f(x)).
To make things simple for myself, whenever I wanted to shift the graph or zoom in I just adjust the dimensions of the current view and then recalculate all the new points on the screen. For example going from This to this by zooming in and shifting the screen would mean that every single point has been recalculated, for me to get the graph to look like it is formed by actual lines instead of just points I divide the width of the screen into about 1000 ~ 10000 points and plot it and if there are enough points it just looks like lines. These points are made by tuple pairs of floats.
As you could imagine there is a lot of overlap and recalculations that may be slowing down the program and so I am wondering what the best way to calculate a (x, f(x)) point, store it and anytime I change the view of the graph, if that x happens to be in view, be able to retrieve the f(x) and skip the calculation. The thing is there is going to be like thousands and thousands of these points and so I figured using list operations like "i in lst" is not efficient enough.
I am trying to make my graph as fast as possible so any suggestions would be helpful! Thanks.
I have come across a number of plots (end of page) that are very similar to scatter / swarm plots which jitter the y-axis in order avoid overlapping dots / bubbles.
How can I get the y values (ideally in an array) based on a given set of x and z values (dot sizes)?
I found the python circlify library but it's not quite what I am looking for.
Example of what I am trying to create
EDIT: For this project I need to be able to output the x, y and z values so that they can be plotted in the user's tool of choice. Therefore I am more interested in solutions that generate the y-coords rather than the actual plot.
Answer:
What you describe in your text is known as a swarm plot (or beeswarm plot) and there are python implementations of these (esp see seaborn), but also, eg, in R. That is, these plots allow adjustment of the y-position of each data point so they don't overlap, but otherwise are closely packed.
Seaborn swarm plot:
Discussion:
But the plots that you show aren't standard swarm plots (which almost always have the weird looking "arms"), but instead seem to be driven by some type of physics engine which allows for motion along x as well as y, which produces the well packed structures you see in the plots (eg, like a water drop on a spiders web).
That is, in the plot above, by imagining moving points only along the vertical axis so that it packs better, you can see that, for the most part, you can't really do it. (Honestly, maybe the data shown could be packed a bit better, but not dramatically so -- eg, the first arm from the left couldn't be improved, and if any of them could, it's only by moving one or two points inward). Instead, to get the plot like you show, you'll need some motion in x, like would be given by some type of physics engine, which hopefully is holding x close to its original value, but also allows for some variation. But that's a trade-off that needs to be decided on a data level, not a programming level.
For example, here's a plotting library, RAWGraphs, which produces a compact beeswarm plot like the Politico graphs in the question:
But critically, they give the warning:
"It’s important to keep in mind that a Beeswarm plot uses forces to avoid collision between the single elements of the visual model. While this helps to see all the circles in the visualization, it also creates some cases where circles are not placed in the exact position they should be on the linear scale of the X Axis."
Or, similarly, in notes from this this D3 package: "Other implementations use force layout, but the force layout simulation naturally tries to reach its equilibrium by pushing data points along both axes, which can be disruptive to the ordering of the data." And here's a nice demo based on D3 force layout where sliders adjust the relative forces pulling the points to their correct values.
Therefore, this plot is a compromise between a swarm plot and a violin plot (which shows a smoothed average for the distribution envelope), but both of those plots give an honest representation of the data, and in these plots, these closely packed plots representation comes at a cost of a misrepresentation of the x-position of the individual data points. Their advantage seems to be that you can color and click on the individual points (where, if you wanted you could give the actual x-data, although that's not done in the linked plots).
Seaborn violin plot:
Personally, I'm really hesitant to misrepresent the data in some unknown way (that's the outcome of a physics engine calculation but not obvious to the reader). Maybe a better compromise would be a violin filled with non-circular patches, or something like a Raincloud plot.
I created an Observable notebook to calculate the y values of a beeswarm plot with variable-sized circles. The image below gives an example of the results.
If you need to use the JavaScript code in a script, it should be straightforward to copy and paste the code for the AccurateBeeswarm class.
The algorithm simply places the points one by one, as close as possible to the x=0 line while avoiding overlaps. There are also options to add a little randomness to improve the appearance. x values are never altered; this is the one big advantage of this approach over force-directed algorithms such as the one used by RAWGraphs.
I'm making a simple word game simulation in Python and need a way to visualise a grid of coordinates. The input would be a simple 2D array with either '' or a character in each spot.
I need each spot in the grid to either be blank or have one letter. The x and y axes need to have arbitrary start points, such as -20. It seems like Matplotlib might do what I want, but having looked around on a bunch of Stackoverflow questions and Matplotlib help pages, I can't find what I need.
The question here partly has what I need:
Show the values in the grid using matplotlib
Except I want no colour, the values are single characters, and the axis labels need to allow arbitrary start points.
Does anyone know whether Matplotlib is the right library to do this sort of thing, or if I should try something else? Performance matters but it's not the most important thing. I don't need any interactivity with the display window, it's purely read only.
In short, I'm trying to find a faster way to plot real time data coming through a serial input. The data looks like a coordinate (x,y) and about 40 are coming in each second. The stream will store the data in a array, using x as the index and setting y as the value for it. This portion is being threaded. While the stream can read in data immediatley, the pyqtgraph library isn't able to keep up with this speed.
Here's the portion of the code where I am plotting the data. The distances and theta variables are arrays with 6400 indexes. They have been transformed into polar values and plotted with each iteration. I added a delay there just to help keep it real-time, though it's only a temporary solution.
while True:
x = distances * np.cos(theta)
y = distances * np.sin(theta)
plot.plot(x, y, pen=None, symbol='o', clear=True)
pg.QtGui.QApplication.processEvents()
#sleep(0.025)
While it's going the way I expect it to, it's not able to plot the most recent data from the serial input. It's easily several seconds behind from the most recent reads, probably because it can not plot 6400 points every 1/40 of a second. I'm wondering if there's a way to only update 1 point rather than having to re-plot the entire scatter every time in pyqtgraph.
It may be possible to plot based on point, but if so, is there a way to keep track of each individual point? There should be no point that shares the same angle value and have different distances, and should essentially overwrite it.
I'm also wondering if there are other graphing animation libraries out there that may be a possible solution worth considering.
This is what it looks like, if you're wondering:
Threading allows you to always have data available to plot but the plot speed is bottlenecked due to the paintEvent latency for each plot iteration. From my understanding, there is no way to update 1 point per paint event with setData instead of having to replot the entire data set for each iteration. So if you have 6400, you must repaint all points even if you are updating the just data with 1 additional point.
Potential workarounds to this include downsampling your data or to only plot once every X amount of data points. Essentially, you are capped at the speed you can plot data to the screen but you can alter your data set to display the most relevant information with less screen refreshes.
I have a panel, 3-dimension data table. It's like a grid, just points(x,y,z).
I want to input two of them, then return the other one number, like inputting x and y, return z. But the x, y may not just exist in the grip points, so it needs to calculate by linear or any method.
In addition, I want to plot as a surface.
I googled and found the numpy.meshgrid(), I am confused how to use it. Or could you recommend any package or function could do this task?
Thank you so much !