Move ax.text a bit away from data point - python

I am trying to create a type of map that plots a route between points.
As such I have something that looks like the image below:
And as you can see, the bll label is very close to the data point. I would like it to be a bit further away, so you can actually see the dot.
Also, the text is just a regular ax.text plot with x and y values.
My problem is, that yes, I could just add some kind of percentage value or something. However, depending on the coordinate and x-value, this will not be the same depending on you being very far left, where x => 0, or far right where x => max value of the plot. Then you could argue I could just add 10 or 20 units, but in my case I produce different maps depending on different routes. So this means that the x-axis values are not the same. Sometimes the map is big, and sometimes it's small. So using the same value in all maps will make bll text move either very much or very little depending on size.
Also, if I am ever to zoom on the map, this would adjust the text as well if I were to use some kind of value extension, since the distance between the data point AND the text will also increase relative to the size of the zoom, like this figure:
Can this be solved in a somewhat simple manner ?
Best regards

Related

Splitting a graph

I have an array of 575 points.
When I represent it on graph, I get following curve shown in the attached image.
I want to split it in sub graphs when slope becomes 0 or you can say when the graph becomes parallel to x-axis.
Thanks in advance.
I understand you want to have some degree of smoothness, otherwise you will have as a result many small separated regions of the graph.
You also may need to specifically define what you want to consider as parallel to the x-axis.
I suggest to start by moving a running window of certain length that categorizes each range being studied as horizontal given certain condition.
This condition can be something like "all values are inside certain range". This condition may take into account characteristics like the variance and the mean of the points inside the window. For example, "all values are between 101% and 99% of the mean."

How to efficiently store, check for inclusion and retrieve large amounts of float numbers in python?

let me describe my problem, so I am creating a simple graphing calculator, the way I did it was that every y coordinate is calculated by putting it into a function f(x) then graphing the point (x, f(x)).
To make things simple for myself, whenever I wanted to shift the graph or zoom in I just adjust the dimensions of the current view and then recalculate all the new points on the screen. For example going from This to this by zooming in and shifting the screen would mean that every single point has been recalculated, for me to get the graph to look like it is formed by actual lines instead of just points I divide the width of the screen into about 1000 ~ 10000 points and plot it and if there are enough points it just looks like lines. These points are made by tuple pairs of floats.
As you could imagine there is a lot of overlap and recalculations that may be slowing down the program and so I am wondering what the best way to calculate a (x, f(x)) point, store it and anytime I change the view of the graph, if that x happens to be in view, be able to retrieve the f(x) and skip the calculation. The thing is there is going to be like thousands and thousands of these points and so I figured using list operations like "i in lst" is not efficient enough.
I am trying to make my graph as fast as possible so any suggestions would be helpful! Thanks.

Creating a packed bubble / scatter plot in python (jitter based on size to avoid overlapping)

I have come across a number of plots (end of page) that are very similar to scatter / swarm plots which jitter the y-axis in order avoid overlapping dots / bubbles.
How can I get the y values (ideally in an array) based on a given set of x and z values (dot sizes)?
I found the python circlify library but it's not quite what I am looking for.
Example of what I am trying to create
EDIT: For this project I need to be able to output the x, y and z values so that they can be plotted in the user's tool of choice. Therefore I am more interested in solutions that generate the y-coords rather than the actual plot.
Answer:
What you describe in your text is known as a swarm plot (or beeswarm plot) and there are python implementations of these (esp see seaborn), but also, eg, in R. That is, these plots allow adjustment of the y-position of each data point so they don't overlap, but otherwise are closely packed.
Seaborn swarm plot:
Discussion:
But the plots that you show aren't standard swarm plots (which almost always have the weird looking "arms"), but instead seem to be driven by some type of physics engine which allows for motion along x as well as y, which produces the well packed structures you see in the plots (eg, like a water drop on a spiders web).
That is, in the plot above, by imagining moving points only along the vertical axis so that it packs better, you can see that, for the most part, you can't really do it. (Honestly, maybe the data shown could be packed a bit better, but not dramatically so -- eg, the first arm from the left couldn't be improved, and if any of them could, it's only by moving one or two points inward). Instead, to get the plot like you show, you'll need some motion in x, like would be given by some type of physics engine, which hopefully is holding x close to its original value, but also allows for some variation. But that's a trade-off that needs to be decided on a data level, not a programming level.
For example, here's a plotting library, RAWGraphs, which produces a compact beeswarm plot like the Politico graphs in the question:
But critically, they give the warning:
"It’s important to keep in mind that a Beeswarm plot uses forces to avoid collision between the single elements of the visual model. While this helps to see all the circles in the visualization, it also creates some cases where circles are not placed in the exact position they should be on the linear scale of the X Axis."
Or, similarly, in notes from this this D3 package: "Other implementations use force layout, but the force layout simulation naturally tries to reach its equilibrium by pushing data points along both axes, which can be disruptive to the ordering of the data." And here's a nice demo based on D3 force layout where sliders adjust the relative forces pulling the points to their correct values.
Therefore, this plot is a compromise between a swarm plot and a violin plot (which shows a smoothed average for the distribution envelope), but both of those plots give an honest representation of the data, and in these plots, these closely packed plots representation comes at a cost of a misrepresentation of the x-position of the individual data points. Their advantage seems to be that you can color and click on the individual points (where, if you wanted you could give the actual x-data, although that's not done in the linked plots).
Seaborn violin plot:
Personally, I'm really hesitant to misrepresent the data in some unknown way (that's the outcome of a physics engine calculation but not obvious to the reader). Maybe a better compromise would be a violin filled with non-circular patches, or something like a Raincloud plot.
I created an Observable notebook to calculate the y values of a beeswarm plot with variable-sized circles. The image below gives an example of the results.
If you need to use the JavaScript code in a script, it should be straightforward to copy and paste the code for the AccurateBeeswarm class.
The algorithm simply places the points one by one, as close as possible to the x=0 line while avoiding overlaps. There are also options to add a little randomness to improve the appearance. x values are never altered; this is the one big advantage of this approach over force-directed algorithms such as the one used by RAWGraphs.

Displaying a grid of letters with coordinates on axes

I'm making a simple word game simulation in Python and need a way to visualise a grid of coordinates. The input would be a simple 2D array with either '' or a character in each spot.
I need each spot in the grid to either be blank or have one letter. The x and y axes need to have arbitrary start points, such as -20. It seems like Matplotlib might do what I want, but having looked around on a bunch of Stackoverflow questions and Matplotlib help pages, I can't find what I need.
The question here partly has what I need:
Show the values in the grid using matplotlib
Except I want no colour, the values are single characters, and the axis labels need to allow arbitrary start points.
Does anyone know whether Matplotlib is the right library to do this sort of thing, or if I should try something else? Performance matters but it's not the most important thing. I don't need any interactivity with the display window, it's purely read only.

Swarmplot with more than just one categorical level (Python)

I am trying to make a swam plot that contains more information than a single categorical level and two variables. I am looking to create something like this
So ideally, something like this would work (but it does not):
ax = sns.swarmplot(x="round_id", y="independent_error_abs", hue="difficulty", hue_order=['easy','medium','hard'], size="followers", markershape="rank",data=df)
where "difficulty", "followers", and "rank" determine the color of the point, the size of the point, and the shape of the point, respectively.
No, this is not possible with swarmplot. Personally I find this kind of plot very difficult to interpret: a good statistical plot should make the patterns in the data immediately apparent, whereas plots with multiple categorical variables that manipulate the size or shape of the points quickly become more like puzzles. My recommendation in these cases (following Andrew Gelman) is to make more than one plot, each with relatively simple semantics.
You don't have to agree, of course, but you will have to make it yourself using matplotlib.
I am facing the same issue, and actually the solution seems to be pretty simple at least for the marker type!
Just divide your dataframe in subdataframes, each for a different marker type. The you make a swarmplot on top of each other, and that's it.
If the size of the dot, is also a categorical variable, you just need to do the same as above where each subdtaframe will represent a marker and a different size.
If size is continuous, then it seems you would need to plot each dot independently in a for loop, but for that I would use matplotlib.pyplot.

Categories

Resources