I have a set of PDF that I need to plot for a certain section of the PDF domain. However, when I plot my lines on a 3d plot I get tails for each PDF,
Is there a clean way to not plot the tails that happen outside my plot limits? I know I can change the data to NaNs to achieve the same effect but I want to do this in matplotlib. Here is my current workaround code,
`# trim the data
y = np.ones(PDF_x.shape)*PDF_x
y[y>95]= np.nan
y[y<75]= np.nan
# plot the data
fig = plt.figure()
ax = fig.gca(projection='3d')
for i in range(PDF_capacity.shape[1]):
ax.plot(life[i]*np.ones((PDF_x.shape)),y,PDF_capacity[:,i], label='parametric curve')
# set the axis limits
ax.set_ylim(75,95)
# add axis labels
ax.set_xlabel('charge cycles to failure point of 75% capacity')
ax.set_ylabel('capacity at 100 charge cycles')
ax.set_zlabel('probability')`
After trimming I can make the following plot,
Masking the data with nan in the way you're doing it is a good and practical solution.
Since matplotlib 3D plots are projections into 2D space, it would be hard to implement automatic clipping. While I do think it would be possible, I'm not convinced that it's worth the effort. First, because you would need to treat different kinds of plots differently, second, because at least in some cases it would probably turn out that masking the data is still the best choice. Now, doing a complex subclassing of the plotting objects just to do the same thing that can be manually done in one or two lines is probably overkill.
My clear recommendation would therefore be to use the solution you already have. Especially since it does not seem to have any drawbacks so far.
Related
Is there any way to decrease the density of data labels in Matplotlib? Right now, it looks like this.
This is my code :
countries_list.insert(0, "(0,0)")
arrowprops = dict(arrowstyle='<-', color='blue', linewidth=1, mutation_scale=10)
for i, txt in enumerate(countries_list):
ax.annotate(string.capwords(txt), (x_list[i], y_list[i]), arrowprops = arrowprops)
Thanks.
Edit: I'm thinking more on the side of like is there maybe an automatic option to automatically rearrange the arrows the point to different locations around the plot to make the labels more readable?
so I don't think there is really much you can do as far as adjusting the text size, since you would need to make it a tiny unreadable font to have each word be separate. I think what you are going to want to do is change the scale of your y axis. Right now you have a linear scale on your y axis with a very nonlinear distribution of your data, hence why you have a ton of data points squished near the bottom.
For your x axis set it with something like the following:
ax.set_yscale('log')
check out more about axes and scaling on their website:
enter link description here
Also just found this, which will probably produce a much nicer looking plot than log scaling, especially since I dont know what kind of distribution we are looking at with your data.
enter link description here
You can use that to scale your y axis relative to your dataset and extreme values.
I have come across a number of plots (end of page) that are very similar to scatter / swarm plots which jitter the y-axis in order avoid overlapping dots / bubbles.
How can I get the y values (ideally in an array) based on a given set of x and z values (dot sizes)?
I found the python circlify library but it's not quite what I am looking for.
Example of what I am trying to create
EDIT: For this project I need to be able to output the x, y and z values so that they can be plotted in the user's tool of choice. Therefore I am more interested in solutions that generate the y-coords rather than the actual plot.
Answer:
What you describe in your text is known as a swarm plot (or beeswarm plot) and there are python implementations of these (esp see seaborn), but also, eg, in R. That is, these plots allow adjustment of the y-position of each data point so they don't overlap, but otherwise are closely packed.
Seaborn swarm plot:
Discussion:
But the plots that you show aren't standard swarm plots (which almost always have the weird looking "arms"), but instead seem to be driven by some type of physics engine which allows for motion along x as well as y, which produces the well packed structures you see in the plots (eg, like a water drop on a spiders web).
That is, in the plot above, by imagining moving points only along the vertical axis so that it packs better, you can see that, for the most part, you can't really do it. (Honestly, maybe the data shown could be packed a bit better, but not dramatically so -- eg, the first arm from the left couldn't be improved, and if any of them could, it's only by moving one or two points inward). Instead, to get the plot like you show, you'll need some motion in x, like would be given by some type of physics engine, which hopefully is holding x close to its original value, but also allows for some variation. But that's a trade-off that needs to be decided on a data level, not a programming level.
For example, here's a plotting library, RAWGraphs, which produces a compact beeswarm plot like the Politico graphs in the question:
But critically, they give the warning:
"It’s important to keep in mind that a Beeswarm plot uses forces to avoid collision between the single elements of the visual model. While this helps to see all the circles in the visualization, it also creates some cases where circles are not placed in the exact position they should be on the linear scale of the X Axis."
Or, similarly, in notes from this this D3 package: "Other implementations use force layout, but the force layout simulation naturally tries to reach its equilibrium by pushing data points along both axes, which can be disruptive to the ordering of the data." And here's a nice demo based on D3 force layout where sliders adjust the relative forces pulling the points to their correct values.
Therefore, this plot is a compromise between a swarm plot and a violin plot (which shows a smoothed average for the distribution envelope), but both of those plots give an honest representation of the data, and in these plots, these closely packed plots representation comes at a cost of a misrepresentation of the x-position of the individual data points. Their advantage seems to be that you can color and click on the individual points (where, if you wanted you could give the actual x-data, although that's not done in the linked plots).
Seaborn violin plot:
Personally, I'm really hesitant to misrepresent the data in some unknown way (that's the outcome of a physics engine calculation but not obvious to the reader). Maybe a better compromise would be a violin filled with non-circular patches, or something like a Raincloud plot.
I created an Observable notebook to calculate the y values of a beeswarm plot with variable-sized circles. The image below gives an example of the results.
If you need to use the JavaScript code in a script, it should be straightforward to copy and paste the code for the AccurateBeeswarm class.
The algorithm simply places the points one by one, as close as possible to the x=0 line while avoiding overlaps. There are also options to add a little randomness to improve the appearance. x values are never altered; this is the one big advantage of this approach over force-directed algorithms such as the one used by RAWGraphs.
I have a large number of diagrams (all of the same size) which I want to display in a single file.
So I used something like
fig, ax = plt.subplots(100)
for i in range (0,99):
ax[i] = draw_my_diagram(i)
The problem that I have, is that the plotted diagrams get really small if i do it in this way. But I want to keep their original size.
I tried to set their figsize manually to the value what I thought I needed them to be, but this messed up the plot. In particular, I drew some patches in the plot, which were not in the desired position and shape anymore.
So what would be the best way to solve this? Maybe there is even a flag that I can set to just keep the original size of the subplots?
This is just a small question. I use the sentence below to control the three axes ranges.
mlab.axes(xlabel='x', ylabel='y', zlabel='z',ranges=(0,10000,0,10000,0,22),nb_labels=10)
In fact the real data ranges are (3000,4000),(5000,6000),(0,22) respectively.
However the axes of the figure I plot is scaled to (0,10000,0,10000,0,22).
I did not find a parameter of mlab.axes can control that.
Do I have to calculate the data ranges every time? Without knowing the real data range, is there a way to make the axis range obey the real data?
For an article I am generating plots of deformed finite element meshes, which I visualize using matplotlib's polycollection. The images are saved as pdf.
Problems arise for high density meshes, for which the naive approach results in files that are too large and rendering too intensive to be practical.
For these meshes it really makes no sense to plot each element as a polygon; it could easily be rasterized, as is done when saving the image as jpg or png. However, for print I would like to hold on to a sharp frame, labels, and annotations.
Does anyone know if it is possible to achieve this kind of hybrid rasterization in matplotlib?
I can think of solutions involving imshow, and bypassing polycollection, but I would much prefer to use matplotlib's built-in components.
Thanks for your advice.
Just pass the rasterized=True keyword to your collection constructor. Example:
col = collections.PolyCollection(<arguments>, rasterized=True)
This allows a selective rasterization of that element only (e.g., if you did a normal plot on top of it, it would be vectorized by default). Most commands like plot or imshow can also take the rasterized keyword. If one wants to rasterize the whole figure (including labels and annotations), this would do it:
fig = plt.figure()
a = fig.add_subplot(1,1,1, rasterized=True)
(But this is not what you want, as stated in the question.)