I am using Python's matplotlib.pyplot.contourf to create a contour plot of my data with a color bar. I have done this successfully countless times, even with other layers of the same variable. However, when the values get small (on the order of 1E-12), parts of the contour show up white. The white color does not show up in the color bar either. Does anyone know what causes this and how to fix this? The faulty contour is attached below.
a1 = plt.contourf(np.linspace(1,24,24),np.linspace(1,20,20),np.transpose(data[:,:,15]))
plt.colorbar(a1)
plt.show()
tl;dr
Given the new information, matplotlib couldn't set the right number of levels (see parameters in the documentation) for your data leaving data unplotted. To fix that you need to tell matplotlib to extend the limits with either plt.contourf(..., extend="max") or plt.contourf(..., extend="both")
Extensive answer
There are a few reasons why contourf() is showing white zones with a colormap that doesn't include white.
NaN values
NaN values are never plotted.
Masked data
If you mask data before plotting, it won't appear in the plot. But you should know if you masked your data.
Although, you may have unnoticed mask your data if you use something like Tick locator = LogLocator().
Matplotlib couldn't set the right levels for your data
Sometimes matplotlib doesn't set the right levels, leaving some of your data without plotting.
To fix that you can user plt.contourf(..., extend=EXTENDS) where EXTENDS can be "neither", "both", "min", "max"
Coarse grid
contourf plots whitespace over finite data. Past answers do not correct
One remark, white section in the plot can also occur if the X and Y vectors data points are not equally spaced. In that case best to use function tricontourf().
I was facing the same problem recently, when there was data available even higher/lower than the levels I have set. So, the plt.contourf fills the contours exclusively given by you, and it neglects any other higher or lower values present in your data.
I solved this by adding a key word argument extend="both", which for your case would be something like this:
a1 = plt.contourf(np.linspace(1,24,24),np.linspace(1,20,20),np.transpose(data[:,:,15]), extend="both")
or in general form:
a1 = plt.contourf(x,y,variable[:,:,15],extend="both")
By doing this, you're instructing the module to plot the higher(/lower) values according to the highest(/lowest) filled contour.
If you want only to extend in the lower or higher range, you can change the keyword argument to
extend="min" or extend ="max"
Related
I have come across a number of plots (end of page) that are very similar to scatter / swarm plots which jitter the y-axis in order avoid overlapping dots / bubbles.
How can I get the y values (ideally in an array) based on a given set of x and z values (dot sizes)?
I found the python circlify library but it's not quite what I am looking for.
Example of what I am trying to create
EDIT: For this project I need to be able to output the x, y and z values so that they can be plotted in the user's tool of choice. Therefore I am more interested in solutions that generate the y-coords rather than the actual plot.
Answer:
What you describe in your text is known as a swarm plot (or beeswarm plot) and there are python implementations of these (esp see seaborn), but also, eg, in R. That is, these plots allow adjustment of the y-position of each data point so they don't overlap, but otherwise are closely packed.
Seaborn swarm plot:
Discussion:
But the plots that you show aren't standard swarm plots (which almost always have the weird looking "arms"), but instead seem to be driven by some type of physics engine which allows for motion along x as well as y, which produces the well packed structures you see in the plots (eg, like a water drop on a spiders web).
That is, in the plot above, by imagining moving points only along the vertical axis so that it packs better, you can see that, for the most part, you can't really do it. (Honestly, maybe the data shown could be packed a bit better, but not dramatically so -- eg, the first arm from the left couldn't be improved, and if any of them could, it's only by moving one or two points inward). Instead, to get the plot like you show, you'll need some motion in x, like would be given by some type of physics engine, which hopefully is holding x close to its original value, but also allows for some variation. But that's a trade-off that needs to be decided on a data level, not a programming level.
For example, here's a plotting library, RAWGraphs, which produces a compact beeswarm plot like the Politico graphs in the question:
But critically, they give the warning:
"It’s important to keep in mind that a Beeswarm plot uses forces to avoid collision between the single elements of the visual model. While this helps to see all the circles in the visualization, it also creates some cases where circles are not placed in the exact position they should be on the linear scale of the X Axis."
Or, similarly, in notes from this this D3 package: "Other implementations use force layout, but the force layout simulation naturally tries to reach its equilibrium by pushing data points along both axes, which can be disruptive to the ordering of the data." And here's a nice demo based on D3 force layout where sliders adjust the relative forces pulling the points to their correct values.
Therefore, this plot is a compromise between a swarm plot and a violin plot (which shows a smoothed average for the distribution envelope), but both of those plots give an honest representation of the data, and in these plots, these closely packed plots representation comes at a cost of a misrepresentation of the x-position of the individual data points. Their advantage seems to be that you can color and click on the individual points (where, if you wanted you could give the actual x-data, although that's not done in the linked plots).
Seaborn violin plot:
Personally, I'm really hesitant to misrepresent the data in some unknown way (that's the outcome of a physics engine calculation but not obvious to the reader). Maybe a better compromise would be a violin filled with non-circular patches, or something like a Raincloud plot.
I created an Observable notebook to calculate the y values of a beeswarm plot with variable-sized circles. The image below gives an example of the results.
If you need to use the JavaScript code in a script, it should be straightforward to copy and paste the code for the AccurateBeeswarm class.
The algorithm simply places the points one by one, as close as possible to the x=0 line while avoiding overlaps. There are also options to add a little randomness to improve the appearance. x values are never altered; this is the one big advantage of this approach over force-directed algorithms such as the one used by RAWGraphs.
I am trying to make a swam plot that contains more information than a single categorical level and two variables. I am looking to create something like this
So ideally, something like this would work (but it does not):
ax = sns.swarmplot(x="round_id", y="independent_error_abs", hue="difficulty", hue_order=['easy','medium','hard'], size="followers", markershape="rank",data=df)
where "difficulty", "followers", and "rank" determine the color of the point, the size of the point, and the shape of the point, respectively.
No, this is not possible with swarmplot. Personally I find this kind of plot very difficult to interpret: a good statistical plot should make the patterns in the data immediately apparent, whereas plots with multiple categorical variables that manipulate the size or shape of the points quickly become more like puzzles. My recommendation in these cases (following Andrew Gelman) is to make more than one plot, each with relatively simple semantics.
You don't have to agree, of course, but you will have to make it yourself using matplotlib.
I am facing the same issue, and actually the solution seems to be pretty simple at least for the marker type!
Just divide your dataframe in subdataframes, each for a different marker type. The you make a swarmplot on top of each other, and that's it.
If the size of the dot, is also a categorical variable, you just need to do the same as above where each subdtaframe will represent a marker and a different size.
If size is continuous, then it seems you would need to plot each dot independently in a for loop, but for that I would use matplotlib.pyplot.
Is it possible to force plt.scatter into the same color levels as plt.contourf and plt.contour? For example, I have code that makes a plot like this:
to make the first subplot, I use
cs=m[0].scatter(xs,ys,c=obsData,cmap=plt.cm.jet)
m.colorbar(cs)
To make the second subplot, I use
cs2=m[1].contourf(x,y,areaData,cmap=cs.cmap)
And for each subsequent subplot, I use
m[ind].contourf(x,y,areaData,cmap=cs.cmap,levels=cs2.levels
where areaData is recalculated within a loop.
My question is, how can I force the first subplot to have the same colors as the other subplots? I am looking for an equivalent to the levels=cs2.levels keyword argument.
As you noted in a comment, your scatter and contour data are not directly related, but you want to display them on the same colormap.
I suggest setting a common colour span that contains both sets of data. Since obsData refers to the scatter points and areaData to the contours, I'd set
vmin,vmax = (fun(np.concatenate([obsData,areaData])) for fun in (np.min,np.max))
to determine the span of the collected data set (obviously, to be generalized for multiple input data sets). These can be passed to scatter and contourf to set the limits of the colour mapping:
cs = m[0].scatter(xs,ys,c=obsData,cmap=plt.cm.viridis,vmin=vmin,vmax=vmax)
cs2 = m[1].contourf(x,y,areaData,cmap=cs.cmap,vmin=vmin,vmax=vmax)
Some manual increase of the span might be in order to obtain a pretty result.
Note that I changed the colormap to viridis. If you really want to fairly represent your data, this should be your first step.
I am producing an imshow plot in Python.
In the final plot, I have strips/columns of data, between which there is no data at all - kind of like a barcode.
At the moment, where I have no data, I have just set all values to zero. The color of these regions of no data is therefore whatever colour represents zero in my colorbar - green in my case.
What I really want is for these columns/strips just to be white, and to make to really clear that these are regions of NO data.
I realise that I could change the colorbar so that the zero is white, but I really want to distinguish the regions of no data from any zeros that might be in the data.
Thank you.
try setting those values to np.nan ( or float('nan')); you may also want to pass interpolation='nearest' to imshow as an argument.
I am displaying a jpg image (I rotate this by 90 degrees, if this is relevant) and of course
the axes display the pixel coordinates. I would like to convert the axis so that instead of displaying the pixel number, it will display my unit of choice - be it radians, degrees, or in my case an astronomical coordinate. I know the conversion from pixel to (eg) degree. Here is a snippet of what my code looks like currently:
import matplotlib.pyplot as plt
import Image
import matplotlib
thumb = Image.open(self.image)
thumb = thumb.rotate(90)
dpi = plt.rcParams['figure.dpi']
figsize = thumb.size[0]/dpi, thumb.size[1]/dpi
fig = plt.figure(figsize=figsize)
plt.imshow(thumb, origin='lower',aspect='equal')
plt.show()
...so following on from this, can I take each value that matplotlib would print on the axis, and change/replace it with a string to output instead? I would want to do this for a specific coordinate format - eg, rather than an angle of 10.44 (degrees), I would like it to read 10 26' 24'' (ie, degrees, arcmins, arcsecs)
Finally on this theme, I'd want control over the tick frequency, on the plot. Matplotlib might print the axis value every 50 pixels, but I'd really want it every (for example) degree.
It sounds like I would like to define some kind of array with the pixel values and their converted values (degrees etc) that I want to be displayed, having control over the sampling frequency over the range xmin/xmax range.
Are there any matplotlib experts on Stack Overflow? If so, thanks very much in advance for your help! To make this a more learning experience, I'd really appreciate being prodded in the direction of tutorials etc on this kind of matplotlib problem. I've found myself getting very confused with axes, axis, figures, artists etc!
Cheers,
Dave
It looks like you're dealing with the matplotlib.pyplot interface, which means that you'll be able to bypass most of the dealing with artists, axes, and the like. You can control the values and labels of the tick marks by using the matplotlib.pyplot.xticks command, as follows:
tick_locs = [list of locations where you want your tick marks placed]
tick_lbls = [list of corresponding labels for each of the tick marks]
plt.xticks(tick_locs, tick_lbls)
For your particular example, you'll have to compute what the tick marks are relative to the units (i.e. pixels) of your original plot (since you're using imshow) - you said you know how to do this, though.
I haven't dealt with images much, but you may be able to use a different plotting method (e.g. pcolor) that allows you to supply x and y information. That may give you a few more options for specifying the units of your image.
For tutorials, you would do well to look through the matplotlib gallery - find something you like, and read the code that produced it. One of the guys in our office recently bought a book on Python visualization - that may be worthwhile looking at.
The way that I generally think of all the various pieces is as follows:
A Figure is a container for all the Axes
An Axes is the space where what you draw (i.e. your plot) actually shows up
An Axis is the actual x and y axes
Artists? That's too deep in the interface for me: I've never had to worry about those yet, even though I rarely use the pyplot module in production plots.