Matplotlib scatterplot axis autoscale fails for small data values - python

When using Matplotlib's scatterplot, sometimes autoscaling works, sometimes not.
How do I fix it?
As in the example provided in the bug report, this code works:
plt.figure()
x = np.array([0,1,2,3])
x = np.array([2,4,5,9])
plt.scatter(x,y)
But when using smaller values, the scaling fails to work:
plt.figure()
x = np.array([0,1,2,3])
x = np.array([2,4,5,9])
plt.scatter(x/10000,y/10000)
Edit: An example can be found here. I have not specified the specific cause in the question, because when encountering the error it is not obvious what causes it. Also, I have specified the solution and cause in my own answer.

In at least Matplotlib 1.5.1, there is a bug where autoscale fails for small data values, as reported here.
The workaround is to use .set_ylim(bottom,top) (documentation) to manually set the data limits (in this example for the y axis, to set the x axis, use .set_xlim(left,right).
To automatically find the data limits that are pleasing to the eyes, the following pseudocode can be used:
def set_axlims(series, marginfactor):
"""
Fix for a scaling issue with matplotlibs scatterplot and small values.
Takes in a pandas series, and a marginfactor (float).
A marginfactor of 0.2 would for example set a 20% border distance on both sides.
Output:[bottom,top]
To be used with .set_ylim(bottom,top)
"""
minv = series.min()
maxv = series.max()
datarange = maxv-minv
border = abs(datarange*marginfactor)
maxlim = maxv+border
minlim = minv-border
return minlim,maxlim

Related

Contour line error with plt.contour in python 3

I am plotting a contour plot in python 3 with matplotlib, and I am getting a strange result. At first, I was using plt.contourf, and notices there was a strange north-south linear artifact in the data that I knew shouldn't be there (I used simulated data). So I changed plt.contourf to plt.contour, and the problem seems to be that some of the edge contours are deformed for some reason (see picture).
Unfortunately, it is hard for me to past a simple version of my code because this is part of a large GUI based app. Here is what I am doing though.
#grid the x,y,z data so it can be used in the contouring
self.beta_zi =
#This is matplot griddata, not the scipy.interpolate.griddata
griddata(self.output_df['x'].values,self.output_df['y'].values,
self.output_df['Beta'].values,
self.cont_grid_x,
self.cont_grid_y,
interp='linear')
#call to the contour itself
self.beta_contour=self.beta_cont_ax.contour(self.cont_grid_x,self.cont_grid_y,
self.beta_zi,
levels=np.linspace(start=0,stop=1, num=11, endpoint=True),
cmap=cm.get_cmap(self.user_beta_cmap.get()))
This seems like a simple problem based on the edges. Has anyone seen this before that can help. I am use a TK backend, which works better with the tkinter based GUI I wrote.
UPDATE: I also tried changing to scipy.interpolate.griddata because matplot's griddata is deprecated, but the problem is the same and persists, so it must be with the actual contour plotting function.
I found that the problem had to do with how I was interpreting the inputs of contour and grid data.
plt.contour and matplot.griddata takes
x = x location of sample data
y = y location of sample data
z = height or z value of sample data
xi = locations of x tick marks on grid
zi = locations of y ticks marks on grid
Typically xi and yi are all the locatoins of each grid node, which is what I was supplying, but in this case you only need the unqiue tick marks on each axis.
Thanks to this post I figured it out.
Matplotlib contour from xyz data: griddata invalid index

Opacity misleading when plotting two histograms at the same time with matplotlib

Let's say I have two histograms and I set the opacity using the parameter of hist: 'alpha=0.5'
I have plotted two histograms yet I get three colors! I understand this makes sense from an opacity point of view.
But! It makes is very confusing to show someone a graph of two things with three colors. Can I just somehow set the smallest bar for each bin to be in front with no opacity?
Example graph
The usual way this issue is handled is to have the plots with some small separation. This is done by default when plt.hist is given multiple sets of data:
import pylab as plt
x = 200 + 25*plt.randn(1000)
y = 150 + 25*plt.randn(1000)
n, bins, patches = plt.hist([x, y])
You instead which to stack them (this could be done above using the argument histtype='barstacked') but notice that the ordering is incorrect.
This can be fixed by individually checking each pair of points to see which is larger and then using zorder to set which one comes first. For simplicity I am using the output of the code above (e.g n is two stacked arrays of the number of points in each bin for x and y):
n_x = n[0]
n_y = n[1]
for i in range(len(n[0])):
if n_x[i] > n_y[i]:
zorder=1
else:
zorder=0
plt.bar(bins[:-1][i], n_x[i], width=10)
plt.bar(bins[:-1][i], n_y[i], width=10, color="g", zorder=zorder)
Here is the resulting image:
By changing the ordering like this the image looks very weird indeed, this is probably why it is not implemented and needs a hack to do it. I would stick with the small separation method, anyone used to these plots assumes they take the same x-value.

Using fill_between() with a Pandas Data Series

I have graphed (using matplotlib) a time series and its associated upper and lower confidence interval bounds (which I calculated in Stata). I used Pandas to read the stata.csv output file and so the series are of type pandas.core.series.Series.
Matplotlib allows me to graph these three series on the same plot, but I wish to shade between the upper and lower confidence bounds to generate a visual confidence interval. Unfortunately I get an error, and the shading doesn't work. I think this is to do with the fact that the functions between which I wish to fill are pandas.core.series.Series.
Another post on here suggests that passing my_series.value instead of my_series will fix this problem; however I cannot get this to work. I'd really appreciate an example.
As long as you don't have NaN values in your data, you should be okay:
In [78]: x = Series(linspace(0, 2 * pi, 10000))
In [79]: y = sin(x)
In [80]: fill_between(x.values, y.min(), y.values, alpha=0.5)
Which yields:

Colormap is being ignored for matplotlib contourf plot with custom levels

I am trying to create a filled contour plot in matplotlib (Win7, 1.1.0). I want to highlight certain values, and the levels are closer to log than linear.
There are numerous colormaps that would suit me, but my choice of cmap is ignored.
Do I need to create a custom "normalize"? If so is each contour colored according to its edge value and then filled with the same color to the next lower value? Why is the symptom of this to ignore my color map ... is this some exception during construction that is being caught and my request is being silently ignored?
My original data had missing values. I have played with making thise nan, large and small ... in each case I have tried masking them and not masking the "outside" values. I have also tried all permutations using the default levels and norm.
lev = [0.1,0.2,0.5,1.0,2.0,4.0,8.0,16.0,32.0]
norml = colors.normalize(0,32)
cs = plt.contourf(x,z,data,cmap=cm.gray, levels=lev, norm = norml)
I hope this snippet is sufficient to at least start the conversation.
Thanks,
Eli
If I understood you correctly, you need to rescale your data to colors using your levels as the basis rather than default linear scaling. If that's right, then you need to use colors.BoundaryNorm as the norm factor. Consider the following example:
x = np.arange(0,8,0.1)
y = np.arange(0,8,0.1)
z = (x[:,None]-4) ** 2 + (y[None,:]-4) ** 2
lev = [0.1,0.2,0.5,1.0,2.0,4.0,8.0,16.0,32.0]
norml = colors.BoundaryNorm(lev, 256)
cs = plt.contourf(x, y, z, cmap = cm.jet, levels = lev, norm = norml)
plt.show()
This yields
Compare it to default Normalize behaviour:
Hope that helps.

Matlab, Python: Fixing colormap to specified values

It is a simple but common task required when trying to fix a colormap according to a 2D matrix of values.
To demonstrate consider the problem in Matlab, the solution does not need to be in Matlab (i.e., the code presented here is only for demonstration purpose).
x = [0,1,2; 3,4,5; 6,7,8];
imagesc(x)
axis square
axis off
So the output is as:
when some values change to over the maximum value it happens like:
x = [0,1,2; 3,4,5; 6,7,18];
which looks logical but makes problems when we wish to compare/trace elements in two maps. Since the colormap association is changed it is almost impossible to find an individual cell for comparison/trace etc.
The solution I implemented is to mask the matrix as:
x = [0,1,2; 3,4,5; 6,7,18];
m = 8;
x(x>=m) = m;
which works perfectly.
Since the provided code requires searching/filtering (extra time consuming!) I wonder if there is a general/more efficient way for this job to be implemented in Matlab, Python etc?
One of the cases that this issue occurs is when we have many simulations sequentially and wish to make a sense-making animation of the progress; in this case each color should keep its association fixed.
In Python using package MatPlotLib the solution is as follows:
import pylab as pl
x = [[0,1,2],[3,4,5],[6,7,18]]
pl.matshow(x, vmin=0, vmax=8)
pl.axis('image')
pl.axis('off')
show()
So vmin and vmax are boundary limits for the full range of colormap.
The indexing is pretty quick so I don't think you need worry.
However, in Matlab, you can pass in the clims argument to imagesc:
imagesc(x,[0 8]);
This maps all values above 8 to the top colour in the colour scale, and all values below 0 to the bottom colour in the colour scale, and then stretches the scale for colours in-between.
imagesc documentation.
f1 = figure;
x = [0,1,2; 3,4,5; 6,7,8];
imagesc(x)
axis square
axis off
limits = get(gca(f1),'CLim');
f2 = figure;
z = [0,1,2; 3,4,5; 6,7,18];
imagesc(z)
axis square
axis off
caxis(limits)

Categories

Resources