Extra bar in the first bin of a pyplot histogram - python

When plotting a histogram, there is an extra bar that shouldn't be there. The bar in the first bin has a non-zero height, even though the frequency as reported by hist output is zero.
Here is a minimal example:
import numpy as np
import matplotlib.pyplot as plt
import random
t=np.array([random.random() for _ in range(10000)])
bins=np.linspace(-0.1, 1.1, 101)
plt.hist(t, bins)
plt.show()
A bar is produced in the first bin, which can be seen midway between the left edge of this figure and the main bulk of the histogram (difficult to see on the thumbnail, enlarge the image):
Printing out print("%2.32f" %plt.hist(t1, bins)[0][1]) gives the value as precisely zero.

This is a small bug in matplotlib that was first introduced in this commit. Basically, the vertices of all of the bin edges are set to 'snap' to the nearest pixel center, with the exception of the first bin. This was done in order to fix another bug, where snapping the first bin edge prevented the histogram bins from aligning properly with corresponding line plots.
There is an open issue relating to this on the matplotlib GitHub page, so it should hopefully be resolved soon.
In the mean time, you could either use plt.bar (as I mentioned in the comments), or manually setting snapping on for the first histogram patch:
counts, edges, patches = plt.histogram(t, bins)
patches[0].set_snap(True)

Related

Rotate Matplotlib x-axis label text

I have a plot that looks like this (this is the famous Wine dataset):
As you can see, the x-axis labels overlap and thus I need to be rotated.
NB! I am not interested in rotating the x-ticks (as explained here), but the label text, i.e. alcohol, malic_acid, etc.
The logic of creating the plot is the following: I create a grid using axd = fig.subplot_mosaic(...) and then for the bottom plots I set the labels with axd[...].set_xlabel("something"). Would be great if set_xlabel would take a rotation parameter, but unfortunately that is not the case.
Based on the documentation set_xlabel accepts text arguments, of which rotation is one.
The example I used to test this is shown below, though .
import matplotlib.pyplot as plt
import numpy as np
plt.plot()
plt.gca().set_xlabel('Test', rotation='vertical')

Can matplotlib only update the newest point to the figure?

Is it possible for matplotlib only update the newest point to the figure instead of re-draw the whole figure?
For example: this may be the fastest way for dynamic plotting
initiate:
fig1 = Figure(figsize = (8.0,8.0),dpi = 100)
axes1 = fig1.add_subplot(111)
line1, = axes1.plot([],[],animated = True)
when new data is coming:
line1.set_data(new_xarray,new_yarray)
axes1.draw_artist(line1)
fig1.canvas.update()
fig1.canvas.flush_events()
But this will re-draw the whole figure! I'm think whether this is possible:
when new data is coming:
axes1.draw_only_last_point(new_x,new_y)
update_the_canvas()
It will only add this new point(new_x,new_y) to the axes instead of re-draw every point.
And if you know which graphic library for python can do that, please answer or comment, thank you so much!!!!!
Really appreciate your help!
Is only redrawing the entire figure the problem, i.e. it is ok to redraw the line itself as long as the figure is unchanged? Is the data known beforehand?
If the answer to those questions are NO, and YES, then it might be worth looking into the animate-class for matplotlib. One example where the data is known beforehand, but the points are plotted one by one is this example. In the example, the figure is redrawn if the newest point is outside of the current x-lim. If you know the range of your data you can avoid it by setting the limits beforehand.
You might also want to look into this answer, the animate example list or the animate documentation.
this is my (so far) little experience.
I started some month ago with Python(2.x) and openCV (2.4.13) as graphic library.I found in may first project that openCV for python works with numpy structure as much as matplotlib and (with slight difference) they can work together.
I had to update some pixel after some condition. I first did my elaboration from images with opencv obtaining a numpy 2D array, like a matrix.
The trick is: opencv mainly thinks about input as images, in terms of X as width first, then Y as height. The numpy structure wants rows and columns wich in fact is Y before X.
With this in mind I updated pixel by pixel the image-matrix A and plot it again with a colormap
import matplotlib as plt
import cv2
A = cv2.imread('your_image.png',0) # 0 means grayscale
# now you loaded an image in a numpy array A
for every new x,y pixel
A[y,x] = new pixel intensity value
plot = plt.imshow(A, 'CMRmap')
plt.show()
If you want images again, consider use this
import matplotlib.image as mpimg
#previous code
mpimg.imsave("newA.png", A)
If you want to work with colors remember that images in colour are X by Y by 3 numpy array but matplotlib has RGB as the right order of channels, openCv works with BGR order. So
C = cv2.imread('colour_reference.png',1) # 1 means BGR
A[y,x,0] = newRedvalue = C[y,x][2]
A[y,x,1] = newGreenvalue = C[y,x][1]
A[y,x,2] = newBluevalue = C[y,x][0]
I hope this will help you in some way

How to increase plot y-range iwith matplotlib? [duplicate]

I would like to plot a set of points using pyplot in matplotlib but have none of the points be on the edge of my axes. The autoscale (or something) sets the xlim and ylim such that often the first and last points lie at x = xmin or xmax making it difficult to read in some situations.
This is more often problematic with loglog() or semilog() plots because the autoscale would like xmin and xmax to be exact powers of ten, but if my data contains only three points, e.g. at xdata = [10**2,10**3,10**4] then the first and last points will lie on the border of the plot.
Attempted Workaround
This is my solution to add a 10% buffer to either side of the graph. But is there a way to do this more elegantly or automatically?
from numpy import array, log10
from matplotlib.pyplot import *
xdata = array([10**2,10**3,10**4])
ydata = xdata**2
figure()
loglog(xdata,ydata,'.')
xmin,xmax = xlim()
xbuff = 0.1*log10(xmax/xmin)
xlim(xmin*10**(-xbuff),xmax*10**(xbuff))
I am hoping for a one- or two-line solution that I can easily use whenever I make a plot like this.
Linear Plot
To make clear what I'm doing in my workaround, I should add an example in linear space (instead of log space):
plot(xdata,ydata)
xmin,xmax = xlim()
xbuff = 0.1*(xmax-xmin)
xlim(xmin-xbuff,xmax+xbuff))
which is identical to the previous example but for a linear axis.
Limits too large
A related problem is that sometimes the limits are too large. Say my data is something like ydata = xdata**0.25 so that the variance in the range is much less than a decade but ends at exactly 10**1. Then, the autoscale ylim are 10**0 to 10**1 though the data are only in the top portion of the plot. Using my workaround above, I can increase ymax so that the third point is fully within the limits but I don't know how to increase ymin so that there is less whitespace at the lower portion of my plot. i.e., the point is that I don't always want to spread my limits apart but would just like to have some constant (or proportional) buffer around all my points.
#askewchan I just succesfully achieved how to change matplotlib settings by editing matplotlibrc configuration file and running python directly from terminal. Don't know the reason yet, but matplotlibrc is not working when I run python from spyder3 (my IDE). Just follow steps here matplotlib.org/users/customizing.html.
1) Solution one (default for all plots)
Try put this in matplotlibrc and you will see the buffer increase:
axes.xmargin : 0.1 # x margin. See `axes.Axes.margins`
axes.ymargin : 0.1 # y margin See `axes.Axes.margins`
Values must be between 0 and 1.
Obs.: Due to bugs, scale is not correctly working yet. It'll be fixed for matplotlib 1.5 (mine is 1.4.3 yet...). More info:
axes.xmargin/ymargin rcParam behaves differently than pyplot.margins() #2298
Better auto-selection of axis limits #4891
2) Solution two (individually for each plot inside the code)
There is also the margins function (for put directly in the code). Example:
import numpy as np
from matplotlib import pyplot as plt
t = np.linspace(-6,6,1000)
plt.plot(t,np.sin(t))
plt.margins(x=0.1, y=0.1)
plt.savefig('plot.png')
Obs.: Here scale is working (0.1 will increase 10% of buffer before and after x-range and y-range).
A similar question was posed to the matplotlib-users list earlier this year. The most promising solution involves implementing a Locator (based on MaxNLocator in this case) to override MaxNLocator.view_limits.

How to put data on a graph like powerlaw.plot_pdf?

I need some basic help with the powerlaw package (https://pypi.python.org/pypi/powerlaw).
I have a list of data samples.
When I use powerlaw.plot_pdf(data), I get a graph (* sorry, can't upload the graphs here as I dont have enough reputation yet).
However, when trying to create the same graph on my own (with this code):
ax.plot(data)
ax.set_yscale('log')
ax.set_xscale('log')
I get a different graph.
Why is it?
Maybe I should normalize the data first (if yes - how)?
Or do I miss something more crucial?
(If I get it right, using the powerlaw.plot_pdf(data) means ploting the data before fitting).
Another option would be to get the values of both the x and y axes that produce the graph of powerlaw.plot_pdf(data) somehow, but did not succeed with that either.
Thanks for your kind help,
Alon
Solved.
After downloading the powerlaw code (https://code.google.com/p/powerlaw/source/checkout), things become clearer.
To get the values of both the x and y axes that produce the graph of powerlaw.plot_pdf(data):
edges, hist = powerlaw.pdf(data)
Now, instead of just plotting the data on a log log axes as described in the question, we should first produce the centers of the bins. These centers will be the values on the x axis. hist contain the values of the y axis:
bin_centers = (edges[1:]+edges[:-1])/2.0
Now plot on a loglog axes:
plt.loglog(bin_centers, hist)

weird range value in the colorbar, matplotlib

I am a new user to the python & matplotlib, this might be a simple question but I searched the internet for hours and couldn't find a solution for this.
I am plotting precipitation data from which is in the NetCDF format. What I find weird is that, the data doesn't have any negative values in it.(I checked that many times,just to make sure). But the value in the colorbar starts with a negative value (like -0.0000312 etc). It doesnt make sense because I dont do any manipulations to the data, other that just selecting a part of the data from the big file and plotting it.
So my code doesn't much to it. Here is the code:
from mpl_toolkits.basemap import Basemap
import numpy as np
import matplotlib.pyplot as plt
from netCDF4 import Dataset
cd progs
f=Dataset('V21_GPCP.1979-2009.nc')
lats=f.variables['lat'][:]
lons=f.variables['lon'][:]
prec=f.variables['PREC'][:]
la=lats[31:52]
lo=lons[18:83]
pre=prec[0,31:52,18:83]
m = Basemap(width=06.e6,height=05.e6,projection='gnom',lat_0=15.,lon_0=80.)
x, y = m(*np.meshgrid(lo,la))
m.drawcoastlines()
m.drawmapboundary(fill_color='lightblue')
m.drawparallels(np.arange(-90.,120.,5.),labels=[1,0,0,0])
m.drawmeridians(np.arange(0.,420.,5.),labels=[0,0,0,1])
cs=m.contourf(x,y,pre,50,cmap=plt.cm.jet)
plt.colorbar()
The output that I got for that code was a beautiful plot, with the colorbar starting from the value -0.00001893, and the rest are positive values, and I believe are correct. Its just the minimum value thats bugging me.
I would like to know:
Is there anything wrong in my code? cos I know that the data is right.
Is there a way to manually change the value to 0?
Is it right for the values in the colorbar to change everytime we run the code, cos for the same data, the next time I run the code, the values go like this " -0.00001893, 2.00000000, 4.00000000, 6.00000000 etc"
I want to customize them to "0.0, 2.0, 4.0, 6.0 etc"
Thanks,
Vaishu
Yes, you can manually format everything about the colorbar. See this:
import matplotlib.colors as mc
import matplotlib.pyplot as plt
plt.imshow(X, norm=mc.Normalize(vmin=0))
plt.colorbar(ticks=[0,2,4,6], format='%0.2f')
Many plotting functions including imshow, contourf, and others include a norm argument that takes a Normalize object. You can set the vmin or vmax attributes of that object to adjust the corresponding values of the colorbar.
colorbar takes the ticks and format arguments to adjust which ticks to display and how to display them.

Categories

Resources