I would like to plot a large sample stored in the arrays a and b with matplotlib's hist2d feature. However, generating H, xedges, yedges, img does not work directly for this data, as it uses too much memory. It works for half the number of samples, though, so I would like to do something like
H_1, xedges_1, yedges_1, img_1 = plt.hist2d(a[:len(a)/2], b[:len(b)/2], bins = 10)
followed by
H_2, xedges_2, yedges_2, img_2 = plt.hist2d(a[len(a)/2:], b[len(b)/2:], bins = 10)
While perhaps deleting the first half of the arrays after calculating the first set of variables. Is there a way to merge these two sets of variables and generate a combined plot for the data?
If (and only if!) you specify the bin edges manually, then your histograms will be compatible. You can simply add the occurences of each bin for both subsets, and you'll recover the full histogram:
import numpy as np
import matplotlib.pyplot as plt
a=np.random.rand(200)*10
b=np.random.rand(200)*10
binmin=min(a.min(),b.min())
binmax=max(a.max(),b.max())
H_1, xedges_1, yedges_1, img_1 = plt.hist2d(a[:len(a)/2], b[:len(b)/2], bins = np.linspace(binmin,binmax,10+1))
H_2, xedges_2, yedges_2, img_2 = plt.hist2d(a[len(a)/2:], b[len(b)/2:], bins = np.linspace(binmin,binmax,10+1))
H_3, xedges_3, yedges_3, img_3 = plt.hist2d(a, b, bins = np.linspace(binmin,binmax,10+1))
Result:
In [150]: (H_1+H_2==H_3).all()
Out[150]: True
Which you can easily plot using plt.pcolor. That's what hist2d seems to use, albeit with an additional transpose of the data:
plt.figure()
plt.pcolor((H_1+H_2).T)
img_3 (left) vs (H_1+H_2).T (right):
Related
I have tens of thousands of images. I want to generate a histogram for each pixel. I have come up with the following code using NumPy to do this that works:
import numpy as np
import matplotlib.pyplot as plt
nimages = 1000
im_shape = (64,64)
nbins = 100
#predefine the histogram bins
hist_bins = np.linspace(0,1,nbins)
#create an array to store histograms for each pixel
perpix_hist = np.zeros((64,64,nbins))
for ni in range(nimages):
#create a simple image with normally distributed pixel values
im = np.random.normal(loc=0.5,scale=0.05,size=im_shape)
#sort each pixel into the predefined histogram
bins_for_this_image = np.searchsorted(hist_bins, im.ravel())
bins_for_this_image = bins_for_this_image.reshape(im_shape)
#this next part adds one to each of those bins
#but this is slow as it loops through each pixel
#how to vectorize?
for i in range(im_shape[0]):
for j in range(im_shape[1]):
perpix_hist[i,j,bins_for_this_image[i,j]] += 1
#plot histogram for a single pixel
plt.plot(hist_bins,perpix_hist[0,0])
plt.xlabel('pixel values')
plt.ylabel('counts')
plt.title('histogram for a single pixel')
plt.show()
I would like to know if anyone can help me vectorize the for loops? I can't think of how to index into the perpix_hist array properly. I have tens/hundreds of thousands of images and each image is ~1500x1500 pixels, and this is too slow.
You can vectorize it using np.meshgrid and providing indices for first, second and third dimension (the last dimension you already have).
y_grid, x_grid = np.meshgrid(np.arange(64), np.arange(64))
for i in range(nimages):
#create a simple image with normally distributed pixel values
im = np.random.normal(loc=0.5,scale=0.05,size=im_shape)
#sort each pixel into the predefined histogram
bins_for_this_image = np.searchsorted(hist_bins, im.ravel())
bins_for_this_image = bins_for_this_image.reshape(im_shape)
perpix_hist[x_grid, y_grid, bins_for_this_image] += 1
I wrote a function with this purpose:
to create a matplotlib figure, but not display it
with no frames, axes, etc.
to plot in the figure an input 2D array using a user-passed colormap
to save the colormapped 2D array from the canvas to a numpy array
that the output array should be the same size as the input
There are lots of questions with answers for tasks similar to either points 1-2 or point 4; for me it was also important to automate point 5. So I started by combining parts from both #joe-kington 's answer and from #matehat 's answer and comments to it, and with small modifications I got to this:
def mk_cmapped_data(data, mpl_cmap_name):
# This is to define figure & ouptput dimensions from input
r, c = data.shape
dpi = 72
w = round(c/dpi, 2)
h = round(r/dpi, 2)
# This part modified from #matehat's SO answer:
# https://stackoverflow.com/a/8218887/1034648
fig = plt.figure(frameon=False)
fig.set_size_inches((w, h))
ax = plt.Axes(fig, [0., 0., 1., 1.])
ax.set_axis_off()
fig.add_axes(ax)
plt.set_cmap(mpl_cmap_name)
ax.imshow(data, aspect='auto', cmap = mpl_cmap_name, interpolation = 'none')
fig.canvas.draw()
# This part is to save the canvas to numpy array
# Adapted rom Joe Kington's SO answer:
# https://stackoverflow.com/a/7821917/1034648
mat = np.frombuffer(fig.canvas.tostring_rgb(), dtype=np.uint8)
mat = mat.reshape(fig.canvas.get_width_height()[::-1] + (3,))
mat = normalise(mat) # this is just using a helper function to normalize output range
plt.close(fig=None)
return mat
The function does what it is supposed to do and is fast enough.
My question is whether I can make it more efficient and or more pythonic in any way.
If you're wanting RGB output that exactly matches the shape of the input array, it's probably easiest to not create a figure, and instead use the colormap objects directly. For example:
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
# Random data with a non 0-1 range.
data = 500 * np.random.random((100, 100)) - 200
# We'll use `LinearSegementedColormap` and `Normalize` instances directly
cmap = plt.get_cmap('viridis')
norm = plt.Normalize(data.min(), data.max())
# The norm instance scales data to a 0-1 range, cmap makes it RGB
rgb = cmap(norm(data))
# MPL uses a 0-1 float RGB representation, so we'll scale to 0-255
rgb = (255 * rgb).astype(np.uint8)
Image.fromarray(rgb).save('test.png')
Note that you likely don't want the additional step of saving it as a PNG, but I wanted to be able to show the result visually. This is exactly a 100x100 image where each pixel corresponds to the original input data.
This is what matplotlib does behind-the-scenes when you call imshow. The data is first run through a Normalize instance to scale it from its original range to 0-1. Then any Colormap instance can be called directly with the 0-1 results to turn the scalar data into RGB data.
One letter variables are hard to understand.
Change:
r -> n_rows
c -> n_cols
w -> width
h -> height
I have a 2D dimensional histogram having bin size 10. I wish to know whether there is a numpy function (or any faster method) to obtain what points lie in each bin in the 2d grid. Is there a way to access the bin elements?
I hope this solve your problem. However, I believe other can improve my code because I am new in python.
Create Histogram with matplotlib
import matplotlib.pyplot as plt
rng = np.random.RandomState(10) # deterministic random data
a = np.hstack((rng.normal(size=100), rng.normal(loc=5, scale=2, size=1000)))
n ,bins ,patches = plt.hist(a, bins=10) # arguments are passed to np.histogram
plt.title("Histogram with '10' bins")
plt.show()
Reshape arrays and..
newbin = np.repeat(np.reshape(bins,(-1, len(bins))), a.shape, axis=0)
newa = np.repeat(np.reshape(a,(len(a),-1)),len(bins),axis=1)
#index_bin = (np.where(newbin[:,0] >np.reshape(a,(1,-1))[:,0] ) )[0][0]
index_bin = (newbin>newa).argmax(axis=1).T
test
print(a[0] , bins)
print(index_bin[0])
Output
1.331586504129518 [-2.13171211 -0.88255884 0.36659444 1.61574771 2.86490098 4.11405425
5.36320753 6.6123608 7.86151407 9.11066734 10.35982062]
3
I linearly interpolate and after that contour data. For calculations I use float type because I do not know how many decimals will be in input data. Sometimes it might be no decimals, sometimes one or over 10.
Unfortunately because of using float after interpolation and contouring of same values I get unwanted artifacts. How can I fix my code to not produce contour artifacts where there should not be any?
Simple code example:
import numpy as np
from scipy.interpolate import griddata
import matplotlib.pyplot as plt
interval_in = np.linspace(1, 100, 10)
interval_out = np.linspace(1, 100, 100)
xin, yin = np.meshgrid(interval_in, interval_in)
zin = np.ones((10, 10))*10
xout, yout = np.meshgrid(interval_out, interval_out)
zout = griddata((xin.flatten(),yin.flatten()),zin.flatten(),(xout,yout),method='linear')
contours = plt.contour(xout, yout, zout, levels=[10])
plt.show()
With your example the zout should be all 10. but actually varies between 9.9999999999999982 and 10.000000000000002, so contour is trying to plot this. You can use numpy rounding to a given precision,
zout_ = np.round_(zout, decimals=3)
contours = plt.contour(xout, yout, zout_, levels=[10])
plt.show()
although, if your data has a large range, contour should work correctly...
I have four hexbin plots which have all been normalized. How do I add them together to make one big distribution?
I have tried concatenating the input vectors and then creating the hexbin plot, but this throws off the normalization of the individual distributions:
So how do I add the individual hexbin distributions whilst still maintainging the induvidual normalization?
The relevant part of my code is:
def hex_plot(x,y,max_v):
bounds = [0,max_v*m.exp(-(3**2)/2),max_v*m.exp(-2),max_v*m.exp(-0.5),max_v] # The sigma bounds
norm = mpl.colors.BoundaryNorm(bounds, ncolors=4)
hex_ = plt.hexbin(x, y, C=None, gridsize=gridsize,reduce_C_function=np.mean,cmap=cmap,mincnt=1,norm=norm)
print "Hex plot max: ",hex_.norm.vmax
return hex_
gridsize=50
cmap = mpl.colors.ListedColormap(['grey','#6A92D4','#1049A9','#052C6E'])
hex_plot(x_tot,y_tot,34840)
Thank you.
I've written a bit of code that does what you're after. From the snippet in your question, it looks like you already know the height (max_v) of your distribution given your binning scheme, so I worked under that assumption. Depending on the data you're applying this to, this might not actually be the case, in which case the following will fail (it's only as good as your guess/knowledge of the height of the distributions). For the purposes of my example data, I've just taken a reasonable guess (based on a quick plot) for the values of max_v1 and max_v2. Switching the c1 and c2 I've defined for the commented versions should reproduce your original problem.
import scipy
import matplotlib.pyplot as pyplot
import matplotlib.colors
import math
#need to know the height of the distributions a priori
max_v1 = 850 #approximate height of distribution 1 (defined below) with binning defined below
max_v2 = 400 #approximate height of distribution 2 (defined below) with binning defined below
max_v = max(max_v1,max_v2)
#make 2 differently sized datasets (so will require different normalizations)
#all normal distributions with assorted means/variances
x1 = scipy.randn(50000)/6.0+0.5
y1 = scipy.randn(50000)/3.0+0.5
x2 = scipy.randn(100000)/2.0-0.5
y2 = scipy.randn(100000)/2.0-0.5
#c1 = scipy.ones(len(x1)) #I don't assign meaningful weights here
#c2 = scipy.ones(len(x2)) #I don't assign meaningful weights here
c1 = scipy.ones(len(x1))*(max_v/max_v1) #highest distribution: no net change in normalization here
c2 = scipy.ones(len(x2))*(max_v/max_v2) #renormalized to same height as highest distribution
#define plot boundaries
xmin=-2.0
xmax=2.0
ymin=-2.0
ymax=2.0
#custom colormap
cmap = matplotlib.colors.ListedColormap(['grey','#6A92D4','#1049A9','#052C6E'])
#the bounds of 1sigma, 2sigma, etc. regions
bounds = [0,max_v*math.exp(-(3**2)/2),max_v*math.exp(-2),max_v*math.exp(-0.5),max_v]
norm = matplotlib.colors.BoundaryNorm(bounds, ncolors=4)
#make the hexbin plot
normalized = pyplot
hexplot = normalized.subplot(111)
normalized.hexbin(scipy.concatenate((x1,x2)), scipy.concatenate((y1,y2)), C=scipy.concatenate((c1,c2)), cmap=cmap, mincnt=1, extent=(xmin,xmax,ymin,ymax),gridsize=50, reduce_C_function=scipy.sum, norm=norm) #combine distributions and weights
hexplot.axis([xmin,xmax,ymin,ymax])
cax = pyplot.axes([0.86, 0.1, 0.03, 0.85])
clims = cax.axis()
cb = normalized.colorbar(cax=cax)
cax.set_yticklabels([' ','3','2','1',' '])
normalized.subplots_adjust(wspace=0, hspace=0, bottom=0.1, right=0.78, top=0.95, left=0.12)
normalized.show()
Here's the result without the fix (commented c1 and c2 used),
and the result with the fix (code as-is);
Hope that helps.