Not sure if this question has been asked before–I looked through similar examples and they weren't exactly what I need to do.
I have an array of positions (shape = (8855470, 3)) in a cube with physical coordinates in between 0 and 787.5. These positions represent point masses in some space. Here's a look at the first three entries of this array:
array([[224.90635586, 720.494766 , 19.40263367],
[491.25279546, 41.26026654, 7.35436416],
[407.70436788, 340.32618713, 328.88192913]])
I want to split this giant cube into a number of smaller cubes. For example, if I wanted to split it on each side into 10 cubes, making 1,000 subcubes total, then each subcube would contain only the points that have positions within that subcube. I have been experimenting with np.meshgrid to create the 3D grid necessary to conditionally apportion the appropriate entries of the positions array to subcubes:
split = np.arange(0.,(787.5+787.5/10.),step=787.5/10.)
xg,yg,zg = np.meshgrid(split,split,split,indexing='ij')
But I'm not sure if this is the way to go about this.
Let me know if this question is too vague or if you need any additional information.
For sake of problem I will work with toy data. I think you're near with the meshgrid. Here's a propossal
Create grid but with points until 757.5 not included, with values as you did in arange.
Reshape then to have a 1d_array. for in arrays zip to get masks with the cube shape.
create a list to save all subcube points.
import numpy as np
data = np.random.randint(0,787,( 10000,3))
start = 0
end = 787.5
step = (end-start)/10
split = np.arange(start,end,step)
xg,yg,zg = np.meshgrid(split,split,split,indexing='ij')
xg = xg.reshape(-1)
yg = yg.reshape(-1)
zg = zg.reshape(-1)
subcube_data = []
for x,y,z in zip(xg,yg,zg):
mask_x = (x<= data[:,0] ) * ( data[:,0] < x+step) #data_x between start and end for this subcube
mask_y = (y<= data[:,1] ) * ( data[:,1] < y+step) #data_y between start and end for this subcube
mask_z = (z<= data[:,2] ) * ( data[:,2] < z+step) #data_z between start and end for this subcube
mask = mask_x * mask_y * mask_z
subcube_data.append(data[mask])
Now you will have a list with 1000 elements where each element is a sub_cube containing an Nx3 point list. If you want to recover the 3d index corresponding to every sub_cube[i] you just could do [xg[i],yg[i],zg[i]].
Last you can plot to see some of the sub_cubes and the rest of data
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
#plot data as 3d scatter border black
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
#plot subcubes 0 1 2 3 4 in colors
for i in range(5):
ax.scatter(subcube_data[i][:,0],
subcube_data[i][:,1],
subcube_data[i][:,2], marker='o', s=2)
for i in range(5,len(subcube_data)):
ax.scatter(subcube_data[i][:,0],
subcube_data[i][:,1],
subcube_data[i][:,2],marker='o', s=1, color='black')
Related
I'm very very new to Python, i usually do my animations with AfterEffects, but it requires a lot of computation time for quite simple things.
• So I would like to create this kind of animation (or at least image) :
AfterEffects graph (forget the shadows, i don't really need it at this point)
Those are circles merging together as they collide, one of them being highlighted (the orange one).
• For now i only managed to do the "merging thing" computing a "distance map" and ploting a contour line :
Python + Matplotlib graph with the following code :
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
part_size = 0.0002
nb_part = 200
mesh_res = 500 # resolution of grid
x = np.linspace(0, 1.9, mesh_res)
y = np.linspace(0, 1, mesh_res)
Xgrid, Ygrid = np.meshgrid(x, y)
centers = np.random.uniform(0,1,(nb_part,2)) # array filled with disks centers positions
sizes = part_size*np.ones(nb_part) # array filled whith disks sizes
#sizes = np.random.uniform(0,part_size,nb_part)
dist_map = np.zeros((mesh_res,mesh_res),float) # array to plot the contour of
for i in range(nb_part):
dist_map += sizes[i] / ((Xgrid - centers[i][0]) ** 2 + (Ygrid - centers[i][1]) ** 2) # function with (almost) value of 1 when on a cricle, so we want the contour of this array
fig, ax = plt.subplots()
contour_opts = {'levels': np.linspace(0.9, 1., 1), 'color':'red', 'linewidths': 4} # to plot only the one-ish values of contour
ax.contour(x, y, dist_map, **contour_opts)
def update(frame_number):
ax.collections = [] # reset the graph
centers[:] += 0.01*np.sin(2*np.pi*frame_number/100+np.stack((np.arange(nb_part),np.arange(nb_part)),axis=-1)) # just to move circles "randomly"
dist_map = np.zeros((mesh_res, mesh_res), float) # updating array of distances
for i in range(nb_part):
dist_map += sizes[i] / ((Xgrid - centers[i][0]) ** 2 + (Ygrid - centers[i][1]) ** 2)
ax.contour(x, y, dist_map, **contour_opts) # calculate the new contour
ani = FuncAnimation(fig, update, interval=20)
plt.show()
The result is not that bad but :
i can't figure how to highlight just one circle keeping the merging effect (ideally, the colors should merge as well, and i would like to keep the image transparency when exported)
it still requires some time to compute each frame (it is way faster than AfterEffects though), so i guess i'm still very far from using optimally python, numpy, and matplotlib. Maybe there are even libraries able to do that kind of things ? So if there is a better strategy to implement it, i'll take it.
Using the small reproducible example below, I create mask that I would then like to programatically determine the min and max x and y indices of where the mask is false (i.e., where the values are not masked). In this and the larger 'real-world' example, the masked values will always be spatially continuous - there are no 'islands' in the mask. The goal is to use the programatically determined indices to zoom into the non-masked values with imshow. I attempt to depict what I'm seeking to do in the image at the end of the post.
import numpy as np
import matplotlib.pyplot as plt
# Generate a large array
arr1 = np.random.rand(100,100)
# Generate a smaller array that will help
# set the mask used below
arr2 = np.random.rand(20,10) + 1
# Insert the smaller array into the larger
# array for demonstration purposes
arr1[60:80,10:20] = arr2
# boost a few values neighboring the inserted array for demonstration purposes
arr1[59,12] += 2
arr1[70:75,20] += 2
arr1[80,13:16] += 2
arr1[64:72,9] += 2
# For demonstration, plot arr1
fig, ax = plt.subplots(figsize=(20, 15))
im = ax.imshow(arr1)
plt.show()
# Generate a mask with an example condition
mask = arr1 < 1
Using the mask, how does one determine what the values of x_min, x_max, y_min, & y_max in the following line of code should be
im = ax.imshow(arr1[y_min:y_max, x_min:x_max])
such that the imshow would be zoomed in to where the red box is on the following figure? As long as I don't have my wires crossed, I think the answer for this small example would be y_min=59, y_max=80, x_min=9, & x_max=20
The following code should work:
y, x = np.where(~mask) # ~ negates the boolean array
x_min = x.min()
x_max = x.max()
y_min = y.min()
y_max = y.max()
plt.imshow(arr1[y_min:y_max+1, x_min:x_max+1])
I would like to plot in 3D with Pandas / MatplotLib (Wireframe or other, I do not care) but in a specific way..
I'm using RFID sensors and I'm trying to record the signal I receive at different distance + different angles. And I want to see the correlation between the rising of the distance and the angle.
So that's why I want to plot in 3D :
X Axis -> the Distance, Y Axis -> the Angle, Z Axis -> the signal received which means a float
My CSV file from where I generate my DataFrame is organized like this a double entry table :
Distance;0;23;45;90;120;180
0;-53.145;-53.08;-53.1;-53.035;-53.035;-53.035
5;-53.145;-53.145;-53.05;-53.145;-53.145;-53.145
15;-53.145;-53.145;-53.145;-53.145;-53.145;-53.145
25;-53.145;-52.145;-53.145;-53.002;-53.145;-53.145
40;-53.145;-53.002;-51.145;-53.145;-54.255;-53.145
60;-53.145;-53.145;-53.145;-53.145;-53.145;-53.145
80;-53.145;-53.145;-53.145;-53.145;-60;-53.145
100;-53.145;-52;-53.145;-54;-53.145;-53.145
120;-53.145;-53.145;-53.145;-53.145;-53.002;-53.145
140;-51.754;-53.145;-51.845;-53.145;-53.145;-53.145
160;-53.145;-53.145;-49;-53.145;-53.145;-53.145
180;-53.145;-53.145;-53.145;-53.145;-53.145;-53.002
200;-53.145;-53.145;-53.145;-53.145;-53.145;-53.145
On the first label row we've different angles : 0°, 23°, 45°, ...
And the index of the DataFrame is the distance : 0 cm, 15 cm...
And the matrix inside represents the signal, so, values of Z Axis...
But I do not know how to generate a 3D Scatter, WireFrame... because in every tutorial I see people that use specific columns as axis.
Indeed, in my CSV file on the first row I've the label of all columns
Distance;0 ;23 ;45 ;90 ;120;180
And I do not know how to generate a 3D plot with a double entry table.
Do you know how to do it ? Or, to generate my CSV file in a better way to see the same result at the end !
I would be grateful if you would help me about this !
Thank you !
maybe contour is enough
b = np.array([0,5,15,25,40,60,80,100,120,140,160,180,200])
a = np.array([0,23,45,90,120,180])
x, y = np.meshgrid(a, b)
z = np.random.randint(-50,-40, (x.shape))
scm = plt.contourf(x, y, z, cmap='inferno')
plt.colorbar(scm)
plt.xticks(a)
plt.yticks(b)
plt.xlabel('Distance')
plt.ylabel('Angle')
plt.show()
displays
You can get a contour plot with something like this (but for the data shown it is not very interesting since all the values are constant at -45):
df = pd.read_csv(sep=';')
df = df.set_index('Distance')
x = df.index
y = df.columns.astype(int)
z = df.values
X,Y = np.meshgrid(x,y)
Z = z.T
plt.contourf(X,Y,Z,cmap='jet')
plt.colorbar()
plt.show()
Welcome to stackoverflow, your question can be split into several steps:
Step 1 - read the data
I have stored your data in a file called data.txt.
I don't know Pandas very well but this can also be handled with the nice simple function of Numpy called loadtxt. Your data is a bit problematic because of the text 'Distance' value in the first column and first row. But don't panic we load the file as a matrix of strings:
raw_data = np.loadtxt('data.txt', delimiter=';', dtype=np.string_)
Step 2 - transform the raw data
To extract the wanted data from the raw data we can do the following:
angle = raw_data[0 , 1:].astype(float)
distance = raw_data[1:, 0 ].astype(float)
data = raw_data[1:, 1:].astype(float)
With indexing the raw data we select the data that we want and with astype we change the string values to numbers.
Intermediate step - making the data a bit fancier
Your data was a bit boring, only the value -45, i took the liberty to make it a bit fancier:
data = (50 + angle[np.newaxis,:]) / (10 + np.sqrt(distance[:,np.newaxis]))
Step 4 - make a wireframe plot
The example at matplotlib.org looks easy enough:
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_wireframe(X, Y, Z)
plt.show()
But the trick is to get the X, Y, Z parameters right...
Step 3 - make the X and Y data
The Z data is simply our data values:
Z = data
The X and Y should also be 2D array's such that plot_wireframe can find the x and y for each value of Z in the 2D arrays X an Y at the same array locations. There is a Numpy function to create these 2D array's:
X, Y = np.meshgrid(angle, distance)
Step 5 - fancing it up a bit
ax.set_xticks(angle)
ax.set_yticks(distance[::2])
ax.set_xlabel('angle')
ax.set_ylabel('distance')
Putting it together
All steps together in the right order:
# necessary includes...
from mpl_toolkits.mplot3d import axes3d
import matplotlib.pyplot as plt
import numpy as np
raw_data = np.loadtxt('data.txt', delimiter=';', dtype=np.string_)
angle = raw_data[0 , 1:].astype(float)
distance = raw_data[1:, 0 ].astype(float)
data = raw_data[1:, 1:].astype(float)
# make the example data a bit more interesting...
data = (50 + angle[np.newaxis,:]) / (10 + np.sqrt(distance[:,np.newaxis]))
# setting up the plot
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
# the trickey part creating the data that plot_wireframe wants
Z = data
X, Y = np.meshgrid(angle, distance)
ax.plot_wireframe(X, Y, Z)
# fancing it up a bit
ax.set_xticks(angle)
ax.set_yticks(distance[::2])
ax.set_xlabel('angle')
ax.set_ylabel('distance')
# and showing the plot ...
plt.show()
Is there anyway to increase the number of arrowheads on a matplotlib streamplot? Right now it appears as if three is only one arrowhead per streamline, which is a problem if I want to change to x/y axes limits to zoom in on the data.
Building on #Richard_wth's answer, I wrote a function to provide control on the location of the arrows on a streamplot. One can choose n arrows per streamline, or choose to have the arrows equally spaced on a streamline.
First, you do a normal streamplot, until you are happy with the location and number of streamlines. You keep the returned argument sp. For instance:
sp = ax.streamplot(x,y,u,v,arrowstyle='-',density=10)
What's important here is to have arrowstyle='-' so that arrows are not displayed.
Then, you can call the function streamQuiver (provided below) to control the arrows on the each streamline. If you want 3 arrows per streamline:
streamQuiver(ax, sp, n=3, ...)
If you want a streamline every 1.5 curvilinear length:
streamQuiver(ax, sp, spacing=1.5, ...)
where ... are options that would be passed to quiver.
The function streamQuiver is probably not fully bulletproof and may need some additional handling for particular cases. It relies on 4 subfunctions:
curve_coord to get the curvilinear length along a path
curve extract to extract equidistant point along a path
seg_to_lines to convert the segments from streamplot into continuous lines. There might be a better way to do that!
lines_to_arrows: this is the main function that extract arrows on each lines
Here's an example where the arrows are at equidistant points on each streamlines.
import numpy as np
import matplotlib.pyplot as plt
def streamQuiver(ax,sp,*args,spacing=None,n=5,**kwargs):
""" Plot arrows from streamplot data
The number of arrows per streamline is controlled either by `spacing` or by `n`.
See `lines_to_arrows`.
"""
def curve_coord(line=None):
""" return curvilinear coordinate """
x=line[:,0]
y=line[:,1]
s = np.zeros(x.shape)
s[1:] = np.sqrt((x[1:]-x[0:-1])**2+ (y[1:]-y[0:-1])**2)
s = np.cumsum(s)
return s
def curve_extract(line,spacing,offset=None):
""" Extract points at equidistant space along a curve"""
x=line[:,0]
y=line[:,1]
if offset is None:
offset=spacing/2
# Computing curvilinear length
s = curve_coord(line)
offset=np.mod(offset,s[-1]) # making sure we always get one point
# New (equidistant) curvilinear coordinate
sExtract=np.arange(offset,s[-1],spacing)
# Interpolating based on new curvilinear coordinate
xx=np.interp(sExtract,s,x);
yy=np.interp(sExtract,s,y);
return np.array([xx,yy]).T
def seg_to_lines(seg):
""" Convert a list of segments to a list of lines """
def extract_continuous(i):
x=[]
y=[]
# Special case, we have only 1 segment remaining:
if i==len(seg)-1:
x.append(seg[i][0,0])
y.append(seg[i][0,1])
x.append(seg[i][1,0])
y.append(seg[i][1,1])
return i,x,y
# Looping on continuous segment
while i<len(seg)-1:
# Adding our start point
x.append(seg[i][0,0])
y.append(seg[i][0,1])
# Checking whether next segment continues our line
Continuous= all(seg[i][1,:]==seg[i+1][0,:])
if not Continuous:
# We add our end point then
x.append(seg[i][1,0])
y.append(seg[i][1,1])
break
elif i==len(seg)-2:
# we add the last segment
x.append(seg[i+1][0,0])
y.append(seg[i+1][0,1])
x.append(seg[i+1][1,0])
y.append(seg[i+1][1,1])
i=i+1
return i,x,y
lines=[]
i=0
while i<len(seg):
iEnd,x,y=extract_continuous(i)
lines.append(np.array( [x,y] ).T)
i=iEnd+1
return lines
def lines_to_arrows(lines,n=5,spacing=None,normalize=True):
""" Extract "streamlines" arrows from a set of lines
Either: `n` arrows per line
or an arrow every `spacing` distance
If `normalize` is true, the arrows have a unit length
"""
if spacing is None:
# if n is provided we estimate the spacing based on each curve lenght)
spacing = [ curve_coord(l)[-1]/n for l in lines]
try:
len(spacing)
except:
spacing=[spacing]*len(lines)
lines_s=[curve_extract(l,spacing=sp,offset=sp/2) for l,sp in zip(lines,spacing)]
lines_e=[curve_extract(l,spacing=sp,offset=sp/2+0.01*sp) for l,sp in zip(lines,spacing)]
arrow_x = [l[i,0] for l in lines_s for i in range(len(l))]
arrow_y = [l[i,1] for l in lines_s for i in range(len(l))]
arrow_dx = [le[i,0]-ls[i,0] for ls,le in zip(lines_s,lines_e) for i in range(len(ls))]
arrow_dy = [le[i,1]-ls[i,1] for ls,le in zip(lines_s,lines_e) for i in range(len(ls))]
if normalize:
dn = [ np.sqrt(ddx**2 + ddy**2) for ddx,ddy in zip(arrow_dx,arrow_dy)]
arrow_dx = [ddx/ddn for ddx,ddn in zip(arrow_dx,dn)]
arrow_dy = [ddy/ddn for ddy,ddn in zip(arrow_dy,dn)]
return arrow_x,arrow_y,arrow_dx,arrow_dy
# --- Main body of streamQuiver
# Extracting lines
seg = sp.lines.get_segments() # list of (2, 2) numpy arrays
lines = seg_to_lines(seg) # list of (N,2) numpy arrays
# Convert lines to arrows
ar_x, ar_y, ar_dx, ar_dy = lines_to_arrows(lines,spacing=spacing,n=n,normalize=True)
# Plot arrows
qv=ax.quiver(ar_x, ar_y, ar_dx, ar_dy, *args, angles='xy', **kwargs)
return qv
# --- Example
x = np.linspace(-1,1,100)
y = np.linspace(-1,1,100)
X,Y=np.meshgrid(x,y)
u = -np.sin(np.arctan2(Y,X))
v = np.cos(np.arctan2(Y,X))
xseed=np.linspace(0.1,1,4)
fig=plt.figure()
ax=fig.add_subplot(111)
sp = ax.streamplot(x,y,u,v,color='k',arrowstyle='-',start_points=np.array([xseed,xseed*0]).T,density=30)
qv = streamQuiver(ax,sp,spacing=0.5, scale=60)
plt.show()
I'm not sure about just increasing the number of arrowheads - but you can increase the density of streamlines with the density parameter in the streamplot function, here's the documentation:
*density* : float or 2-tuple
Controls the closeness of streamlines. When `density = 1`, the domain
is divided into a 30x30 grid---*density* linearly scales this grid.
Each cell in the grid can have, at most, one traversing streamline.
For different densities in each direction, use [density_x, density_y].
Here is an example:
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(0,20,1)
y = np.arange(0,20,1)
u=np.random.random((x.shape[0], y.shape[0]))
v=np.random.random((x.shape[0], y.shape[0]))
fig, ax = plt.subplots(2,2)
ax[0,0].streamplot(x,y,u,v,density=1)
ax[0,0].set_title('Original')
ax[0,1].streamplot(x,y,u,v,density=4)
ax[0,1].set_xlim(5,10)
ax[0,1].set_ylim(5,10)
ax[0,1].set_title('Zoomed, higher density')
ax[1,1].streamplot(x,y,u,v,density=1)
ax[1,1].set_xlim(5,10)
ax[1,1].set_ylim(5,10)
ax[1,1].set_title('Zoomed, same density')
ax[1,0].streamplot(x,y,u,v,density=4)
ax[1,0].set_title('Original, higher density')
fig.show()
I have found a way to customize the number of arrowheads on streamline plot.
The idea is to plot streamline and arrows separately:
plt.streamplot returns a stream_container with two attributes: lines and arrows. The lines contain line segments that can be used to reconstruct streamline without arrows.
plt.quiver can be used to plot gradient fields. With the proper scaling, the length of the arrows is neglectable, leaving only arrowheads.
Thus, we only need to define the positions of arrows using the line segments and pass them to plt.quiver.
Here is a toy example:
import matplotlib.pyplot as plt
from matplotlib import collections as mc
import numpy as np
# get line segments
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
sp = ax.streamplot(x, y, u, v, start_points=start_points, density=10)
seg = sps.lines.get_segments() # seg is a list of (2, 2) numpy arrays
lc = mc.LineCollection(seg, ...)
# define arrows
# here I define one arrow every 50 segments
# you could also select segs based on some criterion, e.g. intersect with certain lines
period = 50
arrow_x = np.array([seg[i][0, 0] for i in range(0, len(seg), period)])
arrow_y = np.array([seg[i][0, 1] for i in range(0, len(seg), period)])
arrow_dx = np.array([seg[i][1, 0] - seg[i][0, 0] for i in range(0, len(seg), period)])
arrow_dy = np.array([seg[i][1, 1] - seg[i][0, 1] for i in range(0, len(seg), period)])
# plot the final streamline
fig = plt.figure(figsize=(12.8, 10.8))
ax = fig.add_subplot(1, 1, 1)
ax.add_collection(lc)
ax.autoscale()
ax.quiver(
arrow_x, arrow_y, arrow_dx, arrow_dy, angles='xy', # arrow position
scale=0.2, scale_units='inches', units='y', minshaft=0, # arrow scaling
headwidth=6, headlength=10, headaxislength=9) # arrow style
fig.show()
There is more than one way to scale the arrows so that they appear to have zero length.
I use matplotlib's method hexbin to compute 2d histograms on my data.
But I would like to get the coordinates of the centers of the hexagons in order to further process the results.
I got the values using get_array() method on the result, but I cannot figure out how to get the bins coordinates.
I tried to compute them given number of bins and the extent of my data but i don't know the exact number of bins in each direction. gridsize=(10,2) should do the trick but it does not seem to work.
Any idea?
I think this works.
from __future__ import division
import numpy as np
import math
import matplotlib.pyplot as plt
def generate_data(n):
"""Make random, correlated x & y arrays"""
points = np.random.multivariate_normal(mean=(0,0),
cov=[[0.4,9],[9,10]],size=int(n))
return points
if __name__ =='__main__':
color_map = plt.cm.Spectral_r
n = 1e4
points = generate_data(n)
xbnds = np.array([-20.0,20.0])
ybnds = np.array([-20.0,20.0])
extent = [xbnds[0],xbnds[1],ybnds[0],ybnds[1]]
fig=plt.figure(figsize=(10,9))
ax = fig.add_subplot(111)
x, y = points.T
# Set gridsize just to make them visually large
image = plt.hexbin(x,y,cmap=color_map,gridsize=20,extent=extent,mincnt=1,bins='log')
# Note that mincnt=1 adds 1 to each count
counts = image.get_array()
ncnts = np.count_nonzero(np.power(10,counts))
verts = image.get_offsets()
for offc in xrange(verts.shape[0]):
binx,biny = verts[offc][0],verts[offc][1]
if counts[offc]:
plt.plot(binx,biny,'k.',zorder=100)
ax.set_xlim(xbnds)
ax.set_ylim(ybnds)
plt.grid(True)
cb = plt.colorbar(image,spacing='uniform',extend='max')
plt.show()
I would love to confirm that the code by Hooked using get_offsets() works, but I tried several iterations of the code mentioned above to retrieve center positions and, as Dave mentioned, get_offsets() remains empty. The workaround that I found is to use the non-empty 'image.get_paths()' option. My code takes the mean to find centers but which means it is just a smidge longer, but it does work.
The get_paths() option returns a set of x,y coordinates embedded that can be looped over and then averaged to return the center position for each hexagram.
The code that I have is as follows:
counts=image.get_array() #counts in each hexagon, works great
verts=image.get_offsets() #empty, don't use this
b=image.get_paths() #this does work, gives Path([[]][]) which can be plotted
for x in xrange(len(b)):
xav=np.mean(b[x].vertices[0:6,0]) #center in x (RA)
yav=np.mean(b[x].vertices[0:6,1]) #center in y (DEC)
plt.plot(xav,yav,'k.',zorder=100)
I had this same problem. I think what needs to be developed is a framework to have a HexagonalGrid object which can then be applied to many different data sets (and it would be awesome to do it for N dimensions). This is possible and it surprises me that neither Scipy or Numpy has anything for it (furthermore there seems to be nothing else like it except perhaps binify)
That said, I assume you want to use hexbinning to compare multiple binned data sets. This requires some common base. I got this to work using matplotlib's hexbin the following way:
import numpy as np
import matplotlib.pyplot as plt
def get_data (mean,cov,n=1e3):
"""
Quick fake data builder
"""
np.random.seed(101)
points = np.random.multivariate_normal(mean=mean,cov=cov,size=int(n))
x, y = points.T
return x,y
def get_centers (hexbin_output):
"""
about 40% faster than previous post only cause you're not calculating the
min/max every time
"""
paths = hexbin_output.get_paths()
v = paths[0].vertices[:-1] # adds a value [0,0] to the end
vx,vy = v.T
idx = [3,0,5,2] # index for [xmin,xmax,ymin,ymax]
xmin,xmax,ymin,ymax = vx[idx[0]],vx[idx[1]],vy[idx[2]],vy[idx[3]]
half_width_x = abs(xmax-xmin)/2.0
half_width_y = abs(ymax-ymin)/2.0
centers = []
for i in xrange(len(paths)):
cx = paths[i].vertices[idx[0],0]+half_width_x
cy = paths[i].vertices[idx[2],1]+half_width_y
centers.append((cx,cy))
return np.asarray(centers)
# important parts ==>
class Hexagonal2DGrid (object):
"""
Used to fix the gridsize, extent, and bins
"""
def __init__ (self,gridsize,extent,bins=None):
self.gridsize = gridsize
self.extent = extent
self.bins = bins
def hexbin (x,y,hexgrid):
"""
To hexagonally bin the data in 2 dimensions
"""
fig = plt.figure()
ax = fig.add_subplot(111)
# Note mincnt=0 so that it will return a value for every point in the
# hexgrid, not just those with count>mincnt
# Basically you fix the gridsize, extent, and bins to keep them the same
# then the resulting count array is the same
hexbin = plt.hexbin(x,y, mincnt=0,
gridsize=hexgrid.gridsize,
extent=hexgrid.extent,
bins=hexgrid.bins)
# you could close the figure if you don't want it
# plt.close(fig.number)
counts = hexbin.get_array().copy()
return counts, hexbin
# Example ===>
if __name__ == "__main__":
hexgrid = Hexagonal2DGrid((21,5),[-70,70,-20,20])
x_data,y_data = get_data((0,0),[[-40,95],[90,10]])
x_model,y_model = get_data((0,10),[[100,30],[3,30]])
counts_data, hexbin_data = hexbin(x_data,y_data,hexgrid)
counts_model, hexbin_model = hexbin(x_model,y_model,hexgrid)
# if you want the centers, they will be the same for both
centers = get_centers(hexbin_data)
# if you want to ignore the cells with zeros then use the following mask.
# But if want zeros for some bins and not others I'm not sure an elegant way
# to do this without using the centers
nonzero = counts_data != 0
# now you can compare the two data sets
variance_data = counts_data[nonzero]
square_diffs = (counts_data[nonzero]-counts_model[nonzero])**2
chi2 = np.sum(square_diffs/variance_data)
print(" chi2={}".format(chi2))