Related
I am creating a charge smear function. I have a matrix were each row is a particle with a charge and position. I then look at each particles position in a grid, to count how many particles are in each grid-cell, but I need to know which cell each particle is in, so that I may find the average of the positions for every particle in a specific grid-cell. My idea for a fix is to create an list where the number of rows is the amount of grid-cells in my matrix, and let the column be positions in x,y and z direction, but obviously I can't append more then one number to each index, but maybe some variation will work? Sorry for open ended question. Thank you in advance
import matplotlib.pyplot as plt
import random
import numpy as np
###Initalize particle lists
particle_arrayX=[]
particle_arrayY=[]
###The resolution
N = 10
###Number of particles
M = 1000
col=None
row=None
###Size of box
Box_size=100
###gridsize
Grid_size=Box_size/N
###Initalize particles
for i in range(M):
particle_arrayX.append(random.random()*Box_size)
particle_arrayY.append(random.random()*Box_size)
###Intialize matrix
ParticleMatrix_Inital=[[0 for i in range(N)]]*N
###Measure density in each cell
for i in range(M):
col=None
row=None
#The x and y components are diveded by the gridsize
#Then they are converted to integers and then asigned either to a row or column
#If value is float with decimal 0 EX 2.0, then 1 is substracted before converted to int
coln=particle_arrayX[i]/Grid_size
rown=particle_arrayY[i]/Grid_size
if coln.is_integer()==True:
col=int(coln)-1
else:
col=int(coln)
if rown.is_integer()==True:
row=int(rown)-1
else:
row=int(rown)
ParticleMatrix_Inital=np.array(ParticleMatrix_Inital)
ParticleMatrix_Inital[row,col]=ParticleMatrix_Inital[row,col]+1
ParticleMatrix_Inital=ParticleMatrix_Inital.tolist()
#Plot matrix
plt.imshow(ParticleMatrix_Inital)
plt.colorbar()
plt.show()
Welcome to SO!
There are many ways to approach the problem of "bin-ing" empirical data. I'm proposing an object oriented (OO) solution below, because (in my subjective opinion) it provides clean, tidy and highly readable code. On the other hand, OO-solutions might not be the most efficient if you're simulating huge many-particles systems. If the below code doesn't entirely solve your issues, I still hope that parts of it can be of some help to you.
That being said, I propose implementing your grid as a class. To make life easier for our self, we may apply the convention that all particles have positive coordinates. That is x, y and even z (if introduced) stretches from 0 to whatever box_size you define. However, the class Grid does not really care about the actual box_size, only the resolution of the grid!
class Grid:
def __init__(self, _delta_x, _delta_y):
self.delta_x = _delta_x
self.delta_y = _delta_y
def bounding_cell(self, x, y):
# Note that y-coordinates corresponds to matrix rows,
# and that x-coordinates corresponds to matrix columns
return (int(y/self.delta_y), int(x/self.delta_x))
Yes, this could have been a simple function. However, as a class it is easily expandable. Also, a function would have rely on global variables (yuck!) or explicitly be given the grid spanning (delta) in each dimensional direction, for every determining of which matrix cell (or bin) the given coordinate (x,y) belongs to.
Now, how does it work? Imagine the simplest of cases, where your grid resolution is 1. Then, a particle at position (x,y) = (1.2, 4,9) should be placed in the matrix at (row,col) = (4,1). That is row = int(y/delta_y) and likewise for x. The higher resolution (smaller delta) you have, the larger the matrix gets in terms of number of rows and cols.
Now that we have a Grid, let us also object orient the Particle! Rather straight forward:
class Particle:
def __init__(self, _id, _charge, _pos_x, _pos_y):
self.id = _id
self.charge = _charge
self.pos_x = _pos_x
self.pos_y = _pos_y
def __str__(self):
# implementing the __str__ method let's us 'print(a_particle)'
return "{%d, %d, %3.1f, %3.1f}" % (self.id, self.charge, self.pos_x, self.pos_y)
def get_position(self):
return self.pos_x, self.pos_y
def get_charge(self):
return self.charge
This class is more or less just a collection of data, and could easily have been replaced by a dict. However, the class screams its intent clearly, it is clean and tidy, and also easily expanded.
Now, let's create some instances of particles! Here is a function which by list comprehension creates a list of particles with an id, charge and position (x,y):
import random
def create_n_particles(n_particles, max_pos):
return [Particle(id, # unique ID
random.randint(-1,1), # charge [-1, 0, 1]
random.random()*max_pos, # x coord
random.random()*max_pos) # y coord
for id in range(n_particles)]
And finally, we get to the fun part: putting it all together:
import numpy as np
if __name__ == "__main__":
n_particles = 1000
box_size = 100
grid_resolution = 10
grid_size = int(box_size / grid_resolution)
grid = Grid(grid_resolution, grid_resolution)
particles = create_n_particles(n_particles, box_size)
charge_matrix = np.zeros((grid_size, grid_size))
id_matrix = [[ [] for i in range(grid_size)] for j in range(grid_size)]
for particle in particles:
x, y = particle.get_position()
row, col = grid.bounding_cell(x, y)
charge_matrix[row][col] += particle.get_charge()
# The ID-matrix is similar to the charge-matrix,
# but every cell contains a new list of particle IDs
id_matrix[row][col].append(particle.id)
Notice the initialization of the ID-matrix: This is the list of particle positions for each grid cell that you asked for. It is a matrix, representing the particle container, and each cell contains a list to be filled with particle IDs. You could also populate these lists with entire particle instances (not just their IDs): id_matrix[row][col].append(particle).
The last for loop does the real work, and here the Object Oriented strategy shows us how charming it is: The loop is short and it is very easy to read and understand what is going on: A cell in the charge_matrix contains the total charge within this grid cell/bin. Meanwhile, the id_matrix is filled with the IDs of the particles that is contained within this grid cell/bin.
From the way we've constructed the list of particles, particles, we see that a particle's ID is equivalent to that particle's index in the list. Hence, they may be retrieved like this,
for i,row in enumerate(id_matrix):
for j,col in enumerate(row):
print("[%d][%d] : " % (i, j), end="")
for particle_id in id_matrix[i][j]:
p = particles[particle_id]
# do some work with 'p', or just print it:
print(p, end=", ")
print("") # print new line
# Output:
# [0][0] : {32, -1, 0.2, 0.4}, ... <-- all data of that particle
# ....
I leave optimization of this retrieval to you as I don't really know what data you need and what you're going to do with it. Maybe it's better to contain all the particles in a dict instead of a list; I don't know(?). You choose!
At the very end, I'll suggest that you use matshow which is inteded for displaying matrices, as opposed to imshow which is more aiming more for images.
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(111)
cax = ax.matshow(charge_matrix)
fig.colorbar(cax)
ax.invert_yaxis() # flip the matrix such that the y-axis points upwards
fig.savefig("charge_matrix.png")
We can also scatter plot the particles and add grid lines corresponding to our the grid in the matshow above. We color the scatter plots such that negative charges are blue, neutral are gray and positive are red.
def charge_color(charge):
if charge > 0: return 'red'
elif charge < 0: return 'blue'
else: return 'gray'
fig = plt.figure()
ax = fig.add_subplot(111)
ax.set_aspect('equal')
ax.set_xticks(np.arange(0, 101, grid_resolution))
ax.set_yticks(np.arange(0, 101, grid_resolution))
ax.grid()
ax.scatter([p.pos_x for p in particles],
[p.pos_y for p in particles],
c=[charge_color(p.get_charge()) for p in particles])
fig.savefig("particle_dist.png")
I have done a lot of searching but have yet to find an answer. I am currently working on some data of a crop field. I have PLY files for multiple fields which I have successfully read into, filtered, and visualised using Python and VTK. My main goal is to eventually segment and run analysis on individual crop plots.
However to make that task easier I first want to "Normalize" my point cloud so that all plots are essentially "on the same level". From the image I have attached you can see that the point clod slopes from one corner to its opposite. So what I want to flatten out the image so the ground points are all on the same plane/ level. And the reset of the points adjusted accordingly.
Point Cloud
I've also included my code to show how I got to this point. If anyone has any advice on how I can achieve the normalising to one plane I would be very appreciative. Sadly I cannot include my data as it is work related.
Thanks.
Josh
import vtk
from vtk.util import numpy_support
import numpy as np
filename = 'File.ply'
# Reader
r = vtk.vtkPLYReader()
r.SetFileName(filename)
# Filters
vgf = vtk.vtkVertexGlyphFilter()
vgf.SetInputConnection(r.GetOutputPort())
# Elevation
pc = r.GetOutput()
bounds = pc.GetBounds()
#print(bounds)
minz = bounds[4]
maxz = bounds[5]
#print(bounds[4], bounds[5])
evgf = vtk.vtkElevationFilter()
evgf.SetInputConnection(vgf.GetOutputPort())
evgf.SetLowPoint(0, 0, minz)
evgf.SetHighPoint(0, 0, maxz)
#pc.GetNumberOfPoints()
# Look up table
lut = vtk.vtkLookupTable()
lut.SetHueRange(0.667, 0)
lut.SetSaturationRange(1, 1)
lut.SetValueRange(1, 1)
lut.Build
# Renderer
mapper = vtk.vtkPolyDataMapper()
mapper.SetInputConnection(evgf.GetOutputPort())
mapper.SetLookupTable(lut)
actor = vtk.vtkActor()
actor.SetMapper(mapper)
renderer = vtk.vtkRenderer()
renWin = vtk.vtkRenderWindow()
renWin.AddRenderer(renderer)
iren = vtk.vtkRenderWindowInteractor()
iren.SetRenderWindow(renWin)
renderer.AddActor(actor)
renderer.SetBackground(0, 0, 0)
renWin.Render()
iren.Start()
I once solved a similar problem. Find below some code that I used back then. It uses two functions fitPlane and findTransformFromVectors that you could replace with your own implementations.
Note that there are many ways to fit a plane through a set of points. This SO post discusses compares scipy.optimize.minimize with scipy.linalg.lstsq. In another SO post, the use of PCA or RANSAC and other methods are suggested. You probably want to use methods provided by sklearn, numpy or other modules. My solution simply (and non-robustly) computes ordinary least squares regression.
import vtk
import numpy as np
# Convert vtk to numpy arrays
from vtk.util.numpy_support import vtk_to_numpy as vtk2np
# Create a random point cloud.
center = [3.0, 2.0, 1.0]
source = vtk.vtkPointSource()
source.SetCenter(center)
source.SetNumberOfPoints(50)
source.SetRadius(1.)
source.Update()
source = source.GetOutput()
# Extract the points from the point cloud.
points = vtk2np(source.GetPoints().GetData())
points = points.transpose()
# Fit a plane. nRegression contains the normal vector of the
# regression surface.
nRegression = fitPlane(points)
# Compute a transform that maps the source center to the origin and
# plane normal to the z-axis.
trafo = findTransformFromVectors(originFrom=center,
axisFrom=nRegression.transpose(),
originTo=(0,0,0),
axisTo=(0.,0.,1.))
# Apply transform to source.
sourceTransformed = vtk.vtkTransformFilter()
sourceTransformed.SetInputData(source)
sourceTransformed.SetTransform(trafo)
sourceTransformed.Update()
# Visualize output...
Here my implementations of fitPlane and findTransformFromVectors:
# The following code has been written by normanius under the CC BY-SA 4.0
# license.
# License: https://creativecommons.org/licenses/by-sa/4.0/
# Author: normanius: https://stackoverflow.com/users/3388962/normanius
# Date: October 2018
# Reference: https://stackoverflow.com/questions/52716438
def fitPlane(X, tolerance=1e-10):
'''
Estimate the plane normal by means of ordinary least dsquares.
Requirement: points X span the full column rank. If the points lie in a
perfect plane, the regression problem is ill-conditioned!
Formulas:
a = (XX^T)^(-1)*X*z
Surface normal:
n = [a[0], a[1], -1]
n = n/norm(n)
Plane intercept:
c = a[2]/norm(n)
NOTE: The condition number for the pseudo-inverse improves if the
formulation is changed to homogenous notation.
Formulas (homogenous):
a = (XX^T)^(-1)*[1,1,1]^T
n = a[:-1]
n = n/norm(n)
c = a[-1]/norm(n)
Arguments:
X: A matrix with n rows and 3 columns
tolerance: Minimal condition number accepted. If the condition
number is lower, the algorithm returns None.
Returns:
If the computation was successful, a numpy array of length three is
returned that represents the estimated plane normal. On failure,
None is returned.
'''
X = np.asarray(X)
d,N = X.shape
X = np.vstack([X,np.ones([1,N])])
z = np.ones([d+1,1])
XXT = np.dot(X, np.transpose(X)) # XXT=X*X^T
if np.linalg.det(XXT) < 1e-10:
# The test covers the case where n<3
return None
n = np.dot(np.linalg.inv(XXT), z)
intercept = n[-1]
n = n[:-1]
scale = np.linalg.norm(n)
n /= scale
intercept /= scale
return n
def findTransformFromVectors(originFrom=None, axisFrom=None,
originTo=None, axisTo=None,
origin=None,
scale=1):
'''
Compute a transformation that maps originFrom and axisFrom to originTo
and axisTo respectively. If scale is set to 'auto', the scale will be
determined such that the axes will also match in length:
scale = norm(axisTo)/norm(axisFrom)
Arguments: originFrom: sequences with 3 elements, or None
axisFrom: sequences with 3 elements, or None
originTo: sequences with 3 elements, or None
axisTo: sequences with 3 elements, or None
origin: sequences with 3 elements, or None,
overrides originFrom and originTo if set
scale: - scalar (isotropic scaling)
- sequence with 3 elements (anisotropic scaling),
- 'auto' (sets scale such that input axes match
in length after transforming axisFrom)
- None (no scaling)
Align two axes alone, assuming that we sit on (0,0,0)
findTransformFromVectors(axisFrom=a0, axisTo=a1)
Align two axes in one point (all calls are equivalent):
findTransformFromVectors(origin=o, axisFrom=a0, axisTo=a1)
findTransformFromVectors(originFrom=o, axisFrom=a0, axisTo=a1)
findTransformFromVectors(axisFrom=a0, originTo=o, axisTo=a1)
Move between two points:
findTransformFromVectors(orgin=o0, originTo=o1)
Move from one position to the other and align axes:
findTransformFromVectors(orgin=o0, axisFrom=a0, originTo=o1, axisTo=a1)
'''
# Prelude with trickle-down logic.
# Infer the origins if an information is not set.
if origin is not None:
# Check for ambiguous input.
assert(originFrom is None and originTo is None)
originFrom = origin
originTo = origin
if originFrom is None:
originFrom = originTo
if originTo is None:
originTo = originFrom
if originTo is None:
# We arrive here only if no origin information was set.
originTo = [0.,0.,0.]
originFrom = [0.,0.,0.]
originFrom = np.asarray(originFrom)
originTo = np.asarray(originTo)
# Check if any rotation will be involved.
axisFrom = np.asarray(axisFrom)
axisTo = np.asarray(axisTo)
axisFromL2 = np.linalg.norm(axisFrom)
axisToL2 = np.linalg.norm(axisTo)
if axisFrom is None or axisTo is None or axisFromL2==0 or axisToL2==0:
rotate = False
else:
rotate = not np.array_equal(axisFrom, axisTo)
# Scale.
if scale is None:
scale = 1.
if scale == 'auto':
scale = axisToL2/axisFromL2 if axisFromL2!=0. else 1.
if np.isscalar(scale):
scale = scale*np.ones(3)
if rotate:
rAxis = np.cross(axisFrom.ravel(), axisTo.ravel()) # rotation axis
angle = np.dot(axisFrom, axisTo) / axisFromL2 / axisToL2
angle = np.arccos(angle)
# Here we finally compute the transform.
trafo = vtk.vtkTransform()
trafo.Translate(originTo)
if rotate:
trafo.RotateWXYZ(angle / np.pi * 180, rAxis[0], rAxis[1], rAxis[2])
trafo.Scale(scale[0],scale[1],scale[2])
trafo.Translate(-originFrom)
return trafo
Is there anyway to increase the number of arrowheads on a matplotlib streamplot? Right now it appears as if three is only one arrowhead per streamline, which is a problem if I want to change to x/y axes limits to zoom in on the data.
Building on #Richard_wth's answer, I wrote a function to provide control on the location of the arrows on a streamplot. One can choose n arrows per streamline, or choose to have the arrows equally spaced on a streamline.
First, you do a normal streamplot, until you are happy with the location and number of streamlines. You keep the returned argument sp. For instance:
sp = ax.streamplot(x,y,u,v,arrowstyle='-',density=10)
What's important here is to have arrowstyle='-' so that arrows are not displayed.
Then, you can call the function streamQuiver (provided below) to control the arrows on the each streamline. If you want 3 arrows per streamline:
streamQuiver(ax, sp, n=3, ...)
If you want a streamline every 1.5 curvilinear length:
streamQuiver(ax, sp, spacing=1.5, ...)
where ... are options that would be passed to quiver.
The function streamQuiver is probably not fully bulletproof and may need some additional handling for particular cases. It relies on 4 subfunctions:
curve_coord to get the curvilinear length along a path
curve extract to extract equidistant point along a path
seg_to_lines to convert the segments from streamplot into continuous lines. There might be a better way to do that!
lines_to_arrows: this is the main function that extract arrows on each lines
Here's an example where the arrows are at equidistant points on each streamlines.
import numpy as np
import matplotlib.pyplot as plt
def streamQuiver(ax,sp,*args,spacing=None,n=5,**kwargs):
""" Plot arrows from streamplot data
The number of arrows per streamline is controlled either by `spacing` or by `n`.
See `lines_to_arrows`.
"""
def curve_coord(line=None):
""" return curvilinear coordinate """
x=line[:,0]
y=line[:,1]
s = np.zeros(x.shape)
s[1:] = np.sqrt((x[1:]-x[0:-1])**2+ (y[1:]-y[0:-1])**2)
s = np.cumsum(s)
return s
def curve_extract(line,spacing,offset=None):
""" Extract points at equidistant space along a curve"""
x=line[:,0]
y=line[:,1]
if offset is None:
offset=spacing/2
# Computing curvilinear length
s = curve_coord(line)
offset=np.mod(offset,s[-1]) # making sure we always get one point
# New (equidistant) curvilinear coordinate
sExtract=np.arange(offset,s[-1],spacing)
# Interpolating based on new curvilinear coordinate
xx=np.interp(sExtract,s,x);
yy=np.interp(sExtract,s,y);
return np.array([xx,yy]).T
def seg_to_lines(seg):
""" Convert a list of segments to a list of lines """
def extract_continuous(i):
x=[]
y=[]
# Special case, we have only 1 segment remaining:
if i==len(seg)-1:
x.append(seg[i][0,0])
y.append(seg[i][0,1])
x.append(seg[i][1,0])
y.append(seg[i][1,1])
return i,x,y
# Looping on continuous segment
while i<len(seg)-1:
# Adding our start point
x.append(seg[i][0,0])
y.append(seg[i][0,1])
# Checking whether next segment continues our line
Continuous= all(seg[i][1,:]==seg[i+1][0,:])
if not Continuous:
# We add our end point then
x.append(seg[i][1,0])
y.append(seg[i][1,1])
break
elif i==len(seg)-2:
# we add the last segment
x.append(seg[i+1][0,0])
y.append(seg[i+1][0,1])
x.append(seg[i+1][1,0])
y.append(seg[i+1][1,1])
i=i+1
return i,x,y
lines=[]
i=0
while i<len(seg):
iEnd,x,y=extract_continuous(i)
lines.append(np.array( [x,y] ).T)
i=iEnd+1
return lines
def lines_to_arrows(lines,n=5,spacing=None,normalize=True):
""" Extract "streamlines" arrows from a set of lines
Either: `n` arrows per line
or an arrow every `spacing` distance
If `normalize` is true, the arrows have a unit length
"""
if spacing is None:
# if n is provided we estimate the spacing based on each curve lenght)
spacing = [ curve_coord(l)[-1]/n for l in lines]
try:
len(spacing)
except:
spacing=[spacing]*len(lines)
lines_s=[curve_extract(l,spacing=sp,offset=sp/2) for l,sp in zip(lines,spacing)]
lines_e=[curve_extract(l,spacing=sp,offset=sp/2+0.01*sp) for l,sp in zip(lines,spacing)]
arrow_x = [l[i,0] for l in lines_s for i in range(len(l))]
arrow_y = [l[i,1] for l in lines_s for i in range(len(l))]
arrow_dx = [le[i,0]-ls[i,0] for ls,le in zip(lines_s,lines_e) for i in range(len(ls))]
arrow_dy = [le[i,1]-ls[i,1] for ls,le in zip(lines_s,lines_e) for i in range(len(ls))]
if normalize:
dn = [ np.sqrt(ddx**2 + ddy**2) for ddx,ddy in zip(arrow_dx,arrow_dy)]
arrow_dx = [ddx/ddn for ddx,ddn in zip(arrow_dx,dn)]
arrow_dy = [ddy/ddn for ddy,ddn in zip(arrow_dy,dn)]
return arrow_x,arrow_y,arrow_dx,arrow_dy
# --- Main body of streamQuiver
# Extracting lines
seg = sp.lines.get_segments() # list of (2, 2) numpy arrays
lines = seg_to_lines(seg) # list of (N,2) numpy arrays
# Convert lines to arrows
ar_x, ar_y, ar_dx, ar_dy = lines_to_arrows(lines,spacing=spacing,n=n,normalize=True)
# Plot arrows
qv=ax.quiver(ar_x, ar_y, ar_dx, ar_dy, *args, angles='xy', **kwargs)
return qv
# --- Example
x = np.linspace(-1,1,100)
y = np.linspace(-1,1,100)
X,Y=np.meshgrid(x,y)
u = -np.sin(np.arctan2(Y,X))
v = np.cos(np.arctan2(Y,X))
xseed=np.linspace(0.1,1,4)
fig=plt.figure()
ax=fig.add_subplot(111)
sp = ax.streamplot(x,y,u,v,color='k',arrowstyle='-',start_points=np.array([xseed,xseed*0]).T,density=30)
qv = streamQuiver(ax,sp,spacing=0.5, scale=60)
plt.show()
I'm not sure about just increasing the number of arrowheads - but you can increase the density of streamlines with the density parameter in the streamplot function, here's the documentation:
*density* : float or 2-tuple
Controls the closeness of streamlines. When `density = 1`, the domain
is divided into a 30x30 grid---*density* linearly scales this grid.
Each cell in the grid can have, at most, one traversing streamline.
For different densities in each direction, use [density_x, density_y].
Here is an example:
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(0,20,1)
y = np.arange(0,20,1)
u=np.random.random((x.shape[0], y.shape[0]))
v=np.random.random((x.shape[0], y.shape[0]))
fig, ax = plt.subplots(2,2)
ax[0,0].streamplot(x,y,u,v,density=1)
ax[0,0].set_title('Original')
ax[0,1].streamplot(x,y,u,v,density=4)
ax[0,1].set_xlim(5,10)
ax[0,1].set_ylim(5,10)
ax[0,1].set_title('Zoomed, higher density')
ax[1,1].streamplot(x,y,u,v,density=1)
ax[1,1].set_xlim(5,10)
ax[1,1].set_ylim(5,10)
ax[1,1].set_title('Zoomed, same density')
ax[1,0].streamplot(x,y,u,v,density=4)
ax[1,0].set_title('Original, higher density')
fig.show()
I have found a way to customize the number of arrowheads on streamline plot.
The idea is to plot streamline and arrows separately:
plt.streamplot returns a stream_container with two attributes: lines and arrows. The lines contain line segments that can be used to reconstruct streamline without arrows.
plt.quiver can be used to plot gradient fields. With the proper scaling, the length of the arrows is neglectable, leaving only arrowheads.
Thus, we only need to define the positions of arrows using the line segments and pass them to plt.quiver.
Here is a toy example:
import matplotlib.pyplot as plt
from matplotlib import collections as mc
import numpy as np
# get line segments
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
sp = ax.streamplot(x, y, u, v, start_points=start_points, density=10)
seg = sps.lines.get_segments() # seg is a list of (2, 2) numpy arrays
lc = mc.LineCollection(seg, ...)
# define arrows
# here I define one arrow every 50 segments
# you could also select segs based on some criterion, e.g. intersect with certain lines
period = 50
arrow_x = np.array([seg[i][0, 0] for i in range(0, len(seg), period)])
arrow_y = np.array([seg[i][0, 1] for i in range(0, len(seg), period)])
arrow_dx = np.array([seg[i][1, 0] - seg[i][0, 0] for i in range(0, len(seg), period)])
arrow_dy = np.array([seg[i][1, 1] - seg[i][0, 1] for i in range(0, len(seg), period)])
# plot the final streamline
fig = plt.figure(figsize=(12.8, 10.8))
ax = fig.add_subplot(1, 1, 1)
ax.add_collection(lc)
ax.autoscale()
ax.quiver(
arrow_x, arrow_y, arrow_dx, arrow_dy, angles='xy', # arrow position
scale=0.2, scale_units='inches', units='y', minshaft=0, # arrow scaling
headwidth=6, headlength=10, headaxislength=9) # arrow style
fig.show()
There is more than one way to scale the arrows so that they appear to have zero length.
I am currently working with BUFR files with wind data. When I read this file on python I get 4 large vectors, latitude vector, longitude vector, wind_direction vector, and wind_speed vector.
Both wind vectors are masked python arrays because there is non-valid data. This happens because the data comes from a non-geostationary satellite. In fact I successfully generated the following image from this BUFR file to show you the general shape that the data takes.
In this image I have plotted a color field to represent the wind speed, while the arrows obviously represent the wind direction.
Please notice the two bands of actual data. Unfortunately the way I am plotting the data, generates a third band (where the color field is smooth), in-between the actual data bands. This is an artefact of the function pcolormesh. If I could superimpose two `pcolormesh plots, each one representing one of the bands, this problem would disappear.
Unfortunately, I do not know how I could separate the data "regions". I have thought about clustering techniques but do not know how to cluster along latlon data using ANOTHER array (the wind data) as the clustering rule.
This is my current code:
#!/usr/bin/python
import bufr
import numpy as np
import sys
import matplotlib
matplotlib.use('Agg')
from matplotlib import pyplot as plt
from matplotlib import mlab
WIND_DIR_INDEX = 97
WIND_SPEED_INDEX = 96
bfrfile = sys.argv[1]
print bfrfile
bfr = bufr.BUFRFile(bfrfile)
lon = []
lat = []
wind_d = []
wind_s = []
for record in bfr:
for entry in record:
if entry.index == WIND_DIR_INDEX:
wind_d.append(entry.data)
if entry.index == WIND_SPEED_INDEX:
wind_s.append(entry.data)
if entry.name.find("LONGITUDE") == 0:
lon.append(entry.data)
if entry.name.find("LATITUDE") == 0:
lat.append(entry.data)
lons = np.concatenate(lon)
lats = np.concatenate(lat)
winds_d = np.concatenate(wind_d)
winds_s = np.concatenate(wind_s)
winds_d = np.ma.masked_greater(winds_d,1.0e+6)
winds_s = np.ma.masked_greater(winds_s,1.0e+6)
windu = np.cos((winds_d-180)*(np.pi/180))
windv = np.sin((winds_d-180)*(np.pi/180))
# Data interpolation for pcolormesh (needs gridded data)
xi = np.linspace(lons.min(),lons.max(),lons.size/10)
yi = np.linspace(lats.min(),lats.max(),lats.size/10)
Z = mlab.griddata(lons,lats,winds_s,xi,yi)
X,Y = np.meshgrid(xi,yi)
mydpi = 96
fig = plt.figure(frameon=True)
fig.set_size_inches(1600/mydpi,1200/mydpi)
ax = plt.Axes(fig,[0,0,1,1])
#ax.set_axis_off()
fig.add_axes(ax)
plt.hold(True);
plt.quiver(lons[::5],lats[::5],windu[::5],windv[::5],linewidths=0)
for method in (ax.set_xticks,ax.set_xticklabels,ax.set_yticks,ax.set_yticklabels):
method([])
fig.savefig('/home/cendas/bin/python/bufr_ascat.png',bbox_inches=0,dpi=5*mydpi)
mydpi = 96
fig = plt.figure(frameon=True)
fig.set_size_inches(1600/mydpi,1200/mydpi)
ax = plt.Axes(fig,[0,0,1,1])
#ax.set_axis_off()
fig.add_axes(ax)
plt.hold(True);
try:
plt.pcolormesh(X,Y,Z,alpha=None)
plt.clim(0,10)
except ValueError:
pass
print "Warning: Empty data array."
for method in (ax.set_xticks,ax.set_xticklabels,ax.set_yticks,ax.set_yticklabels):
method([])
fig.savefig('/home/cendas/bin/python/bufr_ascat_color.png',bbox_inches=0,dpi=5*mydpi)
I then usually follow this python code with the following terminal commands to combine the images:
convert bufr_ascat.png -transparent white bufr_ascat.png
convert bufr_ascat_color.png -transparent white bufr_ascat_color.png
composite bufr_ascat.png bufr_ascat_color.png bufrascat.png
Don't abuse clustering for this.
What you need is a simple selection / filtering; not a structure discovery process.
Choose the mean of the masked data. All non-masked data left of that mean is the left part, all non-masked data on the right is the other?
Clustering is the wrong tool for this task.
I use matplotlib's method hexbin to compute 2d histograms on my data.
But I would like to get the coordinates of the centers of the hexagons in order to further process the results.
I got the values using get_array() method on the result, but I cannot figure out how to get the bins coordinates.
I tried to compute them given number of bins and the extent of my data but i don't know the exact number of bins in each direction. gridsize=(10,2) should do the trick but it does not seem to work.
Any idea?
I think this works.
from __future__ import division
import numpy as np
import math
import matplotlib.pyplot as plt
def generate_data(n):
"""Make random, correlated x & y arrays"""
points = np.random.multivariate_normal(mean=(0,0),
cov=[[0.4,9],[9,10]],size=int(n))
return points
if __name__ =='__main__':
color_map = plt.cm.Spectral_r
n = 1e4
points = generate_data(n)
xbnds = np.array([-20.0,20.0])
ybnds = np.array([-20.0,20.0])
extent = [xbnds[0],xbnds[1],ybnds[0],ybnds[1]]
fig=plt.figure(figsize=(10,9))
ax = fig.add_subplot(111)
x, y = points.T
# Set gridsize just to make them visually large
image = plt.hexbin(x,y,cmap=color_map,gridsize=20,extent=extent,mincnt=1,bins='log')
# Note that mincnt=1 adds 1 to each count
counts = image.get_array()
ncnts = np.count_nonzero(np.power(10,counts))
verts = image.get_offsets()
for offc in xrange(verts.shape[0]):
binx,biny = verts[offc][0],verts[offc][1]
if counts[offc]:
plt.plot(binx,biny,'k.',zorder=100)
ax.set_xlim(xbnds)
ax.set_ylim(ybnds)
plt.grid(True)
cb = plt.colorbar(image,spacing='uniform',extend='max')
plt.show()
I would love to confirm that the code by Hooked using get_offsets() works, but I tried several iterations of the code mentioned above to retrieve center positions and, as Dave mentioned, get_offsets() remains empty. The workaround that I found is to use the non-empty 'image.get_paths()' option. My code takes the mean to find centers but which means it is just a smidge longer, but it does work.
The get_paths() option returns a set of x,y coordinates embedded that can be looped over and then averaged to return the center position for each hexagram.
The code that I have is as follows:
counts=image.get_array() #counts in each hexagon, works great
verts=image.get_offsets() #empty, don't use this
b=image.get_paths() #this does work, gives Path([[]][]) which can be plotted
for x in xrange(len(b)):
xav=np.mean(b[x].vertices[0:6,0]) #center in x (RA)
yav=np.mean(b[x].vertices[0:6,1]) #center in y (DEC)
plt.plot(xav,yav,'k.',zorder=100)
I had this same problem. I think what needs to be developed is a framework to have a HexagonalGrid object which can then be applied to many different data sets (and it would be awesome to do it for N dimensions). This is possible and it surprises me that neither Scipy or Numpy has anything for it (furthermore there seems to be nothing else like it except perhaps binify)
That said, I assume you want to use hexbinning to compare multiple binned data sets. This requires some common base. I got this to work using matplotlib's hexbin the following way:
import numpy as np
import matplotlib.pyplot as plt
def get_data (mean,cov,n=1e3):
"""
Quick fake data builder
"""
np.random.seed(101)
points = np.random.multivariate_normal(mean=mean,cov=cov,size=int(n))
x, y = points.T
return x,y
def get_centers (hexbin_output):
"""
about 40% faster than previous post only cause you're not calculating the
min/max every time
"""
paths = hexbin_output.get_paths()
v = paths[0].vertices[:-1] # adds a value [0,0] to the end
vx,vy = v.T
idx = [3,0,5,2] # index for [xmin,xmax,ymin,ymax]
xmin,xmax,ymin,ymax = vx[idx[0]],vx[idx[1]],vy[idx[2]],vy[idx[3]]
half_width_x = abs(xmax-xmin)/2.0
half_width_y = abs(ymax-ymin)/2.0
centers = []
for i in xrange(len(paths)):
cx = paths[i].vertices[idx[0],0]+half_width_x
cy = paths[i].vertices[idx[2],1]+half_width_y
centers.append((cx,cy))
return np.asarray(centers)
# important parts ==>
class Hexagonal2DGrid (object):
"""
Used to fix the gridsize, extent, and bins
"""
def __init__ (self,gridsize,extent,bins=None):
self.gridsize = gridsize
self.extent = extent
self.bins = bins
def hexbin (x,y,hexgrid):
"""
To hexagonally bin the data in 2 dimensions
"""
fig = plt.figure()
ax = fig.add_subplot(111)
# Note mincnt=0 so that it will return a value for every point in the
# hexgrid, not just those with count>mincnt
# Basically you fix the gridsize, extent, and bins to keep them the same
# then the resulting count array is the same
hexbin = plt.hexbin(x,y, mincnt=0,
gridsize=hexgrid.gridsize,
extent=hexgrid.extent,
bins=hexgrid.bins)
# you could close the figure if you don't want it
# plt.close(fig.number)
counts = hexbin.get_array().copy()
return counts, hexbin
# Example ===>
if __name__ == "__main__":
hexgrid = Hexagonal2DGrid((21,5),[-70,70,-20,20])
x_data,y_data = get_data((0,0),[[-40,95],[90,10]])
x_model,y_model = get_data((0,10),[[100,30],[3,30]])
counts_data, hexbin_data = hexbin(x_data,y_data,hexgrid)
counts_model, hexbin_model = hexbin(x_model,y_model,hexgrid)
# if you want the centers, they will be the same for both
centers = get_centers(hexbin_data)
# if you want to ignore the cells with zeros then use the following mask.
# But if want zeros for some bins and not others I'm not sure an elegant way
# to do this without using the centers
nonzero = counts_data != 0
# now you can compare the two data sets
variance_data = counts_data[nonzero]
square_diffs = (counts_data[nonzero]-counts_model[nonzero])**2
chi2 = np.sum(square_diffs/variance_data)
print(" chi2={}".format(chi2))