I am creating a charge smear function. I have a matrix were each row is a particle with a charge and position. I then look at each particles position in a grid, to count how many particles are in each grid-cell, but I need to know which cell each particle is in, so that I may find the average of the positions for every particle in a specific grid-cell. My idea for a fix is to create an list where the number of rows is the amount of grid-cells in my matrix, and let the column be positions in x,y and z direction, but obviously I can't append more then one number to each index, but maybe some variation will work? Sorry for open ended question. Thank you in advance
import matplotlib.pyplot as plt
import random
import numpy as np
###Initalize particle lists
particle_arrayX=[]
particle_arrayY=[]
###The resolution
N = 10
###Number of particles
M = 1000
col=None
row=None
###Size of box
Box_size=100
###gridsize
Grid_size=Box_size/N
###Initalize particles
for i in range(M):
particle_arrayX.append(random.random()*Box_size)
particle_arrayY.append(random.random()*Box_size)
###Intialize matrix
ParticleMatrix_Inital=[[0 for i in range(N)]]*N
###Measure density in each cell
for i in range(M):
col=None
row=None
#The x and y components are diveded by the gridsize
#Then they are converted to integers and then asigned either to a row or column
#If value is float with decimal 0 EX 2.0, then 1 is substracted before converted to int
coln=particle_arrayX[i]/Grid_size
rown=particle_arrayY[i]/Grid_size
if coln.is_integer()==True:
col=int(coln)-1
else:
col=int(coln)
if rown.is_integer()==True:
row=int(rown)-1
else:
row=int(rown)
ParticleMatrix_Inital=np.array(ParticleMatrix_Inital)
ParticleMatrix_Inital[row,col]=ParticleMatrix_Inital[row,col]+1
ParticleMatrix_Inital=ParticleMatrix_Inital.tolist()
#Plot matrix
plt.imshow(ParticleMatrix_Inital)
plt.colorbar()
plt.show()
Welcome to SO!
There are many ways to approach the problem of "bin-ing" empirical data. I'm proposing an object oriented (OO) solution below, because (in my subjective opinion) it provides clean, tidy and highly readable code. On the other hand, OO-solutions might not be the most efficient if you're simulating huge many-particles systems. If the below code doesn't entirely solve your issues, I still hope that parts of it can be of some help to you.
That being said, I propose implementing your grid as a class. To make life easier for our self, we may apply the convention that all particles have positive coordinates. That is x, y and even z (if introduced) stretches from 0 to whatever box_size you define. However, the class Grid does not really care about the actual box_size, only the resolution of the grid!
class Grid:
def __init__(self, _delta_x, _delta_y):
self.delta_x = _delta_x
self.delta_y = _delta_y
def bounding_cell(self, x, y):
# Note that y-coordinates corresponds to matrix rows,
# and that x-coordinates corresponds to matrix columns
return (int(y/self.delta_y), int(x/self.delta_x))
Yes, this could have been a simple function. However, as a class it is easily expandable. Also, a function would have rely on global variables (yuck!) or explicitly be given the grid spanning (delta) in each dimensional direction, for every determining of which matrix cell (or bin) the given coordinate (x,y) belongs to.
Now, how does it work? Imagine the simplest of cases, where your grid resolution is 1. Then, a particle at position (x,y) = (1.2, 4,9) should be placed in the matrix at (row,col) = (4,1). That is row = int(y/delta_y) and likewise for x. The higher resolution (smaller delta) you have, the larger the matrix gets in terms of number of rows and cols.
Now that we have a Grid, let us also object orient the Particle! Rather straight forward:
class Particle:
def __init__(self, _id, _charge, _pos_x, _pos_y):
self.id = _id
self.charge = _charge
self.pos_x = _pos_x
self.pos_y = _pos_y
def __str__(self):
# implementing the __str__ method let's us 'print(a_particle)'
return "{%d, %d, %3.1f, %3.1f}" % (self.id, self.charge, self.pos_x, self.pos_y)
def get_position(self):
return self.pos_x, self.pos_y
def get_charge(self):
return self.charge
This class is more or less just a collection of data, and could easily have been replaced by a dict. However, the class screams its intent clearly, it is clean and tidy, and also easily expanded.
Now, let's create some instances of particles! Here is a function which by list comprehension creates a list of particles with an id, charge and position (x,y):
import random
def create_n_particles(n_particles, max_pos):
return [Particle(id, # unique ID
random.randint(-1,1), # charge [-1, 0, 1]
random.random()*max_pos, # x coord
random.random()*max_pos) # y coord
for id in range(n_particles)]
And finally, we get to the fun part: putting it all together:
import numpy as np
if __name__ == "__main__":
n_particles = 1000
box_size = 100
grid_resolution = 10
grid_size = int(box_size / grid_resolution)
grid = Grid(grid_resolution, grid_resolution)
particles = create_n_particles(n_particles, box_size)
charge_matrix = np.zeros((grid_size, grid_size))
id_matrix = [[ [] for i in range(grid_size)] for j in range(grid_size)]
for particle in particles:
x, y = particle.get_position()
row, col = grid.bounding_cell(x, y)
charge_matrix[row][col] += particle.get_charge()
# The ID-matrix is similar to the charge-matrix,
# but every cell contains a new list of particle IDs
id_matrix[row][col].append(particle.id)
Notice the initialization of the ID-matrix: This is the list of particle positions for each grid cell that you asked for. It is a matrix, representing the particle container, and each cell contains a list to be filled with particle IDs. You could also populate these lists with entire particle instances (not just their IDs): id_matrix[row][col].append(particle).
The last for loop does the real work, and here the Object Oriented strategy shows us how charming it is: The loop is short and it is very easy to read and understand what is going on: A cell in the charge_matrix contains the total charge within this grid cell/bin. Meanwhile, the id_matrix is filled with the IDs of the particles that is contained within this grid cell/bin.
From the way we've constructed the list of particles, particles, we see that a particle's ID is equivalent to that particle's index in the list. Hence, they may be retrieved like this,
for i,row in enumerate(id_matrix):
for j,col in enumerate(row):
print("[%d][%d] : " % (i, j), end="")
for particle_id in id_matrix[i][j]:
p = particles[particle_id]
# do some work with 'p', or just print it:
print(p, end=", ")
print("") # print new line
# Output:
# [0][0] : {32, -1, 0.2, 0.4}, ... <-- all data of that particle
# ....
I leave optimization of this retrieval to you as I don't really know what data you need and what you're going to do with it. Maybe it's better to contain all the particles in a dict instead of a list; I don't know(?). You choose!
At the very end, I'll suggest that you use matshow which is inteded for displaying matrices, as opposed to imshow which is more aiming more for images.
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(111)
cax = ax.matshow(charge_matrix)
fig.colorbar(cax)
ax.invert_yaxis() # flip the matrix such that the y-axis points upwards
fig.savefig("charge_matrix.png")
We can also scatter plot the particles and add grid lines corresponding to our the grid in the matshow above. We color the scatter plots such that negative charges are blue, neutral are gray and positive are red.
def charge_color(charge):
if charge > 0: return 'red'
elif charge < 0: return 'blue'
else: return 'gray'
fig = plt.figure()
ax = fig.add_subplot(111)
ax.set_aspect('equal')
ax.set_xticks(np.arange(0, 101, grid_resolution))
ax.set_yticks(np.arange(0, 101, grid_resolution))
ax.grid()
ax.scatter([p.pos_x for p in particles],
[p.pos_y for p in particles],
c=[charge_color(p.get_charge()) for p in particles])
fig.savefig("particle_dist.png")
Related
I'm finally a member of the stackoverflow community, so this is my first post. I'm trying my best to create a good question.
I have a problem with a simple variable declaration in a for-loop. The variable gets declared in an if-statement and is the variable of a class. Basically what the code does is to create a box with points in it, initialise the positions and velocities of the points, edit the velocities and calculate the new positions according to their new velocities. I now want to save the initial positions of the points (means time = 0) outside of the class in a for-loop. I do this by if time = 0 then save position to a variable, but the variable gets updated to the new position in every loop iteration. The actual code is about hydrodynamics and particle interaction, but the basic structure of the code is something like this:
import numpy as np
class box():
def __init__(self, boxsize, num_points, timestep):
"""
boxsize is the size of the quadratic box in x and y direction
num_points is the number of points in the box
timestep is the time after that the positions should be updated
"""
self.boxsize = float(boxsize)
self.num_points = int(num_points)
self.timestep = float(timestep)
self.positions = np.zeros((self.num_points, 2)).astype(float)
self.velocities = np.zeros((self.num_points, 2)).astype(float)
def initialise(self):
"""initialise the positions and velocites of the points in the box, both with x- and y-components"""
self.positions[:, :] = np.random.uniform(0., self.boxsize, size=(self.num_points, 2))
self.velocities[:, :] = np.random.uniform(0., 1., size=(self.num_points, 2))
def update_positions(self):
"""update position according to velocities, x- and y-components"""
self.positions += self.velocities*self.timestep
def new_velocities(self):
""" create new velocities, x- and y-components """
self.velocities[:, :] = np.random.uniform(0., 1., size=(self.num_points, 2))
def connect_steps(self):
"""update the positions according to their velocities and create new velocities"""
self.update_positions()
self.new_velocities()
system = box(10., 1, 0.1) #box is 10 in x and y; 1 point in the box
system.initialise() #initialise the positions and velocities of the box
for i in range(10): #10 timesteps
system.connect_steps()
if i == 0.:
r0 = system.positions
print(r0) #here r0 should always be the same array from i = 0 but isn't
print(r0 == system.positions) #yields True every iteration, so r0 is always the new position
What I want is that r0 is always the position at i = 0 (initial position), but every iteration the variable r0 gets updated according to its new position, although the if-clause and so the variable definition only gets entered once at i = 0.
It is intended to first update the positions and after that generate new velocities although they are first used in the next iteration because the real algorithm behind this velocity-generation needs the structure this way.
Maybe there is just a characteristic or property of classes I don't know.
I hope the question makes sense and anybody can help me out.
Thanks a lot!
You might try np.copy(system.positions) to get a copy that won't continue to mutate as you update.
Reference https://numpy.org/doc/stable/reference/generated/numpy.copy.html
Is there anyway to increase the number of arrowheads on a matplotlib streamplot? Right now it appears as if three is only one arrowhead per streamline, which is a problem if I want to change to x/y axes limits to zoom in on the data.
Building on #Richard_wth's answer, I wrote a function to provide control on the location of the arrows on a streamplot. One can choose n arrows per streamline, or choose to have the arrows equally spaced on a streamline.
First, you do a normal streamplot, until you are happy with the location and number of streamlines. You keep the returned argument sp. For instance:
sp = ax.streamplot(x,y,u,v,arrowstyle='-',density=10)
What's important here is to have arrowstyle='-' so that arrows are not displayed.
Then, you can call the function streamQuiver (provided below) to control the arrows on the each streamline. If you want 3 arrows per streamline:
streamQuiver(ax, sp, n=3, ...)
If you want a streamline every 1.5 curvilinear length:
streamQuiver(ax, sp, spacing=1.5, ...)
where ... are options that would be passed to quiver.
The function streamQuiver is probably not fully bulletproof and may need some additional handling for particular cases. It relies on 4 subfunctions:
curve_coord to get the curvilinear length along a path
curve extract to extract equidistant point along a path
seg_to_lines to convert the segments from streamplot into continuous lines. There might be a better way to do that!
lines_to_arrows: this is the main function that extract arrows on each lines
Here's an example where the arrows are at equidistant points on each streamlines.
import numpy as np
import matplotlib.pyplot as plt
def streamQuiver(ax,sp,*args,spacing=None,n=5,**kwargs):
""" Plot arrows from streamplot data
The number of arrows per streamline is controlled either by `spacing` or by `n`.
See `lines_to_arrows`.
"""
def curve_coord(line=None):
""" return curvilinear coordinate """
x=line[:,0]
y=line[:,1]
s = np.zeros(x.shape)
s[1:] = np.sqrt((x[1:]-x[0:-1])**2+ (y[1:]-y[0:-1])**2)
s = np.cumsum(s)
return s
def curve_extract(line,spacing,offset=None):
""" Extract points at equidistant space along a curve"""
x=line[:,0]
y=line[:,1]
if offset is None:
offset=spacing/2
# Computing curvilinear length
s = curve_coord(line)
offset=np.mod(offset,s[-1]) # making sure we always get one point
# New (equidistant) curvilinear coordinate
sExtract=np.arange(offset,s[-1],spacing)
# Interpolating based on new curvilinear coordinate
xx=np.interp(sExtract,s,x);
yy=np.interp(sExtract,s,y);
return np.array([xx,yy]).T
def seg_to_lines(seg):
""" Convert a list of segments to a list of lines """
def extract_continuous(i):
x=[]
y=[]
# Special case, we have only 1 segment remaining:
if i==len(seg)-1:
x.append(seg[i][0,0])
y.append(seg[i][0,1])
x.append(seg[i][1,0])
y.append(seg[i][1,1])
return i,x,y
# Looping on continuous segment
while i<len(seg)-1:
# Adding our start point
x.append(seg[i][0,0])
y.append(seg[i][0,1])
# Checking whether next segment continues our line
Continuous= all(seg[i][1,:]==seg[i+1][0,:])
if not Continuous:
# We add our end point then
x.append(seg[i][1,0])
y.append(seg[i][1,1])
break
elif i==len(seg)-2:
# we add the last segment
x.append(seg[i+1][0,0])
y.append(seg[i+1][0,1])
x.append(seg[i+1][1,0])
y.append(seg[i+1][1,1])
i=i+1
return i,x,y
lines=[]
i=0
while i<len(seg):
iEnd,x,y=extract_continuous(i)
lines.append(np.array( [x,y] ).T)
i=iEnd+1
return lines
def lines_to_arrows(lines,n=5,spacing=None,normalize=True):
""" Extract "streamlines" arrows from a set of lines
Either: `n` arrows per line
or an arrow every `spacing` distance
If `normalize` is true, the arrows have a unit length
"""
if spacing is None:
# if n is provided we estimate the spacing based on each curve lenght)
spacing = [ curve_coord(l)[-1]/n for l in lines]
try:
len(spacing)
except:
spacing=[spacing]*len(lines)
lines_s=[curve_extract(l,spacing=sp,offset=sp/2) for l,sp in zip(lines,spacing)]
lines_e=[curve_extract(l,spacing=sp,offset=sp/2+0.01*sp) for l,sp in zip(lines,spacing)]
arrow_x = [l[i,0] for l in lines_s for i in range(len(l))]
arrow_y = [l[i,1] for l in lines_s for i in range(len(l))]
arrow_dx = [le[i,0]-ls[i,0] for ls,le in zip(lines_s,lines_e) for i in range(len(ls))]
arrow_dy = [le[i,1]-ls[i,1] for ls,le in zip(lines_s,lines_e) for i in range(len(ls))]
if normalize:
dn = [ np.sqrt(ddx**2 + ddy**2) for ddx,ddy in zip(arrow_dx,arrow_dy)]
arrow_dx = [ddx/ddn for ddx,ddn in zip(arrow_dx,dn)]
arrow_dy = [ddy/ddn for ddy,ddn in zip(arrow_dy,dn)]
return arrow_x,arrow_y,arrow_dx,arrow_dy
# --- Main body of streamQuiver
# Extracting lines
seg = sp.lines.get_segments() # list of (2, 2) numpy arrays
lines = seg_to_lines(seg) # list of (N,2) numpy arrays
# Convert lines to arrows
ar_x, ar_y, ar_dx, ar_dy = lines_to_arrows(lines,spacing=spacing,n=n,normalize=True)
# Plot arrows
qv=ax.quiver(ar_x, ar_y, ar_dx, ar_dy, *args, angles='xy', **kwargs)
return qv
# --- Example
x = np.linspace(-1,1,100)
y = np.linspace(-1,1,100)
X,Y=np.meshgrid(x,y)
u = -np.sin(np.arctan2(Y,X))
v = np.cos(np.arctan2(Y,X))
xseed=np.linspace(0.1,1,4)
fig=plt.figure()
ax=fig.add_subplot(111)
sp = ax.streamplot(x,y,u,v,color='k',arrowstyle='-',start_points=np.array([xseed,xseed*0]).T,density=30)
qv = streamQuiver(ax,sp,spacing=0.5, scale=60)
plt.show()
I'm not sure about just increasing the number of arrowheads - but you can increase the density of streamlines with the density parameter in the streamplot function, here's the documentation:
*density* : float or 2-tuple
Controls the closeness of streamlines. When `density = 1`, the domain
is divided into a 30x30 grid---*density* linearly scales this grid.
Each cell in the grid can have, at most, one traversing streamline.
For different densities in each direction, use [density_x, density_y].
Here is an example:
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(0,20,1)
y = np.arange(0,20,1)
u=np.random.random((x.shape[0], y.shape[0]))
v=np.random.random((x.shape[0], y.shape[0]))
fig, ax = plt.subplots(2,2)
ax[0,0].streamplot(x,y,u,v,density=1)
ax[0,0].set_title('Original')
ax[0,1].streamplot(x,y,u,v,density=4)
ax[0,1].set_xlim(5,10)
ax[0,1].set_ylim(5,10)
ax[0,1].set_title('Zoomed, higher density')
ax[1,1].streamplot(x,y,u,v,density=1)
ax[1,1].set_xlim(5,10)
ax[1,1].set_ylim(5,10)
ax[1,1].set_title('Zoomed, same density')
ax[1,0].streamplot(x,y,u,v,density=4)
ax[1,0].set_title('Original, higher density')
fig.show()
I have found a way to customize the number of arrowheads on streamline plot.
The idea is to plot streamline and arrows separately:
plt.streamplot returns a stream_container with two attributes: lines and arrows. The lines contain line segments that can be used to reconstruct streamline without arrows.
plt.quiver can be used to plot gradient fields. With the proper scaling, the length of the arrows is neglectable, leaving only arrowheads.
Thus, we only need to define the positions of arrows using the line segments and pass them to plt.quiver.
Here is a toy example:
import matplotlib.pyplot as plt
from matplotlib import collections as mc
import numpy as np
# get line segments
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
sp = ax.streamplot(x, y, u, v, start_points=start_points, density=10)
seg = sps.lines.get_segments() # seg is a list of (2, 2) numpy arrays
lc = mc.LineCollection(seg, ...)
# define arrows
# here I define one arrow every 50 segments
# you could also select segs based on some criterion, e.g. intersect with certain lines
period = 50
arrow_x = np.array([seg[i][0, 0] for i in range(0, len(seg), period)])
arrow_y = np.array([seg[i][0, 1] for i in range(0, len(seg), period)])
arrow_dx = np.array([seg[i][1, 0] - seg[i][0, 0] for i in range(0, len(seg), period)])
arrow_dy = np.array([seg[i][1, 1] - seg[i][0, 1] for i in range(0, len(seg), period)])
# plot the final streamline
fig = plt.figure(figsize=(12.8, 10.8))
ax = fig.add_subplot(1, 1, 1)
ax.add_collection(lc)
ax.autoscale()
ax.quiver(
arrow_x, arrow_y, arrow_dx, arrow_dy, angles='xy', # arrow position
scale=0.2, scale_units='inches', units='y', minshaft=0, # arrow scaling
headwidth=6, headlength=10, headaxislength=9) # arrow style
fig.show()
There is more than one way to scale the arrows so that they appear to have zero length.
I use matplotlib's method hexbin to compute 2d histograms on my data.
But I would like to get the coordinates of the centers of the hexagons in order to further process the results.
I got the values using get_array() method on the result, but I cannot figure out how to get the bins coordinates.
I tried to compute them given number of bins and the extent of my data but i don't know the exact number of bins in each direction. gridsize=(10,2) should do the trick but it does not seem to work.
Any idea?
I think this works.
from __future__ import division
import numpy as np
import math
import matplotlib.pyplot as plt
def generate_data(n):
"""Make random, correlated x & y arrays"""
points = np.random.multivariate_normal(mean=(0,0),
cov=[[0.4,9],[9,10]],size=int(n))
return points
if __name__ =='__main__':
color_map = plt.cm.Spectral_r
n = 1e4
points = generate_data(n)
xbnds = np.array([-20.0,20.0])
ybnds = np.array([-20.0,20.0])
extent = [xbnds[0],xbnds[1],ybnds[0],ybnds[1]]
fig=plt.figure(figsize=(10,9))
ax = fig.add_subplot(111)
x, y = points.T
# Set gridsize just to make them visually large
image = plt.hexbin(x,y,cmap=color_map,gridsize=20,extent=extent,mincnt=1,bins='log')
# Note that mincnt=1 adds 1 to each count
counts = image.get_array()
ncnts = np.count_nonzero(np.power(10,counts))
verts = image.get_offsets()
for offc in xrange(verts.shape[0]):
binx,biny = verts[offc][0],verts[offc][1]
if counts[offc]:
plt.plot(binx,biny,'k.',zorder=100)
ax.set_xlim(xbnds)
ax.set_ylim(ybnds)
plt.grid(True)
cb = plt.colorbar(image,spacing='uniform',extend='max')
plt.show()
I would love to confirm that the code by Hooked using get_offsets() works, but I tried several iterations of the code mentioned above to retrieve center positions and, as Dave mentioned, get_offsets() remains empty. The workaround that I found is to use the non-empty 'image.get_paths()' option. My code takes the mean to find centers but which means it is just a smidge longer, but it does work.
The get_paths() option returns a set of x,y coordinates embedded that can be looped over and then averaged to return the center position for each hexagram.
The code that I have is as follows:
counts=image.get_array() #counts in each hexagon, works great
verts=image.get_offsets() #empty, don't use this
b=image.get_paths() #this does work, gives Path([[]][]) which can be plotted
for x in xrange(len(b)):
xav=np.mean(b[x].vertices[0:6,0]) #center in x (RA)
yav=np.mean(b[x].vertices[0:6,1]) #center in y (DEC)
plt.plot(xav,yav,'k.',zorder=100)
I had this same problem. I think what needs to be developed is a framework to have a HexagonalGrid object which can then be applied to many different data sets (and it would be awesome to do it for N dimensions). This is possible and it surprises me that neither Scipy or Numpy has anything for it (furthermore there seems to be nothing else like it except perhaps binify)
That said, I assume you want to use hexbinning to compare multiple binned data sets. This requires some common base. I got this to work using matplotlib's hexbin the following way:
import numpy as np
import matplotlib.pyplot as plt
def get_data (mean,cov,n=1e3):
"""
Quick fake data builder
"""
np.random.seed(101)
points = np.random.multivariate_normal(mean=mean,cov=cov,size=int(n))
x, y = points.T
return x,y
def get_centers (hexbin_output):
"""
about 40% faster than previous post only cause you're not calculating the
min/max every time
"""
paths = hexbin_output.get_paths()
v = paths[0].vertices[:-1] # adds a value [0,0] to the end
vx,vy = v.T
idx = [3,0,5,2] # index for [xmin,xmax,ymin,ymax]
xmin,xmax,ymin,ymax = vx[idx[0]],vx[idx[1]],vy[idx[2]],vy[idx[3]]
half_width_x = abs(xmax-xmin)/2.0
half_width_y = abs(ymax-ymin)/2.0
centers = []
for i in xrange(len(paths)):
cx = paths[i].vertices[idx[0],0]+half_width_x
cy = paths[i].vertices[idx[2],1]+half_width_y
centers.append((cx,cy))
return np.asarray(centers)
# important parts ==>
class Hexagonal2DGrid (object):
"""
Used to fix the gridsize, extent, and bins
"""
def __init__ (self,gridsize,extent,bins=None):
self.gridsize = gridsize
self.extent = extent
self.bins = bins
def hexbin (x,y,hexgrid):
"""
To hexagonally bin the data in 2 dimensions
"""
fig = plt.figure()
ax = fig.add_subplot(111)
# Note mincnt=0 so that it will return a value for every point in the
# hexgrid, not just those with count>mincnt
# Basically you fix the gridsize, extent, and bins to keep them the same
# then the resulting count array is the same
hexbin = plt.hexbin(x,y, mincnt=0,
gridsize=hexgrid.gridsize,
extent=hexgrid.extent,
bins=hexgrid.bins)
# you could close the figure if you don't want it
# plt.close(fig.number)
counts = hexbin.get_array().copy()
return counts, hexbin
# Example ===>
if __name__ == "__main__":
hexgrid = Hexagonal2DGrid((21,5),[-70,70,-20,20])
x_data,y_data = get_data((0,0),[[-40,95],[90,10]])
x_model,y_model = get_data((0,10),[[100,30],[3,30]])
counts_data, hexbin_data = hexbin(x_data,y_data,hexgrid)
counts_model, hexbin_model = hexbin(x_model,y_model,hexgrid)
# if you want the centers, they will be the same for both
centers = get_centers(hexbin_data)
# if you want to ignore the cells with zeros then use the following mask.
# But if want zeros for some bins and not others I'm not sure an elegant way
# to do this without using the centers
nonzero = counts_data != 0
# now you can compare the two data sets
variance_data = counts_data[nonzero]
square_diffs = (counts_data[nonzero]-counts_model[nonzero])**2
chi2 = np.sum(square_diffs/variance_data)
print(" chi2={}".format(chi2))
The goal here is to take some list of coordinates, like [[1,2],[3,4],[7,1]], then figure out how big the canvas should be if you want to print all these coords. Take the maximal bottom left coordinate and minimal upper right coordinate that will snugly fit a canvas to these points. In the above list, for example, we're looking for [[1,1],[7,4]], which defines the smallest rectangle where all those points will fit.
In the middle of this function, I'm seeing the incoming "board" assigned a new value.
def print_board(board):
# import pdb; pdb.set_trace()
dimensions = None
for i in board:
if dimensions == None:
dimensions = [i, i]
else:
dimensions[0][0] = min(dimensions[0][0], i[0])
#'board' is redefined !!!
dimensions[0][1] = min(dimensions[0][1], i[1])
#dimensions[1][0] = max(dimensions[1][0], i[0])
#dimensions[1][1] = max(dimensions[1][1], i[1])
# (after we get the canvas size
# we print the canvas with the points on it
# but we never make it that far without an error)
As the for loop moves through the coordinates in the incoming board, it seems to be setting board[0] to whatever coordinate it's looking at at the time. So [[1,2],[3,4],[7,1]] will change first to [[3,4],[3,4],[7,1]], then to [[7,1],[3,4],[7,1]].
I wouldn't expect board to change at all.
(Python 3.2.2)
When you do
dimensions = [i, i]
you're setting both items in dimensions to the first point in your board -- not making copies of that point.
Then when you do
dimensions[0][0] = min(dimensions[0][0], i[0])
dimensions[0][1] = min(dimensions[0][1], i[1])
you're updating that same point --- the first point in your board -- to the results of the min functions.
Try something like this, instead:
def print_board(board):
xs, ys = zip(*board) # separate out the x and y coordinates
min_x, max_x = min(xs), max(xs) # find the mins and maxs
min_y, max_y = min(ys), max(ys)
dimensions = [[min_x, min_y], [max_x, max_y]] # make the dimensions array
As an extension of agfs answer, you can use numpy for even more efficient and succinct code:
import numpy as np
def print_board(board):
a = np.array(board)
return [a.min(axis=0).tolist(), a.max(axis=0).tolist()]
If your board is a numpy array already, and you let the function return a tuple of numpy arrays, it shortens even more:
def print_board(board):
return board.min(axis=0), board.max(axis=0)
I need to compare some theoretical data with real data in python.
The theoretical data comes from resolving an equation.
To improve the comparative I would like to remove data points that fall far from the theoretical curve. I mean, I want to remove the points below and above red dashed lines in the figure (made with matplotlib).
Both the theoretical curves and the data points are arrays of different length.
I can try to remove the points in a roughly-eye way, for example: the first upper point can be detected using:
data2[(data2.redshift<0.4)&data2.dmodulus>1]
rec.array([('1997o', 0.374, 1.0203223485103787, 0.44354759972859786)], dtype=[('SN_name', '|S10'), ('redshift', '<f8'), ('dmodulus', '<f8'), ('dmodulus_error', '<f8')])
But I would like to use a less roughly-eye way.
So, can anyone help me finding an easy way of removing the problematic points?
Thank you!
This might be overkill and is based on your comment
Both the theoretical curves and the data points are arrays of
different length.
I would do the following:
Truncate the data set so that its x values lie within the max and min values of the theoretical set.
Interpolate the theoretical curve using scipy.interpolate.interp1d and the above truncated data x values. The reason for step (1) is to satisfy the constraints of interp1d.
Use numpy.where to find data y values that are out side the range of acceptable theory values.
DONT discard these values, as was suggested in comments and other answers. If you want for clarity, point them out by plotting the 'inliners' one color and the 'outliers' an other color.
Here's a script that is close to what you are looking for, I think. It hopefully will help you accomplish what you want:
import numpy as np
import scipy.interpolate as interpolate
import matplotlib.pyplot as plt
# make up data
def makeUpData():
'''Make many more data points (x,y,yerr) than theory (x,y),
with theory yerr corresponding to a constant "sigma" in y,
about x,y value'''
NX= 150
dataX = (np.random.rand(NX)*1.1)**2
dataY = (1.5*dataX+np.random.rand(NX)**2)*dataX
dataErr = np.random.rand(NX)*dataX*1.3
theoryX = np.arange(0,1,0.1)
theoryY = theoryX*theoryX*1.5
theoryErr = 0.5
return dataX,dataY,dataErr,theoryX,theoryY,theoryErr
def makeSameXrange(theoryX,dataX,dataY):
'''
Truncate the dataX and dataY ranges so that dataX min and max are with in
the max and min of theoryX.
'''
minT,maxT = theoryX.min(),theoryX.max()
goodIdxMax = np.where(dataX<maxT)
goodIdxMin = np.where(dataX[goodIdxMax]>minT)
return (dataX[goodIdxMax])[goodIdxMin],(dataY[goodIdxMax])[goodIdxMin]
# take 'theory' and get values at every 'data' x point
def theoryYatDataX(theoryX,theoryY,dataX):
'''For every dataX point, find interpolated thoeryY value. theoryx needed
for interpolation.'''
f = interpolate.interp1d(theoryX,theoryY)
return f(dataX[np.where(dataX<np.max(theoryX))])
# collect valid points
def findInlierSet(dataX,dataY,interpTheoryY,thoeryErr):
'''Find where theoryY-theoryErr < dataY theoryY+theoryErr and return
valid indicies.'''
withinUpper = np.where(dataY<(interpTheoryY+theoryErr))
withinLower = np.where(dataY[withinUpper]
>(interpTheoryY[withinUpper]-theoryErr))
return (dataX[withinUpper])[withinLower],(dataY[withinUpper])[withinLower]
def findOutlierSet(dataX,dataY,interpTheoryY,thoeryErr):
'''Find where theoryY-theoryErr < dataY theoryY+theoryErr and return
valid indicies.'''
withinUpper = np.where(dataY>(interpTheoryY+theoryErr))
withinLower = np.where(dataY<(interpTheoryY-theoryErr))
return (dataX[withinUpper],dataY[withinUpper],
dataX[withinLower],dataY[withinLower])
if __name__ == "__main__":
dataX,dataY,dataErr,theoryX,theoryY,theoryErr = makeUpData()
TruncDataX,TruncDataY = makeSameXrange(theoryX,dataX,dataY)
interpTheoryY = theoryYatDataX(theoryX,theoryY,TruncDataX)
inDataX,inDataY = findInlierSet(TruncDataX,TruncDataY,interpTheoryY,
theoryErr)
outUpX,outUpY,outDownX,outDownY = findOutlierSet(TruncDataX,
TruncDataY,
interpTheoryY,
theoryErr)
#print inlierIndex
fig = plt.figure()
ax = fig.add_subplot(211)
ax.errorbar(dataX,dataY,dataErr,fmt='.',color='k')
ax.plot(theoryX,theoryY,'r-')
ax.plot(theoryX,theoryY+theoryErr,'r--')
ax.plot(theoryX,theoryY-theoryErr,'r--')
ax.set_xlim(0,1.4)
ax.set_ylim(-.5,3)
ax = fig.add_subplot(212)
ax.plot(inDataX,inDataY,'ko')
ax.plot(outUpX,outUpY,'bo')
ax.plot(outDownX,outDownY,'ro')
ax.plot(theoryX,theoryY,'r-')
ax.plot(theoryX,theoryY+theoryErr,'r--')
ax.plot(theoryX,theoryY-theoryErr,'r--')
ax.set_xlim(0,1.4)
ax.set_ylim(-.5,3)
fig.savefig('findInliers.png')
This figure is the result:
At the end I use some of the Yann code:
def theoryYatDataX(theoryX,theoryY,dataX):
'''For every dataX point, find interpolated theoryY value. theoryx needed
for interpolation.'''
f = interpolate.interp1d(theoryX,theoryY)
return f(dataX[np.where(dataX<np.max(theoryX))])
def findOutlierSet(data,interpTheoryY,theoryErr):
'''Find where theoryY-theoryErr < dataY theoryY+theoryErr and return
valid indicies.'''
up = np.where(data.dmodulus > (interpTheoryY+theoryErr))
low = np.where(data.dmodulus < (interpTheoryY-theoryErr))
# join all the index together in a flat array
out = np.hstack([up,low]).ravel()
index = np.array(np.ones(len(data),dtype=bool))
index[out]=False
datain = data[index]
dataout = data[out]
return datain, dataout
def selectdata(data,theoryX,theoryY):
"""
Data selection: z<1 and +-0.5 LFLRW separation
"""
# Select data with redshift z<1
data1 = data[data.redshift < 1]
# From modulus to light distance:
data1.dmodulus, data1.dmodulus_error = modulus2distance(data1.dmodulus,data1.dmodulus_error)
# redshift data order
data1.sort(order='redshift')
# Outliers: distance to LFLRW curve bigger than +-0.5
theoryErr = 0.5
# Theory curve Interpolation to get the same points as data
interpy = theoryYatDataX(theoryX,theoryY,data1.redshift)
datain, dataout = findOutlierSet(data1,interpy,theoryErr)
return datain, dataout
Using those functions I can finally obtain:
Thank you all for your help.
Just look at the difference between the red curve and the points, if it is bigger than the difference between the red curve and the dashed red curve remove it.
diff=np.abs(points-red_curve)
index= (diff>(dashed_curve-redcurve))
filtered=points[index]
But please take the comment from NickLH serious. Your Data looks pretty good without any filtering, your "outlieres" all have a very big error and won't affect the fit much.
Either you could use the numpy.where() to identify which xy pairs meet your plotting criteria, or perhaps enumerate to do pretty much the same thing. Example:
x_list = [ 1, 2, 3, 4, 5, 6 ]
y_list = ['f','o','o','b','a','r']
result = [y_list[i] for i, x in enumerate(x_list) if 2 <= x < 5]
print result
I'm sure you could change the conditions so that '2' and '5' in the above example are the functions of your curves