NumPy Nearest Neighbor Line Fitting Across Moving Window - python

I have two two-dimensional arrays loaded into NumPy, both of which are 80i x 80j in size. I'm looking to do a moving window polyfit calculation across these arrays, I've nailed down how to conduct the polyfit but am stuck on the specific moving window approach I'm looking to accomplish. I'm aiming for:
1) At each index of Array 1 (a1), code isolates all the values of the index & its closest 8 neighbors into a separate 1D array, and repeats over the same window at Array 2 (a2)
2) With these two new arrays isolated, perform linear regression line-fitting using NumPy's polyfit in the following approach: model = np.polyfit(a1slice, a2slice, 1)
3) Cast the resulting regression coefficient and intercept (example output doing this manually: array([-0.02114911, 10.02127152])) to the same index of two other arrays, where model[0] would be placed into the first new array and model[1] into the second new array at this index.
4) The code then moves sequentially to the next index and performs steps 1-3 again, or a1(i+1, j+0, etc.)
I've provided a graphical example of what I'm trying achieve for two random index selections across Array 1 and the calculation across the index's eight nearest neighbors, should this make the desired result easier to understand:

I've written a function to get the window, but there is a troublesome edge case when the index is on the edge and one distance away from the corner. You'll probably want to modify the function to get exactly what you're looking for.
def get_neighbors(arr, origin, num_neighbors = 8):
coords = np.array([[i,j] for (i,j),value in np.ndenumerate(arr)]).reshape(arr.shape + (2,))
distances = np.linalg.norm(coords - origin, axis = -1)
neighbor_limit = np.sort(distances.ravel())[num_neighbors]
window = np.where(distances <= neighbor_limit)
exclude_window = np.where(distances > neighbor_limit)
return window, exclude_window, distances
test = np.zeros((5, 5))
plt.close('all')
cases = [[0,0], [0, 1], [0, 2]]
for case in cases:
window, exclude, distances = get_neighbors(test, case)
distances[exclude] = 0
plt.figure()
plt.imshow(distances)
Image Outputs:

Related

Transition between two coordinates set with minimum distance periodically (Python)

I have a set of sphere coordinates in 3D that evolves.
They represent a stack of spheres which are continuously removed from a box from the bottom of the geometry, and reinserted at the top at a random location. Since this kind of simulation is really periodic, I would like to simulate the drainage of the box a few times (say, 5 times, so t=1 takes positions 1 -> t=5 takes positions 5), and then come back to the first state to simulate the next steps (t=6 takes position 1, t=10 takes positions 5, same for t=11->15, etc.)
The problem is that at the coordinates of a given sphere (say, sphere 1) can be very different from the first state to the last simulated one. However, it is very important, for the sake of the simulation, to have a simulation as smooth as possible. If I had to quantify it, I would say that I need the distance between state 5 and state 6 for each pebble to be as low as possible.
It seems to me like an assignment problem. Is there any known solution and method for this kind of problems?
Here is an example of what I would like to have (I mostly use Python):
import numpy as np
# Mockup of the simulation positions
Nspheres = 100
Nsteps = 5 # number of simulated steps
coordinates = np.random.uniform(0,100, (Nsteps, Nspheres, 3)) # mockup x,y,z for each step
initial_positions = coordinates[0]
final_positions = coordinates[Nsteps-1]
**indices_adjust_initial_positions = adjust_initial_positions(initial_positions, final_positions) # to do**
adjusted_initial_positions = initial_positions[indices_adjust_initial_positions]
# Quantification of error made
mean_error = np.mean(np.abs(final_positions-adjusted_initial_positions))
max_error = np.max(np.abs(final_positions-adjusted_initial_positions))
print(mean_error, max_error)
# Assign it for each "cycle"
Ncycles = 5 # Number of times the simulation is repeated
simulation_coordinates = np.empty((Nsteps*Ncycles, Nspheres, 3))
simulation_coordinates[:Nsteps] = np.array(coordinates)
for n in range(1, Ncycles):
new_cycle_coordinates = simulation_coordinates[Nsteps*(n-1):Nsteps*(n):, indices_adjust_initial_positions, :]
simulation_coordinates[Nsteps*n:Nsteps*(n+1)] = new_cycle_coordinates
# Print result
print(simulation_coordinates)
The adjust_initial_positions would therefore take the initial and final states, and determine what would be the ideal set of indices to apply to the initial state to look the most like the final state. Please note that if that makes the problem any simpler, I do not really care if the very top spheres are not really matching between the two states, however it is important to be as close as possible at more towards the bottom.
Would you have any suggestion?
After some research, it seems that scipy.optimize has some nice features able to do something like it. If list1 is my first step, list2 is my last simulated step, we can do something like:
cost = np.linalg.norm(list2[:, np.newaxis, :] - list1, axis=2)
_, indexes = scipy.optimize.linear_sum_assignment(cost)
list3 = list1[indexes]
Therefore, list3 will be as close as list2 as possible thanks to the index sorting, while taking the positions of list1.

Efficient way for maximum medians from triangle polygons

Objective
I have a soup of triangle polygons. I want to retrieve the largest median as vector for each triangle.
State of work
Starting point:
Array of points (n,3) , e.g. [x,y,z]
Array of triangle point indices (n, 3) referencing the array of points above, e.g. [[0,1,2],[2,3,4]...]
I combine both two one single matrix containing the real 3D point coordinates. Then I calculate the median vectors and their lengths.
/Edit : I updated the code to my current version of it
def calcMedians(polygon):
# C -> AB = C-(A + 0.5(B-A))
# B -> AC = B - (A + 0.5(C-A))
# A -> BC = A - (B
dim = np.shape(polygon)
medians = np.zeros((dim[0],3,2,dim[1]))
medians[:,0,0] = polygon[:,2]
medians[:,0,1] = polygon[:,0] + 0.5*(polygon[:,1]-polygon[:,0])
medians[:,1,0] = polygon[:,1]
medians[:,1,1] = polygon[:,0] + 0.5*(polygon[:,2]-polygon[:,0])
medians[:,2,0] = polygon[:,0]
medians[:,2,1] = polygon[:,1] + 0.5*(polygon[:,2]-polygon[:,1])
m1 = np.linalg.norm(medians[:,0,0]-medians[:,0,1],axis=1)
m2 = np.linalg.norm(medians[:,1,0]-medians[:,1,1],axis=1)
m3 = np.linalg.norm(medians[:,2,0]-medians[:,2,1],axis=1)
medianlengths = np.vstack((m1,m2,m3)).T
maxlengths = np.argmax(medianlengths,axis=1)
final = np.zeros((dim[0],2,dim[1]))
dim = np.shape(medians)
for i in range(0,dim[0]):
idx = maxlengths[i]
final[i] = medians[i,idx]
return final
Now I am creating the final median vector matrix using an empty matrix first. The lengths are calculated using np.linalg.norm and collected in a matrix. For this matrix, the argmax method is used to identify to target median vector.
Problem
Old:However, I am somehow confused by the dimensionality and currently not able to get this to work or to understand if the result is correct.
Does somebody know how to do this correctly and/or if this approach is efficient?
My target would be a construct of the 3 medians in form of [n_polygons, 3( due to 3 medians), 2 (start and end point), 3 (xyz)]
Using the max lengths information, i would like to reduce it to [n_polygons, 2 (start and end point), 3 (xyz)]
Using this improvised for loop in the function, I can create the output. But there has to be a more "clean" matrix method to it. Using medians[:,maxlengths,:,:] leads to a shape of [4,n_polygons,2,3] instead of [n_polygons,2,3] and I do not understand why.
Example image for medians of two triangles:
Unfortunately, I don't have a large exemplary data set but I guess that this can be generated quite quickly. The example data set from the picture shown above is:
polygons = np.array([[0,1,2],[0,3,2]])
points = np.array([[0,0],
[1,0],
[1,1],
[0,1]])
polygons3d = points[polygons[:,:]]
The longest median is for the shortest triangle side. Look here and rewrite median length formula as
M[i] = Sqrt(2(a^2+b^2+c^2)-3*side[i]^2) / 2
So you can simplify calculations a bit using only side lengths (perhaps you already have them)
Concerning 3D coordinates - just use projection on any coordinate plane not perpendicular to your point plane - ignore one dimension (choose dimension with the lowest value range)

How can I determine which curve is closest to a given set of points?

I have several dataframes which each contain two columns of x and y values, so each row represents a point on a curve. The different dataframes then represent contours on a map. I have another series of data points (fewer in number), and I'd like to see which contour they are closest to on average.
I would like to establish the distance from each datapoint to each point on the curve, with sqrt(x^2+y^2) - sqrt(x_1^2 + y_1^2), add them up for each point on the curve. The trouble is that there are several thousand points on the curve, and there are only a few dozen datapoints to assess, so I can't simply put these in columns next to each other.
I think I need to cycle through the datapoints, checking the sqdistance between them and each point in the curve.
I don't know whether there is an easy function or module that can do this.
Thanks in advance!
Edit: Thanks for the comments. #Alexander: I've tried the vectorize function, as follows, with a sample dataset. I'm actually using contours which comprise several thousand datapoints, and the dataset to compare against are 100+, so I'd like to be able to automate as much as possible. I'm currently able to create a distance measurement from the first datapoint against my contour, but I would ideally like to cycle through j as well. When I try it, it comes up with an error:
import numpy as np
from numpy import vectorize
import pandas as pd
from pandas import DataFrame
df1 = {'X1':['1', '2', '2', '3'], 'Y1':['2', '5', '7', '9']}
df1 = DataFrame(df1, columns=['X1', 'Y1'])
df2 = {'X2':['3', '5', '6'], 'Y2':['10', '15', '16']}
df2 = DataFrame(df2, columns=['X2', 'Y2'])
df1=df1.astype(float)
df2=df2.astype(float)
Distance=pd.DataFrame()
i = range(0, len(df1))
j = range(0, len(df2))
def myfunc(x1, y1, x2, y2):
return np.sqrt((x2-x1)**2+np.sqrt(y2-y1)**2)
vfunc=np.vectorize(myfunc)
Distance['Distance of Datapoint j to Contour']=vfunc(df1.iloc[i] ['X1'], df1.iloc[i]['Y1'], df2.iloc[0]['X2'], df2.iloc[0]['Y2'])
Distance['Distance of Datapoint j to Contour']=vfunc(df1.iloc[i] ['X1'], df1.iloc[i]['Y1'], df2.iloc[1]['X2'], df2.iloc[1]['Y2'])
Distance
General idea
The "curve" is actually a polygon with a lot's of points. There definetly some libraries to calculate the distance between the polygon and the point. But generally it will be something like:
Calculate "approximate distance" to whole polygon, e.g. to the bounding box of a polygon (from point to 4 line segments), or to the center of bounding box
calculate distances to the lines of a polygon. If you have too many points then as an extra step "resolution" of a polygon might be reduced.
Smallest found distance is the distance from point to the polygon.
repeat for each point and each polygon
Existing solutions
Some libraries already can do that:
shapely question, shapely Geo-Python docs
Using shapely in geopandas to calculate distance
scipy.spatial.distance: scipy can be used to calculate distance between arbitrary number of points
numpy.linalg.norm(point1-point2): some answers propose different ways to calculate distance using numpy. Some even show performance benchmarks
sklearn.neighbors: not really about curves and distances to them, but can be used if you want to check "to which area point is most likely related"
And you can always calculate distances yourself using D(x1, y1, x2, y2) = sqrt((x₂-x₁)² + (y₂-y₁)²) and search for best combination of points that gives minimal distance
Example:
# get distance from points of 1 dataset to all the points of another dataset
from scipy.spatial import distance
d = distance.cdist(df1.to_numpy(), df2.to_numpy(), 'euclidean')
print(d)
# Results will be a matrix of all possible distances:
# [[ D(Point_df1_0, Point_df2_0), D(Point_df1_0, Point_df2_1), D(Point_df1_0, Point_df2_2)]
# [ D(Point_df1_1, Point_df2_0), D(Point_df1_1, Point_df2_1), D(Point_df1_1, Point_df2_2)]
# [ D(Point_df1_3, Point_df2_0), D(Point_df1_2, Point_df2_1), D(Point_df1_2, Point_df2_2)]
# [ D(Point_df1_3, Point_df2_0), D(Point_df1_3, Point_df2_1), D(Point_df1_3, Point_df2_2)]]
[[ 8.24621125 13.60147051 14.86606875]
[ 5.09901951 10.44030651 11.70469991]
[ 3.16227766 8.54400375 9.8488578 ]
[ 1. 6.32455532 7.61577311]]
What to do next is up to you. For example as a metric of "general distance between curves" you can:
Pick smallest values in each row and each column (if you skip some columns/rows, then you might end up with candidate that "matches only a part of contour), and calculate their median: np.median(np.hstack([np.amin(d, axis) for axis in range(len(d.shape))])).
Or you can calculate mean value of:
all the distances: np.median(d)
of "smallest 2/3 of distances": np.median(d[d<np.percentile(d, 66, interpolation='higher')])
of "smallest distances that cover at least each rows and each columns":
for min_value in np.sort(d, None):
chosen_indices = d<=min_value
if np.all(np.hstack([np.amax(chosen_indices, axis) for axis in range(len(chosen_indices.shape))])):
break
similarity = np.median(d[chosen_indices])
Or maybe you can use different type of distance from the begining (e.g. "correlation distance" looks promising to your task)
Maybe use "Procrustes analysis, a similarity test for two data sets" together with distances.
Maybe you can use minkowski distance as a similarity metric.
Alternative approach
Alternative approach would be to use some "geometry" library to compare areas of concave hulls:
Build concave hulls for contours and for "candidate datapoints" (not easy, but possible: using shapely , using concaveman). But if you are sure that your contours are already ordered and without overlapping segments, then you can directly build polygons from those points without need for concave hull.
Use "intersection area" minus "non-common area" as a metric of similarity (shapely can be used for that):
Non-common area is: union - intersection or simply "symmetric difference"
Final metric: intersection.area - symmetric_difference.area (intersection, area)
This approach might be better than processing distances in some situations, for example:
You want to prefer "fewer points covering whole area" over "huge amount of very close points that cover only half of the area"
It's more obvious way to compare candidates with different number of points
But it has it's disadvantages too (just draw some examples on paper and experiment to find them)
Other ideas:
instead of using polygons or concave hull you can:
build a linear ring from your points and then use contour.buffer(some_distance). This way you ignore "internal area" of the contour and only compare contour itself (with tolerance of some_distance). Distance between centroids (or double of that) may be used as value for some_distance
You can build polygons/lines from segments using ops.polygonize
instead of using intersection.area - symmetric_difference.area you can:
Snap one object to another, and then compare snapped object to original
Before comparing real objects you can compare "simpler" versions of the objects to filter out obvious mismatches:
For example you can check if boundaries of objects intersect
Or you can simplify geometries before comparing them
For the distance, you need to change your formula to
def getDistance(x, y, x_i, y_i):
return sqrt((x_i -x)^2 + (y_i - y)^2)
with (x,y) being your datapoint and (x_i, y_i) being a point from the curve.
Consider using NumPy for vectorization. Explicitly looping through your data points will most likely be less efficient, depending on your use case, it might however be quick enough. (If you need to run it on a regular basis, I think vectorization will easily outspeed the explicit way) This could look something like this:
import numpy as np # Universal abbreviation for the module
datapoints = np.random.rand(3,2) # Returns a vector with randomized entries of size 3x2 (Imagine it as 3 sets of x- and y-values
contour1 = np.random.rand(1000, 2) # Other than the size (which is 1000x2) no different than datapoints
contour2 = np.random.rand(1000, 2)
contour3 = np.random.rand(1000, 2)
def squareDistanceUnvectorized(datapoint, contour):
retVal = 0.
print("Using datapoint with values x:{}, y:{}".format(datapoint[0], datapoint[1]))
lengthOfContour = np.size(contour, 0) # This gets you the number of lines in the vector
for pointID in range(lengthOfContour):
squaredXDiff = np.square(contour[pointID,0] - datapoint[0])
squaredYDiff = np.square(contour[pointID,1] - datapoint[1])
retVal += np.sqrt(squaredXDiff + squaredYDiff)
retVal = retVal / lengthOfContour # As we want the average, we are dividing the sum by the element count
return retVal
if __name__ == "__main__":
noOfDatapoints = np.size(datapoints,0)
contID = 0
for currentDPID in range(noOfDatapoints):
dist1 = squareDistanceUnvectorized(datapoints[currentDPID,:], contour1)
dist2 = squareDistanceUnvectorized(datapoints[currentDPID,:], contour2)
dist3 = squareDistanceUnvectorized(datapoints[currentDPID,:], contour3)
if dist1 > dist2 and dist1 > dist3:
contID = 1
elif dist2 > dist1 and dist2 > dist3:
contID = 2
elif dist3 > dist1 and dist3 > dist2:
contID = 3
else:
contID = 0
if contID == 0:
print("Datapoint {} is inbetween two contours".format(currentDPID))
else:
print("Datapoint {} is closest to contour {}".format(currentDPID, contID))
Okay, now moving on to vector-land.
I have taken the liberty to adjust this part to what I think is your dataset. Try it and let me know if it works.
import numpy as np
import pandas as pd
# Generate 1000 points (2-dim Vector) with random values between 0 and 1. Make them strings afterwards.
# This is the first contour
random2Ddata1 = np.random.rand(1000,2)
listOfX1 = [str(x) for x in random2Ddata1[:,0]]
listOfY1 = [str(y) for y in random2Ddata1[:,1]]
# Do the same for a second contour, except that we de-center this 255 units into the first dimension
random2Ddata2 = np.random.rand(1000,2)+[255,0]
listOfX2 = [str(x) for x in random2Ddata2[:,0]]
listOfY2 = [str(y) for y in random2Ddata2[:,1]]
# After this step, our 'contours' are basically two blobs of datapoints whose centers are approx. 255 units apart.
# Generate a set of 4 datapoints and make them a Pandas-DataFrame
datapoints = {'X': ['0.5', '0', '255.5', '0'], 'Y': ['0.5', '0', '0.5', '-254.5']}
datapoints = pd.DataFrame(datapoints, columns=['X', 'Y'])
# Do the same for the two contours
contour1 = {'Xf': listOfX1, 'Yf': listOfY1}
contour1 = pd.DataFrame(contour1, columns=['Xf', 'Yf'])
contour2 = {'Xf': listOfX2, 'Yf': listOfY2}
contour2 = pd.DataFrame(contour2, columns=['Xf', 'Yf'])
# We do now have 4 datapoints.
# - The first datapoint is basically where we expect the mean of the first contour to be.
# Contour 1 consists of 1000 points with x, y- values between 0 and 1
# - The second datapoint is at the origin. Its distances should be similar to the once of the first datapoint
# - The third datapoint would be the result of shifting the first datapoint 255 units into the positive first dimension
# - The fourth datapoint would be the result of shifting the first datapoint 255 units into the negative second dimension
# Transformation into numpy array
# First the x and y values of the data points
dpArray = ((datapoints.values).T).astype(np.float)
c1Array = ((contour1.values).T).astype(np.float)
c2Array = ((contour2.values).T).astype(np.float)
# This did the following:
# - Transform the datapoints and contours into numpy arrays
# - Transpose them afterwards so that if we want all x values, we can write var[0,:] instead of var[:,0].
# A personal preference, maybe
# - Convert all the values into floats.
# Now, we iterate through the contours. If you have a lot of them, putting them into a list beforehand would do the job
for contourid, contour in enumerate([c1Array, c2Array]):
# Now for the datapoints
for _index, _value in enumerate(dpArray[0,:]):
# The next two lines do vectorization magic.
# First, we square the difference between one dpArray entry and the contour x values.
# You might notice that contour[0,:] returns an 1x1000 vector while dpArray[0,_index] is an 1x1 float value.
# This works because dpArray[0,_index] is broadcasted to fit the size of contour[0,:].
dx = np.square(dpArray[0,_index] - contour[0,:])
# The same happens for dpArray[1,_index] and contour[1,:]
dy = np.square(dpArray[1,_index] - contour[1,:])
# Now, we take (for one datapoint and one contour) the mean value and print it.
# You could write it into an array or do basically anything with it that you can imagine
distance = np.mean(np.sqrt(dx+dy))
print("Mean distance between contour {} and datapoint {}: {}".format(contourid+1, _index+1, distance))
# But you want to be able to call this... so here we go, generating a function out of it!
def getDistanceFromDatapointsToListOfContoursFindBetterName(datapoints, listOfContourDataFrames):
""" Takes a DataFrame with points and a list of different contours to return the average distance for each combination"""
dpArray = ((datapoints.values).T).astype(np.float)
listOfContours = []
for item in listOfContourDataFrames:
listOfContours.append(((item.values).T).astype(np.float))
retVal = np.zeros((np.size(dpArray,1), len(listOfContours)))
for contourid, contour in enumerate(listOfContours):
for _index, _value in enumerate(dpArray[0,:]):
dx = np.square(dpArray[0,_index] - contour[0,:])
dy = np.square(dpArray[1,_index] - contour[1,:])
distance = np.mean(np.sqrt(dx+dy))
print("Mean distance between contour {} and datapoint {}: {}".format(contourid+1, _index+1, distance))
retVal[_index, contourid] = distance
return retVal
# And just to see that it is, indeed, returning the same results, run it once
getDistanceFromDatapointsToListOfContoursFindBetterName(datapoints, [contour1, contour2])

Proximity Graph in python

I'm relatively new to Python coding (I'm switching from R mostly due to running time speed) and I'm trying to figure out how to code a proximity graph.
That is suppose i have an array of evenly-spaced points in d-dimensional Euclidean space, these will be my nodes. I want to make these into an undirected graph by connecting two points if and only if they lie within e apart. How can I encode this functionally with parameters:
n: spacing between two points on the same axis
d: dimension of R^d
e: maximum distance allowed for an edge to exist.
The graph-tool library has much of the functionality you need. So you could do something like this, assuming you have numpy and graph-tool:
coords = numpy.meshgrid(*(numpy.linspace(0, (n-1)*delta, n) for i in range(d)))
# coords is a Python list of numpy arrays
coords = [c.flatten() for c in coords]
# now coords is a Python list of 1-d numpy arrays
coords = numpy.array(coords).transpose()
# now coords is a numpy array, one row per point
g = graph_tool.generation.geometric_graph(coords, e*(1+1e-9))
The silly e*(1+1e-9) thing is because your criterion is "distance <= e" and geometric_graph's criterion is "distance < e".
There's a parameter called delta that you didn't mention because I think your description of parameter n is doing duty for two params (spacing between points, and number of points).
This bit of code should work, although it certainly isn't the most efficient. It will go through each node and check its distance to all the other nodes (that haven't already compared to it). If that distance is less than your value e then the corresponding value in the connected matrix is set to one. Zero indicates two nodes are not connected.
In this code I'm assuming that your nodeList is a list of cartesian coordinates of the form nodeList = [[x1,y1,...],[x2,y2,...],...[xN,yN,...]]. I also assume you have some function called calcDistance which returns the euclidean distance between two cartesian coordinates. This is basic enough to implement that I haven't written the code for that, and in any case using a function allows for future generalizing and modability.
numNodes = len(nodeList)
connected = np.zeros([numNodes,numNodes])
for i, n1 in enumerate(nodeList):
for j, n2 in enumerate(nodeList[i:]):
dist = calcDistance(n1, n2)
if dist < e:
connected[i,j] = 1
connected[j,i] = 1

Avoid for-loops in assignment of data values

So this is a little follow up question to my earlier question: Generate coordinates inside Polygon and my answer https://stackoverflow.com/a/15243767/1740928
In fact, I want to bin polygon data to a regular grid. Therefore, I calculate a couple of coordinates within the polygon and translate their lat/lon combination to their respective column/row combo of the grid.
Currently, the row/column information is stored in a numpy array with its number of rows corresponding to the number of data polygons and its number of columns corresponding to the coordinates in the polygon.
The whole code takes less then a second, but this code is the bottleneck at the moment (with ~7sec):
for ii in np.arange(len(data)):
for cc in np.arange(data_lats.shape[1]):
final_grid[ row[ii,cc], col[ii,cc] ] += data[ii]
final_grid_counts[ row[ii,cc], col[ii,cc] ] += 1
The array "data" simply contains the data values for each polygon (80000,). The arrays "row" and "col" contain the row and column number of a coordinate in the polygon (shape: (80000,16)).
As you can see, I am summing up all data values within each grid cell and count the number of matches. Thus, I know the average for each grid cell in case different polygons intersect it.
Still, how can these two for loops take around 7 seconds? Can you think of a faster way?
I think numpy should add an nd-bincount function, I had one lying around from a project I was working on some time ago.
import numpy as np
def two_d_bincount(row, col, weights=None, shape=None):
if shape is None:
shape = (row.max() + 1, col.max() + 1)
row = np.asarray(row, 'int')
col = np.asarray(col, 'int')
x = np.ravel_multi_index([row, col], shape)
out = np.bincount(x, weights, minlength=np.prod(shape))
return out.reshape(shape)
weights = np.column_stack([data] * row.shape[1])
final_grid = two_d_bincount(row.ravel(), col.ravel(), weights.ravel())
final_grid_counts = two_d_bincount(row.ravel(), col.ravel())
I hope this helps.
I might not fully understand the shapes of your different grids, but you can maybe eliminate the cc loop using something like this:
final_grid = np.empty((nrows,ncols))
for ii in xrange(len(data)):
final_grid[row[ii,:],col[ii,:]] = data[ii]
This of course assumes that final_grid is starting with no other info (that the count you're incrementing starts at zero). And I'm not sure how to test if it works not understanding how your row and col arrays work.

Categories

Resources