Find irregular region in 4D numpy array of gridded data (lat/lon) - python

I have a large 4-dimensional dataset of Temperatures [time,pressure,lat,lon].
I need to find all grid points within a region defined by lat/lon indices and calculate an average over the region to leave me with a 2-dimensional array.
I know how to do this if my region is a rectangle (or square) but how can this be done with an irregular polygon?
Below is an image showing the regions I need to average together and the lat/lon grid the data is gridded to in the array

I believe this should solve your problem.
The code below generates all cells in a polygon defined by a list of vertices.
It "scans" the polygon row by row keeping track of the transition columns where you (re)-enter or exit the polygon.
def row(x, transitions):
""" generator spitting all cells in a row given a list of transition (in/out) columns."""
i = 1
in_poly = True
y = transitions[0]
while i < len(transitions):
if in_poly:
while y < transitions[i]:
yield (x,y)
y += 1
in_poly = False
else:
in_poly = True
y = transitions[i]
i += 1
def get_same_row_vert(i, vertices):
""" find all vertex columns in the same row as vertices[i], and return next vertex index as well."""
vert = []
x = vertices[i][0]
while i < len(vertices) and vertices[i][0] == x:
vert.append(vertices[i][1])
i += 1
return vert, i
def update_transitions(old, new):
""" update old transition columns for a row given new vertices.
That is: merge both lists and remove duplicate values (2 transitions at the same column cancel each other)"""
if old == []:
return new
if new == []:
return old
o0 = old[0]
n0 = new[0]
if o0 == n0:
return update_transitions(old[1:], new[1:])
if o0 < n0:
return [o0] + update_transitions(old[1:], new)
return [n0] + update_transitions(old, new[1:])
def polygon(vertices):
""" generator spitting all cells in the polygon defined by given vertices."""
vertices.sort()
x = vertices[0][0]
transitions, i = get_same_row_vert(0, vertices)
while i < len(vertices):
while x < vertices[i][0]:
for cell in row(x, transitions):
yield cell
x += 1
vert, i = get_same_row_vert(i, vertices)
transitions = update_transitions(transitions, vert)
# define a "strange" polygon (hook shaped)
vertices = [(0,0),(0,3),(4,3),(4,0),(3,0),(3,2),(1,2),(1,1),(2,1),(2,0)]
for cell in polygon(vertices):
print cell
# or do whatever you need to do

The general class of problems is called "Point in Polygon", where the (fairly) standard algorithm is based on drawing a test line through the point under consideration and counting the number of times it crosses polygon boundaries (its really cool/weird that it works so simply, I think). This is a really good overview which includes implementation information.
For your problem in particular, since each of your regions are defined based on a small number of square cells - I think a more brute-force approach might be better. Perhaps something like:
For each region, form a list of all of the (lat/lon) squares which define it. Depending on how your regions are defined, this may be trivial, or annoying...
For each point you are examining, figure out which square it lives in. Since the squares are so well behaves, you can do this manually using opposite corners of each square, or using a method like numpy.digitize.
Test whether the square the point lives in, is in one of the regions.
If you're still having trouble, please provide some more details about your problem (specifically, how your regions are defined) --- that will make it easier to offer advice.

Related

Shared pixels on shared edges

I am trying to determine how to return coordinates of shared pixels from shared edges in an image (example image above). Essentially, three things would be returned.
An image that draws over the shared edges in red (or another colour,
it doesn't matter.
A dataframe that lists the shared pixel coordinates.
What I call a complexity index, which essentially lists which
polygon has the most shared edges. So, using the image provided,
polygon 1 has one shared edge, polygons 2 and 3 have three shared
edges, polygon 4 has two shared edges, and polygon 5 has one shared
edge.
I've had no real luck with this. Below is some code that I've attempted. I've started by doing contour selection and creating a nested dictionary (contour_df) with image contour data (contour_df) that lists all the pixel information from my image. A big problem I'm running into off the bat is that itertools.permutations is running every pixel coordinate combination, which takes forever, and I can't get past it to try anything else.
Ultimately, I believe I'm approaching this problem incorrectly and may need to start from scratch. Thanks for any and all suggestions.
def complexity_estimator(contour_df):
adjacency_list = []
for i in range(0, contour_df.shape[0]):
if contour_df.iloc[i]["parent_index"] == -1:
adjacency_list.append(0)
else:
# list coordinates for the contour we are interested In
contour_coordinate = contour_df.iloc[i]["contour"]
# list of coordinates of each of the siblings that we are interested In (list of listS)
contour_coordinate_siblings = contour_df[contour_df["parent_index"] == contour_df.iloc[i]["parent_index"]]['contour'].values
count = 0
for sibling_contour in contour_coordinate_siblings:
# compare contour_coordinate with sibling_contour
adjacent = complexity_measure(contour_coordinate,sibling_contour)
if adjacent == True:
count = count + 1
adjacency_list.append(count)
contour_df['complexity'] = adjacency_list
return contour_df
def complexity_measure(contour_coordinates1, contour_coordinates2):
if np.array_equal(contour_coordinates1, contour_coordinates2):
return False
else:
new_pairs = [list(zip(x,contour_coordinates2)) for x in itertools.permutations(contour_coordinates2,len(contour_coordinates2))]
print (new_pairs)
#dist = math.hypot(x2 - x1, y2 - y1)
return True

Identify the grid particles belong to

A square box of size 10,000*10,000 has 10,00,000 particles distributed uniformly. The box is divided into grids, each of size 100*100. There are 10,000 grids in total. At every time-step (for a total of 2016 steps), I would like to identify the grid to which a particle belongs. Is there an efficient way to implement this in python? My implementation is as below and currently takes approximately 83s for one run.
import numpy as np
import time
start=time.time()
# Size of the layout
Layout = np.array([0,10000])
# Total Number of particles
Population = 1000000
# Array to hold the cell number
cell_number = np.zeros((Population),dtype=np.int32)
# Limits of each cell
boundaries = np.arange(0,10100,step=100)
cell_boundaries = np.dstack((boundaries[0:100],boundaries[1:101]))
# Position of Particles
points = np.random.uniform(0,Layout[1],size = (Population,2))
# Generating a list with the x,y boundaries of each cell in the grid
x = []
limit_list = cell_boundaries
for i in range(0,Layout[1]//100):
for j in range(0,Layout[1]//100):
x.append([limit_list[0][i,0],limit_list[0][i,1],limit_list[0][j,0],limit_list[0][j,1]])
# Identifying the cell to which the particles belong
i=0
for y in (x):
cell_number[(points[:,1]>y[0])&(points[:,1]<y[1])&(points[:,0]>y[2])&(points[:,0]<y[3])]=i
i+=1
print(time.time()-start)
I am not sure about your code. You seem to be accumulating the i variable globally. While it should be accumulated on a per cell basis, correct? Something like cell_number[???] += 1, maybe?
Anyhow, the way I see is from a different perspective. You could start by assigning each point a cell id. Then inverse the resulting array with a kind of counter function. I have implemented the following in PyTorch, you will most likely find equivalent utilities in Numpy.
The conversion from 2-point coordinates to cell ids corresponds to applying floor on the coordinates then unfolding them according to your grid's width.
>>> p = torch.from_numpy(points).floor()
>>> p_unfold = p[:, 0]*10000 + p[:, 1]
Then you can "inverse" the statistics, i.e. find out how many particles there are in each respective cell based on the cell ids. This can be done using PyTorch histogram's counter torch.histc:
>>> torch.histc(p_unfold, bins=Population)

Visualize a sparse matrix using Python Turtle graphics

I'm stuck on this problem.
def showMatrix(turtle_object, sparse_matrix):
The showMatrix() function will visualize the matrix contents using a grid of dots. Each grid location will correspond to a single matrix location (row, column). The presence of a "dot" indicates a non-zero entry.
First, you need to set the display coordinates to match the matrix extent using the
screen.setworldcoordinates() method. In other words, the lower left corner of the display will
become coordinate (0,0) and the upper right corner will be (rows-1, columns-1). Changing the screen
coordinates in this way simplifies the mapping of matrix indices to screen coordinates by matching the "grid"
and matrix coordinates.
Using the turtle .goto and .dot methods, plot a red dot for each matrix.
This is the work I've done so far:
def matrix(n, init):
matrix = []
for i in range(n):
row = []
for j in range(n):
row.append(init)
matrix.append(row)
return matrix
def sparse_matrix(matrix,n,value):
import random
ctr = 0
while ctr < n:
row = random.randint(0,order(m)-1)
col = random.randint(0,order(m)-1)
if matrix[row][col] != value:
matrix[row][col] = value
ctr += 1
return matrix
def showMatrix(turtle_object, sparse_matrix):
for i in len(m):
for j in len(m):
if sparse_matrix[i][j] != 0:
sparse_matrix[i][j] = turtle_object
return sparse_matrix
What does the problem mean by (rows-1, columns-1)?
This is tied up with your mysterious m variable and the order() function you left undefined. Let's proceed anyway. We can see from the matrix() function we're dealing with a square matrix but let's not even assume that. Within the sparse_matrix() function, we can figure out rows and columns by doing:
rows = len(sparse_matrix)
columns = len(sparse_matrix[0])
Along with checking that rows isn't zero.
How do I show the sparse matrix on turtle?
Your sparse_matrix() function isn't using turtle_object appropriately -- we don't want to store it, we want to ask it to draw things. And this function probably shouldn't return anything. I'm guessing it should look something like:
def showMatrix(turtle_object, sparse_matrix):
rows = len(sparse_matrix)
if rows == 0:
return
columns = len(sparse_matrix[0])
turtle_object.penup()
for r in range(rows):
for c in range(columns):
if sparse_matrix[r][c] != 0:
turtle_object.goto(c, r)
turtle_object.dot(dot_size, "red")
Where dot_size is 1 for now. Wrapping this in some turtle code:
from turtle import Screen, Turtle
# ...
m = 6
screen = Screen()
dot_size = 1
yertle = Turtle(visible=False)
mat = matrix(order(m), 0)
sparse_matrix(mat, order(m / 2), 1)
showMatrix(yertle, mat)
screen.mainloop()
We get an unsatisfactory graph:
As everything is too small and needs to be scaled up.
I'm not sure how to use screen.setworldcoordinates()
Rather than add a scaling factor directly to our graphing code, we can use turtle's own setworldcoordinates() to bend the window to our graph limits:
screen.setworldcoordinates(0, 0, order(m), order(m))
dot_size = screen.window_width() / order(m)
This gives us something a little more visually satisfying:
I hope this rough sketch gets you moving in the right direction.

Finding n-dimensional neighbors

I am trying to get the neighbors of a cell in an n-dimensional space, something like 8-connected or 26-connected cells, but at any dimensionality provided an n-tuple.
Neighbors that are directly adjacent are easy enough, just +1/-1 in any dimension. The part I am having difficulty with are the diagonals, where I can have any quantity of coordinates differing by 1.
I wrote a function that recurs for each sub-dimension, and generates all +/- combinations:
def point_neighbors_recursive(point):
neighbors = []
# 1-dimension
if len(point) == 1:
neighbors.append([point[0] - 1]) # left
neighbors.append([point[0]]) # current
neighbors.append([point[0] + 1]) # right
return neighbors
# n-dimensional
for sub_dimension in point_neighbors_recursion(point[1:]):
neighbors.append([point[0] - 1] + sub_dimension) # left
neighbors.append([point[0]] + sub_dimension) # center
neighbors.append([point[0] + 1] + sub_dimension) # right
return neighbors
However this returns a lot of redundant neighbors.
Are there any better solutions?
I'll bet that all you need is in the itertools package, especially the product method. What you're looking for is the Cartesian product of your current location with each coordinate perturbed by 1 in each direction. Thus, you'll have a list of triples derived from your current point:
diag_coord = [(x-1, x, x+1) for x in point]
Now, you take the product of all those triples, recombine each set, and you have your diagonals.
Is that what you needed?

Creating a spatial index for QGIS 2 spatial join (PyQGIS)

I've written a bit of code to do a simple spatial join in QGIS 2 and 2.2 (points that lie within a buffer to take attribute of the buffer). However, I'd like to employ a QgsSpatialIndex in order to speed things up a bit. Where can I go from here:
pointProvider = self.pointLayer.dataProvider()
rotateProvider = self.rotateBUFF.dataProvider()
all_point = pointProvider.getFeatures()
point_spIndex = QgsSpatialIndex()
for feat in all_point:
point_spIndex.insertFeature(feat)
all_line = rotateProvider.getFeatures()
line_spIndex = QgsSpatialIndex()
for feat in all_line:
line_spIndex.insertFeature(feat)
rotate_IDX = self.rotateBUFF.fieldNameIndex('bearing')
point_IDX = self.pointLayer.fieldNameIndex('bearing')
self.pointLayer.startEditing()
for rotatefeat in self.rotateBUFF.getFeatures():
for pointfeat in self.pointLayer.getFeatures():
if pointfeat.geometry().intersects(rotatefeat.geometry()) == True:
pointID = pointfeat.id()
bearing = rotatefeat.attributes()[rotate_IDX]
self.pointLayer.changeAttributeValue(pointID, point_IDX, bearing)
self.pointLayer.commitChanges()
To do this kind of spatial join, you can use the QgsSpatialIndex (http://www.qgis.org/api/classQgsSpatialIndex.html) intersects(QgsRectangle) function to get a list of candidate featureIDs or the nearestNeighbor (QgsPoint,n) function to get the list of the n nearest neighbours as featureIDs.
Since you only want the points that lie within the buffer, the intersects function seems most suitable. I have not tested if a degenerate bbox (point) can be used. If not, just make a very small bounding box around your point.
The intersects function returns all features that have a bounding box that intersects the given rectangle, so you will have to test these candidate features for a true intersection.
Your outer loop should be on the points (you want to to add attribute values to each point from their containing buffer).
# If degenerate rectangles are allowed, delta could be 0,
# if not, choose a suitable, small value
delta = 0.1
# Loop through the points
for point in all_point:
# Create a search rectangle
# Assuming that all_point consist of QgsPoint
searchRectangle = QgsRectangle(point.x() - delta, point.y() - delta, point.x() + delta, point.y() + delta)
# Use the search rectangle to get candidate buffers from the buffer index
candidateIDs = line_index.intesects(searchRectangle)
# Loop through the candidate buffers to find the first one that contains the point
for candidateID in candidateIDs:
candFeature == rotateProvider.getFeatures(QgsFeatureRequest(candidateID)).next()
if candFeature.geometry().contains(point):
# Do something useful with the point - buffer pair
# No need to look further, so break
break

Categories

Resources