I have some polygons (Canadian provinces), read in with GeoPandas, and want to use these to create a mask to apply to gridded data on a 2-d latitude-longitude grid (read from a netcdf file using iris). An end goal would be to only have data for a given province remaining, with the rest of the data masked out. So the mask would be 1's for grid boxes within the province, and 0's or NaN's for grid boxes outside the province.
The polygons can be obtained from the shapefile here:
https://www.dropbox.com/s/o5elu01fetwnobx/CAN_adm1.shp?dl=0
The netcdf file I am using can be downloaded here:
https://www.dropbox.com/s/kxb2v2rq17m7lp7/t2m.20090815.nc?dl=0
I imagine there are two approaches here but I am struggling with both:
1) Use the polygon to create a mask on the latitude-longitude grid so that this can be applied to lots of datafiles outside of python (preferred)
2) Use the polygon to mask the data that have been read in and extract only the data inside the province of interest, to work with interactively.
My code so far:
import iris
import geopandas as gpd
#read the shapefile and extract the polygon for a single province
#(province names stored as variable 'NAME_1')
Canada=gpd.read_file('CAN_adm1.shp')
BritishColumbia=Canada[Canada['NAME_1'] == 'British Columbia']
#get the latitude-longitude grid from netcdf file
cubelist=iris.load('t2m.20090815.nc')
cube=cubelist[0]
lats=cube.coord('latitude').points
lons=cube.coord('longitude').points
#create 2d grid from lats and lons (may not be necessary?)
[lon2d,lat2d]=np.meshgrid(lons,lats)
#HELP!
Thanks very much for any help or advice.
UPDATE: Following the great solution from #DPeterK below, my original data can be masked, giving the following:
It looks like you have started well! Geometries loaded from shapefiles expose various geospatial comparison methods, and in this case you need the contains method. You can use this to test each point in your cube's horizontal grid for being contained within your British Columbia geometry. (Note that this is not a fast operation!) You can use this comparison to build up a 2D mask array, which could be applied to your cube's data or used in other ways.
I've written a Python function to do the above – it takes a cube and a geometry and produces a mask for the (specified) horizontal coordinates of the cube, and applies the mask to the cube's data. The function is below:
def geom_to_masked_cube(cube, geometry, x_coord, y_coord,
mask_excludes=False):
"""
Convert a shapefile geometry into a mask for a cube's data.
Args:
* cube:
The cube to mask.
* geometry:
A geometry from a shapefile to define a mask.
* x_coord: (str or coord)
A reference to a coord describing the cube's x-axis.
* y_coord: (str or coord)
A reference to a coord describing the cube's y-axis.
Kwargs:
* mask_excludes: (bool, default False)
If False, the mask will exclude the area of the geometry from the
cube's data. If True, the mask will include *only* the area of the
geometry in the cube's data.
.. note::
This function does *not* preserve lazy cube data.
"""
# Get horizontal coords for masking purposes.
lats = cube.coord(y_coord).points
lons = cube.coord(x_coord).points
lon2d, lat2d = np.meshgrid(lons,lats)
# Reshape to 1D for easier iteration.
lon2 = lon2d.reshape(-1)
lat2 = lat2d.reshape(-1)
mask = []
# Iterate through all horizontal points in cube, and
# check for containment within the specified geometry.
for lat, lon in zip(lat2, lon2):
this_point = gpd.geoseries.Point(lon, lat)
res = geometry.contains(this_point)
mask.append(res.values[0])
mask = np.array(mask).reshape(lon2d.shape)
if mask_excludes:
# Invert the mask if we want to include the geometry's area.
mask = ~mask
# Make sure the mask is the same shape as the cube.
dim_map = (cube.coord_dims(y_coord)[0],
cube.coord_dims(x_coord)[0])
cube_mask = iris.util.broadcast_to_shape(mask, cube.shape, dim_map)
# Apply the mask to the cube's data.
data = cube.data
masked_data = np.ma.masked_array(data, cube_mask)
cube.data = masked_data
return cube
If you just need the 2D mask you could return that before the above function applies it to the cube.
To use this function in your original code, add the following at the end of your code:
geometry = BritishColumbia.geometry
masked_cube = geom_to_masked_cube(cube, geometry,
'longitude', 'latitude',
mask_excludes=True)
If this doesn't mask anything it might well mean that your cube and geometry are defined on different extents. That is, your cube's longitude coordinate runs from 0°–360°, and if the geometry's longitude values run from -180°–180°, then the containment test will never return True. You can fix this by changing the extents of your cube with the following:
cube = cube.intersection(longitude=(-180, 180))
I found an alternative solution to the excellent one posted by #DPeterK above, which yields the same result. It uses matplotlib.path to test if points are contained within the exterior coordinates described by the geometries loaded from a shape file. I am posting this because this method is ~10 times faster than that given by #DPeterK (2:23 minutes vs 25:56 minutes). I'm not sure what is preferable: an elegant solution, or a speedy, brute force solution. Perhaps one can have both?!
One complication with this method is that some geometries are MultiPolygons - i.e. the shape consists of several smaller polygons (in this case, the province of British Columbia includes islands off of the west coast, which can't be described by the coordinates of the mainland British Columbia Polygon). The MultiPolygon has no exterior coordinates but the individual polygons do, so these each need to be treated individually. I found that the neatest solution to this was to use a function copied from GitHub (https://gist.github.com/mhweber/cf36bb4e09df9deee5eb54dc6be74d26), which 'explodes' MultiPolygons into a list of individual polygons that can then be treated separately.
The working code is outlined below, with my documentation. Apologies that it is not the most elegant code - I am relatively new to Python and I'm sure there are lots of unnecessary loops/neater ways to do things!
import numpy as np
import iris
import geopandas as gpd
from shapely.geometry import Point
import matplotlib.path as mpltPath
from shapely.geometry.polygon import Polygon
from shapely.geometry.multipolygon import MultiPolygon
#-----
#FIRST, read in the target data and latitude-longitude grid from netcdf file
cubelist=iris.load('t2m.20090815.minus180_180.nc')
cube=cubelist[0]
lats=cube.coord('latitude').points
lons=cube.coord('longitude').points
#create 2d grid from lats and lons
[lon2d,lat2d]=np.meshgrid(lons,lats)
#create a list of coordinates of all points within grid
points=[]
for latit in range(0,241):
for lonit in range(0,480):
point=(lon2d[latit,lonit],lat2d[latit,lonit])
points.append(point)
#turn into np array for later
points=np.array(points)
#get the cube data - useful for later
fld=np.squeeze(cube.data)
#create a mask array of zeros, same shape as fld, to be modified by
#the code below
mask=np.zeros_like(fld)
#NOW, read the shapefile and extract the polygon for a single province
#(province names stored as variable 'NAME_1')
Canada=gpd.read_file('/Users/ianashpole/Computing/getting_province_outlines/CAN_adm_shp/CAN_adm1.shp')
BritishColumbia=Canada[Canada['NAME_1'] == 'British Columbia']
#BritishColumbia.geometry.type reveals this to be a 'MultiPolygon'
#i.e. several (in this case, thousands...) if individual polygons.
#I ultimately want to get the exterior coordinates of the BritishColumbia
#polygon, but a MultiPolygon is a list of polygons and therefore has no
#exterior coordinates. There are probably many ways to progress from here,
#but the method I have stumbled upon is to 'explode' the multipolygon into
#it's individual polygons and treat each individually. The function below
#to 'explode' the MultiPolygon was found here:
#https://gist.github.com/mhweber/cf36bb4e09df9deee5eb54dc6be74d26
#---define function to explode MultiPolygons
def explode_polygon(indata):
indf = indata
outdf = gpd.GeoDataFrame(columns=indf.columns)
for idx, row in indf.iterrows():
if type(row.geometry) == Polygon:
#note: now redundant, but function originally worked on
#a shapefile which could have combinations of individual polygons
#and MultiPolygons
outdf = outdf.append(row,ignore_index=True)
if type(row.geometry) == MultiPolygon:
multdf = gpd.GeoDataFrame(columns=indf.columns)
recs = len(row.geometry)
multdf = multdf.append([row]*recs,ignore_index=True)
for geom in range(recs):
multdf.loc[geom,'geometry'] = row.geometry[geom]
outdf = outdf.append(multdf,ignore_index=True)
return outdf
#-------
#Explode the BritishColumbia MultiPolygon into its constituents
EBritishColumbia=explode_polygon(BritishColumbia)
#Loop over each individual polygon and get external coordinates
for index,row in EBritishColumbia.iterrows():
print 'working on polygon', index
mypolygon=[]
for pt in list(row['geometry'].exterior.coords):
print index,', ',pt
mypolygon.append(pt)
#See if any of the original grid points read from the netcdf file earlier
#lie within the exterior coordinates of this polygon
#pth.contains_points returns a boolean array (true/false), in the
#shape of 'points'
path=mpltPath.Path(mypolygon)
inside=path.contains_points(points)
#find the results in the array that were inside the polygon ('True')
#and set them to missing. First, must reshape the result of the search
#('points') so that it matches the mask & original data
#reshape the result to the main grid array
inside=np.array(inside).reshape(lon2d.shape)
i=np.where(inside == True)
mask[i]=1
print 'fininshed checking for points inside all polygons'
#mask now contains 0's for points that are not within British Columbia, and
#1's for points that are. FINALLY, use this to mask the original data
#(stored as 'fld')
i=np.where(mask == 0)
fld[i]=np.nan
#Done.
Related
I have a large dataset (~20000) of past storms over 40 years that have a list of central points over 3-hour intervals. I'm trying to overlay a mesh-grid onto a large area from which I would like to count the number of times each storm has passed over any given grid cell, however my current implementation only tracks the position at those three-hour intervals, leading to some instances where the track jumps a grid space when it should also be counted.
I am trying to address this problem using geopandas instead to create a lineseries for each storm track, and then perform an intersection against the mesh grid, however, I cannot find any functional implementations that allow me to do so.
To create the grid in geopandas, I am using the following solution from a previous question:
lonCount = ((plotExtent[1]+360) - (plotExtent[0]+360)) * gridResolution
latCount = ((plotExtent[3]) - (plotExtent[2])) * gridResolution
lons = np.linspace(plotExtent[0], plotExtent[1], lonCount)
lats = np.linspace(plotExtent[2], plotExtent[3], latCount)
# Store the meshgrid in polygon format
xlines = [((x1, yi), (x2, yi)) for x1, x2 in zip(lons[:-1], lons[1:]) for yi in lats]
ylines = [((xi, y1), (xi, y2)) for y1, y2 in zip(lats[:-1], lats[1:]) for xi in lons]
# Save as a Shapely object, then store in geopandas
grids = list(polygonize(MultiLineString(xlines + ylines)))
polyFrame = gpd.GeoDataFrame(grids)
This creates a geoDataSeries of ~5600 polygon objects. I then loop through each of my storm objects to strip out the lat/lon list pairs, and convert them into a shapely LineSeries object, which is then read into geopandas as such:
polyLine = LineString(list(zip(storm_lons, storm_lats)))
coord_tests = gpd.GeoSeries(polyLine)
My goal from here is to simply do something like this:
I = coord_tests.intersects(polyFrame)
To collect a list of polygons that the LineString intersects with, however, this prompts the following error:
AttributeError: No geometry data set yet (expected in column 'geometry'.)
I'm wondering if I have something formatted incorrectly here, am passing the call incorrectly to this function, or if there is a more efficient way to accomplish what I am trying to do here.
Any assistance would be greatly appreciated.
Thanks!
polyFrame = gpd.GeoDataFrame(geometry=grids)
:-)
I have two data frames. One has polygons of buildings (around 70K) and the other has points that may or not be inside the polygons (around 100K). I need to identify if a point is inside a polygon or not.
When I plot both dataframes (example below), the plot shows that some points are inside the polygons and other are not. However, when I use .within(), the outcome says none of the points are inside polygons.
I recreated the example creating one polygon and one point "by hand" rather than importing the data and in this case .within() does recognize that the point is in the polygon. Therefore, I assume I'm making a mistake but I don't know where.
Example: (I'll just post the part that corresponds to one point and one polygon for simplicity. In this case, each data frame contains either a single point or a single polygon)
1) Using the imported data. The data frame dmR has the points and the data frame dmf has the polygon
import pandas as pd
import geopandas as gpd
import numpy as np
import matplotlib.pyplot as plt
from shapely import wkt
from shapely.geometry import Point, Polygon
plt.style.use("seaborn")
# I'm skipping the data manipulation stage and
# going to the point where the data are used.
print(dmR)
geometry
35 POINT (-95.75207 29.76047)
print(dmf)
geometry
41964 POLYGON ((-95.75233 29.76061, -95.75194 29.760...
# Plot
fig, ax = plt.subplots(figsize=(5,5))
minx, miny, maxx, maxy = ([-95.7525, 29.7603, -95.7515, 29.761])
ax.set_xlim(minx, maxx)
ax.set_ylim(miny, maxy)
dmR.plot(ax=ax, c='Red')
dmf.plot(ax=ax, alpha=0.5)
plt.savefig('imported_data.png')
The outcome
shows that the point is inside the polygon. However,
print(dmR.within(dmf))
35 False
41964 False
dtype: bool
2) If I try to recreate this by hand, it would be as follows (there may be a better way to do this but I couldn't figure it out):
# Get the vertices of the polygon to create it by hand
poly1 = dmf['geometry']
g = [i for i in poly1]
x,y = g[0].exterior.coords.xy
x,y
(array('d', [-95.752332508564, -95.75193554162979, -95.75193151831627, -95.75232848525047, -95.752332508564]),
array('d', [29.760606530637265, 29.760607694859385, 29.76044470363038, 29.76044237518235, 29.760606530637265]))
# Create the polygon by hand using the corresponding vertices
coords = [(-95.752332508564, 29.760606530637265),
(-95.75193554162979, 29.760607694859385),
(-95.75193151831627, 29.7604447036303),
(-95.75232848525047, 29.76044237518235),
(-95.752332508564, 29.760606530637265)]
poly = Polygon(coords)
# Create point by hand (just copy the point from 1) above
p1 = Point(-95.75207, 29.76047)
# Create the GeoPandas data frames from the point and polygon
ex = gpd.GeoDataFrame()
ex['geometry']=[poly]
ex = ex.set_geometry('geometry')
ex_p = gpd.GeoDataFrame()
ex_p['geometry'] = [p1]
ex_p = ex_p.set_geometry('geometry')
# Plot and print
fig, ax = plt.subplots(figsize=(5,5))
ax.set_xlim(minx, maxx)
ax.set_ylim(miny, maxy)
ex_p.plot(ax=ax, c='Red')
ex.plot(ax = ax, alpha=0.5)
plt.savefig('by_hand.png')
In this case, the outcome also shows the point in the polygon. However,
ex_p.within(ex)
0 True
dtype: bool
which recognize that the point is in the polygon. All suggestions on what to do are appreciated! Thanks.
I don't know if this is the most efficient way to do it but I was able to do what I needed within Python and using Geopandas.
Instead of using point.within(polygon) approach, I did a spatial join (geopandas.sjoin(df_1, df_2, how = 'inner', op = 'contains')) This results in a new data frame that contains the points that are within polygons and excludes the ones that are not. More information on how to do this can be found here.
I assume something is fishy about your coordinate reference system (crs). I cannot tell about dmr as it is not provided but ex_p is a naive geometry as you generated it from points without specifying the crs. You can check the crs using:
dmr.crs
Let's assume it's in 4326, then it will return:
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
In this case you would need to set a CRS for ex_p first using:
ex_p = ex_p.set_crs(epsg=4326)
If you want to inherit the crs of dmr dynamically you can also use:
ex_p = ex_p.set_crs(dmr.crs)
After you set a crs, you can re-project from one crs to another using:
ex_p = ex_p.to_crs(epsg=3395)
More on that topic:
https://geopandas.org/projections.html
I work with satellite data organised on an irregular two-dimensional grid whose dimensions are scanline (along track dimension) and ground pixel (across track dimension). Latitude and longitude information for each ground pixel are stored in auxiliary coordinate variables.
Given a (lat, lon) point, I would like to identify the closest ground pixel on my set of data.
Let's build a 10x10 toy data set:
import numpy as np
import xarray as xr
import cartopy.crs as ccrs
import matplotlib.pyplot as plt
%matplotlib inline
lon, lat = np.meshgrid(np.linspace(-20, 20, 10),
np.linspace(30, 60, 10))
lon += lat/10
lat += lon/10
da = xr.DataArray(data = np.random.normal(0,1,100).reshape(10,10),
dims=['scanline', 'ground_pixel'],
coords = {'lat': (('scanline', 'ground_pixel'), lat),
'lon': (('scanline', 'ground_pixel'), lon)})
ax = plt.subplot(projection=ccrs.PlateCarree())
da.plot.pcolormesh('lon', 'lat', ax=ax, cmap=plt.cm.get_cmap('Blues'),
infer_intervals=True);
ax.scatter(lon, lat, transform=ccrs.PlateCarree())
ax.coastlines()
ax.gridlines(draw_labels=True)
plt.tight_layout()
Note that the lat/lon coordinates identify the centre pixel and the pixel boundaries are automatically inferred by xarray.
Now, say I want to identify the closest ground pixel to Rome.
The best way I came up with so far is to use a scipy's kdtree on a stacked flattened lat/lon array:
from scipy import spatial
pixel_center_points = np.stack((da.lat.values.flatten(), da.lon.values.flatten()), axis=-1)
tree = spatial.KDTree(pixel_center_points)
rome = (41.9028, 12.4964)
distance, index = tree.query(rome)
print(index)
# 36
I have then to apply unravel_index to get my scanline/ground_pixel indexes:
pixel_coords = np.unravel_index(index, da.shape)
print(pixel_coords)
# (3, 6)
Which gives me the scanline/ground_pixel coordinates of the (supposedly) closest ground pixel to Rome:
ax = plt.subplot(projection=ccrs.PlateCarree())
da.plot.pcolormesh('lon', 'lat', ax=ax, cmap=plt.cm.get_cmap('Blues'),
infer_intervals=True);
ax.scatter(da.lon[pixel_coords], da.lat[pixel_coords],
marker='x', color='r', transform=ccrs.PlateCarree())
ax.coastlines()
ax.gridlines(draw_labels=True)
plt.tight_layout()
I'm convinced there must me a much more elegant way to approach this problem. In particular, I would like to get rid of the flattening/unraveling steps (all my attempts to build a kdtree on a two-dimensional array failed miserably), and make use of my xarray dataset's variables as much as possible (adding a new centre_pixel dimension for example, and using it as input to KDTree).
I am going to answer my own question as I believe I came up with a decent solution, which is discussed at much greater length on my blog post on this subject.
Geographical distance
First of all, defining the distance between two points on the earth's surface as simply the euclidean distance between the two lat/lon pairs could lead to inaccurate results depending on the distance between two points. It is thus better to transform the coordinates to ECEF coordinates first and built a KD-Tree on the transformed coordinates. Assuming points on the surface of the planet (h=0) the coordinate transformation is done as such:
def transform_coordinates(coords):
""" Transform coordinates from geodetic to cartesian
Keyword arguments:
coords - a set of lan/lon coordinates (e.g. a tuple or
an array of tuples)
"""
# WGS 84 reference coordinate system parameters
A = 6378.137 # major axis [km]
E2 = 6.69437999014e-3 # eccentricity squared
coords = np.asarray(coords).astype(np.float)
# is coords a tuple? Convert it to an one-element array of tuples
if coords.ndim == 1:
coords = np.array([coords])
# convert to radiants
lat_rad = np.radians(coords[:,0])
lon_rad = np.radians(coords[:,1])
# convert to cartesian coordinates
r_n = A / (np.sqrt(1 - E2 * (np.sin(lat_rad) ** 2)))
x = r_n * np.cos(lat_rad) * np.cos(lon_rad)
y = r_n * np.cos(lat_rad) * np.sin(lon_rad)
z = r_n * (1 - E2) * np.sin(lat_rad)
return np.column_stack((x, y, z))
Building the KD-Tree
We could then build the KD-Tree by transforming our dataset coordinates, taking care of flattening the 2D grid to a one-dimensional sequence of lat/lon tuples. This is because the KD-Tree input data needs to be (N,K), where N is the number of point and K is the dimensionality (K=2 in our case, as we assume no heigth component).
# reshape and stack coordinates
coords = np.column_stack((da.lat.values.ravel(),
da.lon.values.ravel()))
# construct KD-tree
ground_pixel_tree = spatial.cKDTree(transform_coordinates(coords))
Querying the tree and indexing the xarray dataset
Querying the tree is now as simple as transforming our point's lat/lon coordinates to ECEF and passing those to the tree's query method:
rome = (41.9028, 12.4964)
index = ground_pixel_tree.query(transform_coordinates(rome))
In doing so though, we need to unravel our index on the original dataset's shape, to get the scanline/ground_pixel indexes:
index = np.unravel_index(index, self.shape)
We could now use the two components to index our original xarray dataset, but we could also build two indexers to use with xarray pointwise indexing feature:
index = xr.DataArray(index[0], dims='pixel'), \
xr.DataArray(index[1], dims='pixel')
Getting the closest pixel is now easy and elegant at the same time:
da[index]
Note that we could also query more than one point at once, and by building the indexers as above, we could still index the dataset with a single call:
da[index]
Which would then return a subset of the dataset containing the closest ground pixels to our query points.
Further readings
Using the euclidean norm on the lat/lon tuples could be accurate enough for smaller distance (thing of it as approximating the earth as flat, it works on a small scale). More details on geographic distances here.
Using a KD-Tree to find the nearest neighbour is not the only way to address this problem, see this very comprehensive article.
An implementation of KD-Tree directly into xarray is in the pipeline.
My blog post on the subject.
Here is my question:
the 2-d numpy array data represent some property of each grid space
the shapefile as the administrative division of the study area(like a city).
For example:
http://i4.tietuku.com/84ea2afa5841517a.png
The whole area has 40x40 grids network, and I want to extract the data inside the purple area. In other words , I want to mask the data outside the administrative
boundary into np.nan.
My early attempt
I label the grid number and select the specific array data into np.nan.
http://i4.tietuku.com/523df4783bea00e2.png
value[0,:] = np.nan
value[1,:] = np.nan
.
.
.
.
Can Someone show me a easier method to achieve the target?
Add
Found an answer here which can plot the raster data into shapefile, but the data itself doesn't change.
Update -2016-01-16
I have already solved this problem inspired by some answers.
Someone which are interested on this target, check this two posts which I have asked:
1. Testing point with in/out of a vector shapefile
2. How to use set clipped path for Basemap polygon
The key step was to test the point within/out of the shapefile which I have already transform into shapely.polygon.
Step 1. Rasterize shapefile
Create a function that can determine whether a point at coordinates (x, y) is or is not in the area. See here for more details on how to rasterize your shapefile into an array of the same dimensions as your target mask
def point_is_in_mask(mask, point):
# this is just pseudocode
return mask.contains(point)
Step 2. Create your mask
mask = np.zeros((height, width))
value = np.zeros((height, width))
for y in range(height):
for x in range(width):
if not point_is_in_mask(mask, (x, y)):
value[y][x] = np.nan
Best is to use matplotlib:
def outline_to_mask(line, x, y):
"""Create mask from outline contour
Parameters
----------
line: array-like (N, 2)
x, y: 1-D grid coordinates (input for meshgrid)
Returns
-------
mask : 2-D boolean array (True inside)
"""
import matplotlib.path as mplp
mpath = mplp.Path(line)
X, Y = np.meshgrid(x, y)
points = np.array((X.flatten(), Y.flatten())).T
mask = mpath.contains_points(points).reshape(X.shape)
return mask
alternatively, you may use shapely contains method as suggested in the above answer. You may speed-up calculations by recursively sub-dividing the space, as indicated in this gist (but matplotlib solution was 1.5 times faster in my tests):
https://gist.github.com/perrette/a78f99b76aed54b6babf3597e0b331f8
import netCDF4
import numpy as np
nc_data = netCDF4.Dataset(out_nc, 'w', format='NETCDF4')
nc_data.description = 'Test'
# dimensions
nc_data.createDimension('lat', 720)
nc_data.createDimension('lon', 1440)
fl_res = 0.25
latitudes = np.arange(90.0 - fl_res/2.0, -90.0, -fl_res)
longitudes = np.arange(-180.0 + fl_res/2.0, 180.0, fl_res)
I am creating a 0.25 degree resolution netCDF file. In the latitudes and longitudes that I am creating, do they represent the corner of each grid-cell or the center? Is there any way I can choose what they represent?
Coordinates (should) represent the centers of each grid cell.
There are some special cases where a variable, oftentimes wind velocity, is solved on the edges of the grid cell. In that case, the variable is attached to a set of lat/lon pairs that has one extra pair either in the x- or y-dimension. Though in post-processing, these edge-based variables are usually interpolated to the centers of each grid cell to follow the set of centered-coordinates, like the ones you've defined above.
EDIT: sources
CF conventions - cell boundaries essentially stating that lat/lon are in the center of the grid cell, bounded by the vertices of the grid cell. For most products, only one set of lat/lon coordinates are provided and that can be assumed to be for the centers of the grid cells (CH.4: "If bounds are not provided, an application might reasonably assume the gridpoints to be at the centers of the cells, but we do not require that in this standard.")
NOAA ioSSTv2 - an example of a NOAA product stating that "The latitude/longitude values in the netCDF coordinate variables are the centers of the grid cells."