Lat and Lon representation in python netCDF4 - python

import netCDF4
import numpy as np
nc_data = netCDF4.Dataset(out_nc, 'w', format='NETCDF4')
nc_data.description = 'Test'
# dimensions
nc_data.createDimension('lat', 720)
nc_data.createDimension('lon', 1440)
fl_res = 0.25
latitudes = np.arange(90.0 - fl_res/2.0, -90.0, -fl_res)
longitudes = np.arange(-180.0 + fl_res/2.0, 180.0, fl_res)
I am creating a 0.25 degree resolution netCDF file. In the latitudes and longitudes that I am creating, do they represent the corner of each grid-cell or the center? Is there any way I can choose what they represent?

Coordinates (should) represent the centers of each grid cell.
There are some special cases where a variable, oftentimes wind velocity, is solved on the edges of the grid cell. In that case, the variable is attached to a set of lat/lon pairs that has one extra pair either in the x- or y-dimension. Though in post-processing, these edge-based variables are usually interpolated to the centers of each grid cell to follow the set of centered-coordinates, like the ones you've defined above.
EDIT: sources
CF conventions - cell boundaries essentially stating that lat/lon are in the center of the grid cell, bounded by the vertices of the grid cell. For most products, only one set of lat/lon coordinates are provided and that can be assumed to be for the centers of the grid cells (CH.4: "If bounds are not provided, an application might reasonably assume the gridpoints to be at the centers of the cells, but we do not require that in this standard.")
NOAA ioSSTv2 - an example of a NOAA product stating that "The latitude/longitude values in the netCDF coordinate variables are the centers of the grid cells."


Python: using polygons to create a mask on a given 2d grid

I have some polygons (Canadian provinces), read in with GeoPandas, and want to use these to create a mask to apply to gridded data on a 2-d latitude-longitude grid (read from a netcdf file using iris). An end goal would be to only have data for a given province remaining, with the rest of the data masked out. So the mask would be 1's for grid boxes within the province, and 0's or NaN's for grid boxes outside the province.
The polygons can be obtained from the shapefile here:
The netcdf file I am using can be downloaded here:
I imagine there are two approaches here but I am struggling with both:
1) Use the polygon to create a mask on the latitude-longitude grid so that this can be applied to lots of datafiles outside of python (preferred)
2) Use the polygon to mask the data that have been read in and extract only the data inside the province of interest, to work with interactively.
My code so far:
import iris
import geopandas as gpd
#read the shapefile and extract the polygon for a single province
#(province names stored as variable 'NAME_1')
BritishColumbia=Canada[Canada['NAME_1'] == 'British Columbia']
#get the latitude-longitude grid from netcdf file
#create 2d grid from lats and lons (may not be necessary?)
Thanks very much for any help or advice.
UPDATE: Following the great solution from #DPeterK below, my original data can be masked, giving the following:
It looks like you have started well! Geometries loaded from shapefiles expose various geospatial comparison methods, and in this case you need the contains method. You can use this to test each point in your cube's horizontal grid for being contained within your British Columbia geometry. (Note that this is not a fast operation!) You can use this comparison to build up a 2D mask array, which could be applied to your cube's data or used in other ways.
I've written a Python function to do the above – it takes a cube and a geometry and produces a mask for the (specified) horizontal coordinates of the cube, and applies the mask to the cube's data. The function is below:
def geom_to_masked_cube(cube, geometry, x_coord, y_coord,
Convert a shapefile geometry into a mask for a cube's data.
* cube:
The cube to mask.
* geometry:
A geometry from a shapefile to define a mask.
* x_coord: (str or coord)
A reference to a coord describing the cube's x-axis.
* y_coord: (str or coord)
A reference to a coord describing the cube's y-axis.
* mask_excludes: (bool, default False)
If False, the mask will exclude the area of the geometry from the
cube's data. If True, the mask will include *only* the area of the
geometry in the cube's data.
.. note::
This function does *not* preserve lazy cube data.
# Get horizontal coords for masking purposes.
lats = cube.coord(y_coord).points
lons = cube.coord(x_coord).points
lon2d, lat2d = np.meshgrid(lons,lats)
# Reshape to 1D for easier iteration.
lon2 = lon2d.reshape(-1)
lat2 = lat2d.reshape(-1)
mask = []
# Iterate through all horizontal points in cube, and
# check for containment within the specified geometry.
for lat, lon in zip(lat2, lon2):
this_point = gpd.geoseries.Point(lon, lat)
res = geometry.contains(this_point)
mask = np.array(mask).reshape(lon2d.shape)
if mask_excludes:
# Invert the mask if we want to include the geometry's area.
mask = ~mask
# Make sure the mask is the same shape as the cube.
dim_map = (cube.coord_dims(y_coord)[0],
cube_mask = iris.util.broadcast_to_shape(mask, cube.shape, dim_map)
# Apply the mask to the cube's data.
data =
masked_data =, cube_mask) = masked_data
return cube
If you just need the 2D mask you could return that before the above function applies it to the cube.
To use this function in your original code, add the following at the end of your code:
geometry = BritishColumbia.geometry
masked_cube = geom_to_masked_cube(cube, geometry,
'longitude', 'latitude',
If this doesn't mask anything it might well mean that your cube and geometry are defined on different extents. That is, your cube's longitude coordinate runs from 0°–360°, and if the geometry's longitude values run from -180°–180°, then the containment test will never return True. You can fix this by changing the extents of your cube with the following:
cube = cube.intersection(longitude=(-180, 180))
I found an alternative solution to the excellent one posted by #DPeterK above, which yields the same result. It uses matplotlib.path to test if points are contained within the exterior coordinates described by the geometries loaded from a shape file. I am posting this because this method is ~10 times faster than that given by #DPeterK (2:23 minutes vs 25:56 minutes). I'm not sure what is preferable: an elegant solution, or a speedy, brute force solution. Perhaps one can have both?!
One complication with this method is that some geometries are MultiPolygons - i.e. the shape consists of several smaller polygons (in this case, the province of British Columbia includes islands off of the west coast, which can't be described by the coordinates of the mainland British Columbia Polygon). The MultiPolygon has no exterior coordinates but the individual polygons do, so these each need to be treated individually. I found that the neatest solution to this was to use a function copied from GitHub (, which 'explodes' MultiPolygons into a list of individual polygons that can then be treated separately.
The working code is outlined below, with my documentation. Apologies that it is not the most elegant code - I am relatively new to Python and I'm sure there are lots of unnecessary loops/neater ways to do things!
import numpy as np
import iris
import geopandas as gpd
from shapely.geometry import Point
import matplotlib.path as mpltPath
from shapely.geometry.polygon import Polygon
from shapely.geometry.multipolygon import MultiPolygon
#FIRST, read in the target data and latitude-longitude grid from netcdf file
#create 2d grid from lats and lons
#create a list of coordinates of all points within grid
for latit in range(0,241):
for lonit in range(0,480):
#turn into np array for later
#get the cube data - useful for later
#create a mask array of zeros, same shape as fld, to be modified by
#the code below
#NOW, read the shapefile and extract the polygon for a single province
#(province names stored as variable 'NAME_1')
BritishColumbia=Canada[Canada['NAME_1'] == 'British Columbia']
#BritishColumbia.geometry.type reveals this to be a 'MultiPolygon'
#i.e. several (in this case, thousands...) if individual polygons.
#I ultimately want to get the exterior coordinates of the BritishColumbia
#polygon, but a MultiPolygon is a list of polygons and therefore has no
#exterior coordinates. There are probably many ways to progress from here,
#but the method I have stumbled upon is to 'explode' the multipolygon into
#it's individual polygons and treat each individually. The function below
#to 'explode' the MultiPolygon was found here:
#---define function to explode MultiPolygons
def explode_polygon(indata):
indf = indata
outdf = gpd.GeoDataFrame(columns=indf.columns)
for idx, row in indf.iterrows():
if type(row.geometry) == Polygon:
#note: now redundant, but function originally worked on
#a shapefile which could have combinations of individual polygons
#and MultiPolygons
outdf = outdf.append(row,ignore_index=True)
if type(row.geometry) == MultiPolygon:
multdf = gpd.GeoDataFrame(columns=indf.columns)
recs = len(row.geometry)
multdf = multdf.append([row]*recs,ignore_index=True)
for geom in range(recs):
multdf.loc[geom,'geometry'] = row.geometry[geom]
outdf = outdf.append(multdf,ignore_index=True)
return outdf
#Explode the BritishColumbia MultiPolygon into its constituents
#Loop over each individual polygon and get external coordinates
for index,row in EBritishColumbia.iterrows():
print 'working on polygon', index
for pt in list(row['geometry'].exterior.coords):
print index,', ',pt
#See if any of the original grid points read from the netcdf file earlier
#lie within the exterior coordinates of this polygon
#pth.contains_points returns a boolean array (true/false), in the
#shape of 'points'
#find the results in the array that were inside the polygon ('True')
#and set them to missing. First, must reshape the result of the search
#('points') so that it matches the mask & original data
#reshape the result to the main grid array
i=np.where(inside == True)
print 'fininshed checking for points inside all polygons'
#mask now contains 0's for points that are not within British Columbia, and
#1's for points that are. FINALLY, use this to mask the original data
#(stored as 'fld')
i=np.where(mask == 0)

Healpy coordinate error after interpolation: appearance of bisector

I have a coarse skymap made up of 128 points, of which I would like to make a smooth healpix map (see attached Figure, LHS). Figures referenced in the text:
I load my data, then make new longitude and latitude arrays of the appropriate pixel length for the final map (with e.g. nside=32).
My input data are:
lats = pi/2 + ths # theta from 0, pi, size 8
lons = phs # phi from 0, 2pi, size 16
data = sky_data[0] # shape (8,16)
New lon/lat array size based on number of pixels from nside:
nside = 32
pixIdx = hp.nside2npix(nside) # number of pixels I can get from this nside
pixIdx = np.arange(pixIdx) # pixel index numbers
I then find the new data values for those pixels by interpolation, and then convert back from angles to pixels.
# new lon/lat
new_lats = hp.pix2ang(nside, pixIdx)[0] # thetas I need to populate with interpolated theta values
new_lons = hp.pix2ang(nside, pixIdx)[1] # phis, same
# interpolation
lut = RectSphereBivariateSpline(lats, lons, data, pole_values=4e-14)
data_interp = lut.ev(new_lats.ravel(), new_lons.ravel()) #interpolate the data
pix = hp.ang2pix(nside, new_lats, new_lons) # convert latitudes and longitudes back to pixels
Then, I construct a healpy map with the interpolated values:
healpix_map = np.zeros(hp.nside2npix(nside), dtype=np.double) # create empty map
healpix_map[pix] = data_interp # assign pixels to new interpolated values
testmap = hp.mollview(healpix_map)
The result of the map is the upper RHS of the attached Figure.
(Forgive the use of jet -- viridis doesn't have a "white" zero, so using that colormap adds a blue background.)
The map doesn't look right: you can see from the coarse map in the Figure that there should be a "hotspot" on the lower RHS, but here it appears in the upper left.
As a sanity check, I used matplotlib to make a scatter plot of the interpolated points in a mollview projection, Figure 2, where I removed the edges of the markers to make it look like a map ;)
ax = plt.subplot(111, projection='astro mollweide')
colors = data_interp
sky=plt.scatter(new_lons, new_lats-pi/2, c = colors, edgecolors='none', cmap ='jet')
plt.colorbar(sky, orientation = 'horizontal')
You can see that this map, lower RHS of attached Figure, produces exactly what I expect! So the coordinates are ok, and I am completely confused.
Has anyone encountered this before? What can I do? I'd like to use the healpy functions on this and future maps, so just using matplotlib isn't an option.
I figured it out -- I had to add pi/2 to my thetas for the interpolation to work, so in the end need to apply the following transformation for the image to render correctly:
newnew_lats = pi - new_lats
newnew_lons = pi + new_lons
There still seems to be a bit of an issue with the interpolation, although the seem is not so visible now. I may try a different one to compare.
I'm no expert in healpix (actually I've never used it before - I'm a particle physicist), but as far as I can tell it's just a matter of conventions: in a Mollweide projection, healpy places the north pole (positive latitude) at the bottom of the map, for some reason. I'm not sure why it would do that, or whether this is intentional behavior, but it seems pretty clear that's what is happening. If I mask out everything below the equator, i.e. keep only the positive-latitude points
mask = new_lats - pi/2 > 0
pix = hp.ang2pix(nside, new_lats[mask], new_lons[mask])
healpix_map = np.zeros(hp.nside2npix(nside), dtype=np.double)
healpix_map[pix] = data_interp[mask]
testmap = hp.mollview(healpix_map)
it comes up with a plot with no data above the center line:
At least it's easy enough to fix. mollview admits a rot parameter that will effectively rotate the sphere around the viewing axis before projecting it, and a flip parameter which can be set to 'astro' (default) or 'geo' to set whether east is shown at the left or right. A little experimentation shows that you get the coordinate system you want with
hp.mollview(healpix_map, rot=(180, 0, 180), flip='geo')
In the tuple, the first two elements are longitude and latitude of the point to set in the center of the plot, and the third element is the rotation. All are in degrees. With no mask it gives this:
which I believe is just what you're looking for.

Why are Basemap south polar stereographic map projection coordinates not agreeing with those of data sets in the same projection?

Some satellite based earth observation products provide latitude/longitude information while others provide the X/Y coordinates within a given grid projection (and there are also some having both, see example).
My approach in the second case is to set up a Basemap map which has the same parameters (projection, ellipsoid, origin of map) as given by the data provider in a way that the given X/Y values equal the Basemap coordinates. However if I do so the geolocation does not agree with other data sets including the Basemap coastline.
I have experienced this with three different data sets from different trustworthy sources. For the minimal example I use Landsat data provided by the U.S. Geological Survey which includes both, X/Y coordinates of a South Polar Stereographic grid and the corresponding lat/lon coordinates for all four corners of the image.
From a Landsat metafile we get (ID: LC82171052016079LGN00):
By using Basemap with the right map projection we should be able to derive the corner lat/lon values from the X/Y values:
import numpy as np
from mpl_toolkits.basemap import Basemap
m=Basemap(resolution='h',projection='spstere', ellps='WGS84', boundinglat=-60,lon_0=180, lat_ts=-71)
x_crn=np.array([-2259300,-1981500,-2259300,-1981500])# upper left, upper right, lower left, lower right
y_crn=np.array([1236000, 1236000, 958500, 958500])# upper left, upper right, lower left, lower right
x0, y0= m(0, -90)
#Basemap coordinates at the south pole
#note that (0,0) of the Basemap is in a corner of the map,
#while other data sets use the south pole.
#This is easy to take into account:
lon_crn, lat_crn = m(x0-x_crn, y0-y_crn, inverse=True)
print 'lon_crn: '+str(lon_crn)
print 'lat_crn: '+str(lat_crn)
Which returns:
lon_crn: [-61.31816102 -58.04532791 -67.01108782 -64.1858106 ]
lat_crn: [-67.23548626 -69.3099076 -68.28071626 -70.47651326]
As you can see the longitudes agree to the given precision with those from the metafile, but the latitudes are to low.
I can approximate the latitudes by:
But this is really not satisfying.
This is how the image is located if using the X/Y corner coordinates from the metafile (in red the Basemap drawcoastlines()):
and this is how it looks like using the corner lat/lon:
In this case I can simply use the lat/lon coordinates, but as mentioned before there are datasets (like this) which is provided by X/Y coordinates only, which makes it very important to rely on the Basemap projection. I know that there are other modules to re-project the data as a potential workaround, but it should work without other modules and a re-projection could introduce errors itself.
As this problem appears with different data sets I like to believe that it is a bug in the Basemap module, but I might also make the same mistake again and again or have wrong expectations.
I did some experimentation and it seems like changing lat_ts has no effect with projection='spstere'. In fact, it seems as if the projection latitude is implicitly assumed to be lat_ts=-90. regardless of what value you assign.
I had more success using projection='stere' instead, so that you would construct the Basemap in your example as follows:
m=Basemap(width=5400000., height=5400000., projection='stere',
ellps='WGS84', lon_0=180., lat_0=-90., lat_ts=-71.)
You may prefer to set the latitude and longitude of the corners instead of the width and height of the plot for your application.

Interpolating Scattered Data from a Volume that has Empty Space

I have 3d data produced from mesh points. The structure that was meshed is complex enough that interpolation using griddata is lacking. Specifically, there are regions without data points which are being given values by griddata that are not the fill_value. I need these hollow regions to have the value of 0.0, which I set fill_value to.
A simplified version of this is illustrated below:
The area occupied by the cylinder has no data points but the rest of the cube volume does. There will be data points from interpolation inside the cylinder but I need them to be zero.
Below is a slice parallel to the xy plane of the actual interpolated data with a black oval that approximates the edge 'cylinder'. The red an blue 'bleed' in to the void after interpolation. The fill value of 0.0 can be seen in the upper left corner:
Any ideas on how I can achieve the goal of setting those values to 0.0? Note that the 'cylinder' is not of constant shape.
I thought about going z layer by z layer and finding a polygon that gives the cylinder shape and then setting points inside the polygon to zero.
I also thought about partitioning the volume so a portion of the cylinder ends up in corners of the partion (for each z layer) and hoping that the interpolator would not try to extrapolate into the void region.
The first option seems better, but I would like to know if Python provides some sort of functionality which would work better.
EDIT: Here are some actual points from the data set:
The z scale is much smaller than x or y. You can see that the regions I'm interested in are pretty well defined. But, again, how do I identify them for the purposes of setting grid points to 0.0?

Interpolation over an irregular grid

So, I have three numpy arrays which store latitude, longitude, and some property value on a grid -- that is, I have LAT(y,x), LON(y,x), and, say temperature T(y,x), for some limits of x and y. The grid isn't necessarily regular -- in fact, it's tripolar.
I then want to interpolate these property (temperature) values onto a bunch of different lat/lon points (stored as lat1(t), lon1(t), for about 10,000 t...) which do not fall on the actual grid points. I've tried matplotlib.mlab.griddata, but that takes far too long (it's not really designed for what I'm doing, after all). I've also tried scipy.interpolate.interp2d, but I get a MemoryError (my grids are about 400x400).
Is there any sort of slick, preferably fast way of doing this? I can't help but think the answer is something obvious... Thanks!!
Try the combination of inverse-distance weighting and
described in SO
work nicely in 2d 3d ..., inverse-distance weighting is smooth and local,
and the k= number of nearest neighbours can be varied to tradeoff speed / accuracy.
There is a nice inverse distance example by Roger Veciana i Rovira along with some code using GDAL to write to geotiff if you're into that.
This is of coarse to a regular grid, but assuming you project the data first to a pixel grid with pyproj or something, all the while being careful what projection is used for your data.
A copy of his algorithm and example script:
from math import pow
from math import sqrt
import numpy as np
import matplotlib.pyplot as plt
def pointValue(x,y,power,smoothing,xv,yv,values):
for i in range(0,len(values)):
dist = sqrt((x-xv[i])*(x-xv[i])+(y-yv[i])*(y-yv[i])+smoothing*smoothing);
#If the point is really close to one of the data points, return the data point value to avoid singularities
return values[i]
#Return NODATA if the denominator is zero
if denominator > 0:
value = nominator/denominator
value = -9999
return value
def invDist(xv,yv,values,xsize=100,ysize=100,power=2,smoothing=0):
valuesGrid = np.zeros((ysize,xsize))
for x in range(0,xsize):
for y in range(0,ysize):
valuesGrid[y][x] = pointValue(x,y,power,smoothing,xv,yv,values)
return valuesGrid
if __name__ == "__main__":
#Creating some data, with each coodinate and the values stored in separated lists
xv = [10,60,40,70,10,50,20,70,30,60]
yv = [10,20,30,30,40,50,60,70,80,90]
values = [1,2,2,3,4,6,7,7,8,10]
#Creating the output grid (100x100, in the example)
ti = np.linspace(0, 100, 100)
XI, YI = np.meshgrid(ti, ti)
#Creating the interpolation function and populating the output matrix value
ZI = invDist(xv,yv,values,100,100,power,smoothing)
# Plotting the result
n = plt.normalize(0.0, 100.0)
plt.subplot(1, 1, 1)
plt.pcolor(XI, YI, ZI)
plt.scatter(xv, yv, 100, values)
plt.title('Inv dist interpolation - power: ' + str(power) + ' smoothing: ' + str(smoothing))
plt.xlim(0, 100)
plt.ylim(0, 100)
There's a bunch of options here, which one is best will depend on your data...
However I don't know of an out-of-the-box solution for you
You say your input data is from tripolar data. There are three main cases for how this data could be structured.
Sampled from a 3d grid in tripolar space, projected back to 2d LAT, LON data.
Sampled from a 2d grid in tripolar space, projected into 2d LAT LON data.
Unstructured data in tripolar space projected into 2d LAT LON data
The easiest of these is 2. Instead of interpolating in LAT LON space, "just" transform your point back into the source space and interpolate there.
Another option that works for 1 and 2 is to search for the cells that maps from tripolar space to cover your sample point. (You can use a BSP or grid type structure to speed up this search) Pick one of the cells, and interpolate inside it.
Finally there's a heap of unstructured interpolation options .. but they tend to be slow.
A personal favourite of mine is to use a linear interpolation of the nearest N points, finding those N points can again be done with gridding or a BSP. Another good option is to Delauney triangulate the unstructured points and interpolate on the resulting triangular mesh.
Personally if my mesh was case 1, I'd use an unstructured strategy as I'd be worried about having to handle searching through cells with overlapping projections. Choosing the "right" cell would be difficult.
I suggest you taking a look at GRASS (an open source GIS package) interpolation features ( It's not in python but you can reimplement it or interface with C code.
Am I right in thinking your data grids look something like this (red is the old data, blue is the new interpolated data)?
alt text
This might be a slightly brute-force-ish approach, but what about rendering your existing data as a bitmap (opengl will do simple interpolation of colours for you with the right options configured and you could render the data as triangles which should be fairly fast). You could then sample pixels at the locations of the new points.
Alternatively, you could sort your first set of points spatially and then find the closest old points surrounding your new point and interpolate based on the distances to those points.
There is a FORTRAN library called BIVAR, which is very suitable for this problem. With a few modifications you can make it usable in python using f2py.
From the description:
BIVAR is a FORTRAN90 library which interpolates scattered bivariate data, by Hiroshi Akima.
BIVAR accepts a set of (X,Y) data points scattered in 2D, with associated Z data values, and is able to construct a smooth interpolation function Z(X,Y), which agrees with the given data, and can be evaluated at other points in the plane.

