Temperature and Salinty from CMEMS netcdf file for specific geographical location - python

I want to get the temperature ['thetao'] and salinity ['so'] of the sea surface (just the top layer) for specific geographical location.
I found guidance for how to do this on this website.
import netCDF4 as nc
import numpy as np
fn = "\\...\...\Downloads\global-analysis-forecast-phy-001-024_1647367066622.nc" # path to netcdf file
ds = nc.Dataset(fn) # read as netcdf dataset
print(ds)
print(ds.variables.keys()) # get all variable names
temp = ds.variables['thetao']
sal = ds.variables['so']
lat,lon = ds.variables['latitude'], ds.variables['longitude']
# extract lat/lon values (in degrees) to numpy arrays
latvals = lat[:]; lonvals = lon[:]
# a function to find the index of the point closest pt
# (in squared distance) to give lat/lon value.
def getclosest_ij(lats,lons,latpt,lonpt):
# find squared distance of every point on grid
dist_sq = (lats-latpt)**2 + (lons-lonpt)**2
# 1D index of minimum dist_sq element
minindex_flattened = dist_sq.argmin()
# Get 2D index for latvals and lonvals arrays from 1D index
return np.unravel_index(minindex_flattened, lats.shape)
iy_min, ix_min = getclosest_ij(latvals, lonvals, 50., -140)
print(iy_min)
print(ix_min)
# Read values out of the netCDF file for temperature and salinity
print('%7.4f %s' % (temp[0,0,iy_min,ix_min], temp.units))
print('%7.4f %s' % (sal[0,0,iy_min,ix_min], sal.units))
Some details on the nc-file I am using:
dimensions(sizes): time(1), depth(1), latitude(2041), longitude(4320)
variables(dimensions): float32 depth(depth), float32 latitude(latitude), int16 thetao(time, depth, latitude, longitude), float32 time(time), int16 so(time, depth, latitude, longitude), float32 longitude(longitude)
groups:
dict_keys(['depth', 'latitude', 'thetao', 'time', 'so', 'longitude'])
I am getting this error:
dist_sq = (lats-latpt)**2 + (lons-lonpt)**2
ValueError: operands could not be broadcast together with shapes (2041,) (4320,)
I suspect there is an issue with the shapes/arrays. In the example of the website (link above) the Lat and Lon have a (x,y), however this NC file only has for Latitude (2041,) and for Longitude (4320,).
How can I solve this?

It's because the lats and lons are vectors with different size...
I usually do this if using WGS84 or degrees as unit:
lonm,latm = np.meshgrid(lons,lats)
dmat = (np.cos(latm*np.pi/180.0)*(lonm-lonpt)*60.*1852)**2+((latm-latpt)*60.*1852)**2
Now you can find the closest point:
kkd = np.where(dmat==np.nanmin(dmat))

Related

get elevation from lat/long of geotiff data in gdal

I have a mosaic tif file (gdalinfo below) I made (with some additional info on the tiles here) and have looked extensively for a function that simply returns the elevation (the z value of this mosaic) for a given lat/long. The functions I've seen want me to input the coordinates in the coordinates of the mosaic, but I want to use lat/long, is there something about GetGeoTransform() that I'm missing to achieve this?
This example for instance here shown below:
from osgeo import gdal
import affine
import numpy as np
def retrieve_pixel_value(geo_coord, data_source):
"""Return floating-point value that corresponds to given point."""
x, y = geo_coord[0], geo_coord[1]
forward_transform = \
affine.Affine.from_gdal(*data_source.GetGeoTransform())
reverse_transform = ~forward_transform
px, py = reverse_transform * (x, y)
px, py = int(px + 0.5), int(py + 0.5)
pixel_coord = px, py
data_array = np.array(data_source.GetRasterBand(1).ReadAsArray())
return data_array[pixel_coord[0]][pixel_coord[1]]
This gives me an out of bounds error as it's likely expecting x/y coordinates (e.g. retrieve_pixel_value([153.023499,-27.468968],dataset). I've also tried the following from here:
import rasterio
dat = rasterio.open(fname)
z = dat.read()[0]
def getval(lon, lat):
idx = dat.index(lon, lat, precision=1E-6)
return dat.xy(*idx), z[idx]
Is there a simple adjustment I can make so my function can query the mosaic in lat/long coords?
Much appreciated.
Driver: GTiff/GeoTIFF
Files: mosaic.tif
Size is 25000, 29460
Coordinate System is:
PROJCRS["GDA94 / MGA zone 56",
BASEGEOGCRS["GDA94",
DATUM["Geocentric Datum of Australia 1994",
ELLIPSOID["GRS 1980",6378137,298.257222101004,
LENGTHUNIT["metre",1]],
ID["EPSG",6283]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433,
ID["EPSG",9122]]]],
CONVERSION["UTM zone 56S",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",0,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",153,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",0.9996,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",500000,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",10000000,
LENGTHUNIT["metre",1],
ID["EPSG",8807]],
ID["EPSG",17056]],
CS[Cartesian,2],
AXIS["easting",east,
ORDER[1],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]],
AXIS["northing",north,
ORDER[2],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]]]
Data axis to CRS axis mapping: 1,2
Origin = (491000.000000000000000,6977000.000000000000000)
Pixel Size = (1.000000000000000,-1.000000000000000)
Metadata:
AREA_OR_POINT=Area
Image Structure Metadata:
INTERLEAVE=BAND
Corner Coordinates:
Upper Left ( 491000.000, 6977000.000) (152d54'32.48"E, 27d19'48.33"S)
Lower Left ( 491000.000, 6947540.000) (152d54'31.69"E, 27d35'45.80"S)
Upper Right ( 516000.000, 6977000.000) (153d 9'42.27"E, 27d19'48.10"S)
Lower Right ( 516000.000, 6947540.000) (153d 9'43.66"E, 27d35'45.57"S)
Center ( 503500.000, 6962270.000) (153d 2' 7.52"E, 27d27'47.16"S)
Band 1 Block=25000x1 Type=Float32, ColorInterp=Gray
NoData Value=-999
Update 1 - I tried the following:
tif = r"mosaic.tif"
dataset = rio.open(tif)
d = dataset.read()[0]
def get_xy_coords(latlng):
transformer = Transformer.from_crs("epsg:4326", dataset.crs)
coords = [transformer.transform(x, y) for x,y in latlng][0]
#idx = dataset.index(coords[1], coords[0])
return coords #.xy(*idx), z[idx]
longx,laty = 153.023499,-27.468968
coords = get_elevation([(laty,longx)])
print(coords[0],coords[1])
print(dataset.width,dataset.height)
(502321.11181384244, 6961618.891167777)
25000 29460
So something is still not right. Maybe I need to subtract the coordinates from the bottom left/right of image e.g.
coords[0]-dataset.bounds.left,coords[1]-dataset.bounds.bottom
where
In [78]: dataset.bounds
Out[78]: BoundingBox(left=491000.0, bottom=6947540.0, right=516000.0, top=6977000.0)
Update 2 - Indeed, subtracting the corners of my box seems to get closer.. though I'm sure there is a much nice way just using the tif metadata to get what I want.
longx,laty = 152.94646, -27.463175
coords = get_xy_coords([(laty,longx)])
elevation = d[int(coords[1]-dataset.bounds.bottom),int(coords[0]-dataset.bounds.left)]
fig,ax = plt.subplots(figsize=(12,12))
ax.imshow(d,vmin=0,vmax=400,cmap='terrain',extent=[dataset.bounds.left,dataset.bounds.right,dataset.bounds.bottom,dataset.bounds.top])
ax.plot(coords[0],coords[1],'ko')
plt.show()
You basically have two distinct steps:
Convert lon/lat coordinates to map coordinates, this is only necessary if your input raster is not already in lon/lat. Map coordinates are the coordinates in the projection that the raster itself uses
Convert the map coordinates to pixel coordinates.
There are all kinds of tool you might use, perhaps to make things simpler (like pyproj, rasterio etc). But for such a simple case it's probably nice to start with doing it all in GDAL, that probably also enhances your understanding of what steps are needed.
Inputs
from osgeo import gdal, osr
raster_file = r'D:\somefile.tif'
lon = 153.023499
lat = -27.468968
lon/lat to map coordinates
# fetch metadata required for transformation
ds = gdal.OpenEx(raster_file)
raster_proj = ds.GetProjection()
gt = ds.GetGeoTransform()
ds = None # close file, could also keep it open till after reading
# coordinate transformation (lon/lat to map)
# define source projection
# this definition ensures the order is always lon/lat compared
# to EPSG:4326 for which it depends on the GDAL version (2 vs 3)
source_srs = osr.SpatialReference()
source_srs.ImportFromWkt(osr.GetUserInputAsWKT("urn:ogc:def:crs:OGC:1.3:CRS84"))
# define target projection based on the file
target_srs = osr.SpatialReference()
target_srs.ImportFromWkt(raster_proj)
# convert
ct = osr.CoordinateTransformation(source_srs, target_srs)
mapx, mapy, *_ = ct.TransformPoint(lon, lat)
You could verify this intermediate result by for example adding it as Point WKT in something like QGIS (using the QuickWKT plugin, making sure the viewer has the same projection as the raster).
map coordinates to pixel
# apply affine transformation to get pixel coordinates
gt_inv = gdal.InvGeoTransform(gt) # invert for map -> pixel
px, py = gdal.ApplyGeoTransform(gt_inv, mapx, mapy)
# it wil return fractional pixel coordinates, so convert to int
# before using them to read. Round to nearest with +0.5
py = int(py + 0.5)
px = int(px + 0.5)
# read pixel data
ds = gdal.OpenEx(raster_file) # open file again
elevation_value = ds.ReadAsArray(px, py, 1, 1)
ds = None
The elevation_value variable should be the value you're after. I would definitelly verify the result independently, try a few points in QGIS or the gdallocationinfo utility:
gdallocationinfo -l_srs "urn:ogc:def:crs:OGC:1.3:CRS84" filename.tif 153.023499 -27.468968
# Report:
# Location: (4228P,4840L)
# Band 1:
# Value: 1804.51879882812
If you're reading a lot of points, there will be some threshold at which it would be faster to read a large chunk and extract the values from that array, compared to reading every point individually.
edit:
For applying the same workflow on multiple points at once a few things change.
So for example having the inputs:
lats = np.array([-27.468968, -27.468968, -27.468968])
lons = np.array([153.023499, 153.023499, 153.023499])
The coordinate transformation needs to use ct.TransformPoints instead of ct.TransformPoint which also requires the coordinates to be stacked in a single array of shape [n_points, 2]:
coords = np.stack([lons.ravel(), lats.ravel()], axis=1)
mapx, mapy, *_ = np.asarray(ct.TransformPoints(coords)).T
# reshape in case of non-1D inputs
mapx = mapx.reshape(lons.shape)
mapy = mapy.reshape(lons.shape)
Converting from map to pixel coordinates changes because the GDAL method for this only takes single point. But manually doing this on the arrays would be:
px = gt_inv[0] + mapx * gt_inv[1] + mapy * gt_inv[2]
py = gt_inv[3] + mapx * gt_inv[4] + mapy * gt_inv[5]
And rounding the arrays to integer changes to:
px = (px + 0.5).astype(np.int32)
py = (py + 0.5).astype(np.int32)
If the raster (easily) fits in memory, reading all points would become:
ds = gdal.OpenEx(raster_file)
all_elevation_data = ds.ReadAsArray()
ds = None
elevation_values = all_elevation_data[py, px]
That last step could be optimized by checking highest/lowest pixel coordinates in both dimensions and only read that subset for example, but it would require normalizing the coordinates again to be valid for that subset.
The py and px arrays might also need to be clipped (eg np.clip) if the input coordinates fall outside the raster. In that case the pixel coordinates will be < 0 or >= xsize/ysize.

Spatial resampling of netCDF file in python

I am working with Sentinel 3 SLSTR data which comes in netCDF file format. The file contains 11 bands:
S1-S6 (500 m resolution) and S7-S9 and F1 & F2 (1000 m resolution). S1-S6 contains radiance values and S7-S9 contains brightness temperature values. Right now, I want to resample my S7-S9 band to 500 m resolution to match the resolution of S1-S6 bands.
I am using xarray to read the netCDF files. There is a function xarray.Dataset.resample() but the documentation says that it resample to a new temporal resolution.
I also tried to resample using gdal but couldn't get any result.
import gdal
import xarray as xr
import matplotlib.pyplot as plt
data = xr.open_dataset('S7_BT_in.nc') # one of the files in 1000 m resolution
geo = xr.open_dataset(path+'geodetic_an.nc') # file containing the geodetic values
ds = data['S7_BT_in'] # fetching variable I need to work on
lat = geo['latitude_an'] # fetching latitude values
lon = geo['longitude_an'] # fetching longitude values
#assigning latitude and longitude values to the coordinates of ds
ds = ds.assign_coords(coords = {'Latitude': lat, 'Longitude': lon})
x = gdal.Open('ds') # Opening the netCDF file using gdal
# resampling the data to 500 m resolution
xreproj = gdal.Warp('resampled.nc', x, xRes = 500, yRes = 500)
This is the error I am getting:
SystemError: <built-in function wrapper_GDALWarpDestName> returned NULL without setting an error.
I also tried opening the file directly using gdal but still getting the same error.

Interpolating a 3-dim (time, lat, lon) field with _FillValue in Python. How to avoid for loops?

I am interpolating data from the oceanic component of CMIP6 models to a 2x2 grid. The field has a dim of (time, nav_lat, nav_lon) and nan values in continent. Here, nav_lon and nav_lat are two-dimensional curvilinear grid. I can do the interpolation using griddata from scipy, but I have to use a loop over time. The loop makes it pretty slow if the data has thousands of time records. My question is how to vectorize the interpolation over time.
The following is my code:
import xarray as xr
import numpy as np
from scipy.interpolate import griddata
import matplotlib.pyplot as plt
source = xr.open_dataset('data/zos_2850.nc',decode_times=False)
# obtain old lon and lat (and put lon in 0-360 range)
# nav_lon is from -180 to 180, not in 0-360 range
loni, lati = source.nav_lon.values%360, source.nav_lat.values
# flatten the source coordinates
loni_flat, lati_flat = loni.flatten(), lati.flatten()
# define a 2x2 lon-lat grid
lon, lat = np.linspace(0,360,181), np.linspace(-90,90,91)
# create mesh
X, Y = np.meshgrid(lon,lat)
# loop over time
ntime = len(source.time)
tmp = []
for t in range(ntime):
print(t)
var_s = source.zos[t].values
var_s_flat = var_s.flatten()
# index indicates where they are valid values
index = np.where(~np.isnan(var_s_flat))
# remap the valid values to the new grid
var_t = griddata((loni_flat[index],lati_flat[index]),var_s_flat[index], (X,Y),
method='cubic')
# interpolate mask using nearest
maskinterp = griddata((loni_flat,lati_flat),var_s_flat, (X,Y), method='nearest')
# re-mask interpolated data
var_t[np.isnan(maskinterp)] = np.nan
tmp.append(var_t)
# convert to DataArray
da = xr.DataArray(data=tmp,
dims=["time","lat","lon"],
coords=dict(lon=(["lon"], lon),lat=(["lat"], lat),time=source['time']))

Coordinate system used for healpy.query_disc()

The healpy.query_disc() function takes an argument vec which is a three-component unit vector defining the center of the disc. What coordinate system is being used here - why is there a third dimension for a 2-d projection? What point is the "tail" of this vector llocated at?
Very good you found the solution yourself, for later reference here is a full working code example:
import healpy as hp
import numpy as np
# `lonlat=True` switches `ang2vec` from requiring colatitude $\theta$ and longitude $\phi$ in radians to longitude and latitude in degrees (notice that also the order changes)
# in degrees
lon = 60
lat = 30
vec = hp.ang2vec(lon, lat, lonlat=True)
nside = 256
large_disc = hp.query_disc(nside, vec, radius=np.radians(20))
small_disc = hp.query_disc(nside, vec, radius=np.radians(8))
tiny_disc = hp.query_disc(nside, vec, radius=np.radians(2))
# `query_disc` returns a list of pixels, by default in RING ordering, let's check their length:
list(map(len, [large_disc, small_disc, tiny_disc]))
# ## Create a map and plot it in Mollweide projection
m = np.zeros(hp.nside2npix(nside))
m[large_disc] = 1
m[small_disc] = 2
m[tiny_disc] = 3
hp.mollview(m)
hp.graticule()
See the notebook with plots here: https://zonca.dev/2020/10/example-healpy-query_disc.html
Solved. Used the output of ang2vec() to give the vector.

Why latitudes and longitudes are two dimensional arrays in netcdf file?

I have netCDF file, which contains temperature data over some location. Data shape is 1450x900.
I am creating search functionality in my app, to locate temperature data with lat, lon values.
So I extracted lat and lon coordinates data from netCDf file, but I was expecting that they would be 1D arrays and instead got 2D arrays with 1450x900 shape for both coordinates.
So my question: why they are 2d arrays, instead of 1450 latitude values and 900 longitude values? Doesn't 1450 lat values and 900 lon values describe whole grid?
Lets say we have 4x5 square, indices for locating rightmost and bottom-most point of the grid will be [4, 5]. So my indices for x will be[1, 2, 3, 4] and for y: [1, 2, 3, 4, 5]. 9 indices in total are enough to locate any point on that grid (consisting of 20 cells). So why do lat (x) and lon (y) coordinates in netcdf file contain 20 indices separately (40 in total), instead of 4 and 5 indices respectively (9 in total)? Hope you get what confuses me.
Is it possible to somehow map those 2D arrays and "downgrade" to 1450 latitude values and 900 longitude values? OR it is ok as it is right now? How can I use those values for my intention? Do I need to zip lat lon arrays?
here are the shapes:
>>> DS = xarray.open_dataset('file.nc')
>>> DS.tasmin.shape
(31, 1450, 900)
>>> DS.projection_x_coordinate.shape
(900,)
>>> DS.projection_y_coordinate.shape
(1450,)
>>> DS.latitude.shape
(1450, 900)
>>> DS.longitude.shape
(1450, 900)
consider that projection_x_coordinate and projection_y_coordinate are easting/northing values not lat/longs
here is the metadata of file if needed:
Dimensions: (bnds: 2, projection_x_coordinate: 900, projection_y_coordinate: 1450, time: 31)
Coordinates:
* time (time) datetime64[ns] 2018-12-01T12:00:00 ....
* projection_y_coordinate (projection_y_coordinate) float64 -1.995e+0...
* projection_x_coordinate (projection_x_coordinate) float64 -1.995e+0...
latitude (projection_y_coordinate, projection_x_coordinate) float64 ...
longitude (projection_y_coordinate, projection_x_coordinate) float64 ...
Dimensions without coordinates: bnds
Data variables:
tasmin (time, projection_y_coordinate, projection_x_coordinate) float64 ...
transverse_mercator int32 ...
time_bnds (time, bnds) datetime64[ns] ...
projection_y_coordinate_bnds (projection_y_coordinate, bnds) float64 ...
projection_x_coordinate_bnds (projection_x_coordinate, bnds) float64 ...
Attributes:
comment: Daily resolution gridded climate observations
creation_date: 2019-08-21T21:26:02
frequency: day
institution: Met Office
references: doi: 10.1002/joc.1161
short_name: daily_mintemp
source: HadUK-Grid_v1.0.1.0
title: Gridded surface climate observations data for the UK
version: v20190808
Conventions: CF-1.5
Your data adheres to version 1.5 of the Climate and Forecast conventions.
The document describing this version of the conventions is here, although the relevant section is essentially unchanged across many versions of the conventions.
See section 5.2:
5.2. Two-Dimensional Latitude, Longitude, Coordinate Variables
The latitude and longitude coordinates of a horizontal grid that was
not defined as a Cartesian product of latitude and longitude axes, can
sometimes be represented using two-dimensional coordinate variables.
These variables are identified as coordinates by use of the coordinates attribute.
It looks like you are using the HadOBS 1km resolution gridded daily minimum temperature, and this file in particular:
http://dap.ceda.ac.uk/thredds/fileServer/badc/ukmo-hadobs/data/insitu/MOHC/HadOBS/HadUK-Grid/v1.0.1.0/1km/tasmin/day/v20190808/tasmin_hadukgrid_uk_1km_day_20181201-20181231.nc (warning: >300MB download)
As it states, the data is on a transverse mercator grid.
If you look at output from ncdump -h <filename> you will also see the following description of the grid expressed by means of attributes of the transverse_mercator dummy variable:
int transverse_mercator ;
transverse_mercator:grid_mapping_name = "transverse_mercator" ;
transverse_mercator:longitude_of_prime_meridian = 0. ;
transverse_mercator:semi_major_axis = 6377563.396 ;
transverse_mercator:semi_minor_axis = 6356256.909 ;
transverse_mercator:longitude_of_central_meridian = -2. ;
transverse_mercator:latitude_of_projection_origin = 49. ;
transverse_mercator:false_easting = 400000. ;
transverse_mercator:false_northing = -100000. ;
transverse_mercator:scale_factor_at_central_meridian = 0.9996012717 ;
and you will also see that the coordinate variables projection_x_coordinate and projection_y_coordinate have units of metres.
The grid in question is the Ordnance Survey UK grid using numeric grid references.
See for example this description of the OS grid (from Wikipedia).
If you wish to express the data on a regular longitude-latitude grid then you will need to do some type of interpolation. I see that you are using xarray. You can combine this with pyresample to do the interpolation. Here is an example:
import xarray as xr
import numpy as np
from pyresample.geometry import SwathDefinition
from pyresample.kd_tree import resample_nearest, resample_gauss
ds = xr.open_dataset("tasmin_hadukgrid_uk_1km_day_20181201-20181231.nc")
# Define a target grid. For sake of example, here is one with just
# 3 longitudes and 4 latitudes.
lons = np.array([-2.1, -2., -1.9])
lats = np.array([51.7, 51.8, 51.9, 52.0])
# The target grid is regular (1-d lon, lat coordinates) but we will need
# a 2d version (similar to the input grid), so use numpy.meshgrid to produce this.
lon2d, lat2d = np.meshgrid(lons, lats)
origin_grid = SwathDefinition(lons=ds.longitude, lats=ds.latitude)
target_grid = SwathDefinition(lons=lon2d, lats=lat2d)
# get a numpy array for the first timestep
data = ds.tasmin[0].to_masked_array()
# nearest neighbour interpolation example
# Note that radius_of_influence has units metres
interpolated = resample_nearest(origin_grid, data, target_grid, radius_of_influence=1000)
# GIVES:
# array([[5.12490065, 5.02715332, 5.36414835],
# [5.08337723, 4.96372838, 5.00862833],
# [6.47538931, 5.53855722, 5.11511239],
# [6.46571817, 6.17949381, 5.87357538]])
# gaussian weighted interpolation example
# Note that radius_of_influence and sigmas both have units metres
interpolated = resample_gauss(origin_grid, data, target_grid, radius_of_influence=1000, sigmas=1000)
# GIVES:
# array([[5.20432465, 5.07436805, 5.39693221],
# [5.09069187, 4.8565934 , 5.08191639],
# [6.4505963 , 5.44018209, 5.13774416],
# [6.47345359, 6.2386732 , 5.62121948]])
I figured an answer by myself.
As appeared 2D lat long arrays are used to define the "grid" of some location.
In other words, if we zip lat long values and project on the map, we will get "curved grid" (earth curvature is considered in other words) over some location, which then are used to create grid reference of location.
Hope its clear for anyone interested.

Categories

Resources