I have an array that has 14000 columns and 7000 rows of terrain data for the US that are equally spaced 500m apart. I also have the lower-left latitude and longitude:
ncols = 14000
nrows = 7000
xllcorner = -130
yllcorner = 20
cellsize = 0.05
I also have another dataset (polar --> Cartesian radar data) that is already in a projected coordinate system:
# radial data being converted to Cartesian
x = rangee * np.sin(np.deg2rad(az))[:,None]
y = rangee * np.cos(np.deg2rad(az))[:,None]
latitude = 35.9339
longitude = -80.0212
dataproj = Proj(f"+proj=stere +lat_0={latitude} +lat_ts={latitude} +lon_0={longitude} +ellps=WGS84 +units=m")
lons,lats = dataproj(x,y,inverse=True)
It should be noted that the terrain data spans throughout the US, whereas the radar data is located over North Carolina. Therefore, I have two separate gridded datasets where I would like to be able to match the terrain data as best as possible to the radar data. In other words, whether through interpolation and/or other methods, there should be one value of terrain for each [x,y] location of the radar data.
How could one achieve this?
I would recommend using rioxarray. It has a method called reproject_match to make the two grids align:
https://corteva.github.io/rioxarray/stable/examples/reproject_match.html
Related
I am trying to reproject polar data into Cartesian data that matches along latitude / longitude lines. The code that I have thus far is as follows:
latitude = 35.6655197143554
longitude = -78.48975372314453
# Convert to Cartesian
x = ranges * np.sin(np.deg2rad(azimuths))[:,None]
y = ranges * np.cos(np.deg2rad(azimuths))[:,None]
# Setup a projection
dataproj = Proj(f"+proj=stere +lat_0={latitude} +lat_ts={latitude} +lon_0={longitude} +ellps=WGS84 +units=m")
lons,lats = dataproj(x,y,inverse=True)
...
...
# Plot
im = ax.pcolormesh(lons,lats,data,cmap=cmap_data,norm=norm_cmap)
where the data is a [720,1832] array. The output from plotting looks like below:
Notice how the individual colored pixels move across the latitude and longitude lines. How might I add and/or change the code I have thus far to make the data aligned along lat/lons?
I have a raster of Land Cover data (specifically this one /eodata/auxdata/S2GLC/2017/S2GLC_T32TMS_2017 in https://finder.creodias.eu) that uses 'epsg:32632' as CRS. I want to reproject this raster on 'epsg:21781'. This is what the raster looks like when I open it with xarray.
fn = 'data/S2GLC_T32TMS_2017/S2GLC_T32TMS_2017.tif'
da = xr.open_rasterio(fn).sel(band=1, drop=True)
da
<xarray.DataArray (y: 10980, x: 10980)>
[120560400 values with dtype=uint8]
Coordinates:
* y (y) float64 5.2e+06 5.2e+06 5.2e+06 ... 5.09e+06 5.09e+06 5.09e+06
* x (x) float64 4e+05 4e+05 4e+05 ... 5.097e+05 5.097e+05 5.098e+05
Attributes:
transform: (10.0, 0.0, 399960.0, 0.0, -10.0, 5200020.0)
crs: +init=epsg:32632
res: (10.0, 10.0)
is_tiled: 0
nodatavals: (nan,)
scales: (1.0,)
offsets: (0.0,)
AREA_OR_POINT: Area
INTERLEAVE: BAND
My usual workflow was to transform all the point coordinates, create my destination grid and interpolate using nearest neighbors. Something that looks like this:
import numpy as np
import xarray as xr
import pyproj
from scipy.interpolate import griddata
y = da.y.values
x = da.x.values
xx, yy = np.meshgrid(x,y)
# (n,2) point coordinates in the original CRS
src_coords = np.column_stack([xx.flatten(), yy.flatten()])
transformer = pyproj.transformer.Transformer.from_crs('epsg:32632', 'epsg:21781')
xx, yy = transformer.transform(src_coords[:,0], src_coords[:,1])
# (n,2) point coordinates in the destination CRS, which are not on a regular grid
dst_coords = np.column_stack([xx.flatten(), yy.flatten()])
# I define my destination **regular** grid coordinates
x = np.linspace(620005,719995,10)
y = np.linspace(199995,100005,10)
xx, yy = np.meshgrid(x,y)
dst_grid = np.column_stack([xx.flatten(), yy.flatten()])
# I interpolate onto the grid
reprojected_array = griddata(
src_coords, da.values.flatten(), dst_coords, method='nearest'
).reshape(dst_shape)
Although this method is fairly transparent and (apparently) error-free, it can take very long when dealing with billions of points. Recently, I discovered rasterio's reproject function, and I was blown away by how fast it is. This is how I implemented it:
source = da.values
destination = np.zeros(dst_shape, np.int16)
res, aff = reproject(
source,
destination,
src_transform=src_transform, # affine transformation from original data
src_crs=src_crs,
dst_transform=dst_transform, # affine transformation that corresponds to the grid defined in the other approach
dst_crs=dst_crs,
resampling=Resampling.nearest) # using nearest neighbors just like with scope's griddata
Naturally I wanted to compare the results expecting them to be the same, but they were not, as you can see in the figure.
The resolution is 10 meters so the differences are not large, but after careful comparison with precise satellite data in the 'epsg:21781' coordinates, it looks like the old approach yields better results.
So my questions are:
why do these results differ?
is one approach better than the other? Are there specific conditions where one should prefer one or the other?
Griddata find nearest points in Euclidean distance,
on whatever map projection you give it.
Thus the nearest neighbors from a pipeline like
4326 data points --> reproject --> nearest-Euclidean griddata
query points
depend on the "reproject". Could you try +proj=sinu +lon_0= middle lon
for both data and query ?
What one really wants is a nearest-neighbor engine with great-circle distance,
not Euclidean distance.
The difference may be insignificant for small grids, or near the equator,
but less so in Finland -- cos 61° / cos 60° is ~ 97 %.
TL;DR
Is pyproj.transformer.Transformer.from_crs('epsg:32632', 'epsg:21781')
"correct" ? Don't know.
I see no test suite, and a couple of issues:
warp.reproject() generates the wrong result
roundtrip test \
"Nearest neighbor" is ill-defined / sensitive halfway between data points,
e.g. along the lines x or y = int + 0.5 on an int grid.
This is easy to test with KDTree.
xarray makes regular (Cartesian) grids easy, but afaik does not do
curvilinear (2d) grids.
I work with satellite data organised on an irregular two-dimensional grid whose dimensions are scanline (along track dimension) and ground pixel (across track dimension). Latitude and longitude information for each ground pixel are stored in auxiliary coordinate variables.
Given a (lat, lon) point, I would like to identify the closest ground pixel on my set of data.
Let's build a 10x10 toy data set:
import numpy as np
import xarray as xr
import cartopy.crs as ccrs
import matplotlib.pyplot as plt
%matplotlib inline
lon, lat = np.meshgrid(np.linspace(-20, 20, 10),
np.linspace(30, 60, 10))
lon += lat/10
lat += lon/10
da = xr.DataArray(data = np.random.normal(0,1,100).reshape(10,10),
dims=['scanline', 'ground_pixel'],
coords = {'lat': (('scanline', 'ground_pixel'), lat),
'lon': (('scanline', 'ground_pixel'), lon)})
ax = plt.subplot(projection=ccrs.PlateCarree())
da.plot.pcolormesh('lon', 'lat', ax=ax, cmap=plt.cm.get_cmap('Blues'),
infer_intervals=True);
ax.scatter(lon, lat, transform=ccrs.PlateCarree())
ax.coastlines()
ax.gridlines(draw_labels=True)
plt.tight_layout()
Note that the lat/lon coordinates identify the centre pixel and the pixel boundaries are automatically inferred by xarray.
Now, say I want to identify the closest ground pixel to Rome.
The best way I came up with so far is to use a scipy's kdtree on a stacked flattened lat/lon array:
from scipy import spatial
pixel_center_points = np.stack((da.lat.values.flatten(), da.lon.values.flatten()), axis=-1)
tree = spatial.KDTree(pixel_center_points)
rome = (41.9028, 12.4964)
distance, index = tree.query(rome)
print(index)
# 36
I have then to apply unravel_index to get my scanline/ground_pixel indexes:
pixel_coords = np.unravel_index(index, da.shape)
print(pixel_coords)
# (3, 6)
Which gives me the scanline/ground_pixel coordinates of the (supposedly) closest ground pixel to Rome:
ax = plt.subplot(projection=ccrs.PlateCarree())
da.plot.pcolormesh('lon', 'lat', ax=ax, cmap=plt.cm.get_cmap('Blues'),
infer_intervals=True);
ax.scatter(da.lon[pixel_coords], da.lat[pixel_coords],
marker='x', color='r', transform=ccrs.PlateCarree())
ax.coastlines()
ax.gridlines(draw_labels=True)
plt.tight_layout()
I'm convinced there must me a much more elegant way to approach this problem. In particular, I would like to get rid of the flattening/unraveling steps (all my attempts to build a kdtree on a two-dimensional array failed miserably), and make use of my xarray dataset's variables as much as possible (adding a new centre_pixel dimension for example, and using it as input to KDTree).
I am going to answer my own question as I believe I came up with a decent solution, which is discussed at much greater length on my blog post on this subject.
Geographical distance
First of all, defining the distance between two points on the earth's surface as simply the euclidean distance between the two lat/lon pairs could lead to inaccurate results depending on the distance between two points. It is thus better to transform the coordinates to ECEF coordinates first and built a KD-Tree on the transformed coordinates. Assuming points on the surface of the planet (h=0) the coordinate transformation is done as such:
def transform_coordinates(coords):
""" Transform coordinates from geodetic to cartesian
Keyword arguments:
coords - a set of lan/lon coordinates (e.g. a tuple or
an array of tuples)
"""
# WGS 84 reference coordinate system parameters
A = 6378.137 # major axis [km]
E2 = 6.69437999014e-3 # eccentricity squared
coords = np.asarray(coords).astype(np.float)
# is coords a tuple? Convert it to an one-element array of tuples
if coords.ndim == 1:
coords = np.array([coords])
# convert to radiants
lat_rad = np.radians(coords[:,0])
lon_rad = np.radians(coords[:,1])
# convert to cartesian coordinates
r_n = A / (np.sqrt(1 - E2 * (np.sin(lat_rad) ** 2)))
x = r_n * np.cos(lat_rad) * np.cos(lon_rad)
y = r_n * np.cos(lat_rad) * np.sin(lon_rad)
z = r_n * (1 - E2) * np.sin(lat_rad)
return np.column_stack((x, y, z))
Building the KD-Tree
We could then build the KD-Tree by transforming our dataset coordinates, taking care of flattening the 2D grid to a one-dimensional sequence of lat/lon tuples. This is because the KD-Tree input data needs to be (N,K), where N is the number of point and K is the dimensionality (K=2 in our case, as we assume no heigth component).
# reshape and stack coordinates
coords = np.column_stack((da.lat.values.ravel(),
da.lon.values.ravel()))
# construct KD-tree
ground_pixel_tree = spatial.cKDTree(transform_coordinates(coords))
Querying the tree and indexing the xarray dataset
Querying the tree is now as simple as transforming our point's lat/lon coordinates to ECEF and passing those to the tree's query method:
rome = (41.9028, 12.4964)
index = ground_pixel_tree.query(transform_coordinates(rome))
In doing so though, we need to unravel our index on the original dataset's shape, to get the scanline/ground_pixel indexes:
index = np.unravel_index(index, self.shape)
We could now use the two components to index our original xarray dataset, but we could also build two indexers to use with xarray pointwise indexing feature:
index = xr.DataArray(index[0], dims='pixel'), \
xr.DataArray(index[1], dims='pixel')
Getting the closest pixel is now easy and elegant at the same time:
da[index]
Note that we could also query more than one point at once, and by building the indexers as above, we could still index the dataset with a single call:
da[index]
Which would then return a subset of the dataset containing the closest ground pixels to our query points.
Further readings
Using the euclidean norm on the lat/lon tuples could be accurate enough for smaller distance (thing of it as approximating the earth as flat, it works on a small scale). More details on geographic distances here.
Using a KD-Tree to find the nearest neighbour is not the only way to address this problem, see this very comprehensive article.
An implementation of KD-Tree directly into xarray is in the pipeline.
My blog post on the subject.
I am trying to do a map in 2D coordinates with color defined by a third variable. I already defined the grids by the following command:
b_step = np.linspace(-75,90,12)
l_step = np.linspace(0,360,25)
grid = [(x,y) for x in b_step for y in l_step]
There are three variables in my data set, one is b, l, which are the coordinates, the real data is called s. There are about 7 million datasets. I first want to distribute the data in those grid points, then take average within each grid. Then finally I will use the average s to do map. Anyone has any ideas how to distribute the data in the grid points efficiently and take average?
I know ROOT TH2F (which is a powerful software for High Energy community) can handle it, but I want to write it more pythonic. Thanks.
ROOT TH2F is the best way to handle it efficiently. If you create two TH2F histogram, one tracks the data, the other one tracks total number contributed, then you can calculate the mean value in each grid point. The python code for this is below:
from ROOT import TH2F, gStyle, TCanvas
##### if you want equally distributed grid points.
#h1 = TH2F('h1','h1',l_num,0.0,360.0,b_num,-90.0,90.0)
#h2 = TH2F('h2','h2',l_num,0.0,360.0,b_num,-90.0,90.0)
##### if you want non-equally distributed grid points.
xBins = 37
yBins = 17
xEdges = np.linspace(-185,185,38)
yEdges = np.array([-105.0,-75.0,-60.0,-45.0,-30.0,-15.0,15.0,35.0,40.0,45.0,50.0,55.0,60.0,65.0,70.0,75.0,80.0,100.0])
h1 = TH2F('h1','h1',xBins,xEdges,yBins,yEdges)
h2 = TH2F('h2','h2',xBins,xEdges,yBins,yEdges)
for i in range(data_size):
h1.Fill(x[i],y[i],signal)
h2.Fill(x[i],y[i],1)
for ii in range(1,h1.GetNbinsX()+1):
for jj in range(1,h1.GetNbinsY()+1):
ss = h1.GetBinContent(ii,jj)
nn = h2.GetBinContent(ii,jj)
xx = h1.GetXaxis().GetBinCenter(ii)
yy = h1.GetYaxis().GetBinCenter(jj)
mean = ss/nn
Now you already have the grip coordinates xx and yy, and the data points within it ss, then you can make color plots.
Some satellite based earth observation products provide latitude/longitude information while others provide the X/Y coordinates within a given grid projection (and there are also some having both, see example).
My approach in the second case is to set up a Basemap map which has the same parameters (projection, ellipsoid, origin of map) as given by the data provider in a way that the given X/Y values equal the Basemap coordinates. However if I do so the geolocation does not agree with other data sets including the Basemap coastline.
I have experienced this with three different data sets from different trustworthy sources. For the minimal example I use Landsat data provided by the U.S. Geological Survey which includes both, X/Y coordinates of a South Polar Stereographic grid and the corresponding lat/lon coordinates for all four corners of the image.
From a Landsat metafile we get (ID: LC82171052016079LGN00):
CORNER_UL_LAT_PRODUCT = -66.61490 CORNER_UL_LON_PRODUCT = -61.31816
CORNER_UR_LAT_PRODUCT = -68.74325 CORNER_UR_LON_PRODUCT = -58.04533
CORNER_LL_LAT_PRODUCT = -67.68721 CORNER_LL_LON_PRODUCT = -67.01109
CORNER_LR_LAT_PRODUCT = -69.94052 CORNER_LR_LON_PRODUCT = -64.18581
CORNER_UL_PROJECTION_X_PRODUCT = -2259300.000
CORNER_UL_PROJECTION_Y_PRODUCT = 1236000.000
CORNER_UR_PROJECTION_X_PRODUCT = -1981500.000
CORNER_UR_PROJECTION_Y_PRODUCT = 1236000.000
CORNER_LL_PROJECTION_X_PRODUCT = -2259300.000
CORNER_LL_PROJECTION_Y_PRODUCT = 958500.000
CORNER_LR_PROJECTION_X_PRODUCT = -1981500.000
CORNER_LR_PROJECTION_Y_PRODUCT = 958500.000
...
GROUP = PROJECTION_PARAMETERS MAP_PROJECTION = "PS" DATUM = "WGS84"
ELLIPSOID = "WGS84" VERTICAL_LON_FROM_POLE = 0.00000 TRUE_SCALE_LAT =
-71.00000 FALSE_EASTING = 0 FALSE_NORTHING = 0 GRID_CELL_SIZE_PANCHROMATIC = 15.00 GRID_CELL_SIZE_REFLECTIVE = 30.00
GRID_CELL_SIZE_THERMAL = 30.00 ORIENTATION = "NORTH_UP"
RESAMPLING_OPTION = "CUBIC_CONVOLUTION" END_GROUP =
PROJECTION_PARAMETERS
By using Basemap with the right map projection we should be able to derive the corner lat/lon values from the X/Y values:
import numpy as np
from mpl_toolkits.basemap import Basemap
m=Basemap(resolution='h',projection='spstere', ellps='WGS84', boundinglat=-60,lon_0=180, lat_ts=-71)
x_crn=np.array([-2259300,-1981500,-2259300,-1981500])# upper left, upper right, lower left, lower right
y_crn=np.array([1236000, 1236000, 958500, 958500])# upper left, upper right, lower left, lower right
x0, y0= m(0, -90)
#Basemap coordinates at the south pole
#note that (0,0) of the Basemap is in a corner of the map,
#while other data sets use the south pole.
#This is easy to take into account:
lon_crn, lat_crn = m(x0-x_crn, y0-y_crn, inverse=True)
print 'lon_crn: '+str(lon_crn)
print 'lat_crn: '+str(lat_crn)
Which returns:
lon_crn: [-61.31816102 -58.04532791 -67.01108782 -64.1858106 ]
lat_crn: [-67.23548626 -69.3099076 -68.28071626 -70.47651326]
As you can see the longitudes agree to the given precision with those from the metafile, but the latitudes are to low.
I can approximate the latitudes by:
lat_crn=(lat_crn+90.)*1.0275-90.
But this is really not satisfying.
This is how the image is located if using the X/Y corner coordinates from the metafile (in red the Basemap drawcoastlines()):
and this is how it looks like using the corner lat/lon:
In this case I can simply use the lat/lon coordinates, but as mentioned before there are datasets (like this) which is provided by X/Y coordinates only, which makes it very important to rely on the Basemap projection. I know that there are other modules to re-project the data as a potential workaround, but it should work without other modules and a re-projection could introduce errors itself.
As this problem appears with different data sets I like to believe that it is a bug in the Basemap module, but I might also make the same mistake again and again or have wrong expectations.
I did some experimentation and it seems like changing lat_ts has no effect with projection='spstere'. In fact, it seems as if the projection latitude is implicitly assumed to be lat_ts=-90. regardless of what value you assign.
I had more success using projection='stere' instead, so that you would construct the Basemap in your example as follows:
m=Basemap(width=5400000., height=5400000., projection='stere',
ellps='WGS84', lon_0=180., lat_0=-90., lat_ts=-71.)
You may prefer to set the latitude and longitude of the corners instead of the width and height of the plot for your application.