Spatial resampling of netCDF file in python - python

I am working with Sentinel 3 SLSTR data which comes in netCDF file format. The file contains 11 bands:
S1-S6 (500 m resolution) and S7-S9 and F1 & F2 (1000 m resolution). S1-S6 contains radiance values and S7-S9 contains brightness temperature values. Right now, I want to resample my S7-S9 band to 500 m resolution to match the resolution of S1-S6 bands.
I am using xarray to read the netCDF files. There is a function xarray.Dataset.resample() but the documentation says that it resample to a new temporal resolution.
I also tried to resample using gdal but couldn't get any result.
import gdal
import xarray as xr
import matplotlib.pyplot as plt
data = xr.open_dataset('S7_BT_in.nc') # one of the files in 1000 m resolution
geo = xr.open_dataset(path+'geodetic_an.nc') # file containing the geodetic values
ds = data['S7_BT_in'] # fetching variable I need to work on
lat = geo['latitude_an'] # fetching latitude values
lon = geo['longitude_an'] # fetching longitude values
#assigning latitude and longitude values to the coordinates of ds
ds = ds.assign_coords(coords = {'Latitude': lat, 'Longitude': lon})
x = gdal.Open('ds') # Opening the netCDF file using gdal
# resampling the data to 500 m resolution
xreproj = gdal.Warp('resampled.nc', x, xRes = 500, yRes = 500)
This is the error I am getting:
SystemError: <built-in function wrapper_GDALWarpDestName> returned NULL without setting an error.
I also tried opening the file directly using gdal but still getting the same error.

Related

Issues resampling raster to the resolution of another raster

I am trying to take a population raster and resample+reproject it to match the shape and resolution of a precipitation raster.
Data Links:
Population Data: https://figshare.com/ndownloader/files/10257111
Precipitation Data: https://www.ncei.noaa.gov/data/nclimgrid-monthly/access/nclimgrid_prcp.nc
The Population Data is a series of rasters per decade of 5 different population models covering the continental US. If you simply select one of the rasters I can work out the rest (I have combined into a multiband raster anyways). For example if you use the pop_m4_2010 raster that would help. The resolution is 1x1km, and the projection is Albers Equal Area Conic NAD 83 ESRI:102003.
The Precipitation Data is a netcdf file covering monthly precipitation data for the continental US. The resolution is 5x5km and the projection is WGS84 EPSG:4326.
I converted the netcdf to tiff using the following code:
import xarray as xr
import rioxarray as rio
prcp_file = xr.open_dataset('nclimgrid_prcp.nc')
prp = prcp_file['prcp']
prp = prp.rio.set_spatial_dims(x_dim='lon', y_dim='lat')
prp.rio.write_crs("epsg:4326", inplace=True)
prp.rio.to_raster('prp_raster.tiff')
I also used QGIS to open the population files (add raster layer, navigate into the downloaded folder for pop_m4_2010 and select the "w001001.adf" file). When I do this in a WGS84 project QGIS automatically appears to force reprojection but I am new to this so I am unsure if it is correct.
From this point I have tried several things to resample the population raster to match the 5x5 resolution of the precipitation raster.
In QGIS Processing Toolbox GRASS r.resample
In QGIS Processing Toolbox Raster Layer Zonal Statistics
In Python, honestly I have lost track of all of the different forum posts and tutorials I have followed on GDAL.Warp, Rasterio.Warp, affine transformations, rio.reproject_match, etc. Below are a few examples of the code attempts.
Many of these appear to work (particularly the rio.reproject_match seemed simple and effective). However, none of these appear to be working as intended. When I test the accuracy of the resulting population raster by passing zonal stats of a county vector shapefile the resulting sum of population in the area is either 0, or wildly inaccurate.
What am I doing wrong?
Reproject_Match:
import rioxarray # for the extension to load
import xarray
import matplotlib.pyplot as plt
%matplotlib inline
def print_raster(raster):
print(
f"shape: {raster.rio.shape}\n"
f"resolution: {raster.rio.resolution()}\n"
f"bounds: {raster.rio.bounds()}\n"
f"sum: {raster.sum().item()}\n"
f"CRS: {raster.rio.crs}\n"
)
xds = rioxarray.open_rasterio('pop_m4_2010.tif')
xds_match = rioxarray.open_rasterio('prp_raster.tiff')
fig, axes = plt.subplots(ncols=2, figsize=(12,4))
xds.plot(ax=axes[0])
xds_match.plot(ax=axes[1])
plt.draw()
print("Original Raster:\n----------------\n")
print_raster(xds)
print("Raster to Match:\n----------------\n")
print_raster(xds_match)
xds_repr_match = xds.rio.reproject_match(xds_match)
print("Reprojected Raster:\n-------------------\n")
print_raster(xds_repr_match)
print("Raster to Match:\n----------------\n")
print_raster(xds_match)
xds_repr_match.rio.to_raster("reproj_pop.tif")
Another way with Rasterio.Warp:
import rasterio
from rasterio.warp import calculate_default_transform, reproject, Resampling
#open source raster
srcRst =rasterio.open('pop_m4_2010.tif')
print("source raster crs:")
print(srcRst.crs)
dstCrs = {'init': 'EPSG:4326'}
print("destination raster crs:")
print(dstCrs)
#calculate transform array and shape of reprojected raster
transform, width, height = calculate_default_transform(
srcRst.crs, dstCrs, srcRst.width, srcRst.height, *srcRst.bounds)
print("transform array of source raster")
print(srcRst.transform)
print("transform array of destination raster")
print(transform)
#working of the meta for the destination raster
kwargs = srcRst.meta.copy()
kwargs.update({
'crs': dstCrs,
'transform': transform,
'width': width,
'height': height
})
#open destination raster
dstRst = rasterio.open('pop_m4_2010_reproj4326.tif', 'w', **kwargs)
#reproject and save raster band data
for i in range(1, srcRst.count + 1):
reproject(
source=rasterio.band(srcRst, i),
destination=rasterio.band(dstRst, i),
#src_transform=srcRst.transform,
src_crs=srcRst.crs,
#dst_transform=transform,
dst_crs=dstCrs,
resampling=Resampling.bilinear)
print(i)
#close destination raster
dstRst.close()
And here is a second attempt with Rasterio.Warp:
import rasterio
from rasterio.warp import calculate_default_transform, reproject, Resampling
prcp = rasterio.open('prp_raster.tiff', mode = 'r')
with rasterio.open('pop_m4_2010.tif') as dataset:
# resample data to target shape
data = dataset.read(out_shape=(dataset.count,prcp.height,prcp.width), resampling=Resampling.bilinear)
# scale image transform
transform = dataset.transform * dataset.transform.scale((dataset.width / data.shape[-1]),
(dataset.height / data.shape[-2]))
# Register GDAL format drivers and configuration options with a
# context manager.
with rasterio.Env():
profile = src.profile
profile.update(
dtype=rasterio.float32,
count=1,
compress='lzw')
with rasterio.open('pop_m4_2010_resampledtoprcp.tif', 'w', **profile) as dst:
dst.write(data.astype(rasterio.float32))
This is how you can do that with R.
library(terra)
pop <- rast("USA_HistoricalPopulationDataset/pop_m5_2010")
wth <- rast("nclimgrid_prcp.nc")
wpop <- project(pop, wth, "sum")
Inspect the results.
wpop
#class : SpatRaster
#dimensions : 596, 1385, 1 (nrow, ncol, nlyr)
#resolution : 0.04166666, 0.04166667 (x, y)
#extent : -124.7083, -67, 24.5417, 49.37503 (xmin, xmax, ymin, ymax)
#coord. ref. : lon/lat WGS 84
#source(s) : memory
#name : pop_m5_2010
#min value : 0.0
#max value : 423506.7
global(pop, "sum", na.rm=TRUE)
# sum
#pop_m5_2010 306620886
global(wpop, "sum", na.rm=TRUE)
# sum
#pop_m5_2010 306620761
You can save the results to file with something like this
writeRaster(wpop, "pop.tif")
And you could do this in one step for all population data like this:
ff <- list.files(pattern="0$", "USA_HistoricalPopulationDataset", full=TRUE)
apop <- rast(ff)
wapop <- project(apop, wth, "sum")
The population numbers you are getting are probably wrong because you are using bilinear interpolation when projecting (warping). That is not appropriate for (population) count data. You could first transform it to population density, warp, and transform back. I do that below, getting a result that is similar to what you get with the more direct approach that I have shown above.
csp <- cellSize(pop)
csw <- cellSize(wth[[1]])
popdens <- pop / csp
popdens <- project(popdens, wth, "bilinear")
popcount <- popdens * csw
popcount
#class : SpatRaster
#dimensions : 596, 1385, 1 (nrow, ncol, nlyr)
#resolution : 0.04166666, 0.04166667 (x, y)
#extent : -124.7083, -67, 24.5417, 49.37503 (xmin, xmax, ymin, ymax)
#coord. ref. : lon/lat WGS 84
#source(s) : memory
#name : pop_m5_2010
#min value : 0.0
#max value : 393982.5
global(popcount, "sum", na.rm=TRUE)
# sum
#pop_m5_2010 304906042

How can i open several .nc files as one in python with xarray?

I have 30 .nc files with dimensions: time, lat, lon, pres; coordinates: time, lat, lon, pres and 18 different variables. I would like to open them in one dataset.
If I use xarray.open_mfdataset (), it won't work, as the dimensions of pres are all different.
Is there any way to open them as one anyway?
This is what I tried, and which Error I got:
import xarray as xr
path = ('Data/*.nc')
ds = xr.open_mfdataset(paths = sorted (glob(path)),
chunks={"time_counter":1, "y":100, "x":100},
coords = 'all', combine = 'by_coords')
ValueError: Resulting object does not have monotonic global indexes along dimension PRES

How to remove latitude and longitude (they're constants) from variables in netcdf dataset using xarray?

I have a NetCDF data set I am trying to remove latitude and longitude data from my data variables so they can be indexed properly. The shape of each data variable is always (x, 1, 1) with x being my number of data points and the 1's representing the static longitude and latitudes. I've tried xarray.Dataset.squeeze and xarray.Dataset.drop and xarray.DataArray.drop_vars targeting latitude and longitude. I also tried it with a pandas data frame and the latitude and longitudes still stay glued to my variables and prevent indexing properly. If you copy my code below you will see the variable 'wave_height' shows a shape of (3411,1,1). I want a function that makes this shape just (3411,).
url = 'https://dods.ndbc.noaa.gov/thredds/fileServer/data/stdmet/46077/46077h2021.nc'
reqSpectra = urllib.request.Request(url)
with urllib.request.urlopen(reqSpectra) as respS:
ds_s = xr.open_dataset(io.BytesIO(respS.read()))
wh = ds_s.variables['wave_height']
wh
I would use NumPy's squeeze to remove extra dimensions.
import urllib
import xarray as xr
import numpy as np
import io
# --------------------------------------------------------------------------
url = 'https://dods.ndbc.noaa.gov/thredds/fileServer/data/stdmet/46077/46077h2021.nc'
reqSpectra = urllib.request.Request(url)
with urllib.request.urlopen(reqSpectra) as respS:
ds_s = xr.open_dataset(io.BytesIO(respS.read()))
# --------------------------------------------------------------------------
wh = ds_s.variables['wave_height'];
wh = np.squeeze(wh);
wh

Sentinel3 OLCI (chl) Average of netcdf files on Python

I'm having some troubles with trying to get a monthly average with Sentinel 3 images on... Everything, really. Python, Matlab, we are two people getting stuck in this problem.
The main reason deals with the fact that these images' information is not on a single netcdf file, neatly put with coordinates and products. Instead, they are all in separate files inside a one day folder as
different .nc files with different information each, about one single satellite image. SNAP uses an xmlxs file to work with all of these separate .nc files as I understand it.
Now, I though it would be a good idea to try to merge and create/edit the .nc files as to create a new daily .nc which included the chlorophyll, the coordinates and, might as well add it, time. Later on, I would merge these new ones so to be able to make a monthly mean with xarray. At least that was my idea but I can't do the first part. It might be an obvious solution however here's what I tried, using the xarray module
import os
import numpy as np
import xarray as xr
import netCDF4
from netCDF4 import Dataset
nc_folder = df_try.iloc[0] #folder where the image files are
#open dataset in xarray
nc_chl = xr.open_dataset(str(nc_folder['path']) + '/' + 'chl_nn.nc') #path to chlorophyll file
nc_chl
n_coord =xr.open_dataset(str(nc_folder['path'])+ '/'+ 'geo_coordinates.nc') #path to coordinates file
n_time = xr.open_dataset(str(nc_folder['path'])+ '/' + 'time_coordinates.nc') #path to time file
ds_grid = [[nc_chl], [n_coord], [n_time]]
combined = xr.combine_nested(ds_grid, concat_dim=[None, None])
combined #dataset with all but not recognizing coordinates
ds = combined.rename({'latitude': 'lat', 'longitude': 'lon', 'time_stamp' : 'time'}).set_coords(['lon', 'lat', 'time']) #dataset recognizing coordinates as coordinates
ds
which gives a dataset with
Dimensions: columns 4865 rows: 4091
3 coordinates (lat, lon and time) and the chl variable.
Now, it doesn't save to netcdf4 (I tried but there was an error) but I was also thinking if anyone knew of another way to make an average? I have images from three years (beginning on 2017 to ending on 2019) I would need to average in different ways (monthly, seasonally...). My main current problem is that the chlorophyll values are separate from the geographical coordinates so directly only using the chlorophyll files should not work and would just make a mess.
Any suggestions?
Two options here:
Using xarray
In xarray you can add them as coordinates. It is a bit tricky as the coordinates in the geo_coordinates.nc file are multidimensional as well.
A possible solution is the following:
import netCDF4
import xarray as xr
import matplotlib.pyplot as plt
# paths
root = r'C:<your_path>\S3B_OL_2_WFR____20201015.SEN3\chl_nn.nc' #set path to chl file
coor = r'C:<your_path>\S3B_OL_2_WFR____20201015.SEN3\geo_coordinates.nc' #set path to the coordinates file
# loading xarray datasets
ds = xr.open_dataset(root)
olci_geo_coords = xr.open_dataset(coor)
# extracting coordinates
lat = olci_geo_coords.latitude.data
lon = olci_geo_coords.longitude.data
# assign coordinates to the chl dataset (needs to refer to both the dimensions of our dataset)
ds = ds.assign_coords({"lon":(["rows","columns"], lon), "lat":(["rows","columns"], lat)})
# clip the image (add your own coordinates)
area_of_interest = ds.where((10 < ds.lon) & (ds.lon < 12) & (58 < ds.lat) & (ds.lat < 59), drop=True)
# simple plot with coordinates as axis
plt.figure(figsize=(15,15))
area_of_interest["CHL_NN"].plot(x="lon",y="lat")
Even simpler is to add them as variables in a new dataset:
# path to the folder
root = r'C:<your_path>\S3B_OL_2_WFR____20201015.SEN3\*.nc' #set path to chl file
# create a dataset by combining nc files (coordinates will become variables)
ds = xr.open_mfdataset(root,combine = 'by_coords')
But in this case when you plot the image or clip it you cannot use the coordinates directly.
Using snappy
In python the snappy package is available and based on SNAP toolbox (which is implemented on JAVA). Check: https://senbox.atlassian.net/wiki/spaces/SNAP/pages/19300362/How+to+use+the+SNAP+API+from+Python
Once installed (unfortunately snappy supports only python 2.7, 3.3 or 3.4), you can use the available SNAP function directly on python to aggregate your satellite images and create week/month averages. You then do not need to merge the lon, lat netcdf file as you will work on the xfdumanifest.xml and SNAP will take care of that.
This is an example. It performs aggregation as well (mean calculated on two chl nc files):
from snappy import ProductIO, WKTReader
from snappy import jpy
from snappy import GPF
from snappy import HashMap
# setting the aggregator method
aggregator_average_config = snappy.jpy.get_type('org.esa.snap.binning.aggregators.AggregatorAverage$Config')
agg_avg_chl = aggregator_average_config('CHL_NN')
# creating the hashmap to store the parameters
HashMap = snappy.jpy.get_type('java.util.HashMap')
parameters = HashMap()
#creating the aggregator array
aggregators = snappy.jpy.array('org.esa.snap.binning.aggregators.AggregatorAverage$Config', 1)
#adding my aggregators in the list
aggregators[0] = agg_avg_chl
# set parameters
# output directory
dir_out = 'level-3_py_dynamic.dim'
parameters.put('outputFile', dir_out)
# number of rows (directly linked with resolution)
parameters.put('numRows', 66792) # to have about 300 meters spatial resolution
# aggregators list
parameters.put('aggregators', aggregators)
# Region to clip the aggregation on
wkt="POLYGON ((8.923302175377243 59.55648108694149, 13.488748662344074 59.11388968719029,12.480488185001589 56.690625338725155, 8.212366327767503 57.12425256476263,8.923302175377243 59.55648108694149))"
geom = WKTReader().read(wkt)
parameters.put('region', geom)
# Source product path
path_15 = r"C:<your_path>\S3B_OL_2_WFR____20201015.SEN3\xfdumanifest.xml"
path_16 = r"C:\<your_path>\S3B_OL_2_WFR____20201016.SEN3\xfdumanifest.xml"
path = path_15 + "," + path_16
parameters.put('sourceProductPaths', path)
#result = snappy.GPF.createProduct('Binning', parameters, (source_p1, source_p2))
# create results
result = snappy.GPF.createProduct('Binning', parameters) #to be used with product paths specified in the parameters hashmap
print("results stored in: {0}".format(dir_out) )
I am quite new and interested in the topic and would be happy to hear your/other solutions!

Extracting specific netcdf info and converting to GeoTIFF in python

I am attempting to extract a specific set of data from a netCDF file and then convert said data to a GeoTIFF.
So far I have managed to extract the data I want using netCDF4, all the data in the file are stored as 1d arrays (lat, lon, data I want) and assigning them to a 2d array. The netcdf file that I am working with was subsetted to a specific region. From here however I am at a loss.
I have a slight understanding of how geotiff conversion works via what I have read at these links:
https://borealperspectives.wordpress.com/2014/01/16/data-type-mapping-when-using-pythongdal-to-write-numpy-arrays-to-geotiff/
http://adventuresindevelopment.blogspot.co.uk/2008/12/create-geotiff-with-python-and-gdal.html
And here is what I have currently:
import netCDF4
import numpy as np
from osgeo import gdal
from osgeo import osr
#Reading in data from files and extracting said data
ncfile = netCDF4.Dataset("data.nc", 'r')
dataw = ncfile.variables["dataw"][:]
lat = ncfile.variables["Latitude"][:]
long = ncfile.variables["Longitude"][:]
n = len(dataw)
x = np.zeros((n,3), float)
x[:,0] = long[:]
x[:,1] = lat[:]
x[:,2] = dataw[:]
nx = len(long)
ny = len(lat)
xmin, ymin, xmax, ymax = [long.min(), lat.min(), long.max(), lat.max()]
xres = (xmax - xmin) / float(nx)
yres = (ymax - ymin) / float(ny)
geotransform = (xmin, xres, 0, ymax, 0, -yres)
#Creates 1 raster band file
dst_ds = gdal.GetDriverByName('GTiff').Create('myGeoTIFF.tif', ny, nx, 1, gdal.GDT_Float32)
dst_ds.SetGeoTransform(geotransform) # specify coords
srs = osr.SpatialReference() # establish encoding
srs.ImportFromEPSG(3857) # WGS84 lat/long
dst_ds.SetProjection(srs.ExportToWkt()) # export coords to file
dst_ds.GetRasterBand(1).WriteArray(x) # write r-band to the raster
dst_ds.FlushCache() # write to disk
dst_ds = None # save, close
The process of geotiff creation above I have largely sourced from here:
How do I write/create a GeoTIFF RGB image file in python?
I have attempted this based on the understanding that the array I want to write to the raster is a 2d array with 3 columns, 2 of co-ords and 1 of data. My result which I check in snap is a black page with a white line along the LHS.
So my question is as follows:
How can I extract the necessary geotransform data from my netcdf file, adjust the geotransform parameters appropriately and subsequently use the extracted lat long + dataw arrays to write to a geotiff file?
Try using gdal_translate command line to convert to geotiff file
gdal_translate NETCDF:"<filename.nc>":<varible name> <required_file>.tif

Categories

Resources