How to convert a netCDF4 file to a geoTiff - python

I'm currently trying to get Tropomi data in geoTiff format. I downloaded some data in netCDF4 format. This way I obtain three numpy arrays. one with latitude coordinates, one with longitude coordinates and one with carbon-mono-oxide values.
So I have a matrix with values for my raster and of each value I know the longitude and latitude of that respective value.
With this information how can I construct a georeferenced raster?
I read in the data as follows
import netCDF4
from netCDF4 import Dataset
import numpy as np
file = '/home/daniel/Downloads/S5P_NRTI_L2__CO_____20190430T171319_20190430T171819_08006_01_010301_20190430T175151.nc'
rootgrp = Dataset(file, "r",format="NETCDF4")
lat = rootgrp.groups['PRODUCT']['latitude'][:]
lon = rootgrp.groups['PRODUCT']['longitude'][:]
carbon = rootgrp.groups['PRODUCT']['carbonmonoxide_total_column'][:]
obtaining 3 matrices with shape (1,290,215)
Now I would like to convert this to a Mercator projected geoTIFF, but I do not know how to go about it.

the gdal_translate option seems to work. But here is an alternative explicit way I did it.
#importing packages
import numpy as np
from scipy import interpolate
from netCDF4 import Dataset
from shapely.geometry import Point
import geopandas as gpd
from geopy.distance import geodesic
import rasterio
import matplotlib.pyplot as plt
#load data
file = '/home/daniel/Ellipsis/db/downloaded/rawtropomi/S5P_NRTI_L2__CO_____20190430T171319_20190430T171819_08006_01_010301_20190430T175151.nc'
rootgrp = Dataset(file, "r",format="NETCDF4")
lat = rootgrp.groups['PRODUCT']['latitude'][:]
lon = rootgrp.groups['PRODUCT']['longitude'][:]
carbon = rootgrp.groups['PRODUCT']['carbonmonoxide_total_column'][:]
carbon = carbon.filled(0)
lat = lat.filled(-1000)
lon = lon.filled(-1000)
carbon = carbon.flatten()
lat = lat.flatten()
lon = lon.flatten()
#calculate the real distance between corners and get the widht and height in pixels assuming you want a pixel resolution of at least 7 by 7 kilometers
w = max(geodesic((min(lat),max(lon)), (min(lat),min(lon))).meters/7000 , geodesic((max(lat),max(lon)), (max(lat),min(lon))).meters/14000)
h = geodesic((min(lat),max(lon)), (max(lat),max(lon))).meters/14000
# create a geopandas with as its rows the latitude, longitude an the measrument values. transfrom it to the webmercator projection (or projection of your choosing)
points = [Point(xy) for xy in zip(lon, lat)]
crs = {'init': 'epsg:4326'}
data = gpd.GeoDataFrame({'value':carbon}, crs=crs, geometry=points)
data = data.to_crs({'init': 'epsg:3395'})
data['lon'] = data.bounds['maxx'].values
data['lat'] = data.bounds['maxy'].values
#make grid of coordinates. You nee de calculate the coordinate of each pixel in the desired raster
minlon = min(data['lon'])
maxlon = max(data['lon'])
minlat = min(data['lat'])
maxlat = max(data['lat'])
lon_list = np.arange(minlon, maxlon, (maxlon-minlon)/w )
lat_list = np.arange(minlat, maxlat, (maxlat-minlat)/h)
lon_2d, lat_2d = np.meshgrid(lon_list, lat_list)
#use the values in the geopandas dataframe to interpolate values int the coordinate raster
r = interpolate.griddata(points = (data['lon'].values,data['lat'].values), values = data['value'].values, xi = (lon_2d, lat_2d))
r = np.flip(r, axis = 0)
#check result
plt.imshow(r)
#save raster
transform = rasterio.transform.from_bounds(south = minlat, east = maxlon, north = maxlat, west = minlon, width = r.shape[1], height = r.shape[2] )
file_out = 'test.tiff'
new_dataset = rasterio.open(file_out , 'w', driver='Gtiff', compress='lzw',
height = r.shape[1], width = r.shape[2],
count= r.shape[0], dtype=str( r.dtype),
crs= data.crs,
transform= transform)
new_dataset.write(r)
new_dataset.close()

I would suggest looking at this answer here using gdal_translate:
Convert NetCDF (.nc) to GEOTIFF
gdal_translate -of GTiff file.nc test.tiff

Related

Interpolation and increasing spatial resolution of netfdf data

How can I increase the resolution of netfdf data for feeding to CNN in python?
Is there any function in xarray to do the same?
Thanks in advance.
You can use xarray interpolation to achieve what you want. You have to define the new latitude and longitude you'd like to interpolate to and pass it to the xarray interp function. See example below
import xarray as xr
import numpy as np
import matplotlib.pyplot as plt
# Create sample data
dx = 0.25
lon = np.arange(0, 360, dx)
lat = np.arange(-90, 90+dx, dx)
data = 10 * np.random.rand(len(lat), len(lon))
data_set = xr.Dataset({"temp": (["lat", "lon"], data)},
coords={"lon": lon,"lat": lat})
# Just checking the datasets are not empty
print(data_set)
# Create new lat and lon
dx_new = 0.125
newlon = np.arange(0, 360, dx_new)
newlat = np.arange(-90, 90+dx_new, dx_new)
# Interpolate
data_set_interp = data_set.interp(lat=newlat, lon=newlon)
# Check output
print(data_set_interp)

Aligning a shapefile to raster and assign values to overlay, then return as array?

My goal is to align a shapefile to a raster basemap, and assign 1 to the cells that overlap and 0 to the ones that don't, eventually returning an array that contains lat, lon, time, and the binary variable (1/0).
Here's the plan: 1) create raster of region from array, 2) rasterize polygon shapefiles, 3) align rasterized shapefiles with base raster, 4) pixels that overlap will be assigned 1 and those that don't will be 0, 5) convert rasters to array.
I've been able to do steps 1 & 2 (see code below), but I've been stuck on step 3 for a long time. How do I align the two rasters?
You can find the files here:
https://www.dropbox.com/sh/pecptfepac18s2y/AADbxFkKWlLqMdiHh-ICt4UYa?dl=0
Here's the code I used to create a flat grid of BC as basemap:
import gdal, osr
import numpy as np
#define parameters
#units = km
grid_size = 5
BC_width = 700
BC_length = 1800
def array2raster(newRasterfn,rasterOrigin,pixelWidth,pixelHeight,array):
cols = array.shape[1]
rows = array.shape[0]
originX = rasterOrigin[0]
originY = rasterOrigin[1]
driver = gdal.GetDriverByName('GTiff')
outRaster = driver.Create(newRasterfn, cols, rows, 1, gdal.GDT_Byte)
outRaster.SetGeoTransform((originX, pixelWidth, 0, originY, 0, pixelHeight))
outband = outRaster.GetRasterBand(1)
outband.WriteArray(array)
outRasterSRS = osr.SpatialReference()
outRasterSRS.ImportFromEPSG(4326)
outRaster.SetProjection(outRasterSRS.ExportToWkt())
outband.FlushCache()
def main(newRasterfn,rasterOrigin,pixelWidth,pixelHeight,array):
reversed_arr = array[::-1] # reverse array so the tif looks like the array
array2raster(newRasterfn,rasterOrigin,pixelWidth,pixelHeight,reversed_arr) # convert array to raster
if __name__ == "__main__":
array = np.zeros([int(BC_length/grid_size),int(BC_width/grid_size)]) #140x360
for i in range(1,100):
array[i] = 100
rasterOrigin = (-139.72938, 47.655534) #lower left corner of raster
newRasterfn = '/temp/test.tif'
cols = array.shape[1] #shape of an array (aka # of elements in each dimension)
rows = array.shape[0]
originX = rasterOrigin[0]
originY = rasterOrigin[1]
pixelWidth = 5
pixelHeight = 5
Here's the code I used to rasterize polygon shapefiles
import ogr, gdal, osr
output_raster = '/testdata/poly.tif'
shapefile = "/testdata/20180808.shp"
def main(shapefile):
#making the shapefile as an object.
input_shp = ogr.Open(shapefile)
#getting layer information of shapefile.
shp_layer = input_shp.GetLayer()
#pixel_size determines the size of the new raster.
#pixel_size is proportional to size of shapefile.
pixel_size = 0.1
#get extent values to set size of output raster.
x_min, x_max, y_min, y_max = shp_layer.GetExtent()
#calculate size/resolution of the raster.
x_res = int((x_max - x_min) / pixel_size)
y_res = int((y_max - y_min) / pixel_size)
#get GeoTiff driver by
image_type = 'GTiff'
driver = gdal.GetDriverByName(image_type)
#passing the filename, x and y direction resolution, no. of bands, new raster.
new_raster = driver.Create(output_raster, x_res, y_res, 1, gdal.GDT_Byte)
#transforms between pixel raster space to projection coordinate space.
new_raster.SetGeoTransform((x_min, pixel_size, 0, y_min, 0, pixel_size))
#get required raster band.
band = new_raster.GetRasterBand(1)
#assign no data value to empty cells.
no_data_value = -9999
band.SetNoDataValue(no_data_value)
band.FlushCache()
#main conversion method
gdal.RasterizeLayer(new_raster, [1], shp_layer, burn_values=[255])
#adding a spatial reference
new_rasterSRS = osr.SpatialReference()
new_rasterSRS.ImportFromEPSG(4326)
new_raster.SetProjection(new_rasterSRS.ExportToWkt())
return output_raster
I'm doing everything in Python as I don't have access or funding to paid GIS software. I'm totally new to geospatial data processing... not sure if I'm taking the right approach. Any help would be amazing.
Checkout 'rasterio.mask.mask' from the rasterio library. I think it will help.

Python: passing coordinates from list to function

I am using some code from a workshop to extract data from netCDF files by the coordinates closest to my specified coordinates. When using just one set of coordinates I am able to extract the values I need without trouble as below:
import numpy as np
import netCDF4
from math import pi
from numpy import cos, sin
def tunnel_fast(latvar,lonvar,lat0,lon0):
'''
Find closest point in a set of (lat,lon) points to specified point
latvar - 2D latitude variable from an open netCDF dataset
lonvar - 2D longitude variable from an open netCDF dataset
lat0,lon0 - query point
Returns iy,ix such that the square of the tunnel distance
between (latval[it,ix],lonval[iy,ix]) and (lat0,lon0)
is minimum.
'''
rad_factor = pi/180.0 # for trignometry, need angles in radians
# Read latitude and longitude from file into numpy arrays
latvals = latvar[:] * rad_factor
lonvals = lonvar[:] * rad_factor
ny,nx = latvals.shape
lat0_rad = lat0 * rad_factor
lon0_rad = lon0 * rad_factor
# Compute numpy arrays for all values, no loops
clat,clon = cos(latvals),cos(lonvals)
slat,slon = sin(latvals),sin(lonvals)
delX = cos(lat0_rad)*cos(lon0_rad) - clat*clon
delY = cos(lat0_rad)*sin(lon0_rad) - clat*slon
delZ = sin(lat0_rad) - slat;
dist_sq = delX**2 + delY**2 + delZ**2
minindex_1d = dist_sq.argmin() # 1D index of minimum element
iy_min,ix_min = np.unravel_index(minindex_1d, latvals.shape)
return iy_min,ix_min
ncfile = netCDF4.Dataset('E:\wind_level2_1.nc', 'r')
latvar = ncfile.variables['latitude']
lonvar = ncfile.variables['longitude']
#_________GG turbine_________GAD10 Latitude 51.735516, GAD10 Longitude 1.942656
iy,ix = tunnel_fast(latvar, lonvar, 51.735516, 1.942656)
print('Closest lat lon:', latvar[iy,ix], lonvar[iy,ix])
refLAT=latvar[iy,ix]
refLON = lonvar[iy,ix]
#try to find the data for this location
SARwind = ncfile.variables['sar_wind'][:,:]
ModelWind = ncfile.variables['model_speed'][:,:]
print 'iy,ix' #appears to be the index of the value of Lat,lon
print SARwind[iy,ix]
ncfile.close()
Now I am trying to loop through a text file containing coordinates coord_list to extract sets of coordinates, find the data then move to the next set of coordinates in the list. This code works on it's own as below:
import csv
from decimal import Decimal
with open('Turbine_locs_no_header.csv','rb') as f:
reader = csv.reader(f)
#coord_list = list(reader)
coord_list = [reader]
end_row = len(coord_list)
lon_ind=1
lat_ind=2
for row in range(0, end_row-1):#end_row - 1 due to the 0 index
turbine_lat = coord_list[row][lat_ind]
turbine_lon = coord_list[row][lon_ind]
turbine_lat = [Decimal(turbine_lat)]
print 'lat',turbine_lat, 'lon',turbine_lon, row
However, I want to pass coordinates from the text file to this part of the original code iy,ix = tunnel_fast(latvar, lonvar, 51.94341, 1.922094888), replacing the numbers with variables iy, ix = tunnel_fast(latvar, lonvar, turbine_lat, turbine_lon). I try to combine the two codes by creating a function get_coordinates, I get the following errors
File "C:/Users/mm/test_nc_bycoords_GG_turbines_AGW.py", line 65, in <module>
get_coordinates(coord_list, latvar, lonvar)
File "C:/Users/mm/test_nc_bycoords_GG_turbines_AGW.py", line 51, in get_coordinates
iy, ix = tunnel_fast(latvar, lonvar, turbine_lat, turbine_lon)
File "C:/Users/mm/test_nc_bycoords_GG_turbines_AGW.py", line 27, in tunnel_fast
lat0_rad = lat0 * rad_factor
TypeError: can't multiply sequence by non-int of type 'float'
I thought this is because the turbine_lat and turbine_lon are list items so cannot be used, but this doesn't seem to be connected to the errors. I know this code needs more work anyway, but if anyone could help me spot where I am going wrong that would be very helpful. My attempt to combine the two codes is below.
import numpy as np
import netCDF4
from math import pi
from numpy import cos, sin
import csv
# edited from https://github.com/Unidata/unidata-python-workshop/blob/a56daa50d7b343c7debe93968683613642d6b9f7/notebooks/netcdf-by-coordinates.ipynb
def tunnel_fast(latvar,lonvar,lat0,lon0):
'''
Find closest point in a set of (lat,lon) points to specified point
latvar - 2D latitude variable from an open netCDF dataset
lonvar - 2D longitude variable from an open netCDF dataset
lat0,lon0 - query point
Returns iy,ix such that the square of the tunnel distance
between (latval[it,ix],lonval[iy,ix]) and (lat0,lon0)
is minimum.
'''
rad_factor = pi/180.0 # for trignometry, need angles in radians
# Read latitude and longitude from file into numpy arrays
latvals = latvar[:] * rad_factor
lonvals = lonvar[:] * rad_factor
ny,nx = latvals.shape
lat0_rad = lat0 * rad_factor
lon0_rad = lon0 * rad_factor
# Compute numpy arrays for all values, no loops
clat,clon = cos(latvals),cos(lonvals)
slat,slon = sin(latvals),sin(lonvals)
delX = cos(lat0_rad)*cos(lon0_rad) - clat*clon
delY = cos(lat0_rad)*sin(lon0_rad) - clat*slon
delZ = sin(lat0_rad) - slat;
dist_sq = delX**2 + delY**2 + delZ**2
minindex_1d = dist_sq.argmin() # 1D index of minimum element
iy_min,ix_min = np.unravel_index(minindex_1d, latvals.shape)
return iy_min,ix_min
#________________my edits___________________________________________________
def get_coordinates(coord_list, latvar, lonvar):
"this takes coordinates from a .csv and assigns them to variables"
end_row = len(coord_list)
lon_ind=1
lat_ind=2
for row in range(0, end_row-1):#end_row - 1 due to the 0 index
turbine_lat = coord_list[row][lat_ind]
turbine_lon = coord_list[row][lon_ind]
iy, ix = tunnel_fast(latvar, lonvar, turbine_lat, turbine_lon)
print('Closest lat lon:', latvar[iy, ix], lonvar[iy, ix])
#________________________________________________________________________________________________________________________
ncfile = netCDF4.Dataset('NOGAPS_wind_level2_1.nc', 'r')
latvar = ncfile.variables['latitude']
lonvar = ncfile.variables['longitude']
#____added in to pass to get coordinates function
with open('Turbine_locs_no_header.csv','rb') as f:
reader = csv.reader(f)
coord_list = list(reader)
#_________take latitude from coordinateas function
get_coordinates(coord_list, latvar, lonvar)
#iy,ix = tunnel_fast(latvar, lonvar, turbine_lat, turbine_lon)#get these from the 'assign_coordinates_fromlist.py
#print('Closest lat lon:', latvar[iy,ix], lonvar[iy,ix])
SARwind = ncfile.variables['sar_wind'][:,:]
ModelWind = ncfile.variables['model_speed'][:,:]
print 'iy,ix' #appears to be the index of the value of Lat,lon
print SARwind[iy,ix]
ncfile.close()
When I try to convert
You can unpack an argument list using *args (see the docs). In your case you could do tunnel_fast(latvar, lonvar, *coord_list[row]). You need to make sure that the order of arguments in coord_list[row] is correct and if coord_list[row] contains more than the two values then you need to slice it appropriately.
Thanks to help from a_guest
It was a simple problem of lat0 and lon0 being passed as
<type 'str'> to tunnel_fast when it requires <type 'float'>. This appears to come from loading the coord_list as a list.
with open('Turbine_locs_no_header.csv','rb') as f:
reader = csv.reader(f)
coord_list = list(reader)
The workaround I used was to convert lat0 and lon0 to floats at the beginning of tunnel_fast
lat0 = float(lat0)
lon0 = float(lon0)
I am sure there is a more elegant way to do this, but it works.

Covariance/heat flux in Python

I'm looking to compute poleward heat fluxes at a level in the atmosphere, i.e the mean of (u't') . I'm aware of the covariance function in NumPy, but cannot seem to implement it. Here is my code below.
from netCDF4 import Dataset
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
myfile = '/home/ubuntu/Fluxes_Test/out.nc'
Import = Dataset(myfile, mode='r')
lon = Import.variables['lon'][:] # Longitude
lat = Import.variables['lat'][:] # Latitude
time = Import.variables['time'][:] # Time
lev = Import.variables['lev'][:] # Level
wind = Import.variables['ua'][:]
temp = Import.variables['ta'][:]
lon = lon-180 # to shift co-ordinates to -180 to 180.
variable1 = np.squeeze(wind,temp, axis=0)
variable2 = np.cov(variable1)
m = Basemap(resolution='l')
lons, lats = np.meshgrid(lon,lat)
X, Y = m(lons, lats)
cs = m.pcolor(X,Y, variable2)
plt.show()
The shape of the variables wind and temp which I am trying to compute the flux of (the covariance) are both (3960,64,128), so 3960 pieces of data on a 64x128 grid (with co-ordinates).
I tried squeezing both variables to produce a array of (3960, 3960, 64,128) so cov could work on these first two series of data (the two 3960's) of wind and temp, but this didn't work.

Python: Interpolation from an irregular 2d grid to a regular one

I'd like to map a distribution of values of an irregular grid on a regular one.
I'm trying with the different interpolators but it looks I'm not able to do it.
Here there is the code I've written:
import numpy as np
from scipy import interpolate
import matplotlib.pyplot as plt
N = 100
M = 10
lat = ((np.random.rand(M,N))*2)+0.2
lon = ((np.random.rand(M,N))*3)+0.2
theta = ((np.random.rand(M,N))*180)
lat_min = np.min(lat)
lat_max = np.max(lat)
lon_min = np.min(lon)
lon_max = np.max(lon)
dlat = 0.1 # regular step for the lat[rad]
dlon = 0.1 # regular step for the lon[rad]
# Grid dimensions
Nlat = np.int(np.abs(lat_max-lat_min)/dlat)+1
Nlon = np.int(np.abs(lon_max-lon_min)/dlon)+1
# Lat-Lon vector
reg_lat = np.linspace(lat_min, lat_max, Nlat) # regularly spaced latitude vector
reg_lon = np.linspace(lon_min, lon_max, Nlon) # regularly spaced longitude vector
# Lat-Lon regular Grid
reg_lon_mesh, reg_lat_mesh = np.meshgrid(reg_lon, reg_lat)
I've used:
theta2 = scipy.interpolate.griddata((lon.ravel(), lat.ravel()), theta.ravel(),(reg_lon_mesh, reg_lat_mesh), method='cubic')
but the interpolation seems wrong
and
f = interpolate.interp2d(lon.ravel(), lat.ravel(), theta,kind='cubic')
and it rises the warning: A theoretically impossible results when finding a smoothin spline
with fp = s. Probably causes: s too small or badly chosen eps.
(abs(fp-s)/s>0.001)
kx,ky=3,3 nx,ny=36,34 m=1000 fp=14832451.907306 s=0.000000

Categories

Resources