Cut NetCDF files by shapefile - python

I have a large dataset of global .nc files and I am trying to clip them to a smaller area. I have this area stored as a .shp file.
I have tried using gdal from Qgis but needs to do this by converting each variable and I must select each variable and same shape for all files one by one and with 400 files going trough each variable seems not the best idea. Also this returns .tiff files separated and not the .nc file that i am aiming for.
I had this little script but its not doing what i need
import glob
import subprocess
import os
ImageList = sorted(glob.glob('*.nc'))
print('number of images to process: ', len(ImageList))
Shapefile = 'NHAF-250m.shp'
# Create output directory
OutDir = './Clipped_Rasters/'
if not os.path.exists(OutDir):
os.makedirs(OutDir)
for Image in ImageList:
print('Processing ' + Image)
OutImage = OutDir + Image.replace('.nc', '_BurnedArea_Clipped.tif') # Defines Output Image
# Clip image
subprocess.call('gdalwarp -q -cutline /Users/path/to/file/NHAF-250-vector/ -tr 0.25 0.25 -of GTiff NETCDF:'+Image+":burned_area "+OutImage, shell=True)
print('Done.' + '\n')
print('All images processed.')
Thank you in advance

I recommend to use xarray to handle netcdf data and geopandas + rasterio to handle your Shapefile.
import geopandas
import xarray
import rasterio
import glob
shapefile = 'NHAF-250m.shp'
sf = geopandas.read_file(shapefile)
shape_mask = rasterio.features.geometry_mask(sf.iloc[0],
out_shape=(len(ndvi.y), len(ndvi.x)),
transform=ndvi.geobox.transform,
invert=True)
shape_mask = xarray.DataArray(shape_masj , dims=("y", "x"))
file_list = sorted(glob.glob('*.nc'))
for file in file_list:
nc_file = xarray.open_dataset(file)
# Then apply the mask
masked_netcdf_file = nc_file.where(shape_mask == True, drop=True)
# store again as netcdf or do what every you want with the masked array

Related

Sentinel3 OLCI (chl) Average of netcdf files on Python

I'm having some troubles with trying to get a monthly average with Sentinel 3 images on... Everything, really. Python, Matlab, we are two people getting stuck in this problem.
The main reason deals with the fact that these images' information is not on a single netcdf file, neatly put with coordinates and products. Instead, they are all in separate files inside a one day folder as
different .nc files with different information each, about one single satellite image. SNAP uses an xmlxs file to work with all of these separate .nc files as I understand it.
Now, I though it would be a good idea to try to merge and create/edit the .nc files as to create a new daily .nc which included the chlorophyll, the coordinates and, might as well add it, time. Later on, I would merge these new ones so to be able to make a monthly mean with xarray. At least that was my idea but I can't do the first part. It might be an obvious solution however here's what I tried, using the xarray module
import os
import numpy as np
import xarray as xr
import netCDF4
from netCDF4 import Dataset
nc_folder = df_try.iloc[0] #folder where the image files are
#open dataset in xarray
nc_chl = xr.open_dataset(str(nc_folder['path']) + '/' + 'chl_nn.nc') #path to chlorophyll file
nc_chl
n_coord =xr.open_dataset(str(nc_folder['path'])+ '/'+ 'geo_coordinates.nc') #path to coordinates file
n_time = xr.open_dataset(str(nc_folder['path'])+ '/' + 'time_coordinates.nc') #path to time file
ds_grid = [[nc_chl], [n_coord], [n_time]]
combined = xr.combine_nested(ds_grid, concat_dim=[None, None])
combined #dataset with all but not recognizing coordinates
ds = combined.rename({'latitude': 'lat', 'longitude': 'lon', 'time_stamp' : 'time'}).set_coords(['lon', 'lat', 'time']) #dataset recognizing coordinates as coordinates
ds
which gives a dataset with
Dimensions: columns 4865 rows: 4091
3 coordinates (lat, lon and time) and the chl variable.
Now, it doesn't save to netcdf4 (I tried but there was an error) but I was also thinking if anyone knew of another way to make an average? I have images from three years (beginning on 2017 to ending on 2019) I would need to average in different ways (monthly, seasonally...). My main current problem is that the chlorophyll values are separate from the geographical coordinates so directly only using the chlorophyll files should not work and would just make a mess.
Any suggestions?
Two options here:
Using xarray
In xarray you can add them as coordinates. It is a bit tricky as the coordinates in the geo_coordinates.nc file are multidimensional as well.
A possible solution is the following:
import netCDF4
import xarray as xr
import matplotlib.pyplot as plt
# paths
root = r'C:<your_path>\S3B_OL_2_WFR____20201015.SEN3\chl_nn.nc' #set path to chl file
coor = r'C:<your_path>\S3B_OL_2_WFR____20201015.SEN3\geo_coordinates.nc' #set path to the coordinates file
# loading xarray datasets
ds = xr.open_dataset(root)
olci_geo_coords = xr.open_dataset(coor)
# extracting coordinates
lat = olci_geo_coords.latitude.data
lon = olci_geo_coords.longitude.data
# assign coordinates to the chl dataset (needs to refer to both the dimensions of our dataset)
ds = ds.assign_coords({"lon":(["rows","columns"], lon), "lat":(["rows","columns"], lat)})
# clip the image (add your own coordinates)
area_of_interest = ds.where((10 < ds.lon) & (ds.lon < 12) & (58 < ds.lat) & (ds.lat < 59), drop=True)
# simple plot with coordinates as axis
plt.figure(figsize=(15,15))
area_of_interest["CHL_NN"].plot(x="lon",y="lat")
Even simpler is to add them as variables in a new dataset:
# path to the folder
root = r'C:<your_path>\S3B_OL_2_WFR____20201015.SEN3\*.nc' #set path to chl file
# create a dataset by combining nc files (coordinates will become variables)
ds = xr.open_mfdataset(root,combine = 'by_coords')
But in this case when you plot the image or clip it you cannot use the coordinates directly.
Using snappy
In python the snappy package is available and based on SNAP toolbox (which is implemented on JAVA). Check: https://senbox.atlassian.net/wiki/spaces/SNAP/pages/19300362/How+to+use+the+SNAP+API+from+Python
Once installed (unfortunately snappy supports only python 2.7, 3.3 or 3.4), you can use the available SNAP function directly on python to aggregate your satellite images and create week/month averages. You then do not need to merge the lon, lat netcdf file as you will work on the xfdumanifest.xml and SNAP will take care of that.
This is an example. It performs aggregation as well (mean calculated on two chl nc files):
from snappy import ProductIO, WKTReader
from snappy import jpy
from snappy import GPF
from snappy import HashMap
# setting the aggregator method
aggregator_average_config = snappy.jpy.get_type('org.esa.snap.binning.aggregators.AggregatorAverage$Config')
agg_avg_chl = aggregator_average_config('CHL_NN')
# creating the hashmap to store the parameters
HashMap = snappy.jpy.get_type('java.util.HashMap')
parameters = HashMap()
#creating the aggregator array
aggregators = snappy.jpy.array('org.esa.snap.binning.aggregators.AggregatorAverage$Config', 1)
#adding my aggregators in the list
aggregators[0] = agg_avg_chl
# set parameters
# output directory
dir_out = 'level-3_py_dynamic.dim'
parameters.put('outputFile', dir_out)
# number of rows (directly linked with resolution)
parameters.put('numRows', 66792) # to have about 300 meters spatial resolution
# aggregators list
parameters.put('aggregators', aggregators)
# Region to clip the aggregation on
wkt="POLYGON ((8.923302175377243 59.55648108694149, 13.488748662344074 59.11388968719029,12.480488185001589 56.690625338725155, 8.212366327767503 57.12425256476263,8.923302175377243 59.55648108694149))"
geom = WKTReader().read(wkt)
parameters.put('region', geom)
# Source product path
path_15 = r"C:<your_path>\S3B_OL_2_WFR____20201015.SEN3\xfdumanifest.xml"
path_16 = r"C:\<your_path>\S3B_OL_2_WFR____20201016.SEN3\xfdumanifest.xml"
path = path_15 + "," + path_16
parameters.put('sourceProductPaths', path)
#result = snappy.GPF.createProduct('Binning', parameters, (source_p1, source_p2))
# create results
result = snappy.GPF.createProduct('Binning', parameters) #to be used with product paths specified in the parameters hashmap
print("results stored in: {0}".format(dir_out) )
I am quite new and interested in the topic and would be happy to hear your/other solutions!

Resample image rasterio/gdal, Python

How can I resample a single band GeoTIFF using Bilinear interpolation?
import os
import rasterio
from rasterio.enums import Resampling
from rasterio.plot import show,show_hist
import numpy as np
if __name__ == "__main__":
input_Dir = 'sample.tif'
#src = rasterio.open(input_Dir)
#show(src,cmap="magma")
upscale_factor = 2
with rasterio.open(input_Dir) as dataset:
# resample data to target shape
data = dataset.read(
out_shape=(
dataset.count,
int(dataset.height * upscale_factor),
int(dataset.width * upscale_factor)
),
resampling=Resampling.bilinear
)
# scale image transform
transform = dataset.transform * dataset.transform.scale(
(dataset.width / data.shape[-1]),
(dataset.height / data.shape[-2])
)
show(dataset,cmap="magma",transform=transform)
I have tried the following code and my output is as follows:
I am trying to achieve the following output:
One option would be to use the GDAL python bindings. Then you can perform the resample in memory (or you can save the image if you want). Assuming the old raster resolution was 0.25x0.25 and you're resampling to 0.10x0.10:
from osgeo import gdal
input_Dir = 'sample.tif'
ds = gdal.Translate('', input_Dir, xres=0.1, yres=0.1, resampleAlg="bilinear", format='vrt')
If you want to save the image put output filepath instead of the empty string for the first argument and change the format to 'tif'!

How to georeference an unreferenced aerial image using ground control points in python

I have a series of unreferenced aerial images that I would like to georeference using python. The images are identical spatially (they are actually frames extracted from a video), and I obtained ground control points for them by manually georeferencing one frame in ArcMap. I would like to apply the ground control points I obtained to all the subsequent images, and as a result obtain a geo-tiff or a jpeg file with a corresponding world file (.jgw) for each processed image. I know this is possible to do using arcpy, but I do not have access to arcpy, and would really like to use a free open source module if possible.
My coordinate system is NZGD2000 (epsg 2193), and here is the table of control points I wish to apply to my images:
176.412984, -310.977264, 1681255.524654, 6120217.357425
160.386905, -141.487145, 1681158.424227, 6120406.821253
433.204947, -310.547238, 1681556.948690, 6120335.658359
Here is an example image: https://imgur.com/a/9ThHtOz
I've read a lot of information on GDAL and rasterio, but I don't have any experience with them, and am failing to adapt bits of code I found to my particular situation.
Rasterio attempt:
import cv2
from rasterio.warp import reproject
from rasterio.control import GroundControlPoint
from fiona.crs import from_epsg
img = cv2.imread("Example_image.jpg")
# Creating ground control points (not sure if I got the order of variables right):
points = [(GroundControlPoint(176.412984, -310.977264, 1681255.524654, 6120217.357425)),
(GroundControlPoint(160.386905, -141.487145, 1681158.424227, 6120406.821253)),
(GroundControlPoint(433.204947, -310.547238, 1681556.948690, 6120335.658359))]
# The function requires a parameter "destination", but I'm not sure what to put there.
# I'm guessing this may not be the right function to use
reproject(img, destination, src_transform=None, gcps=points, src_crs=from_epsg(2193),
src_nodata=None, dst_transform=None, dst_crs=from_epsg(2193), dst_nodata=None,
src_alpha=0, dst_alpha=0, init_dest_nodata=True, warp_mem_limit=0)
GDAL attempt:
from osgeo import gdal
import osr
inputImage = "Example_image.jpg"
outputImage = "image_gdal.jpg"
dataset = gdal.Open(inputImage)
I = dataset.ReadAsArray(0,0,dataset.RasterXSize,dataset.RasterYSize)
outdataset = gdal.GetDriverByName('GTiff')
output_SRS = osr.SpatialReference()
output_SRS.ImportFromEPSG(2193)
outdataset = outdataset.Create(outputImage,dataset.RasterXSize,dataset.RasterYSize,I.shape[0])
for nb_band in range(I.shape[0]):
outdataset.GetRasterBand(nb_band+1).WriteArray(I[nb_band,:,:])
# Creating ground control points (not sure if I got the order of variables right):
gcp_list = []
gcp_list.append(gdal.GCP(176.412984, -310.977264, 1681255.524654, 6120217.357425))
gcp_list.append(gdal.GCP(160.386905, -141.487145, 1681158.424227, 6120406.821253))
gcp_list.append(gdal.GCP(433.204947, -310.547238, 1681556.948690, 6120335.658359))
outdataset.SetProjection(srs.ExportToWkt())
wkt = outdataset.GetProjection()
outdataset.SetGCPs(gcp_list,wkt)
outdataset = None
I don't quite know how to make the above code work, and I would really appreciate any help with this.
I ended up reading a book "Geoprocessing with Python" and finally found a solution that worked for me. Here is the code I adapted to my problem:
import shutil
from osgeo import gdal, osr
orig_fn = 'image.tif'
output_fn = 'output.tif'
# Create a copy of the original file and save it as the output filename:
shutil.copy(orig_fn, output_fn)
# Open the output file for writing for writing:
ds = gdal.Open(output_fn, gdal.GA_Update)
# Set spatial reference:
sr = osr.SpatialReference()
sr.ImportFromEPSG(2193) #2193 refers to the NZTM2000, but can use any desired projection
# Enter the GCPs
# Format: [map x-coordinate(longitude)], [map y-coordinate (latitude)], [elevation],
# [image column index(x)], [image row index (y)]
gcps = [gdal.GCP(1681255.524654, 6120217.357425, 0, 176.412984, 310.977264),
gdal.GCP(1681158.424227, 6120406.821253, 0, 160.386905, 141.487145),
gdal.GCP(1681556.948690, 6120335.658359, 0, 433.204947, 310.547238)]
# Apply the GCPs to the open output file:
ds.SetGCPs(gcps, sr.ExportToWkt())
# Close the output file in order to be able to work with it in other programs:
ds = None
For your gdal method, just using gdal.Warp with the outdataset should work, e.g.
outdataset.SetProjection(srs.ExportToWkt())
wkt = outdataset.GetProjection()
outdataset.SetGCPs(gcp_list,wkt)
gdal.Warp("output_name.tif", outdataset, dstSRS='EPSG:2193', format='gtiff')
This will create a new file, output_name.tif.
As an addition to #Kat's answer, to avoid quality loss of the original image file and set the nodata-value to 0, the following can be used.
#Load the original file
src_ds = gdal.Open(orig_fn)
#Create tmp dataset saved in memory
driver = gdal.GetDriverByName('MEM')
tmp_ds = driver.CreateCopy('', src_ds, strict=0)
#
# ... setting GCP....
#
# Setting no data for all bands
for i in range(1, tmp_ds.RasterCount + 1):
f = tmp_ds.GetRasterBand(i).SetNoDataValue(0)
# Saving as file
driver = gdal.GetDriverByName('GTiff')
ds = driver.CreateCopy(output_fn, tmp_ds, strict=0)

csv to raster python

3 column csv (Lon, Lat, Ref) (63000 rows) and I would like to convert the "Ref" to raster. points (x,y) are being plotted. I want to plot the "Ref" column and add contour and color-fill it. Thanks
Data:
Lon,Lat, Ref
-115.0377,51.9147,0
-115.0679,51.9237,0
-115.0528,51.9237,0
-115.0377,51.9237,0
-115.1134,51.9416,0
-115.0982,51.9416,0
-115.0831,51.9416,0
-115.1437,51.9596,6
-115.1285,51.9596,6
-115.1588,51.9686,6
-115.1437,51.9686,10.5
-115.1285,51.9686,10.5
-115.1134,51.9686,8
-115.1891,51.9776,7.5
-115.174,51.9776,7.5
-115.1588,51.9776,7.5
-115.1437,51.9776,8
-115.1285,51.9776,8
-115.1134,51.9776,8
-115.1891,51.9866,7
-115.174,51.9866,7
-115.1588,51.9866,7
-115.1437,51.9866,0
-115.1285,51.9866,0
-115.1134,51.9866,0
-115.1891,51.9956,7
-113.1143,52.2385,3.5
-113.0992,52.2475,3.5
-113.084,52.2475,3.5
-113.0689,52.2475,5.5
-113.0537,52.2475,5.5
Code:
import pandas as pd
import geopandas
from shapely.geometry import Point
import fiona
import matplotlib.pyplot as plt
df=pd.read_csv('name.csv')
df1=df.interpolate()
geometry=[Point(xyz) for xyz in zip(df1.ix[:,0], df1.ix[:,1], df1.ix[:,2])]
df3=geopandas.GeoDataFrame(df1, geometry=geometry)
df3.plot()
plt.savefig('raster.tiff')
wanted result:
If you want to plot points from GeoPandas based on the "Ref" column, you don't need it as a z coordinate.
import pandas as pd
import geopandas
from shapely.geometry import Point
import matplotlib.pyplot as plt
df = pd.read_csv('name.csv')
geometry = [Point(xy) for xy in zip(df.iloc[:, 0], df.iloc[:, 1])]
gdf = geopandas.GeoDataFrame(df, geometry=geometry)
gdf.plot(column=' Ref')
plt.savefig('raster.tiff')
You don't even need interpolate().
However, if you want to convert your vector point dataset to raster geoTIFF, plot() is not the right way to do it. I would go for gdal.Grid() as explained here. - [Python - gdal.Grid() correct use][1]
EDIT
Using gdal.Grid() like this I am able to generate tif based on the sample of data you provided.
import os
import gdal
dir_with_csvs = r"/home/panda"
os.chdir(dir_with_csvs)
def find_csv_filenames(path_to_dir, suffix=".csv"):
filenames = os.listdir(path_to_dir)
return [ filename for filename in filenames if filename.endswith(suffix) ]
csvfiles = find_csv_filenames(dir_with_csvs)
for fn in csvfiles:
vrt_fn = fn.replace(".csv", ".vrt")
lyr_name = fn.replace('.csv', '')
out_tif = fn.replace('.csv', '.tiff')
with open(vrt_fn, 'w') as fn_vrt:
fn_vrt.write('<OGRVRTDataSource>\n')
fn_vrt.write('\t<OGRVRTLayer name="%s">\n' % lyr_name)
fn_vrt.write('\t\t<SrcDataSource>%s</SrcDataSource>\n' % fn)
fn_vrt.write('\t\t<GeometryType>wkbPoint</GeometryType>\n')
fn_vrt.write('\t\t<GeometryField encoding="PointFromColumns" x="Lon" y="Lat" z="Ref"/>\n')
fn_vrt.write('\t</OGRVRTLayer>\n')
fn_vrt.write('</OGRVRTDataSource>\n')
output = gdal.Grid('outcome.tif','name.vrt')
# below using your settings - I don't have sample large enough to properly test it, but it is generating file as well
output2 = gdal.Grid('outcome2.tif','name.vrt', algorithm='invdist:power=2.0:smoothing=1.0')
Do you have any particular reason to use gdal via shell?
[1]: https://gis.stackexchange.com/questions/254330/python-gdal-grid-correct-use
#ctvtkar, I am attaching a code here using gdal. When I run it, the file.vrt gets created, but not the .tif file. The erroe i get is: gdal_grid: not found. gdal is instaled
Code:
import subprocess
import os
dir_with_csvs = r"/home/panda"
os.chdir(dir_with_csvs)
def find_csv_filenames(path_to_dir, suffix=".csv"):
filenames = os.listdir(path_to_dir)
return [ filename for filename in filenames if filename.endswith(suffix) ]
csvfiles = find_csv_filenames(dir_with_csvs)
for fn in csvfiles:
vrt_fn = fn.replace(".csv", ".vrt")
lyr_name = fn.replace('.csv', '')
out_tif = fn.replace('.csv', '.tiff')
with open(vrt_fn, 'w') as fn_vrt:
fn_vrt.write('<OGRVRTDataSource>\n')
fn_vrt.write('\t<OGRVRTLayer name="%s">\n' % lyr_name)
fn_vrt.write('\t\t<SrcDataSource>%s</SrcDataSource>\n' % fn)
fn_vrt.write('\t\t<GeometryType>wkbPoint</GeometryType>\n')
fn_vrt.write('\t\t<GeometryField encoding="PointFromColumns" x="Lon" y="Lat" z="Ref"/>\n')
fn_vrt.write('\t</OGRVRTLayer>\n')
fn_vrt.write('</OGRVRTDataSource>\n')
gdal_cmd = 'gdal_grid -a invdist:power=2.0:smoothing=1.0 -zfield "Ref" -of GTiff -ot Float64 -l %s %s %s' % (lyr_name, vrt_fn, out_tif)
subprocess.call(gdal_cmd, shell=True)

How to convert pixel to wavelength in spectra from FITS files in Python?

I've been using matplotlib.pyplot to plot a spectrum from fits files in Python and getting the intensity versus pixel, but what I actually need is to convert the pixels to wavelength. I've seen similar questions that got me in the right path of what I need to do (e.g. similar question,RGB example) but I still feel lost in the process.
I have FITS files with wavelengths around 3500 and 6000 (A), in float32 format and dimensions (53165,).
So as I understand, I need to calibrate the pixel positions to wavelength. I have my rest wavelength header (RESW), my "step" wavelength header (STW) and I would need to get:
x = RESW + (number of pixels * STW)
and the plot it. Here is what I got in my code so far.
import os, glob
from glob import glob
from pylab import *
from astropy.io import ascii
import scipy.constants as constants
import matplotlib.pylab as plt
from astropy.io import fits
#add directory you have your files in
dir = ''
#OPEN ALL FILES AND STORE THEM INTO A LIST
files= glob(dir + '*.fits')
for fi in (files):
print(fi)
name = fi[:-len('.fits')] #Remove '.fits' from the file name
with fits.open(dir + fi) as hdu:
hdu.info()
data = hdu[0].data
hdr = hdu[0].header #added to try2
step = hdr['CDELT1'] #added to try2
restw = hdr['CRVAL1'] #added to try2
#step = fits.getheader('STW') #added to try
#restw = fits.getheader('RESW') #added to try
spectra = restw + (data * step) #added to try
plt.clf()
plt.plot(spectra)
plt.savefig(name + '.pdf')
I've tried using fits.getheader('') but I don't know where or how to put it because this way is not working right.
Could someone please help? Thanks in advance!

Categories

Resources