I'm having some troubles with trying to get a monthly average with Sentinel 3 images on... Everything, really. Python, Matlab, we are two people getting stuck in this problem.
The main reason deals with the fact that these images' information is not on a single netcdf file, neatly put with coordinates and products. Instead, they are all in separate files inside a one day folder as
different .nc files with different information each, about one single satellite image. SNAP uses an xmlxs file to work with all of these separate .nc files as I understand it.
Now, I though it would be a good idea to try to merge and create/edit the .nc files as to create a new daily .nc which included the chlorophyll, the coordinates and, might as well add it, time. Later on, I would merge these new ones so to be able to make a monthly mean with xarray. At least that was my idea but I can't do the first part. It might be an obvious solution however here's what I tried, using the xarray module
import os
import numpy as np
import xarray as xr
import netCDF4
from netCDF4 import Dataset
nc_folder = df_try.iloc[0] #folder where the image files are
#open dataset in xarray
nc_chl = xr.open_dataset(str(nc_folder['path']) + '/' + 'chl_nn.nc') #path to chlorophyll file
nc_chl
n_coord =xr.open_dataset(str(nc_folder['path'])+ '/'+ 'geo_coordinates.nc') #path to coordinates file
n_time = xr.open_dataset(str(nc_folder['path'])+ '/' + 'time_coordinates.nc') #path to time file
ds_grid = [[nc_chl], [n_coord], [n_time]]
combined = xr.combine_nested(ds_grid, concat_dim=[None, None])
combined #dataset with all but not recognizing coordinates
ds = combined.rename({'latitude': 'lat', 'longitude': 'lon', 'time_stamp' : 'time'}).set_coords(['lon', 'lat', 'time']) #dataset recognizing coordinates as coordinates
ds
which gives a dataset with
Dimensions: columns 4865 rows: 4091
3 coordinates (lat, lon and time) and the chl variable.
Now, it doesn't save to netcdf4 (I tried but there was an error) but I was also thinking if anyone knew of another way to make an average? I have images from three years (beginning on 2017 to ending on 2019) I would need to average in different ways (monthly, seasonally...). My main current problem is that the chlorophyll values are separate from the geographical coordinates so directly only using the chlorophyll files should not work and would just make a mess.
Any suggestions?
Two options here:
Using xarray
In xarray you can add them as coordinates. It is a bit tricky as the coordinates in the geo_coordinates.nc file are multidimensional as well.
A possible solution is the following:
import netCDF4
import xarray as xr
import matplotlib.pyplot as plt
# paths
root = r'C:<your_path>\S3B_OL_2_WFR____20201015.SEN3\chl_nn.nc' #set path to chl file
coor = r'C:<your_path>\S3B_OL_2_WFR____20201015.SEN3\geo_coordinates.nc' #set path to the coordinates file
# loading xarray datasets
ds = xr.open_dataset(root)
olci_geo_coords = xr.open_dataset(coor)
# extracting coordinates
lat = olci_geo_coords.latitude.data
lon = olci_geo_coords.longitude.data
# assign coordinates to the chl dataset (needs to refer to both the dimensions of our dataset)
ds = ds.assign_coords({"lon":(["rows","columns"], lon), "lat":(["rows","columns"], lat)})
# clip the image (add your own coordinates)
area_of_interest = ds.where((10 < ds.lon) & (ds.lon < 12) & (58 < ds.lat) & (ds.lat < 59), drop=True)
# simple plot with coordinates as axis
plt.figure(figsize=(15,15))
area_of_interest["CHL_NN"].plot(x="lon",y="lat")
Even simpler is to add them as variables in a new dataset:
# path to the folder
root = r'C:<your_path>\S3B_OL_2_WFR____20201015.SEN3\*.nc' #set path to chl file
# create a dataset by combining nc files (coordinates will become variables)
ds = xr.open_mfdataset(root,combine = 'by_coords')
But in this case when you plot the image or clip it you cannot use the coordinates directly.
Using snappy
In python the snappy package is available and based on SNAP toolbox (which is implemented on JAVA). Check: https://senbox.atlassian.net/wiki/spaces/SNAP/pages/19300362/How+to+use+the+SNAP+API+from+Python
Once installed (unfortunately snappy supports only python 2.7, 3.3 or 3.4), you can use the available SNAP function directly on python to aggregate your satellite images and create week/month averages. You then do not need to merge the lon, lat netcdf file as you will work on the xfdumanifest.xml and SNAP will take care of that.
This is an example. It performs aggregation as well (mean calculated on two chl nc files):
from snappy import ProductIO, WKTReader
from snappy import jpy
from snappy import GPF
from snappy import HashMap
# setting the aggregator method
aggregator_average_config = snappy.jpy.get_type('org.esa.snap.binning.aggregators.AggregatorAverage$Config')
agg_avg_chl = aggregator_average_config('CHL_NN')
# creating the hashmap to store the parameters
HashMap = snappy.jpy.get_type('java.util.HashMap')
parameters = HashMap()
#creating the aggregator array
aggregators = snappy.jpy.array('org.esa.snap.binning.aggregators.AggregatorAverage$Config', 1)
#adding my aggregators in the list
aggregators[0] = agg_avg_chl
# set parameters
# output directory
dir_out = 'level-3_py_dynamic.dim'
parameters.put('outputFile', dir_out)
# number of rows (directly linked with resolution)
parameters.put('numRows', 66792) # to have about 300 meters spatial resolution
# aggregators list
parameters.put('aggregators', aggregators)
# Region to clip the aggregation on
wkt="POLYGON ((8.923302175377243 59.55648108694149, 13.488748662344074 59.11388968719029,12.480488185001589 56.690625338725155, 8.212366327767503 57.12425256476263,8.923302175377243 59.55648108694149))"
geom = WKTReader().read(wkt)
parameters.put('region', geom)
# Source product path
path_15 = r"C:<your_path>\S3B_OL_2_WFR____20201015.SEN3\xfdumanifest.xml"
path_16 = r"C:\<your_path>\S3B_OL_2_WFR____20201016.SEN3\xfdumanifest.xml"
path = path_15 + "," + path_16
parameters.put('sourceProductPaths', path)
#result = snappy.GPF.createProduct('Binning', parameters, (source_p1, source_p2))
# create results
result = snappy.GPF.createProduct('Binning', parameters) #to be used with product paths specified in the parameters hashmap
print("results stored in: {0}".format(dir_out) )
I am quite new and interested in the topic and would be happy to hear your/other solutions!
Related
I'm trying to plot a grid of air pollution data from a netCDF files in python using xarray. However, I'm facing a couple roadblocks.
To start off, here is the data that can be used to reproduce my code:
Data
When you try to import this data using xarray.open_dataset, you end up with a file that has zero coordinates or variables, and lots of attributes:
FILE_NAME = "test2.nc". ##I changed the name to make it shorter
xr.open_dataset(FILE_NAME)
So I created variables of the data and tried to import those into xarray:
prd='PRODUCT'
metdata = "METADATA"
lat= ds.groups[prd].variables['latitude']
lon= ds.groups[prd].variables['longitude']
no2 = ds.groups[prd].variables['nitrogendioxide_tropospheric_column']
scanline = ds.groups[prd].variables['scanline']
time = ds.groups[prd].variables['time']
ground_pixel = ds.groups[prd].variables['ground_pixel']
ds = xr.DataArray(no2,
dims=["time","x","y"],
coords={
"lon":(["time","x", "y"], lon)
}
# coords=[("time", time), ("x", scanline),("y", ground_pixel)]
)
As you can see above, I tried multiple ways of creating the coordinates, but I'm still getting an error. The data in this netCDF file is on an irregular grid, and I just want to be able to plot that accurately and quickly using xarray.
Does someone know how I can do this?
Problem definition: I wanted to construct spherical nanoparticles of maghemite nanoparticles(gamma Fe2O3) with a radius of 40 angstrom (4nm). I have the lammps data file of large bulk system (replicate :10 10 10, 160000 atoms). I am a beginner in python but I have managed to write a code in python I tried deleting the x,y,z coordinates from the center of all three axis which is not within the radius distance but it is not working, only after looking at the output in VMD i understood i am doing things wrongly but i don't know how to cut a sphere out of a cube please some one help me in this. following is my python code.Thanks in advance.
import pandas as pd
import numpy as np
df = pd.read_csv("data.supercell443.txt",sep='\t',header=None)
optdf = pd.DataFrame([])
IL = 1
xmid = df[4].max()/2
ymid = df[5].max()/2
zmid = df[6].max()/2
xallowed_less = xmid+40
xallowed_more = xmid-40
yallowed_less = ymid+40
yallowed_more = ymid-40
zallowed_more = zmid-40
zallowed_less = zmid+40
for i,j,k,l,x,y,z in df.values:
if abs(xmid-x) = 40:
tdf = pd.DataFrame([IL,j,k,l,x,y,z])
optdf = optdf.append(tdf.T)
IL+=1
Input image from data file using VMD software
Output image of code
Is it necessary to do this in Python? This is easily done in LAMMPS itself using the 'group' command or using existing tools such as Atomsk.
I have some ensemble files in grib format that I would like to lazy load in Python using dask and xarray. Based in https://climate-cms.org/2018/09/14/dask-era-interim.html, I managed to lazy load the files as intended, but now I want to slice and select the dimensions to plot the data for some time and level.
UPDATE: I've recently came back to this issue and I finally figured out that instead of using da.concatenate, I should use da.stack. This simple change solved my problem. This issue is updated accordingly, in case anyone need an example on how to create an ensemble of grib files using python (with dask arrays for lazy load), to load and plot data in the same fashion as one would do using softwares like GrADS.
My program looks like:
import dask
import dask.array as da
import xarray as xr
import pandas as pd
import numpy as np
from glob import glob
from datetime import date, datetime, timedelta
import matplotlib.pyplot as plt
bpath = '/some/path/to/my/data'
# pressure levels
levels =['1000', '925', '850', '700', '500', '300', '250', '200', '50']
# ensemble member names
ensm = ['M01', 'M02', 'M03', 'M04', 'M05']
#dask.delayed
def open_file_delayed(file, vname):
ds = xr.open_dataset(file, decode_cf='False', engine='pynio')
return ds
def open_file(file, vname, nlevs, nlats, nlons, ftype):
file_data = open_file_delayed(file, vname)[vname].data
return da.from_delayed(file_data, (nlevs, nlats, nlons), ftype)
# list of files to open (sorted by date)
# filename mask: PREFIXMEMYYYYiMMiDDiHHiYYYYfMMfDDfHHf.grb
# MEM: member name (see the levels list)
# YYYYiMMiDDiHHi: analysis date (passed as an argument to the open_file function)
# YYYYfMMfDDfHHf: forecast date
files = sorted(glob(bpath + '/%(dateanl)s/%(mem)s/PREFIX%(mem)s%(dateanl)s*.grb'%
{'dateanl': date, 'mem': member}))
ntime = len(files)
# open the first file in the list to get dimensions and coordinates
ds0 = xr.open_dataset(files[0], decode_cf='False', engine='pynio')
var0 = ds0[vname]
levs = ds0.lv_ISBL2.data
lats = ds0.g4_lat_0.data
lons = ds0.g4_lon_1.data
nlevs = ds0.lv_ISBL2.size
nlats = ds0.g4_lat_0.size
nlons = ds0.g4_lon_1.size
ftype = var0.dtype
ds0.close()
# calculate the date range of the forecasts, in my case len(date_fmt) = 61 (every grib file has 61 times and 9 levels)
date_fmt = pd.date_range(start=datetime.strptime(date, "%Y%m%d%H"), freq="6H", periods=ntime)
# call the function 'open_file' for all files contained in the list 'files' and stack them up
dask_var = da.stack([open_file(file, vname, nlevs, nlats, nlons, ftype) for file in files], axis=0)
# xda is the data array with all files
xda = xr.DataArray(dask_var, dims=['tlev', 'lat', 'lon'])
# set coordinates values
xda.coords['time'] = ('time', date_fmt)
xda.coords['lat'] = ('lat', lats)
xda.coords['lon'] = ('lon', lons)
return xda
To use this code, I do (for a single analysis date - 202005300 - May 30, 2020, and a variable called ZGEO):
Note: this part is very fast (it takes miliseconds), as we are just creating a map structure to the actual data, similar to a GrADS control file.
lens_zgeo = [open_ensemble('2020053000', ens, 'ZGEO') for ens in ensm]
dens_zgeo = xr.concat(lens_zgeo, dim='ens')
dens_zgeo.coords['ens'] = ('ens', ensm)
dens_zgeo is a data array with the following sctructure:
data array structure
From this point, I can slice the dimensions of the data array and plot (which was what I've intented originally):
Note: this part takes longer because the data needs to be read from the disk.
dens_zgeo.isel(ens=0,time=0,lev=0).plot()
BOOM, case closed. Thanks!
I've edited the question with the modifications I needed in order to get the result I wanted. For this case, the main point is the use of da.stack instead of da.concatenate. By doing so, I've got the resulting data array to get concatenated in the ensemble dimension I needed.
I have a series of unreferenced aerial images that I would like to georeference using python. The images are identical spatially (they are actually frames extracted from a video), and I obtained ground control points for them by manually georeferencing one frame in ArcMap. I would like to apply the ground control points I obtained to all the subsequent images, and as a result obtain a geo-tiff or a jpeg file with a corresponding world file (.jgw) for each processed image. I know this is possible to do using arcpy, but I do not have access to arcpy, and would really like to use a free open source module if possible.
My coordinate system is NZGD2000 (epsg 2193), and here is the table of control points I wish to apply to my images:
176.412984, -310.977264, 1681255.524654, 6120217.357425
160.386905, -141.487145, 1681158.424227, 6120406.821253
433.204947, -310.547238, 1681556.948690, 6120335.658359
Here is an example image: https://imgur.com/a/9ThHtOz
I've read a lot of information on GDAL and rasterio, but I don't have any experience with them, and am failing to adapt bits of code I found to my particular situation.
Rasterio attempt:
import cv2
from rasterio.warp import reproject
from rasterio.control import GroundControlPoint
from fiona.crs import from_epsg
img = cv2.imread("Example_image.jpg")
# Creating ground control points (not sure if I got the order of variables right):
points = [(GroundControlPoint(176.412984, -310.977264, 1681255.524654, 6120217.357425)),
(GroundControlPoint(160.386905, -141.487145, 1681158.424227, 6120406.821253)),
(GroundControlPoint(433.204947, -310.547238, 1681556.948690, 6120335.658359))]
# The function requires a parameter "destination", but I'm not sure what to put there.
# I'm guessing this may not be the right function to use
reproject(img, destination, src_transform=None, gcps=points, src_crs=from_epsg(2193),
src_nodata=None, dst_transform=None, dst_crs=from_epsg(2193), dst_nodata=None,
src_alpha=0, dst_alpha=0, init_dest_nodata=True, warp_mem_limit=0)
GDAL attempt:
from osgeo import gdal
import osr
inputImage = "Example_image.jpg"
outputImage = "image_gdal.jpg"
dataset = gdal.Open(inputImage)
I = dataset.ReadAsArray(0,0,dataset.RasterXSize,dataset.RasterYSize)
outdataset = gdal.GetDriverByName('GTiff')
output_SRS = osr.SpatialReference()
output_SRS.ImportFromEPSG(2193)
outdataset = outdataset.Create(outputImage,dataset.RasterXSize,dataset.RasterYSize,I.shape[0])
for nb_band in range(I.shape[0]):
outdataset.GetRasterBand(nb_band+1).WriteArray(I[nb_band,:,:])
# Creating ground control points (not sure if I got the order of variables right):
gcp_list = []
gcp_list.append(gdal.GCP(176.412984, -310.977264, 1681255.524654, 6120217.357425))
gcp_list.append(gdal.GCP(160.386905, -141.487145, 1681158.424227, 6120406.821253))
gcp_list.append(gdal.GCP(433.204947, -310.547238, 1681556.948690, 6120335.658359))
outdataset.SetProjection(srs.ExportToWkt())
wkt = outdataset.GetProjection()
outdataset.SetGCPs(gcp_list,wkt)
outdataset = None
I don't quite know how to make the above code work, and I would really appreciate any help with this.
I ended up reading a book "Geoprocessing with Python" and finally found a solution that worked for me. Here is the code I adapted to my problem:
import shutil
from osgeo import gdal, osr
orig_fn = 'image.tif'
output_fn = 'output.tif'
# Create a copy of the original file and save it as the output filename:
shutil.copy(orig_fn, output_fn)
# Open the output file for writing for writing:
ds = gdal.Open(output_fn, gdal.GA_Update)
# Set spatial reference:
sr = osr.SpatialReference()
sr.ImportFromEPSG(2193) #2193 refers to the NZTM2000, but can use any desired projection
# Enter the GCPs
# Format: [map x-coordinate(longitude)], [map y-coordinate (latitude)], [elevation],
# [image column index(x)], [image row index (y)]
gcps = [gdal.GCP(1681255.524654, 6120217.357425, 0, 176.412984, 310.977264),
gdal.GCP(1681158.424227, 6120406.821253, 0, 160.386905, 141.487145),
gdal.GCP(1681556.948690, 6120335.658359, 0, 433.204947, 310.547238)]
# Apply the GCPs to the open output file:
ds.SetGCPs(gcps, sr.ExportToWkt())
# Close the output file in order to be able to work with it in other programs:
ds = None
For your gdal method, just using gdal.Warp with the outdataset should work, e.g.
outdataset.SetProjection(srs.ExportToWkt())
wkt = outdataset.GetProjection()
outdataset.SetGCPs(gcp_list,wkt)
gdal.Warp("output_name.tif", outdataset, dstSRS='EPSG:2193', format='gtiff')
This will create a new file, output_name.tif.
As an addition to #Kat's answer, to avoid quality loss of the original image file and set the nodata-value to 0, the following can be used.
#Load the original file
src_ds = gdal.Open(orig_fn)
#Create tmp dataset saved in memory
driver = gdal.GetDriverByName('MEM')
tmp_ds = driver.CreateCopy('', src_ds, strict=0)
#
# ... setting GCP....
#
# Setting no data for all bands
for i in range(1, tmp_ds.RasterCount + 1):
f = tmp_ds.GetRasterBand(i).SetNoDataValue(0)
# Saving as file
driver = gdal.GetDriverByName('GTiff')
ds = driver.CreateCopy(output_fn, tmp_ds, strict=0)
I know how to calculate the mean of a variable in one netcdf file. But, I have 40 netcdf files. In each file I have 4000 data values for mixing layer height. I want to create a list of mean mixing layer height for the multiple netcdf files.
In the end the size of my list should be 40.
Can some help me with a python code to create this list?
Thank you so much.
Here is the code I used to calculate the mean mixing layer height for one layer in a single netcdf file
import numpy as np
import netCDF4
f = netCDF4.Dataset('niv.nc')
#the shape of my data set is (5760,3)
#5760 is the number of lists of time
#In each list I have 3 mixing layer heights for 3 layers.
#I'm going to call all the mixing layer height data for the first layer
a= (f.variables['pbl'][:,0])
print (np.mean(a))
You have to get the list of filenames somehow. Here I'll assume you have all your files in one folder, and there are no other netCDF files in that folder.To do this using netCDF4 and requiring a separate mean for each file
import numpy as np
import netCDF4
from glob import glob
# you want to modify this to use your actual data directory
filename_list = glob('/home/user/data_dir/*.nc')
mean_list = []
for filename in filename_list: # make filename_list with something like os.listdir
with netCDF4.Dataset(filename) as ds:
mean_list.append(np.mean(ds.variables['pbl'][:, 0]))
To do the same thing with xarray:
import xarray as xr
from glob import glob
# you want to modify this to use your actual data directory
filename_list = glob('/home/user/data_dir/*.nc')
mean_list = []
for filename in filename_list: # make filename_list with something like os.listdir
with xr.open_dataset(filename) as ds:
mean_list.append(np.mean(ds['pbl'][:, 0].values))
If instead of getting the average for each file, let's say the first dimension is time and you want to get the average among all the files. To do that with xarray, you could use open_mfdataset like so:
import xarray as xr
import os
from glob import glob
# you want to modify this to use your actual data directory
filename_list = glob('/home/user/data_dir/*.nc')
ds = xr.open_mfdataset(filename_list, concat_dim='time')
mean = np.mean(ds['pbl'][:, 0].values)