Spatial merge/combine of multiple netcdf with xarray or satpy

Spatial merge/combine of multiple netcdf with xarray or satpy - python

I have two spatial dataset in netcdf format. They have same time, dimensions, coordinates, and data variable. But they are for different spatial coordinates. In below I tried to show my two dataset by a polygon:
import glob
import xarray as xr
import geopandas as gpd
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
file1 = '20190109T071048.nc'
file2 = '20190109T085117.nc'
ds1 = xr.open_dataset(file1, group='PRODUCT')
ds2 = xr.open_dataset(file2, group='PRODUCT')
PATH_TO_GPK = 'Study_Area.gpkg'
SA = gpd.read_file(PATH_TO_GPK, layer='Study_Area')
First dataset plot:
plt.figure(figsize=(12,8))
ax = plt.axes()
ds1.qa_value.isel(time = 0).plot(ax = ax, x='longitude', y='latitude')
SA.plot(ax = ax, alpha = 0.8, facecolor = 'none')
Second dataset plot:
plt.figure(figsize=(12,8))
ax = plt.axes()
ds2.qa_value.isel(time = 0).plot(ax = ax, x='longitude', y='latitude')
SA.plot(ax = ax, alpha = 0.8, facecolor = 'none')
I want to merge these two netcdf files with xarray.
combined = xr.merge([ds1, ds2], compat='no_conflicts')
Error:
MergeError: conflicting values for variable 'latitude' on objects to be combined. You can skip this check by specifying compat='override'.
tried with:
combined = xr.merge([ds1, ds2], compat='override')
but plot of combined was same as above first plot.
Then tried with:
combined = xr.combine_by_coords([ds1,ds2], compat='no_conflicts')
Error:
Could not find any dimension coordinates to use to order the datasets for concatenation
Then tried with:
combined = xr.combine_nested([ds1,ds2], concat_dim=["time"])
and plot of combined was again same as first plot.
Based on ThomasNicolas suggestion, I used below code:
ds = xr.open_mfdataset([file1, file2], combine='nested')
But it return this error:
AttributeError: 'Dataset' object has no attribute 'qa_value'
There are not any data in result:
print of the first dataset (for example) shows:
print (ds1)
<xarray.Dataset>
Dimensions: (corner: 4, ground_pixel: 450, scanline: 3245, time: 1)
Coordinates:
* scanline (scanline) float64 0.0 1.0 ... 3.244e+03
* ground_pixel (ground_pixel) float64 0.0 1.0 ... 449.0
* time (time) datetime64[ns] 2019-01-03
* corner (corner) float64 0.0 1.0 2.0 3.0
latitude (time, scanline, ground_pixel) float32 ...
longitude (time, scanline, ground_pixel) float32 ...
Data variables:
delta_time (time, scanline) timedelta64[ns] 08:07:0...
time_utc (time, scanline) object '2019-01-03T08:0...
qa_value (time, scanline, ground_pixel) float32 ...
Is there any suggestion for merge or combine of these files?
UPDATED
Base on #dl.meteo advice, I used satpy library for solve my problem, it seems that it can merge two netcdf files but not completely, you can see an incorrect part (red boundary) in joined image.
Can satpy do it correctly?
# Read NetCDF files
from satpy import Scene
import glob
filenames = glob.glob('myfiles*.nc')
scn = Scene(filenames=filenames, reader='tropomi_l2')
scn.load(['qq'])
mask = SA_mask_poly.mask(d, lat_name='latitude', lon_name='longitude')
out_sel = d.compute().where(mask == 0, drop=True)
plt.figure(figsize=(12,8))
ax = plt.axes()
out_sel.plot(ax = ax, x='longitude', y='latitude')
SA.plot(ax = ax, alpha = 0.8, facecolor = 'none', lw = 1)

I've come across this problem just now. xarray can't combine values with different coordinates. As your two passes have their own unique coordinates, you can't directly combine them.
One solution for this is to use the pyresample module to resample both granules from their own coordinates onto a common grid. Open each file as a separate Scene and then apply scn.resample() method. This will put both onto the same grid. From there you can combine them.
Another solution might be the experimental MultiScene object, which is designed for this use case. As per the documentation:
Scene objects in Satpy are meant to represent a single geographic region at a specific single instant in time or range of time. This means they are not suited for handling multiple orbits of polar-orbiting satellite data, multiple time steps of geostationary satellite data, or other special data cases. To handle these cases Satpy provides the MultiScene class.
The reason your current solution has artefacts is your Scene object has two separate orbits stuck together as one array. I think the discontinuity in their coordinates will cause stretch/tear artefacts in your quadmesh plot and further processing, such as convolution filtering, is likely to return unexpected results as it expects neighbouring values in an array to be physically neighbours in the final image and not in another orbit.

Related

Issues with Scipy interpolate griddata

I have a netcdf file with a spatial resolution of 0.05º and I want to regrid it to a spatial resolution of 0.01º like this other netcdf. I tried using scipy.interpolate.griddata, but I am not really getting there, I think there is something that I am missing.
original_dataset = xr.open_dataset('to_regrid.nc')
target_dataset= xr.open_dataset('SSTA_L4_MED_0_1dg_2022-01-18.nc')
According to scipy.interpolate.griddata documentation, I need to construct my interpolation pipeline as following:
grid = griddata(points, values, (grid_x_new, grid_y_new),
method='nearest')
So in my case, I assume it would be as following:
#Saving in variables the old and new grids
grid_x_new = target_dataset['lon']
grid_y_new = target_dataset['lat']
grid_x_old = original_dataset ['lon']
grid_y_old = original_dataset ['lat']
points = (grid_x_old,grid_y_old)
values = original_dataset['analysed_sst'] #My variable in the netcdf is the sea surface temp.
Now, when I run griddata:
from scipy.interpolate import griddata
grid = griddata(points, values, (grid_x_new, grid_y_new),method='nearest')
I am getting the following error:
ValueError: shape mismatch: objects cannot be broadcast to a single
shape
I assume it has something to do with the lat/lon array shapes. I am quite new to netcdf field and don't really know what can be the issue here. Any help would be very appreciated!

In your original code the indices in grid_x_old and grid_y_old should correspond to each unique coordinate in the dataset. To get things working correctly something like the following will work:
import xarray as xr
from scipy.interpolate import griddata
original_dataset = xr.open_dataset('to_regrid.nc')
target_dataset= xr.open_dataset('SSTA_L4_MED_0_1dg_2022-01-18.nc')
#Saving in variables the old and new grids
grid_x_old = original_dataset.to_dataframe().reset_index().loc[:,["lat", "lon"]].lon
grid_y_old = original_dataset.to_dataframe().reset_index().loc[:,["lat", "lon"]].lat
grid_x_new = target_dataset.to_dataframe().reset_index().loc[:,["lat", "lon"]].lon
grid_y_new = target_dataset.to_dataframe().reset_index().loc[:,["lat", "lon"]].lat
values = original_dataset.to_dataframe().reset_index().loc[:,["lat", "lon", "analysed_sst"]].analysed_sst
points = (grid_x_old,grid_y_old)
grid = griddata(points, values, (grid_x_new, grid_y_new),method='nearest')

I recommend using xesm for regridding xarray datasets. The code below will regrid your dataset:
import xarray as xr
import xesmf as xe
original_dataset = xr.open_dataset('to_regrid.nc')
target_dataset= xr.open_dataset('SSTA_L4_MED_0_1dg_2022-01-18.nc')
regridder = xe.Regridder(original_dataset, target_dataset, "bilinear")
dr_out = regridder(original_dataset)

Plotting Pandas Dataframe data using Folium GeoJson

I am trying to plot the amount of times a satellite goes over a certain location using Python and a heatmap. I easily generate the satellite data, but I am having issues with displaying it in a nice manner. I am trying to follow this example, as I can use the style function to lower the opacity. I am having some issues replicating this though as it seems that the GeoJson version they were using no longer accepts the same inputs. This is the dataframe I am using:
print(df.head())
latitude longitude countSp geometry
0 -57.9 151.1 1.0 POLYGON ((151.05 -57.95, 151.15 -57.95, 151.15...
1 -57.9 151.2 2.0 POLYGON ((151.15 -57.95, 151.25 -57.95, 151.25...
2 -57.8 151.2 1.0 POLYGON ((151.15 -57.84999999999999, 151.25 -5...
3 -57.8 151.3 3.0 POLYGON ((151.25 -57.84999999999999, 151.35 -5...
4 -57.8 151.4 2.0 POLYGON ((151.35 -57.84999999999999, 151.45 -5...
I then call folium through:
hmap = folium.Map(location=[42.5, -80], zoom_start=7, )
colormap_dept = branca.colormap.StepColormap(
colors=['#00ae53', '#86dc76', '#daf8aa',
'#ffe6a4', '#ff9a61', '#ee0028'],
vmin=0,
vmax=max_amt,
index=[0, 2, 4, 6, 8, 10, 12])
style_func = lambda x: {
'fillColor': colormap_dept(x['countSp']),
'color': '',
'weight': 0.0001,
'fillOpacity': 0.1
}
folium.GeoJson(
df,
style_function=style_func,
).add_to(hmap)
This is the error I get when I run my code:
ValueError: Cannot render objects with any missing geometries: latitude longitude countSp geometry
I know that I can use the HeatMap plugin from folium in order to get most of this done, but I have found a couple of issues with doing that. First is that I cannot easily generate a legend (though I have been able to work around this). Second is that it is way too opaque, and I am not finding any ways of reducing that. I have tried playing around with the radius, and blur parameters for HeatMap without much change. I think that the fillOpacity of the style_func above is a much better way of making my data translucent.
By the way, I generate the polygon in my df by the following command. So in my dataframe all I need folium to know about is the geometry and countSp (which is the number of times a satellite goes over a certain area - ~10kmx10km square).
df['geometry'] = df.apply(lambda row: Polygon([(row.longitude-0.05, row.latitude-0.05),
(row.longitude+0.05, row.latitude-0.05),
(row.longitude+0.05, row.latitude+0.05),
(row.longitude-0.05, row.latitude+0.05)]), axis=1)
Is there a good way of going about this issue?

Once again, they were looking for a way to express the purpose in a heat map, so I used Plotly's data on airline arrivals and departures to visualize it.
The number of flights to and from the U.S. mainland only was used for the data.
Excluded IATA codes['LIH','HNL','STT','STX','SJU','OGG','KOA']
Draw a straight line on the map from the latitude and longitude of the departure airport to the latitude and longitude of the arrival airport.
Draw a heat map with data on the number of arrivals and departures by airport.
Since we cannot use a discrete colormap, we will create a linear colormap and add it.
Embed the heatmap as a layer named Traffic
import pandas as pd
df_airports = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/2011_february_us_airport_traffic.csv')
df_airports.sort_values('cnt', ascending=False)
df_air = df_airports[['lat','long','cnt']]
df_flight_paths = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/2011_february_aa_flight_paths.csv')
df_flight_paths = df_flight_paths[~df_flight_paths['airport1'].isin(['HNL','STT','SJU','OGG','KOA'])]
df_flight_paths = df_flight_paths[~df_flight_paths['airport2'].isin(['LIH','HNL','STT','STX','SJU'])]
df_flight_paths = df_flight_paths[['start_lat', 'start_lon', 'end_lat', 'end_lon', 'cnt']]
import folium
from folium.plugins import HeatMap
import branca.colormap as cm
from collections import defaultdict
steps=10
colormap = cm.linear.YlGnBu_09.scale(0, 1).to_step(steps)
gradient_map=defaultdict(dict)
for i in range(steps):
gradient_map[1/steps*i] = colormap.rgb_hex_str(1/steps*i)
m = folium.Map(location=[32.500, -97.500], zoom_start=4, tiles="cartodbpositron")
data = []
for idx,row in df_flight_paths.iterrows():
folium.PolyLine([[row.start_lat, row.start_lon], [row.end_lat, row.end_lon]], weight=2, color="red", opacity=0.4
).add_to(m)
HeatMap(
df_air.values,
gradient=gradient_map,
name='Traffic',
mini_opacity=0.1,
radius=15,
blur=5
).add_to(m)
folium.LayerControl().add_to(m)
colormap.add_to(m)
m

Missing axis values in plotting of NetCDF variable in group with xarray

I am constructing a NetCDF file that will be used with xarray. It will consist of many groups that use dimensions that are defined in the root group. In my current example, xarray's plot function is unable to put the proper values on the axes. Tools like panoply or ncview produce plots that do put the proper values of the dimensions at the axes. The script below creates a file which allows me to reproduce the problem. Do I construct the NetCDF file in an incorrect way, or is this a bug in xarray?
import numpy as np
import netCDF4 as nc
import xarray as xr
import matplotlib.pyplot as plt
# Three series, two variables that contain the axis values and the 2D field.
z = np.arange(0., 1000., 50.)
time = np.arange(0., 86400., 3600.)
a = np.random.rand(time.size, z.size)
# The two dimensions are stored in the root group.
nc_file = nc.Dataset("test.nc", mode="w", datamodel="NETCDF4", clobber=False)
nc_file.createDimension("z" , z.size )
nc_file.createDimension("time", time.size)
nc_z = nc_file.createVariable("z" , "f8", ("z") )
nc_time = nc_file.createVariable("time", "f8", ("time"))
nc_z [:] = z [:]
nc_time[:] = time[:]
# The 2D field is created and stored in a group called test_group.
nc_group = nc_file.createGroup("test_group")
nc_a = nc_group.createVariable("a", "f8", ("time", "z"))
nc_a[:,:] = a[:,:]
nc_file.close()
# Opening the file in x-array shows a plot that misses the axes values.
xr_file = xr.open_dataset("test.nc", "test_group")
xr_a = xr_file['a']
xr_a.plot()
plt.show()
The resulting figure, which has just the count rather than the dimension values on the axes, is:

Plotting netCDF data with Python: How to change grid?

I'm an new one in python and plotting data with Matplotlib. I really need help and thank you in advance for the answers.
So, I have a netCDF file with v-component of wind data. Grid coordinates: points=9600 (240x40)
lon : 0 to 358.5 by 1.5 degrees_east circular
lat : 88.5 to 30 by -1.5 degrees_north
My code is:
import numpy as np
import matplotlib
matplotlib.use('Agg')
from netCDF4 import Dataset
from matplotlib.mlab import griddata
from matplotlib import pyplot as plt
from mpl_toolkits.basemap import Basemap
#read data from NETcdf file ".nc"
my_file = '/home/Era-Interim/NH-EraInt-1979.nc'
fh = Dataset(my_file, mode='r')
lons = fh.variables['lon'][:]
lats = fh.variables['lat'][:]
V = fh.variables['V'][:]
V_units = fh.variables['V'].units
fh.close()
# create figure
fig = plt.figure(figsize=(20,20))
# create a map
m = Basemap(projection='nplaea',boundinglat=30,lon_0=10,resolution='l',round=True)
#draw parallels, meridians, coastlines, countries, mapboundary
m.drawcoastlines(linewidth=0.5)
m.drawcountries(linewidth=0.5)
#m.drawmapboundary(linewidth=2)
m.drawparallels(np.arange(30,90,20), labels=[1,1,0,0]) #paral in 10 degree, right, left
m.drawmeridians(np.arange(0,360,30), labels=[1,1,1,1]) #merid in 10 degree, bottom
#Plot the data on top of the map
lon,lat = np.meshgrid(lons,lats)
x,y = m(lon,lat)
cs = m.pcolor(x,y,np.squeeze(V),cmap=plt.cm.RdBu_r)
plt.title("", fontsize=25, verticalalignment='baseline')
plt.savefig("/home/Era-Interim/1.png")
As a result, I received a map (you can find in my dropbox folder) https://www.dropbox.com/sh/nvy8wcodk9jtat0/AAC-omkPP8_7uINSSXbzImeja?dl=0
On the map, there are white pixels between 358.5 and 0 (360) lon, because I have no data between 358.5 and 0 (360) lon.
The question is: how can I change the size of the grid, regrid it, interpolate data, or something else in order to not have this white sector?

I have found a solution. At the beginning of the script, you must add
from mpl_toolkits.basemap import Basemap, addcyclic
and further
datain, lonsin = addcyclic(np.squeeze(Q), lons)
lons, Q = m.shiftdata(lonsin, datain = np.squeeze(Q), lon_0=180.)
print lons
lon, lat = np.meshgrid(lons, lats)
x,y = m(lon, lat)
cs = m.pcolor(x,y,datain,cmap=plt.cm.RdBu_r)
The difference can be seen in the figures (I still can not post images).
https://www.dropbox.com/sh/nvy8wcodk9jtat0/AAC-omkPP8_7uINSSXbzImeja?dl=0

I think in this case some kind of interpolation techniques can be applied.
Check this out. There was similar problem.
Hope it is useful.

The simple answer is 360 degrees is 0 degrees, so you can copy the 0 degrees data and it should look right. I may be interpreting this wrong though, as I believe that the data is representing the pressure levels at each of the points, not between the two points (i.e. at zero degrees, not between zero degrees and 1.5 degrees).
My interpretation means that, yes, you don't have data between 358.5 and 0, but you also don't have data between 357 and 358.5. This seems more likely than just skipping an area. This would mean that the data from 358.5 should be touching the data from 0 as it is just as far away as 0 is from 1.5 which is touching.
Copying the last bit would grant you the ability to change your m.pcolor call to an imshow call (as in Roman Dryndik's link) and use interpolation to smooth out the graph.

How to separate two regions of latlon data on python

I am currently working with BUFR files with wind data. When I read this file on python I get 4 large vectors, latitude vector, longitude vector, wind_direction vector, and wind_speed vector.
Both wind vectors are masked python arrays because there is non-valid data. This happens because the data comes from a non-geostationary satellite. In fact I successfully generated the following image from this BUFR file to show you the general shape that the data takes.
In this image I have plotted a color field to represent the wind speed, while the arrows obviously represent the wind direction.
Please notice the two bands of actual data. Unfortunately the way I am plotting the data, generates a third band (where the color field is smooth), in-between the actual data bands. This is an artefact of the function pcolormesh. If I could superimpose two `pcolormesh plots, each one representing one of the bands, this problem would disappear.
Unfortunately, I do not know how I could separate the data "regions". I have thought about clustering techniques but do not know how to cluster along latlon data using ANOTHER array (the wind data) as the clustering rule.
This is my current code:
#!/usr/bin/python
import bufr
import numpy as np
import sys
import matplotlib
matplotlib.use('Agg')
from matplotlib import pyplot as plt
from matplotlib import mlab
WIND_DIR_INDEX = 97
WIND_SPEED_INDEX = 96
bfrfile = sys.argv[1]
print bfrfile
bfr = bufr.BUFRFile(bfrfile)
lon = []
lat = []
wind_d = []
wind_s = []
for record in bfr:
for entry in record:
if entry.index == WIND_DIR_INDEX:
wind_d.append(entry.data)
if entry.index == WIND_SPEED_INDEX:
wind_s.append(entry.data)
if entry.name.find("LONGITUDE") == 0:
lon.append(entry.data)
if entry.name.find("LATITUDE") == 0:
lat.append(entry.data)
lons = np.concatenate(lon)
lats = np.concatenate(lat)
winds_d = np.concatenate(wind_d)
winds_s = np.concatenate(wind_s)
winds_d = np.ma.masked_greater(winds_d,1.0e+6)
winds_s = np.ma.masked_greater(winds_s,1.0e+6)
windu = np.cos((winds_d-180)*(np.pi/180))
windv = np.sin((winds_d-180)*(np.pi/180))
# Data interpolation for pcolormesh (needs gridded data)
xi = np.linspace(lons.min(),lons.max(),lons.size/10)
yi = np.linspace(lats.min(),lats.max(),lats.size/10)
Z = mlab.griddata(lons,lats,winds_s,xi,yi)
X,Y = np.meshgrid(xi,yi)
mydpi = 96
fig = plt.figure(frameon=True)
fig.set_size_inches(1600/mydpi,1200/mydpi)
ax = plt.Axes(fig,[0,0,1,1])
#ax.set_axis_off()
fig.add_axes(ax)
plt.hold(True);
plt.quiver(lons[::5],lats[::5],windu[::5],windv[::5],linewidths=0)
for method in (ax.set_xticks,ax.set_xticklabels,ax.set_yticks,ax.set_yticklabels):
method([])
fig.savefig('/home/cendas/bin/python/bufr_ascat.png',bbox_inches=0,dpi=5*mydpi)
mydpi = 96
fig = plt.figure(frameon=True)
fig.set_size_inches(1600/mydpi,1200/mydpi)
ax = plt.Axes(fig,[0,0,1,1])
#ax.set_axis_off()
fig.add_axes(ax)
plt.hold(True);
try:
plt.pcolormesh(X,Y,Z,alpha=None)
plt.clim(0,10)
except ValueError:
pass
print "Warning: Empty data array."
for method in (ax.set_xticks,ax.set_xticklabels,ax.set_yticks,ax.set_yticklabels):
method([])
fig.savefig('/home/cendas/bin/python/bufr_ascat_color.png',bbox_inches=0,dpi=5*mydpi)
I then usually follow this python code with the following terminal commands to combine the images:
convert bufr_ascat.png -transparent white bufr_ascat.png
convert bufr_ascat_color.png -transparent white bufr_ascat_color.png
composite bufr_ascat.png bufr_ascat_color.png bufrascat.png

Don't abuse clustering for this.
What you need is a simple selection / filtering; not a structure discovery process.
Choose the mean of the masked data. All non-masked data left of that mean is the left part, all non-masked data on the right is the other?
Clustering is the wrong tool for this task.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Spatial merge/combine of multiple netcdf with xarray or satpy - python

Related

Issues with Scipy interpolate griddata

Plotting Pandas Dataframe data using Folium GeoJson

Missing axis values in plotting of NetCDF variable in group with xarray

Plotting netCDF data with Python: How to change grid?

How to separate two regions of latlon data on python

Categories

Resources