Novice in python so I hope I am asking correctly.
I have a huge set of data that I would like to interpolate to every one second and fill in the gaps with the appropriate latitude and longitude provided:
Lat Long Time
-87.10 30.42 16:38:49
.
.
.
-87.09 30.40 16:39:22
.
.
.
-87.08 30.40 16:39:30
So I would like to generate a new latitude and longitude every second.
I have already plotted the corresponding latitude and longitude and would like to fill in the gaps with the interpolated data with points possibly.
If linear interpolation is good enough for you, you can use the numpy.interp function with the time array that you want to use for interpolation and the time and longitude/latitude data points that are read from the input file (the time must be increasing so you may need to sort the data).
To read the data for the file you can use the numpy.loadtxt function adding a converter to transform the time to an increasing number:
import numpy as np
from matplotlib.dates import strpdate2num
lon, lat, time = np.loadtxt('data.txt', skiprows=1,
converters={2:strpdate2num('%H:%M:%S')}, unpack=True)
Then you can interpolate the longitude and latitude values using the interp function. The last argument to the linspace function gives the number of points in the time interpolated data.
interp_time = np.linspace(time[0], time[-1], 100)
interp_lon = np.interp(interp_time, time, lon)
interp_lat = np.interp(interp_time, time, lat)
For something more complicated that linear interpolation there are several facilities in scipy.interpolation.
Related
I would like to take a temperature variable from a netcdf file in python and average over all of the satellite's scans.
The temperature variable is given as:
tdisk = file.variables['tdisk'][:,:,:] # Disk Temp(nscans,nlons,nlats)
The shape of the tdisk array is 68,52,46. The satellite makes 68 scans per day. The longitude and latitude variables are given as:
lats = file.variables['latitude'][:,:] # Latitude(nlons,nlats)
lons = file.variables['longitude'][:,:] # Longitude(nlons,nlats)
Which have sizes of 52,46. I would like to average the each nscan of temperature together to get a daily mean so the temperature size becomes 52,46. I've seen ways to stack the arrays and concatenate them, but I would like a mean value. Eventually I am looking to make a contour plot with (x=longitude, y=latitude , and z=temp)
Is this possible to do? Thanks for any help
If you are using Xarray, you can do this using DataArray.mean:
import xarray as xr
# open netcdf file
ds = xr.open_dataset('file.nc')
# take the mean of the tdisk variable
da = ds['tdisk'].mean(dim='nscans')
# make a contour plot
da.plot.contour('longitude', 'latitude')
Based on the question you seem to want to calculate a temporal mean, not a daily mean as you seem to only have one day. So the following will probably work:
ds.mean(“time”)
I have a set of data with each point having 5 parameters ([latitude, longitude, time, wind speed, bearing]). And i want to interpolate this data.
I have implemented scipy nearest ND interpolator based on what I read from the documentation, but the data at points outside the provided data points do not seem to be correct.
Implementation
interp = scipy.interpolate.NearestNDInterpolator(Windspeed_Data_Array[:, 0:3], Windspeed_Data_Array[:, 3:5])
Where "Windspeed_Data_Array[:,0:3]" is [latitude, longitude, time] and "Windpseed_Data_Array[:,3:5]" is [windspeed, bearing].
For example when I set the test coordinates to [-37.7276, 144.9066, 1483180200]
The raw data is shown below
|latitude|longitude|time |windspeed|bearing|
|-37.7276|144.9066 |1483174800|16.6 |193 |
|-37.7276|144.9066 |1483185600|14.8 |184 |
I thought the output at the test coordinates should be between the two data points shown, however when I run the code:
test = interp(test_coords)
The output is Windspeed = 16.6 and bearing = 193 which seems to be wrong
That's the nature of the choosen interpolation.
Nearest interpolation interpolation will assign to the dependent variables the value found in the nearest sample
This is an example from the NearestNEInterpolator documentation
If you want to have a weighted average of multiple close neighbors I would suggest you to take a look at LinearNDInterpolator.
Note: Don't be seduced by the "Nearest" word hehe
So I am pretty new in programming, currently doing some research on netCDF .nc files to work with python. I have the following code which I am sure it will not work. The objective here is to plot a graph simple line graph of 10m u-component of winds over time.
The problem I think is that 10m u-component of winds has 4D shape of (time=840, expver=2, latitude=19, longitude=27) while time being only (time=840).
Any replies or ideas will be much appreciated! The code is as shown:
from netCDF4 import Dataset
import matplotlib.pyplot as plt
import numpy as np
nc = Dataset(r'C:\WAIG\Python\ERA5_py01\Downloaded_nc_file/test_era5.nc','r')
for i in nc.variables:
print(i)
lat = nc.variables['latitude'][:]
lon = nc.variables['longitude'][:]
time = nc.variables['time'][:]
u = nc.variables['u10'][:]
plt.plot(np.squeeze(u), np.squeeze(time))
plt.show()
Right, you have winds that represent the 10m wind at every location of the model grid (lat,lon) as well as dimensioned on exper--not sure what that last one is. You need to select a subset of u. For example, let's pick exper=1, and lat/lon indexes of lat=8, lon=12 (not sure where those are going to be:
exper_index = 1
lat_index = 8
lon_index = 12
# ':' below means "full slice"--take everything along that dimension
plt.plot(u[:, exper_index, lat_index, lon_index], time)
plt.title(f'U at latitude {lat[lat_index]}, longitude {lon[lon_index]}')
plt.show()
Have you tried using xarray?
I think that it will be easier for you to read the netCDF4 file and plot it using matplotlib.
This is straight forward:
import xarray as xr
ds = xr.open_dataset('C:\WAIG\Python\ERA5_py01\Downloaded_nc_file/test_era5.nc')
This line will plot the timeseries of the horizontal mean U10
ds['u10'].mean(['longitude','latitude']).plot()
You can also select by the value or index in a specific dimension using sel and isel methods:
This line selects the 10th latitude and 5th longitude and plot it. In this case I am interested in specific indexes for latitude and longitude, not in the real units.
ds['u10'].isel(latitude=10,longitude=5).plot()
This line selects the nearest value of latitude and longitude to the given values and plot it. In this case, I am interested in the values of latitude and longitude in the real units.
ds['u10'].sel(latitude=-15,longitude=40,method='nearest').plot()
See their documentation to learn more about xarray.
I hope this solution is better for your case and it also introduce you this great tool. Please, let me know if you still need some help with this.
I am a big fan of MetPy and had a look at their interpolation functions (https://unidata.github.io/MetPy/latest/api/generated/metpy.interpolate.html) but could not find what I was looking for.
I am looking for a function to interpolate a gridded 2D (lon and lat) or 3D (lon, lat and vertical levels) climate data field to a specific geographic location (lat/lon).
The function would take 5 arguments: a 2D/3D data variable and associated latitude and longitude variables, as well as the two desired latitude and longitude coordinate values. Returned is either a single value (for 2D field) or a vertical profile (for 3D field).
I am basically looking for an equivalent to the old Basemap function bm.interp(). Cartopy does not have an equivalent. The CDO (Climate Data Operators) operator 'remapbil,lon=/lat=' does the same thing but works directly on netCDF files from the command line, I'm looking for a Python solution.
I think such a function would be a useful addition to the MetPy library as it allows for comparing gridded data (e.g., model or satellite data) with point observations such as from weather stations or radiosonde profiles (treated as just a vertical profile here).
Can you point me in the right direction?
I think what you're looking for already exists in scipy.interpolate (scipy is one of MetPy's dependencies). Here we can use interpn to interpolate linearly in n dimensions:
import numpy as np
from scipy.interpolate import interpn
# Array of synthetic grid to interpolate--ordered z,y,x
a = np.arange(24).reshape(2, 3, 4)
# Locations of grid points along each dimension
z = np.array([1.5, 2.5])
y = np.array([-1., 0., 1.])
x = np.array([-3.5, -1, 1, 3.5])
interpn((z, y, x), a, (2., 0.5, 2.))
This can be done easily with my nctoolkit package (https://nctoolkit.readthedocs.io/en/latest/). It uses CDO as a backend, and defaults to bilinear interpolation. The following would regrid a .nc file to a single grid point and then convert it to an xarray dataset.
import nctoolkit as nc
import pandas as pd
data = nc.open_data("example.nc")
grid = pd.DataFrame({"lon":[0], "lat":[50]})
data.regrid(grid)
ds = data.to_xarray()
To add one more solution, if you're already using multidimensional netCDF files and want a Python solution: check out xarray's interpolation tools. They support multidimensional, label-based interpolation with usage similar to xarray's indexing interface. These are built on top of the same scipy.interpolate otherwise mentioned, and xarray is also a MetPy dependency.
TL;DR: Question: Is there a fast way to interpolate a scattered 2D-dataset at specific coordinates?
And if so could someone provide an example with the provided sample data and variables used from "Current Solution" (as I'm apparently to stupid to implement it myself).
Problem:
I need to interpolate (and if possible also extrapolate) a DataFrame (size = (34, 18)) of scattered data at specific coordinate points. The DataFrame stays always the same.
The interpolation need to be fast as it is done more than 10.000 times in a loop.
The coordinates at which will be interpolated are not know in advance as they change every loop.
Current Solution:
def Interpolation(a, b):
#import external modules
import pandas as pd
from scipy import interpolate
#reading .xlsx file into DataFrame
file = pd.ExcelFile(file_path)
mr_df = file.parse('Model_References')
matrix = mr_df.set_index(mr_df.columns[0])
#interpolation at specific coordinates
matrix = Matrix.stack().reset_index().values
value = interpolate.griddata(matrix[:,0:2], matrix[:,2], (a, b), method='cubic')
return(value)
This method is not acceptable for long time use as only the two lines of code under #interpolation at specific coordinates is more than 95% of the execution time.
My Ideas:
scipy.interpolate.Rbf seems like the best solution if the data needs to be interpolated and extrapolated but as to my understanding it only creates a finer mesh of the existing data and cannot output a interpolated value at specific coordinates
creating a smaller 4x4 matrix of the area around the specific coordinates (a,b) would maybe decrease the execution time per loop, but I do struggle how to use griddata with the smaller matrix. I created a 5x5 matrix with the first row and column being the indexes and the other 4x4 entries is the data with the specific coordinates in the middle.
But I get a TypeError: list indices must be integers or slices, not tuple which I do not understand as I did not change anything else.
Sample Data:
0.0 0.1 0.2 0.3
0.0 -407 -351 -294 -235
0.0001 -333 -285 -236 -185
0.0002 -293 -251 -206 -161
0.00021 -280 -239 -196 -151
Thanks to #Jdog's comment I was able to figure it out:
The creation of a spline once before the loop with scipy.interpolate.RectBivariateSpline and the read out of specific coordinates with scipy.interpolate.RectBivariateSpline.ev decreased the execution time of the interpolation from 255s to 289ms.
def Interpolation(mesh, a, b):
#interpolation at specific coordinates
value = mesh.ev(stroke, current)
return(value)
#%%
#import external modules
import pandas as pd
from scipy import interp
#reading .xlsx file into DataFrame
file = pd.ExcelFile(file_path)
mr_df = file.parse('Model_References')
matrix = mr_df.set_index(mr_df.columns[0])
mesh = interp.RectBivariateSpline(a_index, b_index, matrix)
for iterations in loop:
value = Interpolation(mesh, a, b)