I have a 2D array of satellite data, and two corresponding 2D arrays giving the latitude and longitude of each pixel.
The data array is a masked array.
When I plot it up using pcolormesh, it looks like this:
m.pcolormesh(lon, lat, data)
I am attempting to grid this data on to a 0.25x0.25 deg grid.
lonGrid = arange(0, 360, 0.25)
latGrid = arange(-90, 90 0.25)
dataGridded = griddata(lon.ravel(),lat.ravel(),data.ravel(),latGrid,lonGrid, interp='linear')
m.pcolormesh(lonGrid, latGrid, dataGridded)
However, the resulting plot comes out as this:
It seems like this error has something to do with pcolormesh filling in the space between masked values. But I am unsure how to fix this.
Thanks
EDIT:
I was able to use the scipy version of griddata to get this to work...but its much slower and the syntax is more clunky. I would still appreciate some help getting the mpl(?) version above to work
from scipy.interpolate import griddata as griddata2
lonGrid,latGrid = meshgrid(lonGrid,latGrid)
dataGrid = griddata2((lon.ravel(),lat.ravel()),data.ravel(),(lonGrid,latGrid), method = 'linear')
dataGrid = ma.masked_where((dataGrid < 0) | isnan(dataGrid), dataGrid)
m.pcolormesh(lonGrid, latGrid, dataGridded)
Here are a couple initial troubleshooting ideas.
What version of Numpy are you using? If 1.09 or earlier the .ravel() will not return a masked array if given a masked array. See here.
The data array "wind" became "data". Is "data" truly masked? What happened between the two? Some more code would be useful.
dataGridded = griddata(lon.ravel(),lat.ravel(),XXXX.ravel(),latGrid,lonGrid, interp='linear')
Related
I have a netcdf file with a spatial resolution of 0.05º and I want to regrid it to a spatial resolution of 0.01º like this other netcdf. I tried using scipy.interpolate.griddata, but I am not really getting there, I think there is something that I am missing.
original_dataset = xr.open_dataset('to_regrid.nc')
target_dataset= xr.open_dataset('SSTA_L4_MED_0_1dg_2022-01-18.nc')
According to scipy.interpolate.griddata documentation, I need to construct my interpolation pipeline as following:
grid = griddata(points, values, (grid_x_new, grid_y_new),
method='nearest')
So in my case, I assume it would be as following:
#Saving in variables the old and new grids
grid_x_new = target_dataset['lon']
grid_y_new = target_dataset['lat']
grid_x_old = original_dataset ['lon']
grid_y_old = original_dataset ['lat']
points = (grid_x_old,grid_y_old)
values = original_dataset['analysed_sst'] #My variable in the netcdf is the sea surface temp.
Now, when I run griddata:
from scipy.interpolate import griddata
grid = griddata(points, values, (grid_x_new, grid_y_new),method='nearest')
I am getting the following error:
ValueError: shape mismatch: objects cannot be broadcast to a single
shape
I assume it has something to do with the lat/lon array shapes. I am quite new to netcdf field and don't really know what can be the issue here. Any help would be very appreciated!
In your original code the indices in grid_x_old and grid_y_old should correspond to each unique coordinate in the dataset. To get things working correctly something like the following will work:
import xarray as xr
from scipy.interpolate import griddata
original_dataset = xr.open_dataset('to_regrid.nc')
target_dataset= xr.open_dataset('SSTA_L4_MED_0_1dg_2022-01-18.nc')
#Saving in variables the old and new grids
grid_x_old = original_dataset.to_dataframe().reset_index().loc[:,["lat", "lon"]].lon
grid_y_old = original_dataset.to_dataframe().reset_index().loc[:,["lat", "lon"]].lat
grid_x_new = target_dataset.to_dataframe().reset_index().loc[:,["lat", "lon"]].lon
grid_y_new = target_dataset.to_dataframe().reset_index().loc[:,["lat", "lon"]].lat
values = original_dataset.to_dataframe().reset_index().loc[:,["lat", "lon", "analysed_sst"]].analysed_sst
points = (grid_x_old,grid_y_old)
grid = griddata(points, values, (grid_x_new, grid_y_new),method='nearest')
I recommend using xesm for regridding xarray datasets. The code below will regrid your dataset:
import xarray as xr
import xesmf as xe
original_dataset = xr.open_dataset('to_regrid.nc')
target_dataset= xr.open_dataset('SSTA_L4_MED_0_1dg_2022-01-18.nc')
regridder = xe.Regridder(original_dataset, target_dataset, "bilinear")
dr_out = regridder(original_dataset)
I have a 1D ray containing data that looks like this (48000 points), spaced by one wavenumber (R = 1 cm-1). The shape of the x and y array is (48000, 1), I want to rebin both in a similar way
xarr=[50000,9999,9998,....,2000]
yarr=[0.1,0.02,0.8,0.5....0.1]
I wish to decrease the spatial resolution, lets say R= 10 cm-1), so I want ten times less points (4800), from 50000 to 2000. And do the same for the y array
How to start?
I try by taking the natural log of the wavelength scale, then re-bin this onto a new log of wavelength scale generate using np.linspace()
xi=np.log(xarr[0])
xf=np.log(xarr[-1])
xnew=np.linspace(xi, xf, num=4800)
now I need to recast the y array into this xnew array, I am thinking of using rebin, a 2D rebin, but not sure how to use this. Any suggestions?
import numpy as np
arr1=[2,3,65,3,5...,32,2]
series=np.array(arr1)
print(series[:3])
I tried this and it seems to work!
import numpy as np
import scipy.stats as stats
#irregular x and y arrays
yirr= np.random.randint(1,101,10)
xirr=np.arange(10)
nbins=5
bin_means, bin_edges, binnumber = stats.binned_statistic(xirr,yirr, 'mean', bins=nbins)
yreg=bin_means # <== regularized yarr
xi=xirr[0]
xf=xirr[-1]
xreg=np.linspace(xi, xf, num=nbins)
print('yreg',yreg)
print('xreg',xreg) # <== regularized xarr
If anyone can find an improvement or see a problem with this, please post!
I'll try it on my logarithmically scaled data now
I have a highdimensional dataset in the form of a numpy matrix. I want to plot my data using parallel axis. Also, I later want to be able to add some highlighted data to the plot. I would like to create a legend and name the axis manually. Example code would be very great since I am a python beginner.
I have already tried using pandas but that didn't work like I would like since I didn't find a way to work with a numpy matrix in an easy way.
tsne = TSNE(n_components=2, perplexity=60, learning_rate=200)
X_embedding = tsne.fit_transform(x)
if x is a sparse matrix you need to pass it as X_embedding = tsne.fit_transform(x.toarray()) , .toarray() will convert the sparse matrix into dense matrix
for_tsne = np.hstack((X_embedding, y.values.reshape(-1,1)))
for_tsne_df = pd.DataFrame(data=for_tsne, columns=['Dimension_x','Dimension_y','Score'])
colors = {0:'red', 1:'green'}
plt.scatter(for_tsne_df['Dimension_x'], for_tsne_df['Dimension_y'], c=for_tsne_df['Score'].apply(lambda x: colors[x]))
sns.FacetGrid(for_tsne_df, hue='Score',height=6).map(plt.scatter,"Dimension_x","Dimension_y").add_legend()
plt.title('TSNE with TFIDF encoding of project_title feature with perplexity=60')
plt.show()
I need to regrid data on a irregular grid (lambert conical) to a regular grid. I think pyresample is my best bet. Infact my original lat,lon are not 1D (which seems to be needed to use basemap.interp or scipy.interpolate.griddata).
I found this SO's answer helpful. However I get empty interpolated data. I think it has to do with the choice of my radius of influence and with the fact that my data are wrapped (??).
This is my code:
import numpy as np
from matplotlib import pyplot as plt
import netCDF4
%matplotlib inline
url = "http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/NARR/Dailies/monolevel/hlcy.2009.nc"
SRHtemp = netCDF4.Dataset(url).variables['hlcy'][0,::]
Y_n = netCDF4.Dataset(url).variables['y'][:]
X_n = netCDF4.Dataset(url).variables['x'][:]
T_n = netCDF4.Dataset(url).variables['time'][:]
lat_n = netCDF4.Dataset(url).variables['lat'][:]
lon_n = netCDF4.Dataset(url).variables['lon'][:]
lat_n and lon_n are irregular and the latitude and longitude corresponding to the projected coordinates x,y.
Because of the way lon_n is, I added:
lon_n[lon_n<0] = lon_n[lon_n<0]+360
so that now if I plot them they look nice and ok:
Then I create my new set of regular coordinates:
XI = np.arange(148,360)
YI = np.arange(0,87)
XI, YI = np.meshgrid(XI,YI)
Following the answer above I wrote the following code:
from pyresample.geometry import SwathDefinition
from pyresample.kd_tree import resample_nearest
def_a = SwathDefinition(lons=XI, lats=YI)
def_b = SwathDefinition(lons=lon_n, lats=lat_n)
interp_dat = resample_nearest(def_b,SRHtemp,def_a,radius_of_influence = 70000,fill_value = -9.96921e+36)
the resolution of the data is about 30km, so I put 70km, the fill_value I put is the one from the data, but of course I can just put zero or nan.
however I get an empty array.
What do I do wrong? also - if there is another way of doing it, I am interested in knowing it. Pyresample documentation is a bit thin, and I need a bit more help.
I did find this answer suggesting to use another griddata function:
import matplotlib.mlab as ml
resampled_data = ml.griddata(lon_n.ravel(), lat_n.ravel(),SRHtemp.ravel(),XI,YI,interp = "linear")
and it seems to be ok:
But I would like to understand more about pyresample, since it seems so powerful.
The problem is that XI and XI are integers, not floats. You can fix this by simply doing
XI = np.arange(148,360.)
YI = np.arange(0,87.)
XI, YI = np.meshgrid(XI,YI)
The inability to handle integer datatypes is an undocumented, unintuitive, and possibly buggy behavior from pyresample.
A few more notes on your coding style:
It's not necessary to overwrite the XI and YI variables, you don't gain much by this
You should just load the netCDF dataset once and the access the variables via that object
I have gridded data over the contiguous United States and I'm trying to select a chunk of it over a specific area.
import numpy as np
from netCDF4 import Dataset
import matplotlib.pyplot as plt
filename = '/Users/me/myfile.nc'
full_data = Dataset(filename,'r')
latitudes = full_data.variables['latitude'][0,:,:]
longitudes = full_data.variables['longitude'][0,:,:]
temperature = full_data.variables['temperature'][0,:,:]
All three variables are 2-dimensional matrices of shape (337,451). I'm trying to do the following to get a sub-selection of the data over a specific region.
index = (latitudes>=44.0)&(latitudes<=45.0)&(longitudes>=-91.0)&(longitudes<=-89.0)
temp_subset = temperature[index]
lat_subset = latitudes[index]
lon_subset = longitudes[index]
I would expect all three of these variables to be 2-dimensional, but instead they all return a flattened array with a shape of (102,). I've tried another approach:
index2 = np.where((latitudes>=44.0)&(latitudes<=45.0)&(longitudes>=-91.0)&(longitudes<=-89.0))
temp = temperatures[index2[0],:]
temp2 = temp[:,index2[1]]
plt.imshow(temp2,origin='lower')
plt.colobar()
But my data looks quite incorrect. Is there a better way to get a 2D subset grid from a larger grid?
Edub,
I suggest looking on at numpy's matrix indexing documentation, specifically http://docs.scipy.org/doc/numpy-1.10.1/user/basics.indexing.html#other-indexing-options . Currently, you are providing two dimensions for indexing, but no slicing information (resulting in only receiving one dimensional results). I hope this proves useful!