creating vector using netcdf into array

creating vector using netcdf into array - python

I'm fairly new to python and have found stack overflow one of the best resources out there, now I'm hoping someone can help me with what I believe is a fairly basic question.
I'm looking to create a land mask from a list of lats and lons and rainfall data extracted from a netCDF file. I need to get the data from the netcdf file to line up so I can remove rows which have a rainfall value of '-9999.' (indicating no data because its over the ocean). I can access the file, I can create a mesh grid, but when it comes to inserting the rainfall data for the final check I'm getting odd shapes and no luck with the logical test. Can someone have a look at this code and let me know what you think?
from netCDF4 import Dataset
import numpy as np
f=Dataset('/Testing/Ensemble_grid/1970_2012_eMAST_ANUClimate_mon_evap_v1m0_197001.nc')
lat = f.variables['latitude'][:]
lon = f.variables['longitude'][:]
rainfall = np.array(f.variables['lwe_thickness_of_precipitation_amount'])
lons, lats = np.meshgrid(lon,lat)
full_ary = np.array((lats,lons))
full_lats_lons = np.swapaxes(full_ary,0,2)
rain_data = np.squeeze(rainfall,axis=(0,))
grid = np.array((full_lats_lons,rain_data))
full_grid = np.expand_dims(grid,axis=1)
full_grid_col = np.swapaxes(full_grid,0,1)
land_grid = np.logical_not(full_grid_col[:,1]==-9999.)

Here is an alternative method that simply creates a new 2D variable, landmask, where each grid cell is either 0 (ocean) or 1 (land). (I like to use 1 and 0 landmasks because you can transform it into a boolean numpy array and do quick land-averages this way.)
import netCDF4
import numpy as np
ncfile = netCDF4.('/path/to/your/ncfile.nc', 'r')
lat = ncfile.variables['lat'][:]
lon = ncfile.variables['lon'][:]
# Presuming here that rainfall is 2D, if not, just read in the first time step, i.e. [0,:,:]
rain = ncfile.variables['lwe_thickness_of_precipitation_amount'][:,:]
ncfile.close()
nlat, nlon = len(lat), len(lon)
# Populate a 2D landmask array, where 1=land and 0=ocean
landmask = np.zeros([nlat, nlon], dtype='int')
for y in range(nlat):
for x in range(nlon):
if rain[y,x]!=-9999: # We're at a land point
landmask[y,x] = 1
# Now you can write out the landmask into a new netCDF file
filename_out = './landmask.nc'
ncfile_out = netCDF4.Dataset(filename_out, 'w')
ncfile_out.createDimension('lat', nlat)
ncfile_out.createDimension('lon', nlon)
lat_out = ncfile_out.createVariable('lat', 'f4', ('lat',))
lon_out = ncfile_out.createVariable('lon', 'f4', ('lon',))
landmask_out = ncfile_out.createVariable('landmask', 'i', ('lat', 'lon',))
setattr(lat_out, 'units', 'degrees_north')
setattr(lon_out, 'units', 'degrees_east')
setattr(landmask_out, 'description', '1=land 0=ocean')
lat_out[:] = lat
lon_out[:] = lon
landmask_out[:,:] = landmask[:,:]
ncfile_out.close()

Ian, you need to put a repeatable example up here...
I suspect what you need is something like this;
x = np.array([[1, 2, 3], [4, 5, 6]], np.int32)
x.flat

Related

How to use apply_ufunc depending on data dims?

I want to interpolate 3D array air (time, lat, lon) with one 2D array newlat (time, lon) which depends on time and lon.
For loop method
import xarray as xr
import numpy as np
from scipy.interpolate import interp1d
air = (
xr.tutorial.load_dataset("air_temperature")
.air.sortby("lat") # np.interp needs coordinate in ascending order
.isel(time=slice(4), lon=slice(3))
) # choose a small subset for convenience
newlat = xr.DataArray(np.random.rand(air.sizes['time'], air.sizes['lon'])*75,
dims=['time', 'lon'],
coords={'time': air.time, 'lon': air.lon}
)
# create empty array to save result
result = np.empty((4, 3))
# loop each dim
for t in range(air.sizes['time']):
for lon in range(air.sizes['lon']):
# interpolation relying on time and lon
f = interp1d(air.lat, air.isel(time=t, lon=lon), kind='linear', fill_value='extrapolate')
result[t, lon] = f(newlat.isel(time=t, lon=lon))
apply_ufunc method
def interp1d_np(data, x, xi):
f = interp1d(x, data, kind='linear', fill_value='extrapolate')
return f(xi)
t_index = 0
lon_index = 0
xr.apply_ufunc(
interp1d_np,
air.isel(time=t_index, lon=lon_index),
air.lat,
newlat.isel(time=t_index, lon=lon_index),
input_core_dims=[["lat"], ["lat"], []],
exclude_dims=set(("lat",)),
vectorize=True,
)
Note that t_idnex and lon_index are the same for input air and newlat.
The code above only works for one specific part of air. How to apply it to the whole air DataArray?
Temperary Solution
We can just use the embedded function like this:
air.interp(lat=newlat, kwargs={"fill_value": None})
But, I'm still curious of how to use apply_ufunc in this situation, because users may have their own functions instead of the simple interpolation.

You're basically there. Delete the .isel calls and it will work! vectorize=True will automatically loop over the "non core dimensions" i.e. time and lon in this case.
xr.apply_ufunc(
interp1d_np,
air,
air.lat,
newlat,
input_core_dims=[["lat"], ["lat"], []],
exclude_dims=set(("lat",)),
vectorize=True,
)

Is there a simple way of getting an xyz array from xarray dataset?

Is there a simple way of getting an array of xyz values (i.e. an array of 3 cols and nrows = number of pixels) from an xarray dataset? Something like what we get from the rasterToPoints function in R.
I'm opening a netcdf file with values for a certain variable (chl). I'm not able to add images here directly, but here is a screenshot of the output:
Xarray dataset structure
I need to end with an array that have this structure:
[[lon1, lat1, val],
[lon1, lat2, val]]
And so on, getting the combination of lon/lat for each point. I'm sorry if I'm missing something really obvious, but I'm new to Python.

The natural format you are probably looking for here is a pandas dataframe, where lon, lat and chl are columns. This can be easily created using xarray's to_dataframe method, as follows.
import xarray as xr
ds = xr.open_dataset("infile.nc")
df = (
ds
.to_dataframe()
.reset_index()
)

I can suggest you a small pseudo-code:
import numpy as np
lons = ds.variables['lon'].load()
lats = ds.variables['lat'].load()
chl = ds.variables['chl'].load()
xm,ym = np.meshgrid(lons,lats)
dataout = np.concatenate((xm.flatten()[np.newaxis,:],ym.flatten()[np.newaxis,:],chla.flatten()[np.newaxis,:]),axis=0)
Might be it does not work out-of-the box, but at least one solution could be similar with this.

Connecting points in a specific order

The full code (excluding the path finding algorithm) I am about to describe can be found on Code Review.
I am reading in 10 co-ordinates from a text file in Python. I then proceed to pass the latitude and longitude co-ordinates to a function which prints the points as follows.
def read_two_column_file(file_name):
with open(file_name, 'r') as f_input:
csv_input = csv.reader(f_input, delimiter=' ', skipinitialspace=True, )
long = []
lat = []
for col in csv_input:
x = float(col[0]) # converting to float
y = float(col[1])
long.append(x)
lat.append(y)
return long, lat
def display_points(long, lat):
plt.figure()
plt.gca().set_aspect('equal', adjustable='box')
plt.ylabel('latitude')
plt.xlabel('longitude')
plt.title('longitude vs latitude')
plt.scatter(lat, long)
plt.orientation = u'vertical'
plt.grid('True')
plt.show()
Sample Input:
35.905333, 14.471970
35.896389, 14.477780
35.901281, 14.518173
35.860491, 14.572245
35.807607, 14.535320
35.832267, 14.455894
35.882414, 14.373217
35.983794, 14.336096
35.974463, 14.351006
35.930951, 14.401137
Plot:
This plots points on a map, and the idea is to find the shortest possible route from a starting point to an end point. Forgetting about the algorithm which does so, let us say I get an output representing the route as:
[2, 1, 0, 9, 8, 7, 6, 5, 4, 3, 2]
How can I translate these nodes back to the co-ordinates they are representing in order to connect them on Matplotlib?

Transform your latitude and longitude into numpy arrays:
long = np.array(long)
lat = np.array(lat)
I would advise to do it in read_two_column_file directly.
Then if the path is in the variable path, you can do directly:
plt.plot(long[path], lat[path])

Using python to take a 32x32 matrices append many of these matrices to a single array then adding a timestamp index to each matrix

I am very new to coding python and I am working with a .CSV file that gives me a 32x32 matrix in a 1024 column row with a time stamp. I reshaped the data to give me 32x32 arrays and looped through each row appending the matrices to a numpy array.
`i = 0
while i < len(df_array):
if i == 0:
spec = np.reshape(df_array[i][np.arange(1,1025)], (32,32))
spectrum_matrix = spec
else:
spec = np.reshape(df_array[i][np.arange(1,1025)], (32,32))
spectrum_matrix = np.concatenate((spectrum_matrix, spec), axis = 0)
i = i + 1
print("job done")`
What I would like to do is to add the time stamp from the original data file and add them to each of the matrices thus allowing me to re sample the data over a 5 minute average. I also would like to plot the bins a to get a plot similar to this Drop size distribution
As a reference I am reading in the data .CSV with pandas and here is an example of a portion of the raw data: 01.06.2017;18:22:20;0.122;0.00;51;7.401;10375;18745;57;27;0.00;23.6;0.110;0;
<SPECTRUM>;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
The ;'s after the SPECTRUM is the 32x32 matrix.
Thanks in advance for any help!

Python and associated packages can do many things without loops
From my understanding of your data you have a (8640 x 32 x 32) Data Structure (time x size x velocity).
Pandas works very well with 2D data structures, however for higher dimensional data I would recommend you get familiar with xarray. With this package along with pandas you can create and manipulate your data without having to resort to loops.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import xarray as xr
import seaborn as sns
%matplotlib inline
#create random data
data = (np.random.binomial(n =5, p =0.2, size =(8640,32,32))*1000).astype(int)
#create labels for data
sizes= np.linspace(1,5,32)
velocities = np.linspace(1,1000, num = 32)
#make time range of 24 hours with 10sec intervals
ind = pd.date_range(start='2014-01-01', periods=8640, freq='10s')
#convert data to xarray 3D data structure
df = xr.DataArray(data, coords = [ind, sizes, velocities],
dims = ['time', 'size', 'speed'])
#make a 5 min average of the data
min_average= df.resample('300s', dim = 'time', how = 'mean')
#plot sample of data and 5 min average
my1d = min_average.isel(size = 5, speed= 10)
my1d.plot(label = '5 min avg')
plt.gca()
df.isel(size = 5, speed =10).plot(alpha = 0.3, c = 'r', label = 'raw_data')
plt.legend()
As for making a distribution plot like you linked things become a bit trickier but is possible:
#transform your data to only have mean speed for each time and size
#and convert to pandas dataframe
mean_speed =min_average.mean(dim = ['speed'])
#for some reason xarray make you name the new column when you convert
#to a pandas dataframe. I then get rid of the extra empty variable with
#a list comprehension
df= mean_speed.to_dataframe('').unstack().T
df.index = np.array([np.array(i)[1].astype(float) for i in df.index])
#make a contourplot of your new data
plt.contourf(df.columns, df.index, df.values, cmap ='PuBu_r')
plt.title('mean speed')
plt.ylabel('size')
plt.xlabel('time')
plt.colorbar()

How to create a grid from LiDAR points (X,Y,Z) with GDAL python?

I'm new really to python programming, and I was just wondering if you can create a regular grid of 0.5 by o.5 m of resolution using LiDAR points.
My data are in LAS format (reading with from liblas import file as lasfile) and they have the following format: X,Y,Z. Where X and Y are coordinates.
The points are randomly positioned and some pixel are empty (NAN value) and in some pixel there are more of one points. Where there are more of one point, I wish to obtain a mean value. In the end i need to save the data in a TIF format or Ascii format.
I am studying osgeo module and GDAL but I honest to say that i don't know if osgeo module is the best solution.
I am really glad for help with some code that i can study and implement,
Thanks in Advance for the help, I really need.
I don't know the best way to get a grid with these parameters.

It's a bit late but maybe this answer will be useful for others, if not for you...
I have done this with Numpy and Pandas, and it's pretty fast. I was using TLS data and could do this with several million data points without any trouble on a decent 2009-vintage laptop. The key is 'binning' by rounding the data, and then using Pandas' GroupBy methods to do the aggregating and calculate the means.
If you need to round to a power of 10 you can use np.round, otherwise you can round to an arbitrary value by making a function to do so, which I have done by modifying this SO answer.
import numpy as np
import pandas as pd
# make rounding function:
def round_to_val(a, round_val):
return np.round( np.array(a, dtype=float) / round_val) * round_val
# load data
data = np.load( 'shape of ndata, 3')
n_d = data.shape[0]
# round the data
d_round = np.empty( [n_d, 5] )
d_round[:,0] = data[:,0]
d_round[:,1] = data[:,1]
d_round[:,2] = data[:,2]
del data # free up some RAM
d_round[:,3] = round_to_val( d_round[:,0], 0.5)
d_round[:,4] = round_to_val( d_round[:,1], 0.5)
# sorting data
ind = np.lexsort( (d_round[:,4], d_round[:,3]) )
d_sort = d_round[ind]
# making dataframes and grouping stuff
df_cols = ['x', 'y', 'z', 'x_round', 'y_round']
df = pd.DataFrame( d_sort)
df.columns = df_cols
df_round = df[['x_round', 'y_round', 'z']]
group_xy = df_round.groupby(['x_round', 'y_round'])
# calculating the mean, write to csv, which saves the file with:
# [x_round, y_round, z_mean] columns. You can exit Python and then start up
# later to clear memory if that's an issue.
group_mean = group_xy.mean()
group_mean.to_csv('your_binned_data.csv')
# Restarting...
import numpy as np
from scipy.interpolate import griddata
binned_data = np.loadtxt('your_binned_data.csv', skiprows=1, delimiter=',')
x_bins = binned_data[:,0]
y_bins = binned_data[:,1]
z_vals = binned_data[:,2]
pts = np.array( [x_bins, y_bins])
pts = pts.T
# make grid (with borders rounded to 0.5...)
xmax, xmin = 640000.5, 637000
ymax, ymin = 6070000.5, 6067000
grid_x, grid_y = np.mgrid[640000.5:637000:0.5, 6067000.5:6070000:0.5]
# interpolate onto grid
data_grid = griddata(pts, z_vals, (grid_x, grid_y), method='cubic')
# save to ascii
np.savetxt('data_grid.txt', data_grid)
When I've done this, I have saved the output as a .npy and converted to a tiff with the Image library, and then georeferenced in ArcMap. There is probably a way to do that with osgeo but I haven't used it.
Hope this helps someone at least...

You can use the histogram function in Numpy to do binning, for instance:
import numpy as np
points = np.random.random(1000)
#create 10 bins from 0 to 1
bins = np.linspace(0, 1, 10)
means = (numpy.histogram(points, bins, weights=data)[0] /
numpy.histogram(points, bins)[0])

Try LAStools, particularly lasgrid or las2dem.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

creating vector using netcdf into array - python

Ian, you need to put a repeatable example up here... I suspect what you need is something like this; x = np.array([[1, 2, 3], [4, 5, 6]], np.int32) x.flat

Related

How to use apply_ufunc depending on data dims?

Is there a simple way of getting an xyz array from xarray dataset?

Connecting points in a specific order

Using python to take a 32x32 matrices append many of these matrices to a single array then adding a timestamp index to each matrix

How to create a grid from LiDAR points (X,Y,Z) with GDAL python?

Categories

Resources