Importing variables from Netcdf into Python - python

I am very new to Python, and I have managed to read in some variables from NetCDF in to Python and plot them, but the size of the variables isn't correct.
My dataset is 144 x 90 (lon x lat) but when I call in the variables, it seems to miss a large section of data.
Do I need to specify the size of the dataset I'm reading in? Is that what I'm doing wrong here?
Here is the code I am using:
import netCDF4
from netCDF4 import Dataset
from pylab import *
ncfile = Dataset('DEC3499.aijE03Ccek11p5A.nc','r')
temp = ncfile.variables['tsurf']
prec = ncfile.variables['prec']
subplot(2,1,1)
pcolor(temp)
subplot(2,1,2)
pcolor(prec)
savefig('DEC3499.png',optimize=True,quality=85)
quit()
Just to clarify, here is an image showing the output. There should be data right to the far right hand side of the box.
(http://img163.imageshack.us/img163/6900/screenshot20130520at112.png)

I figured it out.
For those interested, I just needed to amend the following lines to pull in the variables properly:
temp = ncfile.variables['tsurf'][:,:]
prec = ncfile.variables['prec'][:,:]
Thanks!

Related

Python: Masking ERA5 data (NetCDF) from shapefile (polygon/multipolygon)

I want to select grid cells from ERA5 gridded data (surface level only) that are inside geographical masks for North- and South-Switzerland (plus the radar buffer), to calculate regional means.
The 4 masks (masks) are given as polygons/multipolygons (polygons) in a shapefile and so far for 2 of the masks I was able to use salem roi to get what I want:
radar_north = salem.read_shapefile('radar_north140.shp')
file_radar_north = file.salem.roi(shape=radar_north)
file_radar_north.cape.mean(dim='time').salem.quick_map()
However, for the radar_south and alpensuedseite shapefiles the code didn´t work at the beginning (wrong selection or shows no data), and now the nothing works anymore (?). I don´t know why, as I have not changed anything from the first time to the second.
If someone sees the issue or knows a different way to mask the ERA data (which is maybe quicker) I would be grateful! (I was unsuccessfull with the answers from similar questions here).
Best
Lena
This could work if you are working on netcdf files
import geopandas as gpd
import xarray as xr
import rioxarray
from shapely.geometry import mapping
# load shapefile with geopandas
radar_north = gpd.read_file('radar_north140.shp')
# load ERA5 netcdf with xarray
era = xr.open_dataset('ERA5.nc')
# add projection system to nc
era = era.rio.write_crs("EPSG:4326", inplace=True)
# mask ERA5 data with shapefile
era_radar_north = era.rio.clip(radar_north.geometry.apply(mapping), radar_north.crs)

How can I cut a portion of a satellite image based on coordinates? (gdal)

I have a satellite image of 7-channels (Basically I have seven .tif files, one for each band). And I have a .csv file with coordinates of points-of-interest that are in the region shot by the satellite. I want to cut small portions of the image in the surroundings of each coordinate point. How could I do that?
As I don't have a full working code right now, it really doesn't matter the size of those small portions of image. For the explanation of this question let's say that I want them to be 15x15 pixels. So for the moment, my final objective is to obtain a lot of 15x15x7 vectors, one for every coordinate point that I have in the .csv file. And that is what I am stucked with. (the "7" in the "15x15x7" is because the image has 7 channels)
Just to give some background in case it's relevant: I will use those vectors later to train a CNN model in keras.
This is what I did so far: (I am using jupyter notebook, anaconda environment)
imported gdal, numpy, matplotlib, geopandas, among other libraries.
Opened the .gif files using gdal, converted them into arrays
Opened the .csv file using pandas.
Created a numpy array called "imagen" of shape (7931, 7901, 3) that will host the 7 bands of the satellite image (in form of numbers). At this point I just need to know which rows and colums of the array "imagen" correspond to each coordinate point. In other words I need to convert every coordinate point into a pair of numbers (row,colum). And that is what I am stucked with.
After that, I think that the "cutting part" will be easy.
#I import libraries
from osgeo import gdal_array
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import geopandas
from geopandas import GeoDataFrame
from shapely.geometry import Point
#I access the satellite images (I just show one here to make it short)
b1 = r"E:\Imágenes Satelitales\2017\226_86\1\LC08_L1TP_226086_20170116_20170311_01_T1_sr_band1.tif"
band1 = gdal.Open(b1, gdal.GA_ReadOnly)
#I open the .csv file
file_svc = "C:\\Users\\Administrador\Desktop\DeepLearningInternship\Crop Yield Prediction\Crop Type Classification model - CNN\First\T28_Pringles4.csv"
df = pd.read_csv(file_svc)
print(df.head())
That prints something like this:
Lat1 Long1 CropingState
-37.75737 -61.14537 Barbecho
-37.78152 -61.15872 Verdeo invierno
-37.78248 -61.17755 Barbecho
-37.78018 -61.17357 Campo natural
-37.78850 -61.18501 Campo natural
#I create the array "imagen" (I only show one channel here to make it short)
imagen = (np.zeros(7931*7901*7, dtype = np.float32)).reshape(7931,7901,7)
imagen[:,:,0] = band1.ReadAsArray().astype(np.float32)
#And then I can plot it:
plt.imshow(imagen[:,:,0], cmap = 'hot')
plt.plot()
Which plots something like this:
(https://github.com/jamesluc007/DeepLearningInternship/blob/master/Crop%20Yield%20Prediction/Crop%20Type%20Classification%20model%20-%20CNN/First/red_band.png)
I want to transform those (-37,-61) into something like (2230,1750). But I haven't figured it how yet. Any clues?

Substitute dataset coordinates in xarray (Python)

I have a dataset stored in NetCDF4 format that consists of Intensity values with 3 dimensions: Loop, Delay and Wavelength. I named my coordinates the same as the dimensions (I don't know if it's good or bad...)
I'm using xarray (formerly xray) in Python to load the dataset:
import xarray as xr
ds = xr.open_dataset('test_data.netcdf4')
Now I want to manipulate the data while keeping track of the original data. For instance, I would:
Apply an offset to the Delay coordinates and keep the original Delay dataarray untouched. This seems to be done with:
ds_ = ds.assign_coords(Delay_corr=ds_.Delay.copy(deep=True) + 25)
Substitute the coordinates Delay for Delay_corr for all relevant dataarrays in the dataset. However, I have no clue how to do this and I didn't find anything in the documentation.
Would anybody know how to perform item #2?
To download the NetCDF4 file with test data:
http://1drv.ms/1QHQTRy
The method you're looking for is the xr.swap_dims() method:
ds.coords['Delay_corr'] = ds.Delay + 25 # could also use assign_coords
ds2 = ds.swap_dims({'Delay': 'Delay_corr'})
See this section of the xarray docs for a full example.
I think it's much simpler than that.
If you don't want to change the existing data, you create a copy. Note that changing ds won't change the netcdf4 file, but assuming you still don't want to change ds:
ds_ = ds.copy(deep=True)
Then just set the Delay coord as a modified version of the old one
ds_.coords['Delay'] = ds_['Delay'] + 25

How can I write a velocity field to a VTI image with anaconda Python?

I am trying to write a VTK Image Data file (.vti) with python. For my python coding I am using the Anaconda distribution. I am using the evtk package, which has the ability to write a vtk file.
The data I need to write is a velocity for which I have the 3d X,Y,Z and U,V,W 3d arrays. I have found some sample code which uses the evtk package to write a .vti file.(http://www.vtk.org/Wiki/VTK/Writing_VTK_files_using_python)
The problem is that the sample code and built in functions only take scalar point or cell data. So I am able to write a file with scalars, but I need it to have the data as vectors.
I am digging through the actual package files and trying to find a solution or tools to code one.I would extremely appreciate if somebody had suggestions or solutions to give me a hand.
I enclose the test code I have written from info on the wiki just in case I am missing a way of inputing to the function, but I fear I am going to need to start from scratch.
Thanks in advance
(removed the code since the one bellow is more recent)
Managed to write an unstructured file (.vtu), but I would really like to be a able to write an Image Data file.(Found the following link helpful during the process. http://www.aero.iitb.ac.in/~prabhu/tmp/python_cep07/course_handouts/viz3d_handout.pdf)
Thanks again in advance
I attach the code to see if anybody has any suggestions.
from tvtk.api import tvtk, write_data
import numpy as N
##Generation of data
#array of x,y,z coordinates
[Z,Y,X] = N.mgrid[-2.:2+1, -2.:2+1, -2.:2+1]
#array of zeros to add the u,v,w components
[W,V,U] = N.zeros_like([Z,Y,X],dtype=float)
#loop through data to have correct format
points = N.array([N.zeros(3) for i in range(len(Z)*len(Z[0])*len(Z[0][0]))])
velF = N.zeros_like(points)
c=0
for k in range(len(Z)):
for j in range(len(Z[0])):
for i in range(len(Z[0][0])):
#coordinates of point
x = X[k][j][i]
y = Y[k][j][i]
z = Z[k][j][i]
points[c] = N.array([x,y,z])
#test velocity field
u = k -2.
v = 0.
w = 0.
velF[c] = N.array([u,v,w])
#update counter
c = c+1
##Generate and write the vtk file
Ugrid = tvtk.UnstructuredGrid()
Ugrid.points = points
Ugrid.point_data.vectors = velF
Ugrid.point_data.vectors.name = 'velocity'
write_data(Ugrid, 'vtktest.vtu')
If you want to write the unstructured grids using evtk, here you can find a full demo with point and cell data (both vector and scalar fields): https://gist.github.com/dromanov/0fb8bacff5342a56a690.
Description of the technique and explanations are here: http://spikard.blogspot.ru/2015/07/visualization-of-unstructured-grid-data.html.
Good luck!

Read/Open a modis aqua .hdf file and display/plot the output in gdal and matplotlib

I have tried and search on how to solve this but still can't find a way on how to do read and plot this in gdal and matplotlib from a given Modis Aqua .hdf file. Any help is much appreciated. By the way am using Python 2.7.5 in Windows 7. The filename is A2014037040000.L2_LAC.SeAHABS.hdf.Among the Geophysical Datas of the hdf file I will only be using the chlor_a.
Update:
Here is the link of the sample file.
A2014037040500.L2_LAC.SeAHABS.hdf
The trick with HDF's is that most of the time you need a specific subdataset. If you use GDAL you need to open the HDF pointing directly to that subdataset:
import gdal
import matplotlib.pyplot as plt
ds = gdal.Open('HDF4_SDS:UNKNOWN:"MOD021KM.A2013048.0750.hdf":6')
data = ds.ReadAsArray()
ds = None
fig, ax = plt.subplots(figsize=(6,6))
ax.imshow(data[0,:,:], cmap=plt.cm.Greys, vmin=1000, vmax=6000)
You can also open the 'main' HDF file and inspect the subdatasets, and go from there:
# open the main HDF
ds = gdal.Open('MOD021KM.A2013048.0750.hdf')
# get the path for a specific subdataset
subds = [sd for sd, descr in ds.GetSubDatasets() if descr.endswith('EV_250_Aggr1km_RefSB (16-bit unsigned integer)')][0]
# open and read it like normal
dssub = gdal.Open(subds)
data = dssub.ReadAsArray()
dssub = None
ds = None
You should try setting datatype for the MODIS dataset. I guess it is 16 bits unsigned
ds= gdal.Open(hdfpath)
data = ds.GetRasterBand(N).ReadAsArray().astype(numpy.uint16)
N is the band number for your data of interest. You can try open it with QGIS or ENVI to see the structure of the HDF file.
Remember that the bands starts at 1 and not as 0. First band is 1.
Hope it helps

Categories

Resources