Weird plot for netCDF data with matplotlib - python

I've been searching for a pathetically long time for this, so I would appreciate any help or hint I can get.
I'm trying to plot some sea ice freeboard data (netCDF, Gridded total freeboard) on the Antarctic sea, but the data that should plot nicely around Antarctica lies at the bottom of my image. NetCDF and matplotlib are fairly new to me so maybe the error could be e.g. with handling the dimensions or the projection.
from scipy.io.netcdf import netcdf_file as Dataset
import numpy as np
import matplotlib.pyplot as plt
FB = Dataset('./datasets/fb-0217-0320.nc', 'r')
f = FB.variables['f'][:,:]
lat = FB.variables['lat'][:,0]
lon = FB.variables['lon'][0,:]
masked_fb = np.ma.masked_where(np.isnan(f), f)
mtx_lon, mtx_lat = np.meshgrid(lon, lat)
m = Basemap(projection='spstere',boundinglat=-50, lon_0=180., resolution='l')
m.bluemarble()
plt.figure()
m.pcolormesh(mtx_lon, mtx_lat, masked_fb, latlon=True)
plt.show()
ncdump gives:
dimensions:
x = 79 ;
y = 83 ;
variables:
float lat(y, x) ;
lat:standard_name = "latitude" ;
lat:long_name = "latitude coordinate" ;
lat:units = "degrees_north" ;
float lon(y, x) ;
lon:standard_name = "longitude" ;
lon:long_name = "longitude coordinate" ;
lon:units = "degrees_east" ;
float f(y, x) ;
f:long_name = "total_freeboard" ;
f:units = "mm" ;
f:coordinates = "lat lon" ;
One weird thing I noticed is that min lat is -5156.6201 but I didn't know how to count how many of them there are...
Edit: Formated the code to fit the common way, like Neil advised.

Okay, I got help from matplotlib and thought I should share this here if someone else has sometimes similar problems. The problem was with meshgrid. Since the latitudes and longitudes in the netCDF file were already in 2D the meshgrid was unnecessary. The solution that worked for me was:
from scipy.io.netcdf import netcdf_file as Dataset
import numpy as np
import matplotlib.pyplot as plt
FB = Dataset('./datasets/fb-0217-0320.nc', 'r')
f = FB.variables['f'][:,:]
lat = FB.variables['lat'][:,:]
lon = FB.variables['lon'][:,:]
masked_fb = np.ma.masked_where(np.isnan(f), f)
m = Basemap(projection='spstere',boundinglat=-50, lon_0=180., resolution='l')
m.bluemarble()
plt.figure()
m.pcolormesh(lon, lat, masked_fb, latlon=True)
plt.show()

First, it's common practice to read in the netcdf module as
from scipy.io.netcdf import netcdf_file as Dataset
You can then read in the file and access variables as
FB = Dataset('./datasets/fb-0217-0320.nc', 'r')
f = FB.variables['f'][:,:]
lat = FB.variables['lat'][:,:]
lon = FB.variables['lon'][:,:]
Are you sure that lat[:,0] and lon[0,:] is reading in the grid coordinates correctly? ncdump indicates they are 2D variables and I suspect that the issue is creating a meshgrid from lat[:,0] and lon[0,:].

Related

Python's Basemap doesn't align with correct coordinates?

I have a .dat file containing a list of coordinates (~100k) and a temperature at each coordinate. It has a structure like this:
-59.083 -26.583 0.2
-58.417 -26.250 0.6
-58.412 -26.417 0.4
...
To visually display the temperature ranges, I created a numpy array and plotted the datasets using the Basemap module for Python. The code I wrote is the following:
from matplotlib import pyplot as plt
from mpl_toolkits.basemap import Basemap
import numpy as np
m = Basemap(projection='mill',llcrnrlat=-90,urcrnrlat=90,\
llcrnrlon=-180,urcrnrlon=180,resolution='c')
m.drawcoastlines(linewidth=0.15)
data = np.loadtxt('gridly.dat')
xcoordlist = []
ycoordlist = []
tempvallist = []
for i in data:
xcoord = i[0]
ycoord = i[1]
tempval = i[2]
xcoord2 = xcoord*111139 #<--- Multiplying converts each coordinate's degrees to meters)
ycoord2 = ycoord*111139
xcoordlist.append(xcoord2)
ycoordlist.append(ycoord2)
tempvallist.append(tempval)
xco = np.array(xcoordlist)
yco = np.array(ycoordlist)
tval = np.array(tempvallist)
gridsize = 100
m.hexbin(yco, xco, C=tval, gridsize=gridsize)
cb = m.colorbar()
plt.show()
When I plot the data, I'm getting almost exactly what I want, however, the hexagonal heatmap is offset for some reason, giving me the following chart:
I've been searching online for what might be wrong but unfortunately couldn't find answers or troubleshoot. Does anyone know how I can fix this issue?
After hours of digging around, I finally figured it out! What was wrong with my code was that I was trying to manually convert the geographic coordinates into point coordinates for the displaying chart (by multiplying by 111139).
While the logic for doing this makes sense, I believe this process broke down when I began to plot the data onto different kinds of charts (i.e. orthogonal, miller projection etc.) because the different projections/charts will have different point coordinates (kind of like how the pixel locations on your computer screen may not align with the pixel locations on a different computer screen).
Instead, the Basemap module has a built-in function that will convert real-world coordinates into coordinates that can be plotted on the chart, for you: m(x, y).
So, the improved and correct script would be:
from matplotlib import pyplot as plt
from mpl_toolkits.basemap import Basemap
import numpy as np
m = Basemap(projection='mill',llcrnrlat=-90,urcrnrlat=90,\
llcrnrlon=-180,urcrnrlon=180,resolution='c')
m.drawcoastlines(linewidth=0.15)
data = np.loadtxt('gridly.dat')
xcoordlist = []
ycoordlist = []
tempvallist = []
for i in data:
lat = i[0]
lon = i[1]
tempval = i[2]
xpt, ypt = m(lon, lat)
xcoordlist.append(xpt)
ycoordlist.append(ypt)
tempvallist.append(tempval)
xco = np.array(xcoordlist)
yco = np.array(ycoordlist)
tval = np.array(tempvallist)
gridsize = 100
m.hexbin(xco, yco, C=tval, gridsize=gridsize)
cb = m.colorbar()
plt.show()
As you can see where it says xpt, ypt = m(lon, lat), the function converts the real world longitudes (lon) and latitudes (lat) from the .dat file into pottable points. Hope this helps anyone else that may have this problem in the future!

Plotting netCDF data with Python: How to change grid?

I'm an new one in python and plotting data with Matplotlib. I really need help and thank you in advance for the answers.
So, I have a netCDF file with v-component of wind data. Grid coordinates: points=9600 (240x40)
lon : 0 to 358.5 by 1.5 degrees_east circular
lat : 88.5 to 30 by -1.5 degrees_north
My code is:
import numpy as np
import matplotlib
matplotlib.use('Agg')
from netCDF4 import Dataset
from matplotlib.mlab import griddata
from matplotlib import pyplot as plt
from mpl_toolkits.basemap import Basemap
#read data from NETcdf file ".nc"
my_file = '/home/Era-Interim/NH-EraInt-1979.nc'
fh = Dataset(my_file, mode='r')
lons = fh.variables['lon'][:]
lats = fh.variables['lat'][:]
V = fh.variables['V'][:]
V_units = fh.variables['V'].units
fh.close()
# create figure
fig = plt.figure(figsize=(20,20))
# create a map
m = Basemap(projection='nplaea',boundinglat=30,lon_0=10,resolution='l',round=True)
#draw parallels, meridians, coastlines, countries, mapboundary
m.drawcoastlines(linewidth=0.5)
m.drawcountries(linewidth=0.5)
#m.drawmapboundary(linewidth=2)
m.drawparallels(np.arange(30,90,20), labels=[1,1,0,0]) #paral in 10 degree, right, left
m.drawmeridians(np.arange(0,360,30), labels=[1,1,1,1]) #merid in 10 degree, bottom
#Plot the data on top of the map
lon,lat = np.meshgrid(lons,lats)
x,y = m(lon,lat)
cs = m.pcolor(x,y,np.squeeze(V),cmap=plt.cm.RdBu_r)
plt.title("", fontsize=25, verticalalignment='baseline')
plt.savefig("/home/Era-Interim/1.png")
As a result, I received a map (you can find in my dropbox folder) https://www.dropbox.com/sh/nvy8wcodk9jtat0/AAC-omkPP8_7uINSSXbzImeja?dl=0
On the map, there are white pixels between 358.5 and 0 (360) lon, because I have no data between 358.5 and 0 (360) lon.
The question is: how can I change the size of the grid, regrid it, interpolate data, or something else in order to not have this white sector?
I have found a solution. At the beginning of the script, you must add
from mpl_toolkits.basemap import Basemap, addcyclic
and further
datain, lonsin = addcyclic(np.squeeze(Q), lons)
lons, Q = m.shiftdata(lonsin, datain = np.squeeze(Q), lon_0=180.)
print lons
lon, lat = np.meshgrid(lons, lats)
x,y = m(lon, lat)
cs = m.pcolor(x,y,datain,cmap=plt.cm.RdBu_r)
The difference can be seen in the figures (I still can not post images).
https://www.dropbox.com/sh/nvy8wcodk9jtat0/AAC-omkPP8_7uINSSXbzImeja?dl=0
I think in this case some kind of interpolation techniques can be applied.
Check this out. There was similar problem.
Hope it is useful.
The simple answer is 360 degrees is 0 degrees, so you can copy the 0 degrees data and it should look right. I may be interpreting this wrong though, as I believe that the data is representing the pressure levels at each of the points, not between the two points (i.e. at zero degrees, not between zero degrees and 1.5 degrees).
My interpretation means that, yes, you don't have data between 358.5 and 0, but you also don't have data between 357 and 358.5. This seems more likely than just skipping an area. This would mean that the data from 358.5 should be touching the data from 0 as it is just as far away as 0 is from 1.5 which is touching.
Copying the last bit would grant you the ability to change your m.pcolor call to an imshow call (as in Roman Dryndik's link) and use interpolation to smooth out the graph.

Use python to extract and plot data from netCDF

I am new to using python for scientific data so apologies in advance if anything is unclear. I have a netCDF4 file with multiple variables including latitude, longitude and density. I am trying to plot the variable density on a matplotlib basemap using only density values from coordinates between 35-40 N and 100-110 W.
import numpy as np
import netCDF4 as nc
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
in: f = nc.Dataset('C:\\Users\\mdc\\data\\density.nc', 'r')
in: f.variables['latitude'].shape
out:(120000,)
(the variables longitude and density have the same shape)
I am stuck trying to find a way to extract only the latitude and longitude coordinate pairs (and their associated density values) that fit the criteria of [35 < lat < 40 & -110 < lon < -100]. Any advice on how to do this would be appreciated.
I have tried extracting each of the relevant variables and compiling them into a 2d-array but I have not figured out how to select only the data I need.
lats = f.variables['latitude'][:]
lons = f.variables['longitude'][:]
dens = f.variables['density'][:]
combined = np.vstack((lats,lons,dens))
in: combined
out: array([[ -4.14770737e+01, -3.89834557e+01, -3.86000137e+01, ...,
4.34283943e+01, 4.37634315e+01, 4.40338402e+01],
[ 1.75510895e+02, 1.74857147e+02, 1.74742798e+02, ...,
7.83558655e+01, 7.81687775e+01, 7.80410919e+01],
[ 7.79418945e-02, 7.38342285e-01, 9.94934082e-01, ...,
5.60119629e-01, -1.60522461e-02, 5.52429199e-01]], dtype=float32)
As for plotting I am trying to plot the coordinate pairs by different colors, rather than sizes, according to their density value.
m = Basemap(projection='robin', resolution='i', lat_0 = 37, lon_0 = -105)
m.drawcoastlines()
for lats,lons,dens in zip(lats,lons,dens):
x,y = m(lats,lons)
size = dens*3
m.plot(x,y, 'r', markersize=size)
plt.show()
The data selection, using pandas (can't install netCDF here, sorry, and pandas is satisfactory):
import pandas as pd
tinyd = pd.DataFrame(np.array(
[[ -4.14770737e+01, -3.89834557e+01, -3.86000137e+01,
4.34283943e+01, 4.37634315e+01, 4.40338402e+01],
[ 1.75510895e+02, 1.74857147e+02, 1.74742798e+02,
7.83558655e+01, 7.81687775e+01, 7.80410919e+01],
[ 7.79418945e-02, 7.38342285e-01, 9.94934082e-01,
5.60119629e-01, -1.60522461e-02, 5.52429199e-01]]).T,
columns=['lat','lon','den'])
mask = (tinyd.lat > -39) & (tinyd.lat < 44) & \
(tinyd.lon > 80) & (tinyd.lon < 175)
toplot = tinyd[mask]
print(toplot)
lat lon den
1 -38.983456 174.857147 0.738342
2 -38.600014 174.742798 0.994934
plt.scatter(toplot.lat, toplot.lon, s=90, c=toplot.den)
plt.colorbar()
plotting on top of Basemap is the same, and you can specify a different colormap, etc.

creating vector using netcdf into array

I'm fairly new to python and have found stack overflow one of the best resources out there, now I'm hoping someone can help me with what I believe is a fairly basic question.
I'm looking to create a land mask from a list of lats and lons and rainfall data extracted from a netCDF file. I need to get the data from the netcdf file to line up so I can remove rows which have a rainfall value of '-9999.' (indicating no data because its over the ocean). I can access the file, I can create a mesh grid, but when it comes to inserting the rainfall data for the final check I'm getting odd shapes and no luck with the logical test. Can someone have a look at this code and let me know what you think?
from netCDF4 import Dataset
import numpy as np
f=Dataset('/Testing/Ensemble_grid/1970_2012_eMAST_ANUClimate_mon_evap_v1m0_197001.nc')
lat = f.variables['latitude'][:]
lon = f.variables['longitude'][:]
rainfall = np.array(f.variables['lwe_thickness_of_precipitation_amount'])
lons, lats = np.meshgrid(lon,lat)
full_ary = np.array((lats,lons))
full_lats_lons = np.swapaxes(full_ary,0,2)
rain_data = np.squeeze(rainfall,axis=(0,))
grid = np.array((full_lats_lons,rain_data))
full_grid = np.expand_dims(grid,axis=1)
full_grid_col = np.swapaxes(full_grid,0,1)
land_grid = np.logical_not(full_grid_col[:,1]==-9999.)
Here is an alternative method that simply creates a new 2D variable, landmask, where each grid cell is either 0 (ocean) or 1 (land). (I like to use 1 and 0 landmasks because you can transform it into a boolean numpy array and do quick land-averages this way.)
import netCDF4
import numpy as np
ncfile = netCDF4.('/path/to/your/ncfile.nc', 'r')
lat = ncfile.variables['lat'][:]
lon = ncfile.variables['lon'][:]
# Presuming here that rainfall is 2D, if not, just read in the first time step, i.e. [0,:,:]
rain = ncfile.variables['lwe_thickness_of_precipitation_amount'][:,:]
ncfile.close()
nlat, nlon = len(lat), len(lon)
# Populate a 2D landmask array, where 1=land and 0=ocean
landmask = np.zeros([nlat, nlon], dtype='int')
for y in range(nlat):
for x in range(nlon):
if rain[y,x]!=-9999: # We're at a land point
landmask[y,x] = 1
# Now you can write out the landmask into a new netCDF file
filename_out = './landmask.nc'
ncfile_out = netCDF4.Dataset(filename_out, 'w')
ncfile_out.createDimension('lat', nlat)
ncfile_out.createDimension('lon', nlon)
lat_out = ncfile_out.createVariable('lat', 'f4', ('lat',))
lon_out = ncfile_out.createVariable('lon', 'f4', ('lon',))
landmask_out = ncfile_out.createVariable('landmask', 'i', ('lat', 'lon',))
setattr(lat_out, 'units', 'degrees_north')
setattr(lon_out, 'units', 'degrees_east')
setattr(landmask_out, 'description', '1=land 0=ocean')
lat_out[:] = lat
lon_out[:] = lon
landmask_out[:,:] = landmask[:,:]
ncfile_out.close()
Ian, you need to put a repeatable example up here...
I suspect what you need is something like this;
x = np.array([[1, 2, 3], [4, 5, 6]], np.int32)
x.flat

How to create a grid from LiDAR points (X,Y,Z) with GDAL python?

I'm new really to python programming, and I was just wondering if you can create a regular grid of 0.5 by o.5 m of resolution using LiDAR points.
My data are in LAS format (reading with from liblas import file as lasfile) and they have the following format: X,Y,Z. Where X and Y are coordinates.
The points are randomly positioned and some pixel are empty (NAN value) and in some pixel there are more of one points. Where there are more of one point, I wish to obtain a mean value. In the end i need to save the data in a TIF format or Ascii format.
I am studying osgeo module and GDAL but I honest to say that i don't know if osgeo module is the best solution.
I am really glad for help with some code that i can study and implement,
Thanks in Advance for the help, I really need.
I don't know the best way to get a grid with these parameters.
It's a bit late but maybe this answer will be useful for others, if not for you...
I have done this with Numpy and Pandas, and it's pretty fast. I was using TLS data and could do this with several million data points without any trouble on a decent 2009-vintage laptop. The key is 'binning' by rounding the data, and then using Pandas' GroupBy methods to do the aggregating and calculate the means.
If you need to round to a power of 10 you can use np.round, otherwise you can round to an arbitrary value by making a function to do so, which I have done by modifying this SO answer.
import numpy as np
import pandas as pd
# make rounding function:
def round_to_val(a, round_val):
return np.round( np.array(a, dtype=float) / round_val) * round_val
# load data
data = np.load( 'shape of ndata, 3')
n_d = data.shape[0]
# round the data
d_round = np.empty( [n_d, 5] )
d_round[:,0] = data[:,0]
d_round[:,1] = data[:,1]
d_round[:,2] = data[:,2]
del data # free up some RAM
d_round[:,3] = round_to_val( d_round[:,0], 0.5)
d_round[:,4] = round_to_val( d_round[:,1], 0.5)
# sorting data
ind = np.lexsort( (d_round[:,4], d_round[:,3]) )
d_sort = d_round[ind]
# making dataframes and grouping stuff
df_cols = ['x', 'y', 'z', 'x_round', 'y_round']
df = pd.DataFrame( d_sort)
df.columns = df_cols
df_round = df[['x_round', 'y_round', 'z']]
group_xy = df_round.groupby(['x_round', 'y_round'])
# calculating the mean, write to csv, which saves the file with:
# [x_round, y_round, z_mean] columns. You can exit Python and then start up
# later to clear memory if that's an issue.
group_mean = group_xy.mean()
group_mean.to_csv('your_binned_data.csv')
# Restarting...
import numpy as np
from scipy.interpolate import griddata
binned_data = np.loadtxt('your_binned_data.csv', skiprows=1, delimiter=',')
x_bins = binned_data[:,0]
y_bins = binned_data[:,1]
z_vals = binned_data[:,2]
pts = np.array( [x_bins, y_bins])
pts = pts.T
# make grid (with borders rounded to 0.5...)
xmax, xmin = 640000.5, 637000
ymax, ymin = 6070000.5, 6067000
grid_x, grid_y = np.mgrid[640000.5:637000:0.5, 6067000.5:6070000:0.5]
# interpolate onto grid
data_grid = griddata(pts, z_vals, (grid_x, grid_y), method='cubic')
# save to ascii
np.savetxt('data_grid.txt', data_grid)
When I've done this, I have saved the output as a .npy and converted to a tiff with the Image library, and then georeferenced in ArcMap. There is probably a way to do that with osgeo but I haven't used it.
Hope this helps someone at least...
You can use the histogram function in Numpy to do binning, for instance:
import numpy as np
points = np.random.random(1000)
#create 10 bins from 0 to 1
bins = np.linspace(0, 1, 10)
means = (numpy.histogram(points, bins, weights=data)[0] /
numpy.histogram(points, bins)[0])
Try LAStools, particularly lasgrid or las2dem.

Categories

Resources