Use python to extract and plot data from netCDF - python

I am new to using python for scientific data so apologies in advance if anything is unclear. I have a netCDF4 file with multiple variables including latitude, longitude and density. I am trying to plot the variable density on a matplotlib basemap using only density values from coordinates between 35-40 N and 100-110 W.
import numpy as np
import netCDF4 as nc
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
in: f = nc.Dataset('C:\\Users\\mdc\\data\\density.nc', 'r')
in: f.variables['latitude'].shape
out:(120000,)
(the variables longitude and density have the same shape)
I am stuck trying to find a way to extract only the latitude and longitude coordinate pairs (and their associated density values) that fit the criteria of [35 < lat < 40 & -110 < lon < -100]. Any advice on how to do this would be appreciated.
I have tried extracting each of the relevant variables and compiling them into a 2d-array but I have not figured out how to select only the data I need.
lats = f.variables['latitude'][:]
lons = f.variables['longitude'][:]
dens = f.variables['density'][:]
combined = np.vstack((lats,lons,dens))
in: combined
out: array([[ -4.14770737e+01, -3.89834557e+01, -3.86000137e+01, ...,
4.34283943e+01, 4.37634315e+01, 4.40338402e+01],
[ 1.75510895e+02, 1.74857147e+02, 1.74742798e+02, ...,
7.83558655e+01, 7.81687775e+01, 7.80410919e+01],
[ 7.79418945e-02, 7.38342285e-01, 9.94934082e-01, ...,
5.60119629e-01, -1.60522461e-02, 5.52429199e-01]], dtype=float32)
As for plotting I am trying to plot the coordinate pairs by different colors, rather than sizes, according to their density value.
m = Basemap(projection='robin', resolution='i', lat_0 = 37, lon_0 = -105)
m.drawcoastlines()
for lats,lons,dens in zip(lats,lons,dens):
x,y = m(lats,lons)
size = dens*3
m.plot(x,y, 'r', markersize=size)
plt.show()

The data selection, using pandas (can't install netCDF here, sorry, and pandas is satisfactory):
import pandas as pd
tinyd = pd.DataFrame(np.array(
[[ -4.14770737e+01, -3.89834557e+01, -3.86000137e+01,
4.34283943e+01, 4.37634315e+01, 4.40338402e+01],
[ 1.75510895e+02, 1.74857147e+02, 1.74742798e+02,
7.83558655e+01, 7.81687775e+01, 7.80410919e+01],
[ 7.79418945e-02, 7.38342285e-01, 9.94934082e-01,
5.60119629e-01, -1.60522461e-02, 5.52429199e-01]]).T,
columns=['lat','lon','den'])
mask = (tinyd.lat > -39) & (tinyd.lat < 44) & \
(tinyd.lon > 80) & (tinyd.lon < 175)
toplot = tinyd[mask]
print(toplot)
lat lon den
1 -38.983456 174.857147 0.738342
2 -38.600014 174.742798 0.994934
plt.scatter(toplot.lat, toplot.lon, s=90, c=toplot.den)
plt.colorbar()
plotting on top of Basemap is the same, and you can specify a different colormap, etc.

Related

Python's Basemap doesn't align with correct coordinates?

I have a .dat file containing a list of coordinates (~100k) and a temperature at each coordinate. It has a structure like this:
-59.083 -26.583 0.2
-58.417 -26.250 0.6
-58.412 -26.417 0.4
...
To visually display the temperature ranges, I created a numpy array and plotted the datasets using the Basemap module for Python. The code I wrote is the following:
from matplotlib import pyplot as plt
from mpl_toolkits.basemap import Basemap
import numpy as np
m = Basemap(projection='mill',llcrnrlat=-90,urcrnrlat=90,\
llcrnrlon=-180,urcrnrlon=180,resolution='c')
m.drawcoastlines(linewidth=0.15)
data = np.loadtxt('gridly.dat')
xcoordlist = []
ycoordlist = []
tempvallist = []
for i in data:
xcoord = i[0]
ycoord = i[1]
tempval = i[2]
xcoord2 = xcoord*111139 #<--- Multiplying converts each coordinate's degrees to meters)
ycoord2 = ycoord*111139
xcoordlist.append(xcoord2)
ycoordlist.append(ycoord2)
tempvallist.append(tempval)
xco = np.array(xcoordlist)
yco = np.array(ycoordlist)
tval = np.array(tempvallist)
gridsize = 100
m.hexbin(yco, xco, C=tval, gridsize=gridsize)
cb = m.colorbar()
plt.show()
When I plot the data, I'm getting almost exactly what I want, however, the hexagonal heatmap is offset for some reason, giving me the following chart:
I've been searching online for what might be wrong but unfortunately couldn't find answers or troubleshoot. Does anyone know how I can fix this issue?
After hours of digging around, I finally figured it out! What was wrong with my code was that I was trying to manually convert the geographic coordinates into point coordinates for the displaying chart (by multiplying by 111139).
While the logic for doing this makes sense, I believe this process broke down when I began to plot the data onto different kinds of charts (i.e. orthogonal, miller projection etc.) because the different projections/charts will have different point coordinates (kind of like how the pixel locations on your computer screen may not align with the pixel locations on a different computer screen).
Instead, the Basemap module has a built-in function that will convert real-world coordinates into coordinates that can be plotted on the chart, for you: m(x, y).
So, the improved and correct script would be:
from matplotlib import pyplot as plt
from mpl_toolkits.basemap import Basemap
import numpy as np
m = Basemap(projection='mill',llcrnrlat=-90,urcrnrlat=90,\
llcrnrlon=-180,urcrnrlon=180,resolution='c')
m.drawcoastlines(linewidth=0.15)
data = np.loadtxt('gridly.dat')
xcoordlist = []
ycoordlist = []
tempvallist = []
for i in data:
lat = i[0]
lon = i[1]
tempval = i[2]
xpt, ypt = m(lon, lat)
xcoordlist.append(xpt)
ycoordlist.append(ypt)
tempvallist.append(tempval)
xco = np.array(xcoordlist)
yco = np.array(ycoordlist)
tval = np.array(tempvallist)
gridsize = 100
m.hexbin(xco, yco, C=tval, gridsize=gridsize)
cb = m.colorbar()
plt.show()
As you can see where it says xpt, ypt = m(lon, lat), the function converts the real world longitudes (lon) and latitudes (lat) from the .dat file into pottable points. Hope this helps anyone else that may have this problem in the future!

interpolate.griddata shifts data northwards, is it a bug?

I observe unexpected results from scipy.interpolate.griddata. I am trying to visualize a set of irregularly spaced points using matplotlib.basemap and scipy.interpolate.griddata.
The data is given as three lists: latitudes, longitudes and values. To get them on the map I interpolate the data onto a regular grid and visualize it using Basemap's imshow function.
I observe that the interpolated data is shifted northwards from true positions.
Here is an example. Here I want to highlight a cell formed by two meridians and two parallels. I expect to get something like this:
However what I get is something like this:
You can see that the red rectangle is visibly shifted northwards.
I have tried to vary the grid resolution and the number of points, however this does not seem to have any effect on this observed shift.
Here is an IPython notebook that illustrates the issue.
Also below is the complete code:
import numpy as np
from numpy import random
from scipy import interpolate
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
# defining the region of interest
r = {'lon':[83.0, 95.5], 'lat':[48.5,55.5]}
# initializing Basemap
m = Basemap(projection='merc',
llcrnrlon=r['lon'][0],
llcrnrlat=r['lat'][0],
urcrnrlon=r['lon'][1],
urcrnrlat=r['lat'][1],
lon_0=r['lon'][0],
ellps='WGS84',
fix_aspect=True,
resolution='h')
# defining the highlighted block
block = {'lon':[89,91],'lat':[50.5,52.5]}
# generating the data
npixels = 100000
lat_range = r['lat'][1] - r['lat'][0]
lats = lat_range * random.random(npixels) + r['lat'][0]
lon_range = r['lon'][1] - r['lon'][0]
lons = lon_range * random.random(npixels) + r['lon'][0]
values = np.zeros(npixels)
for p in range(npixels):
if block['lat'][0] < lats[p] < block['lat'][1] \
and block['lon'][0] < lons[p] < block['lon'][1]:
values[p] = 1.0
# plotting the original data without interpolation
plt.figure(figsize=(5, 5))
m.drawparallels(np.arange(r['lat'][0], r['lat'][1] + 0.25, 2.0),
labels=[True,False,True,False])
m.drawmeridians(np.arange(r['lon'][0], r['lon'][1] + 0.25, 2.0),
labels=[True,True,False,True])
m.scatter(lons,lats,c=values,latlon=True,edgecolors='none')
# interpolating on the regular grid
nx = ny = 500
mapx = np.linspace(r['lon'][0],r['lon'][1],nx)
mapy = np.linspace(r['lat'][0],r['lat'][1],ny)
mapgridx,mapgridy = np.meshgrid(mapx,mapy)
mapdata = interpolate.griddata(list(zip(lons,lats)),values,
(mapgridx,mapgridy),method='nearest')
# plotting the interpolated data
plt.figure(figsize=(5, 5))
m.drawparallels(np.arange(r['lat'][0], r['lat'][1] + 0.25, 2.0),
labels=[True,False,True,False])
m.drawmeridians(np.arange(r['lon'][0], r['lon'][1] + 0.25, 2.0),
labels=[True,True,False,True])
m.imshow(mapdata)
I am seeing this with SciPy 0.17.0
Pauli Virtanen on SciPy bugtracker answered the question.
The issue goes away if one replaces basemap.imshow() with matplotlib.pyplot.pcolormesh()
Replacing above
m.imshow(mapdata)
with
meshx,meshy = m(mapx,mapy)
plt.pcolormesh(meshx,meshy,mapdata)
produces correctly aligned image.
It is not clear what I am doing wrong with basemap.imshow, but that is probably another question.

Plotting netCDF data with Python: How to change grid?

I'm an new one in python and plotting data with Matplotlib. I really need help and thank you in advance for the answers.
So, I have a netCDF file with v-component of wind data. Grid coordinates: points=9600 (240x40)
lon : 0 to 358.5 by 1.5 degrees_east circular
lat : 88.5 to 30 by -1.5 degrees_north
My code is:
import numpy as np
import matplotlib
matplotlib.use('Agg')
from netCDF4 import Dataset
from matplotlib.mlab import griddata
from matplotlib import pyplot as plt
from mpl_toolkits.basemap import Basemap
#read data from NETcdf file ".nc"
my_file = '/home/Era-Interim/NH-EraInt-1979.nc'
fh = Dataset(my_file, mode='r')
lons = fh.variables['lon'][:]
lats = fh.variables['lat'][:]
V = fh.variables['V'][:]
V_units = fh.variables['V'].units
fh.close()
# create figure
fig = plt.figure(figsize=(20,20))
# create a map
m = Basemap(projection='nplaea',boundinglat=30,lon_0=10,resolution='l',round=True)
#draw parallels, meridians, coastlines, countries, mapboundary
m.drawcoastlines(linewidth=0.5)
m.drawcountries(linewidth=0.5)
#m.drawmapboundary(linewidth=2)
m.drawparallels(np.arange(30,90,20), labels=[1,1,0,0]) #paral in 10 degree, right, left
m.drawmeridians(np.arange(0,360,30), labels=[1,1,1,1]) #merid in 10 degree, bottom
#Plot the data on top of the map
lon,lat = np.meshgrid(lons,lats)
x,y = m(lon,lat)
cs = m.pcolor(x,y,np.squeeze(V),cmap=plt.cm.RdBu_r)
plt.title("", fontsize=25, verticalalignment='baseline')
plt.savefig("/home/Era-Interim/1.png")
As a result, I received a map (you can find in my dropbox folder) https://www.dropbox.com/sh/nvy8wcodk9jtat0/AAC-omkPP8_7uINSSXbzImeja?dl=0
On the map, there are white pixels between 358.5 and 0 (360) lon, because I have no data between 358.5 and 0 (360) lon.
The question is: how can I change the size of the grid, regrid it, interpolate data, or something else in order to not have this white sector?
I have found a solution. At the beginning of the script, you must add
from mpl_toolkits.basemap import Basemap, addcyclic
and further
datain, lonsin = addcyclic(np.squeeze(Q), lons)
lons, Q = m.shiftdata(lonsin, datain = np.squeeze(Q), lon_0=180.)
print lons
lon, lat = np.meshgrid(lons, lats)
x,y = m(lon, lat)
cs = m.pcolor(x,y,datain,cmap=plt.cm.RdBu_r)
The difference can be seen in the figures (I still can not post images).
https://www.dropbox.com/sh/nvy8wcodk9jtat0/AAC-omkPP8_7uINSSXbzImeja?dl=0
I think in this case some kind of interpolation techniques can be applied.
Check this out. There was similar problem.
Hope it is useful.
The simple answer is 360 degrees is 0 degrees, so you can copy the 0 degrees data and it should look right. I may be interpreting this wrong though, as I believe that the data is representing the pressure levels at each of the points, not between the two points (i.e. at zero degrees, not between zero degrees and 1.5 degrees).
My interpretation means that, yes, you don't have data between 358.5 and 0, but you also don't have data between 357 and 358.5. This seems more likely than just skipping an area. This would mean that the data from 358.5 should be touching the data from 0 as it is just as far away as 0 is from 1.5 which is touching.
Copying the last bit would grant you the ability to change your m.pcolor call to an imshow call (as in Roman Dryndik's link) and use interpolation to smooth out the graph.

Weird plot for netCDF data with matplotlib

I've been searching for a pathetically long time for this, so I would appreciate any help or hint I can get.
I'm trying to plot some sea ice freeboard data (netCDF, Gridded total freeboard) on the Antarctic sea, but the data that should plot nicely around Antarctica lies at the bottom of my image. NetCDF and matplotlib are fairly new to me so maybe the error could be e.g. with handling the dimensions or the projection.
from scipy.io.netcdf import netcdf_file as Dataset
import numpy as np
import matplotlib.pyplot as plt
FB = Dataset('./datasets/fb-0217-0320.nc', 'r')
f = FB.variables['f'][:,:]
lat = FB.variables['lat'][:,0]
lon = FB.variables['lon'][0,:]
masked_fb = np.ma.masked_where(np.isnan(f), f)
mtx_lon, mtx_lat = np.meshgrid(lon, lat)
m = Basemap(projection='spstere',boundinglat=-50, lon_0=180., resolution='l')
m.bluemarble()
plt.figure()
m.pcolormesh(mtx_lon, mtx_lat, masked_fb, latlon=True)
plt.show()
ncdump gives:
dimensions:
x = 79 ;
y = 83 ;
variables:
float lat(y, x) ;
lat:standard_name = "latitude" ;
lat:long_name = "latitude coordinate" ;
lat:units = "degrees_north" ;
float lon(y, x) ;
lon:standard_name = "longitude" ;
lon:long_name = "longitude coordinate" ;
lon:units = "degrees_east" ;
float f(y, x) ;
f:long_name = "total_freeboard" ;
f:units = "mm" ;
f:coordinates = "lat lon" ;
One weird thing I noticed is that min lat is -5156.6201 but I didn't know how to count how many of them there are...
Edit: Formated the code to fit the common way, like Neil advised.
Okay, I got help from matplotlib and thought I should share this here if someone else has sometimes similar problems. The problem was with meshgrid. Since the latitudes and longitudes in the netCDF file were already in 2D the meshgrid was unnecessary. The solution that worked for me was:
from scipy.io.netcdf import netcdf_file as Dataset
import numpy as np
import matplotlib.pyplot as plt
FB = Dataset('./datasets/fb-0217-0320.nc', 'r')
f = FB.variables['f'][:,:]
lat = FB.variables['lat'][:,:]
lon = FB.variables['lon'][:,:]
masked_fb = np.ma.masked_where(np.isnan(f), f)
m = Basemap(projection='spstere',boundinglat=-50, lon_0=180., resolution='l')
m.bluemarble()
plt.figure()
m.pcolormesh(lon, lat, masked_fb, latlon=True)
plt.show()
First, it's common practice to read in the netcdf module as
from scipy.io.netcdf import netcdf_file as Dataset
You can then read in the file and access variables as
FB = Dataset('./datasets/fb-0217-0320.nc', 'r')
f = FB.variables['f'][:,:]
lat = FB.variables['lat'][:,:]
lon = FB.variables['lon'][:,:]
Are you sure that lat[:,0] and lon[0,:] is reading in the grid coordinates correctly? ncdump indicates they are 2D variables and I suspect that the issue is creating a meshgrid from lat[:,0] and lon[0,:].

How to separate two regions of latlon data on python

I am currently working with BUFR files with wind data. When I read this file on python I get 4 large vectors, latitude vector, longitude vector, wind_direction vector, and wind_speed vector.
Both wind vectors are masked python arrays because there is non-valid data. This happens because the data comes from a non-geostationary satellite. In fact I successfully generated the following image from this BUFR file to show you the general shape that the data takes.
In this image I have plotted a color field to represent the wind speed, while the arrows obviously represent the wind direction.
Please notice the two bands of actual data. Unfortunately the way I am plotting the data, generates a third band (where the color field is smooth), in-between the actual data bands. This is an artefact of the function pcolormesh. If I could superimpose two `pcolormesh plots, each one representing one of the bands, this problem would disappear.
Unfortunately, I do not know how I could separate the data "regions". I have thought about clustering techniques but do not know how to cluster along latlon data using ANOTHER array (the wind data) as the clustering rule.
This is my current code:
#!/usr/bin/python
import bufr
import numpy as np
import sys
import matplotlib
matplotlib.use('Agg')
from matplotlib import pyplot as plt
from matplotlib import mlab
WIND_DIR_INDEX = 97
WIND_SPEED_INDEX = 96
bfrfile = sys.argv[1]
print bfrfile
bfr = bufr.BUFRFile(bfrfile)
lon = []
lat = []
wind_d = []
wind_s = []
for record in bfr:
for entry in record:
if entry.index == WIND_DIR_INDEX:
wind_d.append(entry.data)
if entry.index == WIND_SPEED_INDEX:
wind_s.append(entry.data)
if entry.name.find("LONGITUDE") == 0:
lon.append(entry.data)
if entry.name.find("LATITUDE") == 0:
lat.append(entry.data)
lons = np.concatenate(lon)
lats = np.concatenate(lat)
winds_d = np.concatenate(wind_d)
winds_s = np.concatenate(wind_s)
winds_d = np.ma.masked_greater(winds_d,1.0e+6)
winds_s = np.ma.masked_greater(winds_s,1.0e+6)
windu = np.cos((winds_d-180)*(np.pi/180))
windv = np.sin((winds_d-180)*(np.pi/180))
# Data interpolation for pcolormesh (needs gridded data)
xi = np.linspace(lons.min(),lons.max(),lons.size/10)
yi = np.linspace(lats.min(),lats.max(),lats.size/10)
Z = mlab.griddata(lons,lats,winds_s,xi,yi)
X,Y = np.meshgrid(xi,yi)
mydpi = 96
fig = plt.figure(frameon=True)
fig.set_size_inches(1600/mydpi,1200/mydpi)
ax = plt.Axes(fig,[0,0,1,1])
#ax.set_axis_off()
fig.add_axes(ax)
plt.hold(True);
plt.quiver(lons[::5],lats[::5],windu[::5],windv[::5],linewidths=0)
for method in (ax.set_xticks,ax.set_xticklabels,ax.set_yticks,ax.set_yticklabels):
method([])
fig.savefig('/home/cendas/bin/python/bufr_ascat.png',bbox_inches=0,dpi=5*mydpi)
mydpi = 96
fig = plt.figure(frameon=True)
fig.set_size_inches(1600/mydpi,1200/mydpi)
ax = plt.Axes(fig,[0,0,1,1])
#ax.set_axis_off()
fig.add_axes(ax)
plt.hold(True);
try:
plt.pcolormesh(X,Y,Z,alpha=None)
plt.clim(0,10)
except ValueError:
pass
print "Warning: Empty data array."
for method in (ax.set_xticks,ax.set_xticklabels,ax.set_yticks,ax.set_yticklabels):
method([])
fig.savefig('/home/cendas/bin/python/bufr_ascat_color.png',bbox_inches=0,dpi=5*mydpi)
I then usually follow this python code with the following terminal commands to combine the images:
convert bufr_ascat.png -transparent white bufr_ascat.png
convert bufr_ascat_color.png -transparent white bufr_ascat_color.png
composite bufr_ascat.png bufr_ascat_color.png bufrascat.png
Don't abuse clustering for this.
What you need is a simple selection / filtering; not a structure discovery process.
Choose the mean of the masked data. All non-masked data left of that mean is the left part, all non-masked data on the right is the other?
Clustering is the wrong tool for this task.

Categories

Resources