How to represent scalar variables over geographic map in Jupyter Notebook

How to represent scalar variables over geographic map in Jupyter Notebook - python

I am representing some geographical data in a Jupyter notebook: temperature, ocean wave height, etc. I have numpy arrays that have the latitude, longitude, and value for those variables. I would like to display these variables over a geographical map, preferably using ipyleaflet (because that is what I am already using). I am trying to get a result similar to a heatmap.
I tried to use the ipyleaflet Heatmap, but it seems to me that it is designed to represent agregation of points and not scalar uniform arrays, because I can't get it to show the results properly. I think ipyleaflet may lack a function to represent this kind of data, but seems odd since it has a very nice Velocity funtion to represent vectorial variables.
The only way I can think of to make this would be to generate an image with matplotlib and then adding it to the map in an image layer, but I feel like that is not the proper way to do it.

For representing a heatmap I would recommend to use Cartopy in combination with Matplotlib.
Here a ready to use script I made for a world projection with a coastline:
import numpy as np
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import cartopy.feature as cfeature
from cartopy.util import add_cyclic_point
# Set x y z variables
x = longitude_data
y = latitude_data
z = heat_map_data
# Set up figure and projection
z, x = add_cyclic_point(z, coord=x)
fig = plt.figure()
ax = fig.add_subplot(1,1,1, projection=ccrs.PlateCarree() )
# Set data range and colourmap
levels = np.arange(min,max,steps)
plt.contourf(x, y, z,levels = levels,transform=ccrs.PlateCarree(),cmap="rainbow")
# Set axes, extent (world) and labels
ax.set_xticks(np.linspace(-180,180,num=7), crs=ccrs.PlateCarree())
ax.set_yticks(np.linspace(-60,60,num=5), crs=ccrs.PlateCarree())
ax.add_feature(cfeature.COASTLINE) #Add coastline
ax.set_global()
ax.set_title('Heatmap')
ax.set_xlabel('Longitude')
ax.set_ylabel('Latitude')
# Add colorbar
plt.colorbar(ax=ax,shrink=0.7,orientation="vertical")
fig.show()
With the Cartopy and Matplotlib documentation you should now be able to create some maps.

As you say, you can generate an image and overlay it onto the map. This is the suggestion I was given when I asked about this on Github.
There's an example notebook here.
Not quite as easy as matplotlib, but you get all the nice interactivity of ipyleaflet!
Here's the current version of the notebook in case the link changes (converted to markdown via jupyter-nbconvert --to markdown Numpy.ipynb)
From NumPy to Leaflet
This notebook shows how to display some raster geographic data in IPyLeaflet. The data is a NumPy array, which means that you have all the power of the Python scientific stack at your disposal to process it.
The following libraries are needed:
* requests
* tqdm
* rasterio
* numpy
* scipy
* pillow
* matplotlib
* ipyleaflet
The recommended way is to try to conda install them first, and if they are not found then pip install.
import requests
import os
from tqdm import tqdm
import zipfile
import rasterio
from affine import Affine
import numpy as np
import scipy.ndimage
from rasterio.warp import reproject, Resampling
import PIL
import matplotlib.pyplot as plt
from base64 import b64encode
try:
from StringIO import StringIO
py3 = False
except ImportError:
from io import StringIO, BytesIO
py3 = True
from ipyleaflet import Map, ImageOverlay, basemap_to_tiles, basemaps
Download a raster file representing the flow accumulation for South America. This gives an idea of the river network.
url = 'https://edcintl.cr.usgs.gov/downloads/sciweb1/shared/hydrosheds/sa_30s_zip_grid/sa_acc_30s_grid.zip'
filename = os.path.basename(url)
name = filename[:filename.find('_grid')]
adffile = name + '/' + name + '/w001001.adf'
if not os.path.exists(adffile):
r = requests.get(url, stream=True)
with open(filename, 'wb') as f:
total_length = int(r.headers.get('content-length'))
for chunk in tqdm(r.iter_content(chunk_size=1024), total=(total_length/1024) + 1):
if chunk:
f.write(chunk)
f.flush()
zip = zipfile.ZipFile(filename)
zip.extractall('.')
We transform the data a bit so that rivers appear thicker.
dataset = rasterio.open(adffile)
acc_orig = dataset.read()[0]
acc = np.where(acc_orig<0, 0, acc_orig)
shrink = 1 # if you are out of RAM try increasing this number (should be a power of 2)
radius = 5 # you can play with this number to change the width of the rivers
circle = np.zeros((2*radius+1, 2*radius+1)).astype('uint8')
y, x = np.ogrid[-radius:radius+1,-radius:radius+1]
index = x**2 + y**2 <= radius**2
circle[index] = 1
acc = np.sqrt(acc)
acc = scipy.ndimage.maximum_filter(acc, footprint=circle)
acc[acc_orig<0] = np.nan
acc = acc[::shrink, ::shrink]
The original data is in the WGS 84 projection, but Leaflet uses Web Mercator, so we need to reproject.
# At this point if GDAL complains about not being able to open EPSG support file gcs.csv, try in the terminal:
# export GDAL_DATA=`gdal-config --datadir`
with rasterio.Env():
rows, cols = acc.shape
src_transform = list(dataset.transform)
src_transform[0] *= shrink
src_transform[4] *= shrink
src_transform = Affine(*src_transform[:6])
src_crs = {'init': 'EPSG:4326'}
source = acc
dst_crs = {'init': 'EPSG:3857'}
dst_transform, width, height = rasterio.warp.calculate_default_transform(src_crs, dst_crs, cols, rows, *dataset.bounds)
dst_shape = height, width
destination = np.zeros(dst_shape)
reproject(
source,
destination,
src_transform=src_transform,
src_crs=src_crs,
dst_transform=dst_transform,
dst_crs=dst_crs,
resampling=Resampling.nearest)
acc_web = destination
Let's convert our NumPy array to an image. For that we must specify a colormap (here plt.cm.jet).
acc_norm = acc_web - np.nanmin(acc_web)
acc_norm = acc_norm / np.nanmax(acc_norm)
acc_norm = np.where(np.isfinite(acc_web), acc_norm, 0)
acc_im = PIL.Image.fromarray(np.uint8(plt.cm.jet(acc_norm)*255))
acc_mask = np.where(np.isfinite(acc_web), 255, 0)
mask = PIL.Image.fromarray(np.uint8(acc_mask), mode='L')
im = PIL.Image.new('RGBA', acc_norm.shape[::-1], color=None)
im.paste(acc_im, mask=mask)
The image is embedded in the URL as a PNG file, so that it can be sent to the browser.
if py3:
f = BytesIO()
else:
f = StringIO()
im.save(f, 'png')
data = b64encode(f.getvalue())
if py3:
data = data.decode('ascii')
imgurl = 'data:image/png;base64,' + data
Not quite as easy as matplotlib, but you get all the nice interactivity of ipyleaflet!
Finally we can overlay our image and if everything went fine it should be exactly over South America.
b = dataset.bounds
bounds = [(b.bottom, b.left), (b.top, b.right)]
io = ImageOverlay(url=imgurl, bounds=bounds)
center = [-10, -60]
zoom = 2
m = Map(center=center, zoom=zoom, interpolation='nearest')
m
tile = basemap_to_tiles(basemaps.Esri.WorldStreetMap)
m.add_layer(tile)
You can play with the opacity slider and check that rivers from our data file match the rivers on OpenStreetMap.
m.add_layer(io)
io.interact(opacity=(0.0,1.0,0.01))

Related

while using shading in pcolormesh getting error

I am facing a problem while using shading for pcolormesh in contour fill plot. As soon as I am give shading as "shading='gouraud'" I am getting this error " TypeError: Dimensions of C (73, 144) are incompatible with X (145) and/or Y (74); see help(pcolormesh)".If anyone can help me in this regard it will be much appreciated. I am also posting my code which I am using.
import os
os.environ["PROJ_LIB"] = "C:\\Utilities\\Python\\Anaconda\\Library\\share"; #fixr
import numpy as np
import xarray as xr
import proplot as plot
import matplotlib.pyplot as plt
import pandas as pd
# --- read netcdf file
dset = xr.open_dataset(r'E:\DATA_SETS\OLR_NCEP_REANALYSIS\olr.daily.1974.2020.nc')
# --- select an area and time (optional)
#dset = dset.sel(lat=slice(15, -60), lon=slice(270, 330))
plot.rc.reso = 'lo'
#--- plotting
f, ax = plot.subplots(ncols=1,figsize=[6.4, 5.0],tight=True,
proj='cyl', proj_kw={'lon_0': 0})
# format options
ax.format(land=True, landcolor='mushroom',coast=True, innerborders=False, borders=False,
labels=True,
latlim=(0, 30), lonlim=(50, 100),linewidth=1,
gridlinewidth=0,latlines=5, lonlines=10,
abc=True, abcloc='ll', abcstyle='(a)')
levels=list(np.arange(120,260,20))
map = ax.pcolormesh(dset['lon'], dset['lat'], dset['olr'][16202, :,:],shading='gouraud',
cmap='Blues_r',levels=levels,vmin=np.inf,vmax=np.inf,extend='neither')
f.colorbar(map,length=0.6,loc='b',extendrect=True)

It seems like the dimensions of your coordinates and the 2D variable do not match.
Simplest solution is to use central coordinates of x and y:
x = dset['lon'].values()
y = dset['lat'].values()
xnew = 0.5*(x[0:-1]+x[1:])
ynew = 0.5*(y[0:-1]+y[1:])
pcolormesh(xnew,ynew, ...)

Python's Basemap doesn't align with correct coordinates?

I have a .dat file containing a list of coordinates (~100k) and a temperature at each coordinate. It has a structure like this:
-59.083 -26.583 0.2
-58.417 -26.250 0.6
-58.412 -26.417 0.4
...
To visually display the temperature ranges, I created a numpy array and plotted the datasets using the Basemap module for Python. The code I wrote is the following:
from matplotlib import pyplot as plt
from mpl_toolkits.basemap import Basemap
import numpy as np
m = Basemap(projection='mill',llcrnrlat=-90,urcrnrlat=90,\
llcrnrlon=-180,urcrnrlon=180,resolution='c')
m.drawcoastlines(linewidth=0.15)
data = np.loadtxt('gridly.dat')
xcoordlist = []
ycoordlist = []
tempvallist = []
for i in data:
xcoord = i[0]
ycoord = i[1]
tempval = i[2]
xcoord2 = xcoord*111139 #<--- Multiplying converts each coordinate's degrees to meters)
ycoord2 = ycoord*111139
xcoordlist.append(xcoord2)
ycoordlist.append(ycoord2)
tempvallist.append(tempval)
xco = np.array(xcoordlist)
yco = np.array(ycoordlist)
tval = np.array(tempvallist)
gridsize = 100
m.hexbin(yco, xco, C=tval, gridsize=gridsize)
cb = m.colorbar()
plt.show()
When I plot the data, I'm getting almost exactly what I want, however, the hexagonal heatmap is offset for some reason, giving me the following chart:
I've been searching online for what might be wrong but unfortunately couldn't find answers or troubleshoot. Does anyone know how I can fix this issue?

After hours of digging around, I finally figured it out! What was wrong with my code was that I was trying to manually convert the geographic coordinates into point coordinates for the displaying chart (by multiplying by 111139).
While the logic for doing this makes sense, I believe this process broke down when I began to plot the data onto different kinds of charts (i.e. orthogonal, miller projection etc.) because the different projections/charts will have different point coordinates (kind of like how the pixel locations on your computer screen may not align with the pixel locations on a different computer screen).
Instead, the Basemap module has a built-in function that will convert real-world coordinates into coordinates that can be plotted on the chart, for you: m(x, y).
So, the improved and correct script would be:
from matplotlib import pyplot as plt
from mpl_toolkits.basemap import Basemap
import numpy as np
m = Basemap(projection='mill',llcrnrlat=-90,urcrnrlat=90,\
llcrnrlon=-180,urcrnrlon=180,resolution='c')
m.drawcoastlines(linewidth=0.15)
data = np.loadtxt('gridly.dat')
xcoordlist = []
ycoordlist = []
tempvallist = []
for i in data:
lat = i[0]
lon = i[1]
tempval = i[2]
xpt, ypt = m(lon, lat)
xcoordlist.append(xpt)
ycoordlist.append(ypt)
tempvallist.append(tempval)
xco = np.array(xcoordlist)
yco = np.array(ycoordlist)
tval = np.array(tempvallist)
gridsize = 100
m.hexbin(xco, yco, C=tval, gridsize=gridsize)
cb = m.colorbar()
plt.show()
As you can see where it says xpt, ypt = m(lon, lat), the function converts the real world longitudes (lon) and latitudes (lat) from the .dat file into pottable points. Hope this helps anyone else that may have this problem in the future!

How to convert pixel to wavelength in spectra from FITS files in Python?

I've been using matplotlib.pyplot to plot a spectrum from fits files in Python and getting the intensity versus pixel, but what I actually need is to convert the pixels to wavelength. I've seen similar questions that got me in the right path of what I need to do (e.g. similar question,RGB example) but I still feel lost in the process.
I have FITS files with wavelengths around 3500 and 6000 (A), in float32 format and dimensions (53165,).
So as I understand, I need to calibrate the pixel positions to wavelength. I have my rest wavelength header (RESW), my "step" wavelength header (STW) and I would need to get:
x = RESW + (number of pixels * STW)
and the plot it. Here is what I got in my code so far.
import os, glob
from glob import glob
from pylab import *
from astropy.io import ascii
import scipy.constants as constants
import matplotlib.pylab as plt
from astropy.io import fits
#add directory you have your files in
dir = ''
#OPEN ALL FILES AND STORE THEM INTO A LIST
files= glob(dir + '*.fits')
for fi in (files):
print(fi)
name = fi[:-len('.fits')] #Remove '.fits' from the file name
with fits.open(dir + fi) as hdu:
hdu.info()
data = hdu[0].data
hdr = hdu[0].header #added to try2
step = hdr['CDELT1'] #added to try2
restw = hdr['CRVAL1'] #added to try2
#step = fits.getheader('STW') #added to try
#restw = fits.getheader('RESW') #added to try
spectra = restw + (data * step) #added to try
plt.clf()
plt.plot(spectra)
plt.savefig(name + '.pdf')
I've tried using fits.getheader('') but I don't know where or how to put it because this way is not working right.
Could someone please help? Thanks in advance!

How to separate two regions of latlon data on python

I am currently working with BUFR files with wind data. When I read this file on python I get 4 large vectors, latitude vector, longitude vector, wind_direction vector, and wind_speed vector.
Both wind vectors are masked python arrays because there is non-valid data. This happens because the data comes from a non-geostationary satellite. In fact I successfully generated the following image from this BUFR file to show you the general shape that the data takes.
In this image I have plotted a color field to represent the wind speed, while the arrows obviously represent the wind direction.
Please notice the two bands of actual data. Unfortunately the way I am plotting the data, generates a third band (where the color field is smooth), in-between the actual data bands. This is an artefact of the function pcolormesh. If I could superimpose two `pcolormesh plots, each one representing one of the bands, this problem would disappear.
Unfortunately, I do not know how I could separate the data "regions". I have thought about clustering techniques but do not know how to cluster along latlon data using ANOTHER array (the wind data) as the clustering rule.
This is my current code:
#!/usr/bin/python
import bufr
import numpy as np
import sys
import matplotlib
matplotlib.use('Agg')
from matplotlib import pyplot as plt
from matplotlib import mlab
WIND_DIR_INDEX = 97
WIND_SPEED_INDEX = 96
bfrfile = sys.argv[1]
print bfrfile
bfr = bufr.BUFRFile(bfrfile)
lon = []
lat = []
wind_d = []
wind_s = []
for record in bfr:
for entry in record:
if entry.index == WIND_DIR_INDEX:
wind_d.append(entry.data)
if entry.index == WIND_SPEED_INDEX:
wind_s.append(entry.data)
if entry.name.find("LONGITUDE") == 0:
lon.append(entry.data)
if entry.name.find("LATITUDE") == 0:
lat.append(entry.data)
lons = np.concatenate(lon)
lats = np.concatenate(lat)
winds_d = np.concatenate(wind_d)
winds_s = np.concatenate(wind_s)
winds_d = np.ma.masked_greater(winds_d,1.0e+6)
winds_s = np.ma.masked_greater(winds_s,1.0e+6)
windu = np.cos((winds_d-180)*(np.pi/180))
windv = np.sin((winds_d-180)*(np.pi/180))
# Data interpolation for pcolormesh (needs gridded data)
xi = np.linspace(lons.min(),lons.max(),lons.size/10)
yi = np.linspace(lats.min(),lats.max(),lats.size/10)
Z = mlab.griddata(lons,lats,winds_s,xi,yi)
X,Y = np.meshgrid(xi,yi)
mydpi = 96
fig = plt.figure(frameon=True)
fig.set_size_inches(1600/mydpi,1200/mydpi)
ax = plt.Axes(fig,[0,0,1,1])
#ax.set_axis_off()
fig.add_axes(ax)
plt.hold(True);
plt.quiver(lons[::5],lats[::5],windu[::5],windv[::5],linewidths=0)
for method in (ax.set_xticks,ax.set_xticklabels,ax.set_yticks,ax.set_yticklabels):
method([])
fig.savefig('/home/cendas/bin/python/bufr_ascat.png',bbox_inches=0,dpi=5*mydpi)
mydpi = 96
fig = plt.figure(frameon=True)
fig.set_size_inches(1600/mydpi,1200/mydpi)
ax = plt.Axes(fig,[0,0,1,1])
#ax.set_axis_off()
fig.add_axes(ax)
plt.hold(True);
try:
plt.pcolormesh(X,Y,Z,alpha=None)
plt.clim(0,10)
except ValueError:
pass
print "Warning: Empty data array."
for method in (ax.set_xticks,ax.set_xticklabels,ax.set_yticks,ax.set_yticklabels):
method([])
fig.savefig('/home/cendas/bin/python/bufr_ascat_color.png',bbox_inches=0,dpi=5*mydpi)
I then usually follow this python code with the following terminal commands to combine the images:
convert bufr_ascat.png -transparent white bufr_ascat.png
convert bufr_ascat_color.png -transparent white bufr_ascat_color.png
composite bufr_ascat.png bufr_ascat_color.png bufrascat.png

Don't abuse clustering for this.
What you need is a simple selection / filtering; not a structure discovery process.
Choose the mean of the masked data. All non-masked data left of that mean is the left part, all non-masked data on the right is the other?
Clustering is the wrong tool for this task.

How to create a grid from LiDAR points (X,Y,Z) with GDAL python?

I'm new really to python programming, and I was just wondering if you can create a regular grid of 0.5 by o.5 m of resolution using LiDAR points.
My data are in LAS format (reading with from liblas import file as lasfile) and they have the following format: X,Y,Z. Where X and Y are coordinates.
The points are randomly positioned and some pixel are empty (NAN value) and in some pixel there are more of one points. Where there are more of one point, I wish to obtain a mean value. In the end i need to save the data in a TIF format or Ascii format.
I am studying osgeo module and GDAL but I honest to say that i don't know if osgeo module is the best solution.
I am really glad for help with some code that i can study and implement,
Thanks in Advance for the help, I really need.
I don't know the best way to get a grid with these parameters.

It's a bit late but maybe this answer will be useful for others, if not for you...
I have done this with Numpy and Pandas, and it's pretty fast. I was using TLS data and could do this with several million data points without any trouble on a decent 2009-vintage laptop. The key is 'binning' by rounding the data, and then using Pandas' GroupBy methods to do the aggregating and calculate the means.
If you need to round to a power of 10 you can use np.round, otherwise you can round to an arbitrary value by making a function to do so, which I have done by modifying this SO answer.
import numpy as np
import pandas as pd
# make rounding function:
def round_to_val(a, round_val):
return np.round( np.array(a, dtype=float) / round_val) * round_val
# load data
data = np.load( 'shape of ndata, 3')
n_d = data.shape[0]
# round the data
d_round = np.empty( [n_d, 5] )
d_round[:,0] = data[:,0]
d_round[:,1] = data[:,1]
d_round[:,2] = data[:,2]
del data # free up some RAM
d_round[:,3] = round_to_val( d_round[:,0], 0.5)
d_round[:,4] = round_to_val( d_round[:,1], 0.5)
# sorting data
ind = np.lexsort( (d_round[:,4], d_round[:,3]) )
d_sort = d_round[ind]
# making dataframes and grouping stuff
df_cols = ['x', 'y', 'z', 'x_round', 'y_round']
df = pd.DataFrame( d_sort)
df.columns = df_cols
df_round = df[['x_round', 'y_round', 'z']]
group_xy = df_round.groupby(['x_round', 'y_round'])
# calculating the mean, write to csv, which saves the file with:
# [x_round, y_round, z_mean] columns. You can exit Python and then start up
# later to clear memory if that's an issue.
group_mean = group_xy.mean()
group_mean.to_csv('your_binned_data.csv')
# Restarting...
import numpy as np
from scipy.interpolate import griddata
binned_data = np.loadtxt('your_binned_data.csv', skiprows=1, delimiter=',')
x_bins = binned_data[:,0]
y_bins = binned_data[:,1]
z_vals = binned_data[:,2]
pts = np.array( [x_bins, y_bins])
pts = pts.T
# make grid (with borders rounded to 0.5...)
xmax, xmin = 640000.5, 637000
ymax, ymin = 6070000.5, 6067000
grid_x, grid_y = np.mgrid[640000.5:637000:0.5, 6067000.5:6070000:0.5]
# interpolate onto grid
data_grid = griddata(pts, z_vals, (grid_x, grid_y), method='cubic')
# save to ascii
np.savetxt('data_grid.txt', data_grid)
When I've done this, I have saved the output as a .npy and converted to a tiff with the Image library, and then georeferenced in ArcMap. There is probably a way to do that with osgeo but I haven't used it.
Hope this helps someone at least...

You can use the histogram function in Numpy to do binning, for instance:
import numpy as np
points = np.random.random(1000)
#create 10 bins from 0 to 1
bins = np.linspace(0, 1, 10)
means = (numpy.histogram(points, bins, weights=data)[0] /
numpy.histogram(points, bins)[0])

Try LAStools, particularly lasgrid or las2dem.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to represent scalar variables over geographic map in Jupyter Notebook - python

Related

while using shading in pcolormesh getting error

Python's Basemap doesn't align with correct coordinates?

How to convert pixel to wavelength in spectra from FITS files in Python?

How to separate two regions of latlon data on python

How to create a grid from LiDAR points (X,Y,Z) with GDAL python?

Categories

Resources