I am trying to take a population raster and resample+reproject it to match the shape and resolution of a precipitation raster.
Data Links:
Population Data: https://figshare.com/ndownloader/files/10257111
Precipitation Data: https://www.ncei.noaa.gov/data/nclimgrid-monthly/access/nclimgrid_prcp.nc
The Population Data is a series of rasters per decade of 5 different population models covering the continental US. If you simply select one of the rasters I can work out the rest (I have combined into a multiband raster anyways). For example if you use the pop_m4_2010 raster that would help. The resolution is 1x1km, and the projection is Albers Equal Area Conic NAD 83 ESRI:102003.
The Precipitation Data is a netcdf file covering monthly precipitation data for the continental US. The resolution is 5x5km and the projection is WGS84 EPSG:4326.
I converted the netcdf to tiff using the following code:
import xarray as xr
import rioxarray as rio
prcp_file = xr.open_dataset('nclimgrid_prcp.nc')
prp = prcp_file['prcp']
prp = prp.rio.set_spatial_dims(x_dim='lon', y_dim='lat')
prp.rio.write_crs("epsg:4326", inplace=True)
prp.rio.to_raster('prp_raster.tiff')
I also used QGIS to open the population files (add raster layer, navigate into the downloaded folder for pop_m4_2010 and select the "w001001.adf" file). When I do this in a WGS84 project QGIS automatically appears to force reprojection but I am new to this so I am unsure if it is correct.
From this point I have tried several things to resample the population raster to match the 5x5 resolution of the precipitation raster.
In QGIS Processing Toolbox GRASS r.resample
In QGIS Processing Toolbox Raster Layer Zonal Statistics
In Python, honestly I have lost track of all of the different forum posts and tutorials I have followed on GDAL.Warp, Rasterio.Warp, affine transformations, rio.reproject_match, etc. Below are a few examples of the code attempts.
Many of these appear to work (particularly the rio.reproject_match seemed simple and effective). However, none of these appear to be working as intended. When I test the accuracy of the resulting population raster by passing zonal stats of a county vector shapefile the resulting sum of population in the area is either 0, or wildly inaccurate.
What am I doing wrong?
Reproject_Match:
import rioxarray # for the extension to load
import xarray
import matplotlib.pyplot as plt
%matplotlib inline
def print_raster(raster):
print(
f"shape: {raster.rio.shape}\n"
f"resolution: {raster.rio.resolution()}\n"
f"bounds: {raster.rio.bounds()}\n"
f"sum: {raster.sum().item()}\n"
f"CRS: {raster.rio.crs}\n"
)
xds = rioxarray.open_rasterio('pop_m4_2010.tif')
xds_match = rioxarray.open_rasterio('prp_raster.tiff')
fig, axes = plt.subplots(ncols=2, figsize=(12,4))
xds.plot(ax=axes[0])
xds_match.plot(ax=axes[1])
plt.draw()
print("Original Raster:\n----------------\n")
print_raster(xds)
print("Raster to Match:\n----------------\n")
print_raster(xds_match)
xds_repr_match = xds.rio.reproject_match(xds_match)
print("Reprojected Raster:\n-------------------\n")
print_raster(xds_repr_match)
print("Raster to Match:\n----------------\n")
print_raster(xds_match)
xds_repr_match.rio.to_raster("reproj_pop.tif")
Another way with Rasterio.Warp:
import rasterio
from rasterio.warp import calculate_default_transform, reproject, Resampling
#open source raster
srcRst =rasterio.open('pop_m4_2010.tif')
print("source raster crs:")
print(srcRst.crs)
dstCrs = {'init': 'EPSG:4326'}
print("destination raster crs:")
print(dstCrs)
#calculate transform array and shape of reprojected raster
transform, width, height = calculate_default_transform(
srcRst.crs, dstCrs, srcRst.width, srcRst.height, *srcRst.bounds)
print("transform array of source raster")
print(srcRst.transform)
print("transform array of destination raster")
print(transform)
#working of the meta for the destination raster
kwargs = srcRst.meta.copy()
kwargs.update({
'crs': dstCrs,
'transform': transform,
'width': width,
'height': height
})
#open destination raster
dstRst = rasterio.open('pop_m4_2010_reproj4326.tif', 'w', **kwargs)
#reproject and save raster band data
for i in range(1, srcRst.count + 1):
reproject(
source=rasterio.band(srcRst, i),
destination=rasterio.band(dstRst, i),
#src_transform=srcRst.transform,
src_crs=srcRst.crs,
#dst_transform=transform,
dst_crs=dstCrs,
resampling=Resampling.bilinear)
print(i)
#close destination raster
dstRst.close()
And here is a second attempt with Rasterio.Warp:
import rasterio
from rasterio.warp import calculate_default_transform, reproject, Resampling
prcp = rasterio.open('prp_raster.tiff', mode = 'r')
with rasterio.open('pop_m4_2010.tif') as dataset:
# resample data to target shape
data = dataset.read(out_shape=(dataset.count,prcp.height,prcp.width), resampling=Resampling.bilinear)
# scale image transform
transform = dataset.transform * dataset.transform.scale((dataset.width / data.shape[-1]),
(dataset.height / data.shape[-2]))
# Register GDAL format drivers and configuration options with a
# context manager.
with rasterio.Env():
profile = src.profile
profile.update(
dtype=rasterio.float32,
count=1,
compress='lzw')
with rasterio.open('pop_m4_2010_resampledtoprcp.tif', 'w', **profile) as dst:
dst.write(data.astype(rasterio.float32))
This is how you can do that with R.
library(terra)
pop <- rast("USA_HistoricalPopulationDataset/pop_m5_2010")
wth <- rast("nclimgrid_prcp.nc")
wpop <- project(pop, wth, "sum")
Inspect the results.
wpop
#class : SpatRaster
#dimensions : 596, 1385, 1 (nrow, ncol, nlyr)
#resolution : 0.04166666, 0.04166667 (x, y)
#extent : -124.7083, -67, 24.5417, 49.37503 (xmin, xmax, ymin, ymax)
#coord. ref. : lon/lat WGS 84
#source(s) : memory
#name : pop_m5_2010
#min value : 0.0
#max value : 423506.7
global(pop, "sum", na.rm=TRUE)
# sum
#pop_m5_2010 306620886
global(wpop, "sum", na.rm=TRUE)
# sum
#pop_m5_2010 306620761
You can save the results to file with something like this
writeRaster(wpop, "pop.tif")
And you could do this in one step for all population data like this:
ff <- list.files(pattern="0$", "USA_HistoricalPopulationDataset", full=TRUE)
apop <- rast(ff)
wapop <- project(apop, wth, "sum")
The population numbers you are getting are probably wrong because you are using bilinear interpolation when projecting (warping). That is not appropriate for (population) count data. You could first transform it to population density, warp, and transform back. I do that below, getting a result that is similar to what you get with the more direct approach that I have shown above.
csp <- cellSize(pop)
csw <- cellSize(wth[[1]])
popdens <- pop / csp
popdens <- project(popdens, wth, "bilinear")
popcount <- popdens * csw
popcount
#class : SpatRaster
#dimensions : 596, 1385, 1 (nrow, ncol, nlyr)
#resolution : 0.04166666, 0.04166667 (x, y)
#extent : -124.7083, -67, 24.5417, 49.37503 (xmin, xmax, ymin, ymax)
#coord. ref. : lon/lat WGS 84
#source(s) : memory
#name : pop_m5_2010
#min value : 0.0
#max value : 393982.5
global(popcount, "sum", na.rm=TRUE)
# sum
#pop_m5_2010 304906042
Related
I have a mosaic tif file (gdalinfo below) I made (with some additional info on the tiles here) and have looked extensively for a function that simply returns the elevation (the z value of this mosaic) for a given lat/long. The functions I've seen want me to input the coordinates in the coordinates of the mosaic, but I want to use lat/long, is there something about GetGeoTransform() that I'm missing to achieve this?
This example for instance here shown below:
from osgeo import gdal
import affine
import numpy as np
def retrieve_pixel_value(geo_coord, data_source):
"""Return floating-point value that corresponds to given point."""
x, y = geo_coord[0], geo_coord[1]
forward_transform = \
affine.Affine.from_gdal(*data_source.GetGeoTransform())
reverse_transform = ~forward_transform
px, py = reverse_transform * (x, y)
px, py = int(px + 0.5), int(py + 0.5)
pixel_coord = px, py
data_array = np.array(data_source.GetRasterBand(1).ReadAsArray())
return data_array[pixel_coord[0]][pixel_coord[1]]
This gives me an out of bounds error as it's likely expecting x/y coordinates (e.g. retrieve_pixel_value([153.023499,-27.468968],dataset). I've also tried the following from here:
import rasterio
dat = rasterio.open(fname)
z = dat.read()[0]
def getval(lon, lat):
idx = dat.index(lon, lat, precision=1E-6)
return dat.xy(*idx), z[idx]
Is there a simple adjustment I can make so my function can query the mosaic in lat/long coords?
Much appreciated.
Driver: GTiff/GeoTIFF
Files: mosaic.tif
Size is 25000, 29460
Coordinate System is:
PROJCRS["GDA94 / MGA zone 56",
BASEGEOGCRS["GDA94",
DATUM["Geocentric Datum of Australia 1994",
ELLIPSOID["GRS 1980",6378137,298.257222101004,
LENGTHUNIT["metre",1]],
ID["EPSG",6283]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433,
ID["EPSG",9122]]]],
CONVERSION["UTM zone 56S",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",0,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",153,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",0.9996,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",500000,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",10000000,
LENGTHUNIT["metre",1],
ID["EPSG",8807]],
ID["EPSG",17056]],
CS[Cartesian,2],
AXIS["easting",east,
ORDER[1],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]],
AXIS["northing",north,
ORDER[2],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]]]
Data axis to CRS axis mapping: 1,2
Origin = (491000.000000000000000,6977000.000000000000000)
Pixel Size = (1.000000000000000,-1.000000000000000)
Metadata:
AREA_OR_POINT=Area
Image Structure Metadata:
INTERLEAVE=BAND
Corner Coordinates:
Upper Left ( 491000.000, 6977000.000) (152d54'32.48"E, 27d19'48.33"S)
Lower Left ( 491000.000, 6947540.000) (152d54'31.69"E, 27d35'45.80"S)
Upper Right ( 516000.000, 6977000.000) (153d 9'42.27"E, 27d19'48.10"S)
Lower Right ( 516000.000, 6947540.000) (153d 9'43.66"E, 27d35'45.57"S)
Center ( 503500.000, 6962270.000) (153d 2' 7.52"E, 27d27'47.16"S)
Band 1 Block=25000x1 Type=Float32, ColorInterp=Gray
NoData Value=-999
Update 1 - I tried the following:
tif = r"mosaic.tif"
dataset = rio.open(tif)
d = dataset.read()[0]
def get_xy_coords(latlng):
transformer = Transformer.from_crs("epsg:4326", dataset.crs)
coords = [transformer.transform(x, y) for x,y in latlng][0]
#idx = dataset.index(coords[1], coords[0])
return coords #.xy(*idx), z[idx]
longx,laty = 153.023499,-27.468968
coords = get_elevation([(laty,longx)])
print(coords[0],coords[1])
print(dataset.width,dataset.height)
(502321.11181384244, 6961618.891167777)
25000 29460
So something is still not right. Maybe I need to subtract the coordinates from the bottom left/right of image e.g.
coords[0]-dataset.bounds.left,coords[1]-dataset.bounds.bottom
where
In [78]: dataset.bounds
Out[78]: BoundingBox(left=491000.0, bottom=6947540.0, right=516000.0, top=6977000.0)
Update 2 - Indeed, subtracting the corners of my box seems to get closer.. though I'm sure there is a much nice way just using the tif metadata to get what I want.
longx,laty = 152.94646, -27.463175
coords = get_xy_coords([(laty,longx)])
elevation = d[int(coords[1]-dataset.bounds.bottom),int(coords[0]-dataset.bounds.left)]
fig,ax = plt.subplots(figsize=(12,12))
ax.imshow(d,vmin=0,vmax=400,cmap='terrain',extent=[dataset.bounds.left,dataset.bounds.right,dataset.bounds.bottom,dataset.bounds.top])
ax.plot(coords[0],coords[1],'ko')
plt.show()
You basically have two distinct steps:
Convert lon/lat coordinates to map coordinates, this is only necessary if your input raster is not already in lon/lat. Map coordinates are the coordinates in the projection that the raster itself uses
Convert the map coordinates to pixel coordinates.
There are all kinds of tool you might use, perhaps to make things simpler (like pyproj, rasterio etc). But for such a simple case it's probably nice to start with doing it all in GDAL, that probably also enhances your understanding of what steps are needed.
Inputs
from osgeo import gdal, osr
raster_file = r'D:\somefile.tif'
lon = 153.023499
lat = -27.468968
lon/lat to map coordinates
# fetch metadata required for transformation
ds = gdal.OpenEx(raster_file)
raster_proj = ds.GetProjection()
gt = ds.GetGeoTransform()
ds = None # close file, could also keep it open till after reading
# coordinate transformation (lon/lat to map)
# define source projection
# this definition ensures the order is always lon/lat compared
# to EPSG:4326 for which it depends on the GDAL version (2 vs 3)
source_srs = osr.SpatialReference()
source_srs.ImportFromWkt(osr.GetUserInputAsWKT("urn:ogc:def:crs:OGC:1.3:CRS84"))
# define target projection based on the file
target_srs = osr.SpatialReference()
target_srs.ImportFromWkt(raster_proj)
# convert
ct = osr.CoordinateTransformation(source_srs, target_srs)
mapx, mapy, *_ = ct.TransformPoint(lon, lat)
You could verify this intermediate result by for example adding it as Point WKT in something like QGIS (using the QuickWKT plugin, making sure the viewer has the same projection as the raster).
map coordinates to pixel
# apply affine transformation to get pixel coordinates
gt_inv = gdal.InvGeoTransform(gt) # invert for map -> pixel
px, py = gdal.ApplyGeoTransform(gt_inv, mapx, mapy)
# it wil return fractional pixel coordinates, so convert to int
# before using them to read. Round to nearest with +0.5
py = int(py + 0.5)
px = int(px + 0.5)
# read pixel data
ds = gdal.OpenEx(raster_file) # open file again
elevation_value = ds.ReadAsArray(px, py, 1, 1)
ds = None
The elevation_value variable should be the value you're after. I would definitelly verify the result independently, try a few points in QGIS or the gdallocationinfo utility:
gdallocationinfo -l_srs "urn:ogc:def:crs:OGC:1.3:CRS84" filename.tif 153.023499 -27.468968
# Report:
# Location: (4228P,4840L)
# Band 1:
# Value: 1804.51879882812
If you're reading a lot of points, there will be some threshold at which it would be faster to read a large chunk and extract the values from that array, compared to reading every point individually.
edit:
For applying the same workflow on multiple points at once a few things change.
So for example having the inputs:
lats = np.array([-27.468968, -27.468968, -27.468968])
lons = np.array([153.023499, 153.023499, 153.023499])
The coordinate transformation needs to use ct.TransformPoints instead of ct.TransformPoint which also requires the coordinates to be stacked in a single array of shape [n_points, 2]:
coords = np.stack([lons.ravel(), lats.ravel()], axis=1)
mapx, mapy, *_ = np.asarray(ct.TransformPoints(coords)).T
# reshape in case of non-1D inputs
mapx = mapx.reshape(lons.shape)
mapy = mapy.reshape(lons.shape)
Converting from map to pixel coordinates changes because the GDAL method for this only takes single point. But manually doing this on the arrays would be:
px = gt_inv[0] + mapx * gt_inv[1] + mapy * gt_inv[2]
py = gt_inv[3] + mapx * gt_inv[4] + mapy * gt_inv[5]
And rounding the arrays to integer changes to:
px = (px + 0.5).astype(np.int32)
py = (py + 0.5).astype(np.int32)
If the raster (easily) fits in memory, reading all points would become:
ds = gdal.OpenEx(raster_file)
all_elevation_data = ds.ReadAsArray()
ds = None
elevation_values = all_elevation_data[py, px]
That last step could be optimized by checking highest/lowest pixel coordinates in both dimensions and only read that subset for example, but it would require normalizing the coordinates again to be valid for that subset.
The py and px arrays might also need to be clipped (eg np.clip) if the input coordinates fall outside the raster. In that case the pixel coordinates will be < 0 or >= xsize/ysize.
I am working with Sentinel 3 SLSTR data which comes in netCDF file format. The file contains 11 bands:
S1-S6 (500 m resolution) and S7-S9 and F1 & F2 (1000 m resolution). S1-S6 contains radiance values and S7-S9 contains brightness temperature values. Right now, I want to resample my S7-S9 band to 500 m resolution to match the resolution of S1-S6 bands.
I am using xarray to read the netCDF files. There is a function xarray.Dataset.resample() but the documentation says that it resample to a new temporal resolution.
I also tried to resample using gdal but couldn't get any result.
import gdal
import xarray as xr
import matplotlib.pyplot as plt
data = xr.open_dataset('S7_BT_in.nc') # one of the files in 1000 m resolution
geo = xr.open_dataset(path+'geodetic_an.nc') # file containing the geodetic values
ds = data['S7_BT_in'] # fetching variable I need to work on
lat = geo['latitude_an'] # fetching latitude values
lon = geo['longitude_an'] # fetching longitude values
#assigning latitude and longitude values to the coordinates of ds
ds = ds.assign_coords(coords = {'Latitude': lat, 'Longitude': lon})
x = gdal.Open('ds') # Opening the netCDF file using gdal
# resampling the data to 500 m resolution
xreproj = gdal.Warp('resampled.nc', x, xRes = 500, yRes = 500)
This is the error I am getting:
SystemError: <built-in function wrapper_GDALWarpDestName> returned NULL without setting an error.
I also tried opening the file directly using gdal but still getting the same error.
I used pixel_to_world to find the ra and dec of five stars, and know their xy values in another image. So I feel like wcs_from_points is the correct method by which I should get a WCS on my image.
I have the following code:
import numpy as np
from astropy.wcs.utils import fit_wcs_from_points
from astropy.coordinates import SkyCoord
stars = np.array([[1246.63, 1372.83, 1455.12, 1611.95, 1644.85],
[1588.42, 1502.92, 1677.24, 1610.39, 1325.12]])
known_coords = SkyCoord([(125.66419083, -42.96809252), (125.67730695, -42.98209958),
(125.65082259, -42.9914015), (125.6611325, -43.01438513), (125.70471982, -43.01228167)],
frame="icrs", unit="deg")
fit_wcs_from_points(xy = stars, world_coords = known_coords, projection='TAN')
I get this output:
WCS Keywords
Number of WCS axes: 2
CTYPE : 'RA---TAN' 'DEC--TAN'
CRVAL : 125.67776105611648 -42.99124198597975
CRPIX : 1451.3389930261899 1501.555748491856
CD1_1 CD1_2 : 6.547003719386885e-07 -0.00011159088984525699
CD2_1 CD2_2 : -0.00012238492146305314 -1.1457490550730293e-05
NAXIS : 1645 1678
My known image (which is rotated 180 degrees from this image) has the following data, so at least the CRVALs seem correct (I'm still rather new to this so I'm unsure of what approximate these keywords should be populated with):
CTYPE1 = 'RA---TAN' / Pixel coordinate system
CTYPE2 = 'DEC--TAN' / Pixel coordinate system
CRPIX1 = -6457.18566867618 / Ref. pixel of center of rotation
CRPIX2 = -4673.96071610528 / Ref. pixel of center of rotation
CRVAL1 = 125.116590442276 / [deg] [deg] 08:24:31.5 Value of ref pixel
CRVAL2 = -43.3619270216124 / [deg] [deg] -42:34:50.2 Value of ref pixel
CD1_1 = 5.92477204932042E-5 / WCS transform matrix element
CD2_1 = 3.73014240770773E-8 / WCS transform matrix element
CD1_2 = -3.4695999305039E-8 / WCS transform matrix element
CD2_2 = 5.92440992599392E-5 / WCS transform matrix element
I then updated my header of the image:
hdulist = fits.open('Swirl06p_1.fits')
header = hdulist[0].header
header.keys
header.set('CTYPE1', 'RA---TAN')
header.set('CTYPE2', 'DEC--TAN')
header.set('CRVAL1', 125.67776105611648)
header.set('CRVAL2', -42.99124198597975)
header.set('CRPIX1', 1451.3389930261899)
header.set('CRPIX2', 1501.555748491856)
header.set('CD1_1', 6.547003719386885e-07)
header.set('CD1_2', -0.00011159088984525699)
header.set('CD2_1', -0.00012238492146305314)
header.set('CD2_2', -1.1457490550730293e-05)
header.set('NAXIS1', 1645, after=3)
header.set('NAXIS2', 1678, after=4)
hdulist.writeto('Swirl06p_1WCS.fits')
hdulist.close()
However, the resulting image's WCS is incorrect. it has RA's around 180 and dec's around -3, as opposed to the correct 126 and -43. Also, the NAXISn values differ than what I stored them as, but this makes sense as it is storing it as the length of the axes...
I'm using Anaconda 4.10.3, and my astropy version is 4.3.1, my numpy version is 1.20.3, I'm on Ubuntu 20.04.3 LTS, if that helps.
Thanks in advance
EDIT:
Got it to work in one cell, thanks to help. This is my final code (in case anyone else experiences this later, haha)
hdulist = fits.open('Swirl06p_1.fits')
header = hdulist[0].header
stars = np.array([[1246.63, 1372.83, 1455.12, 1611.95, 1644.85],
[1588.42, 1502.92, 1677.24, 1610.39, 1325.12]])
known_coords = SkyCoord([(125.66419083, -42.96809252), (125.67730695, -42.98209958),
(125.65082259, -42.9914015), (125.6611325, -43.01438513), (125.70471982, -43.01228167)],
frame="icrs", unit="deg")
w = fit_wcs_from_points(xy = stars, world_coords = known_coords, projection='TAN')
hdulist[0].header.update(w.to_header())
hdulist.writeto('Swirl06p_1WCS.fits')
hdulist.close()
I am using python's gdal module to analyse satellite images - RADARSAT-2 and TerraSAR-X - saved as .tif files. I need to fetch pixel values at coordinates read from a shapefile. While the code works fine for the RS2 images, I'm having trouble with the TSX images.
The geotransform read by gdal is off for the TSX products, which yields negative pixel indices for the location of shapefile features on the image. The same piece of code works fine for RS2 products.
Any idea of what's going on and how to fix it ?
Example of a correct geotransform from a RS2 product:
(-74.98992283355103, 7.186522272956171e-05, 0.0, 62.273587708987776, 0.0, -7.186522272956171e-05)
Example of the geotransforms I get for TSX products :
(506998.75, 2.5, 0.0, 6919001.25, 0.0, -2.5)
Code snippet :
import gdal
gdal.UseExceptions()
# Read image, band, geotransform
dataset = gdal.Open(paths['TSXtiff'])
band = dataset.GetRasterBand(band_index)
gt = dataset.GetGeoTransform()
# Read shapefile
shapefile = ogr.Open(paths["Shapefile"])
layer = shapefile.GetLayer()
# Add pixel_value for each feature to associated list
pixels_at_shp = []
for feature in layer :
geometry = feature.GetGeometryRef()
# Coordinates in map units
# In GDAL, mx = px*gt[1] + gt[0], my = py*gt[5] + gt[3]
mx,my = geometry.GetX(), geometry.GetY()
# Convert to pixel coordinates
px = int((mx-gt[0])/gt[1])
py = int((my-gt[3])/gt[5])
band_values = band.ReadAsArray(px,py,1,1)
pixels_at_shp.append(band_values)
shapefile = None
return pixels_at_shp
The TSX product was projected while the RS2 product was simply georeferenced. Solution : go back to lat/long degrees coordinates by reprojecting the GeoTIFF files onto WGS84.
I used the gdal command line tool to do the reprojection, like so :
for f in *.tif
do
gdalwarp "$f" "${f%%.*}_reproj.tif" -t_srs "+proj=longlat +ellps=WGS84"
done
I am currently working with BUFR files with wind data. When I read this file on python I get 4 large vectors, latitude vector, longitude vector, wind_direction vector, and wind_speed vector.
Both wind vectors are masked python arrays because there is non-valid data. This happens because the data comes from a non-geostationary satellite. In fact I successfully generated the following image from this BUFR file to show you the general shape that the data takes.
In this image I have plotted a color field to represent the wind speed, while the arrows obviously represent the wind direction.
Please notice the two bands of actual data. Unfortunately the way I am plotting the data, generates a third band (where the color field is smooth), in-between the actual data bands. This is an artefact of the function pcolormesh. If I could superimpose two `pcolormesh plots, each one representing one of the bands, this problem would disappear.
Unfortunately, I do not know how I could separate the data "regions". I have thought about clustering techniques but do not know how to cluster along latlon data using ANOTHER array (the wind data) as the clustering rule.
This is my current code:
#!/usr/bin/python
import bufr
import numpy as np
import sys
import matplotlib
matplotlib.use('Agg')
from matplotlib import pyplot as plt
from matplotlib import mlab
WIND_DIR_INDEX = 97
WIND_SPEED_INDEX = 96
bfrfile = sys.argv[1]
print bfrfile
bfr = bufr.BUFRFile(bfrfile)
lon = []
lat = []
wind_d = []
wind_s = []
for record in bfr:
for entry in record:
if entry.index == WIND_DIR_INDEX:
wind_d.append(entry.data)
if entry.index == WIND_SPEED_INDEX:
wind_s.append(entry.data)
if entry.name.find("LONGITUDE") == 0:
lon.append(entry.data)
if entry.name.find("LATITUDE") == 0:
lat.append(entry.data)
lons = np.concatenate(lon)
lats = np.concatenate(lat)
winds_d = np.concatenate(wind_d)
winds_s = np.concatenate(wind_s)
winds_d = np.ma.masked_greater(winds_d,1.0e+6)
winds_s = np.ma.masked_greater(winds_s,1.0e+6)
windu = np.cos((winds_d-180)*(np.pi/180))
windv = np.sin((winds_d-180)*(np.pi/180))
# Data interpolation for pcolormesh (needs gridded data)
xi = np.linspace(lons.min(),lons.max(),lons.size/10)
yi = np.linspace(lats.min(),lats.max(),lats.size/10)
Z = mlab.griddata(lons,lats,winds_s,xi,yi)
X,Y = np.meshgrid(xi,yi)
mydpi = 96
fig = plt.figure(frameon=True)
fig.set_size_inches(1600/mydpi,1200/mydpi)
ax = plt.Axes(fig,[0,0,1,1])
#ax.set_axis_off()
fig.add_axes(ax)
plt.hold(True);
plt.quiver(lons[::5],lats[::5],windu[::5],windv[::5],linewidths=0)
for method in (ax.set_xticks,ax.set_xticklabels,ax.set_yticks,ax.set_yticklabels):
method([])
fig.savefig('/home/cendas/bin/python/bufr_ascat.png',bbox_inches=0,dpi=5*mydpi)
mydpi = 96
fig = plt.figure(frameon=True)
fig.set_size_inches(1600/mydpi,1200/mydpi)
ax = plt.Axes(fig,[0,0,1,1])
#ax.set_axis_off()
fig.add_axes(ax)
plt.hold(True);
try:
plt.pcolormesh(X,Y,Z,alpha=None)
plt.clim(0,10)
except ValueError:
pass
print "Warning: Empty data array."
for method in (ax.set_xticks,ax.set_xticklabels,ax.set_yticks,ax.set_yticklabels):
method([])
fig.savefig('/home/cendas/bin/python/bufr_ascat_color.png',bbox_inches=0,dpi=5*mydpi)
I then usually follow this python code with the following terminal commands to combine the images:
convert bufr_ascat.png -transparent white bufr_ascat.png
convert bufr_ascat_color.png -transparent white bufr_ascat_color.png
composite bufr_ascat.png bufr_ascat_color.png bufrascat.png
Don't abuse clustering for this.
What you need is a simple selection / filtering; not a structure discovery process.
Choose the mean of the masked data. All non-masked data left of that mean is the left part, all non-masked data on the right is the other?
Clustering is the wrong tool for this task.