I am plotting weather model precipitation data on a map. The map is a contour fill. Over that data I would like to plot text at specific grid points that tells the value of the precipitation at the point in time, however I am struggling to do this. I have a list of lat/lon points but cannot figure out how to map it properly to the data I have. The data itself comes with it's own set of lat/lon points that uniquely map to the data. Here's example code that I have:
grib = 'gfs.t00z.pgrb2.0p25.f084'
grbs = pygrib.open(grib)
grb = grbs.select(name='Total Precipitation',typeOfLevel='surface')[0]
precip = grb.values *.039370
lat,lon = grb.latlons()
x, y = m(lon,lat)
with open('mydata') as f:
for line in f:
myline = line.replace("\n", "")
myline = myline.split(",")
uniquepoints.append(myline) # contains specific lat, lon points
intervals = [0.0,0.01,0.1,0.25,0.5,0.75,1.00,1.25,1.50,1.75,2.0,2.50,3.00]
m=Basemap(projection='lcc',lon_0 = -95.,llcrnrlon=-125.,
urcrnrlon=-55.,llcrnrlat=20.,urcrnrlat=50., lat_1=25.,lat_2=46., resolution='l',area_thresh=10000,ax=ax)
obsobj = m.contourf(x,y,precip,intervals,cmap=plt.cm.jet)
I know you are suppose to use plt.text, but I can't configure it correctly so that the unique points map correctly to the precip data.
It can be easy to make simple mistakes with placing points on Basemap as you have to convert between lat,long,height and the windows coordinate system in x,y. However, if your data is plotting at the correct points on the map, you can use those positions to plot your labels at those exact points with some specified offset.
A pythonic example of how to do this:
lats = [x['LLHPosition'][0] for x in unit_data]
lons = [x['LLHPosition'][1] for x in unit_data]
x,y = m(lons,lats)
label_text = [x['UnitName'] for x in unit_data]
x_offsets = [2000] * len(unit_data)
y_offsets = [2000] * len(unit_data)
for label, xpt, ypt, x_offset, y_offset in zip(label_text, x, y, x_offsets, y_offsets):
plt.text(xpt+x_offset, ypt+y_offset, label)
Related
I have a mosaic tif file (gdalinfo below) I made (with some additional info on the tiles here) and have looked extensively for a function that simply returns the elevation (the z value of this mosaic) for a given lat/long. The functions I've seen want me to input the coordinates in the coordinates of the mosaic, but I want to use lat/long, is there something about GetGeoTransform() that I'm missing to achieve this?
This example for instance here shown below:
from osgeo import gdal
import affine
import numpy as np
def retrieve_pixel_value(geo_coord, data_source):
"""Return floating-point value that corresponds to given point."""
x, y = geo_coord[0], geo_coord[1]
forward_transform = \
affine.Affine.from_gdal(*data_source.GetGeoTransform())
reverse_transform = ~forward_transform
px, py = reverse_transform * (x, y)
px, py = int(px + 0.5), int(py + 0.5)
pixel_coord = px, py
data_array = np.array(data_source.GetRasterBand(1).ReadAsArray())
return data_array[pixel_coord[0]][pixel_coord[1]]
This gives me an out of bounds error as it's likely expecting x/y coordinates (e.g. retrieve_pixel_value([153.023499,-27.468968],dataset). I've also tried the following from here:
import rasterio
dat = rasterio.open(fname)
z = dat.read()[0]
def getval(lon, lat):
idx = dat.index(lon, lat, precision=1E-6)
return dat.xy(*idx), z[idx]
Is there a simple adjustment I can make so my function can query the mosaic in lat/long coords?
Much appreciated.
Driver: GTiff/GeoTIFF
Files: mosaic.tif
Size is 25000, 29460
Coordinate System is:
PROJCRS["GDA94 / MGA zone 56",
BASEGEOGCRS["GDA94",
DATUM["Geocentric Datum of Australia 1994",
ELLIPSOID["GRS 1980",6378137,298.257222101004,
LENGTHUNIT["metre",1]],
ID["EPSG",6283]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433,
ID["EPSG",9122]]]],
CONVERSION["UTM zone 56S",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",0,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",153,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",0.9996,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",500000,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",10000000,
LENGTHUNIT["metre",1],
ID["EPSG",8807]],
ID["EPSG",17056]],
CS[Cartesian,2],
AXIS["easting",east,
ORDER[1],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]],
AXIS["northing",north,
ORDER[2],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]]]
Data axis to CRS axis mapping: 1,2
Origin = (491000.000000000000000,6977000.000000000000000)
Pixel Size = (1.000000000000000,-1.000000000000000)
Metadata:
AREA_OR_POINT=Area
Image Structure Metadata:
INTERLEAVE=BAND
Corner Coordinates:
Upper Left ( 491000.000, 6977000.000) (152d54'32.48"E, 27d19'48.33"S)
Lower Left ( 491000.000, 6947540.000) (152d54'31.69"E, 27d35'45.80"S)
Upper Right ( 516000.000, 6977000.000) (153d 9'42.27"E, 27d19'48.10"S)
Lower Right ( 516000.000, 6947540.000) (153d 9'43.66"E, 27d35'45.57"S)
Center ( 503500.000, 6962270.000) (153d 2' 7.52"E, 27d27'47.16"S)
Band 1 Block=25000x1 Type=Float32, ColorInterp=Gray
NoData Value=-999
Update 1 - I tried the following:
tif = r"mosaic.tif"
dataset = rio.open(tif)
d = dataset.read()[0]
def get_xy_coords(latlng):
transformer = Transformer.from_crs("epsg:4326", dataset.crs)
coords = [transformer.transform(x, y) for x,y in latlng][0]
#idx = dataset.index(coords[1], coords[0])
return coords #.xy(*idx), z[idx]
longx,laty = 153.023499,-27.468968
coords = get_elevation([(laty,longx)])
print(coords[0],coords[1])
print(dataset.width,dataset.height)
(502321.11181384244, 6961618.891167777)
25000 29460
So something is still not right. Maybe I need to subtract the coordinates from the bottom left/right of image e.g.
coords[0]-dataset.bounds.left,coords[1]-dataset.bounds.bottom
where
In [78]: dataset.bounds
Out[78]: BoundingBox(left=491000.0, bottom=6947540.0, right=516000.0, top=6977000.0)
Update 2 - Indeed, subtracting the corners of my box seems to get closer.. though I'm sure there is a much nice way just using the tif metadata to get what I want.
longx,laty = 152.94646, -27.463175
coords = get_xy_coords([(laty,longx)])
elevation = d[int(coords[1]-dataset.bounds.bottom),int(coords[0]-dataset.bounds.left)]
fig,ax = plt.subplots(figsize=(12,12))
ax.imshow(d,vmin=0,vmax=400,cmap='terrain',extent=[dataset.bounds.left,dataset.bounds.right,dataset.bounds.bottom,dataset.bounds.top])
ax.plot(coords[0],coords[1],'ko')
plt.show()
You basically have two distinct steps:
Convert lon/lat coordinates to map coordinates, this is only necessary if your input raster is not already in lon/lat. Map coordinates are the coordinates in the projection that the raster itself uses
Convert the map coordinates to pixel coordinates.
There are all kinds of tool you might use, perhaps to make things simpler (like pyproj, rasterio etc). But for such a simple case it's probably nice to start with doing it all in GDAL, that probably also enhances your understanding of what steps are needed.
Inputs
from osgeo import gdal, osr
raster_file = r'D:\somefile.tif'
lon = 153.023499
lat = -27.468968
lon/lat to map coordinates
# fetch metadata required for transformation
ds = gdal.OpenEx(raster_file)
raster_proj = ds.GetProjection()
gt = ds.GetGeoTransform()
ds = None # close file, could also keep it open till after reading
# coordinate transformation (lon/lat to map)
# define source projection
# this definition ensures the order is always lon/lat compared
# to EPSG:4326 for which it depends on the GDAL version (2 vs 3)
source_srs = osr.SpatialReference()
source_srs.ImportFromWkt(osr.GetUserInputAsWKT("urn:ogc:def:crs:OGC:1.3:CRS84"))
# define target projection based on the file
target_srs = osr.SpatialReference()
target_srs.ImportFromWkt(raster_proj)
# convert
ct = osr.CoordinateTransformation(source_srs, target_srs)
mapx, mapy, *_ = ct.TransformPoint(lon, lat)
You could verify this intermediate result by for example adding it as Point WKT in something like QGIS (using the QuickWKT plugin, making sure the viewer has the same projection as the raster).
map coordinates to pixel
# apply affine transformation to get pixel coordinates
gt_inv = gdal.InvGeoTransform(gt) # invert for map -> pixel
px, py = gdal.ApplyGeoTransform(gt_inv, mapx, mapy)
# it wil return fractional pixel coordinates, so convert to int
# before using them to read. Round to nearest with +0.5
py = int(py + 0.5)
px = int(px + 0.5)
# read pixel data
ds = gdal.OpenEx(raster_file) # open file again
elevation_value = ds.ReadAsArray(px, py, 1, 1)
ds = None
The elevation_value variable should be the value you're after. I would definitelly verify the result independently, try a few points in QGIS or the gdallocationinfo utility:
gdallocationinfo -l_srs "urn:ogc:def:crs:OGC:1.3:CRS84" filename.tif 153.023499 -27.468968
# Report:
# Location: (4228P,4840L)
# Band 1:
# Value: 1804.51879882812
If you're reading a lot of points, there will be some threshold at which it would be faster to read a large chunk and extract the values from that array, compared to reading every point individually.
edit:
For applying the same workflow on multiple points at once a few things change.
So for example having the inputs:
lats = np.array([-27.468968, -27.468968, -27.468968])
lons = np.array([153.023499, 153.023499, 153.023499])
The coordinate transformation needs to use ct.TransformPoints instead of ct.TransformPoint which also requires the coordinates to be stacked in a single array of shape [n_points, 2]:
coords = np.stack([lons.ravel(), lats.ravel()], axis=1)
mapx, mapy, *_ = np.asarray(ct.TransformPoints(coords)).T
# reshape in case of non-1D inputs
mapx = mapx.reshape(lons.shape)
mapy = mapy.reshape(lons.shape)
Converting from map to pixel coordinates changes because the GDAL method for this only takes single point. But manually doing this on the arrays would be:
px = gt_inv[0] + mapx * gt_inv[1] + mapy * gt_inv[2]
py = gt_inv[3] + mapx * gt_inv[4] + mapy * gt_inv[5]
And rounding the arrays to integer changes to:
px = (px + 0.5).astype(np.int32)
py = (py + 0.5).astype(np.int32)
If the raster (easily) fits in memory, reading all points would become:
ds = gdal.OpenEx(raster_file)
all_elevation_data = ds.ReadAsArray()
ds = None
elevation_values = all_elevation_data[py, px]
That last step could be optimized by checking highest/lowest pixel coordinates in both dimensions and only read that subset for example, but it would require normalizing the coordinates again to be valid for that subset.
The py and px arrays might also need to be clipped (eg np.clip) if the input coordinates fall outside the raster. In that case the pixel coordinates will be < 0 or >= xsize/ysize.
Two sections of my code are giving me trouble, I am trying to get the basemap created in this first section here:
#Basemap
epsg = 6060; width = 2000.e3; height = 2000.e3 #epsg 3413. 6062
m=Basemap(epsg=epsg,resolution='l',width=width,height=height) #lat_ts=(90.+35.)/2.
m.drawcoastlines(color='white')
m.drawmapboundary(fill_color='#99ffff')
m.fillcontinents(color='#cc9966',lake_color='#99ffff')
m.drawparallels(np.arange(10,70,20),labels=[1,1,0,0])
m.drawmeridians(np.arange(-100,0,20),labels=[0,0,0,1])
plt.title('ICESAT2 Tracks in Greenland')
plt.figure(figsize=(20,10))
Then my next section is meant to plot the data its getting from a file, and plot these tracks on top of the Basemap. Instead, it creates a new plot entirely. I have tried rewording the secondary plt.scatter to match Basemap, such as m.scatter, m.plt, etc. But it only returns with “RuntimeError: Can not put single artist in more than one figure” when I do so.
Any ideas on how to get this next section of code onto the basemap? Here is the next section, focus on the end to see where it is plotting.
icesat2_data[track] = dict() # creates a sub-dictionary, track
icesat2_data[track][year+month+day] = dict() # and one layer more for the date under the whole icesat2_data dictionary
icesat2_data[track][year+month+day] = dict.fromkeys(lasers)
for laser in lasers: # for loop, access all the gt1l, 2l, 3l
if laser in f:
lat = f[laser]["land_ice_segments"]["latitude"][:] # data for a particular laser's latitude.
lon = f[laser]["land_ice_segments"]["longitude"][:] #data for a lasers longitude
height = f[laser]["land_ice_segments"]["h_li"][:] # data for a lasers height
quality = f[laser]["land_ice_segments"]["atl06_quality_summary"][:].astype('int')
# Quality filter
idx1 = quality == 0 # data dictionary to see what quality summary is
#print('idx1', idx1)
# Spatial filter
idx2 = np.logical_and( np.logical_and(lat>=lat_min, lat<=lat_max), np.logical_and(lon>=lon_min, lon<=lon_max) )
idx = np.where( np.logical_and(idx1, idx2) ) # combines index 1 and 2 from data quality filter. make sure not empty. if empty all data failed test (low quality or outside box)
icesat2_data[track][year+month+day][laser] = dict.fromkeys(['lat','lon','height']) #store data, creates empty dictionary of lists lat, lon, hi, those strings are the keys to the dict.
icesat2_data[track][year+month+day][laser]['lat'] = lat[idx] # grabbing only latitudes using that index of points with good data quality and within bounding box
icesat2_data[track][year+month+day][laser]['lon'] = lon[idx]
icesat2_data[track][year+month+day][laser]['height'] = height[idx]
if lat[idx].any() == True and lon[idx].any() == True:
x, y = transformer.transform(icesat2_data[track][year+month+day][laser]['lon'], \
icesat2_data[track][year+month+day][laser]['lat'])
plt.scatter(x, y, marker='o', color='#000000')
Currently, they output separately, like this:
Not sure if you're still working on this, but here's a quick example I put together that you might be able to work with (obviously I don't have the data you're working with). A couple things that might not be self-explanatory - I used m() to transform the coordinates to map coordinates. This is Basemap's built-in transformation method so you don't have to use PyProj. Also, setting a zorder in the scatter function ensures that your points are plotted above the countries layer and don't get hidden underneath.
#Basemap
epsg = 6060; width = 2000.e3; height = 2000.e3 #epsg 3413. 6062
plt.figure(figsize=(20,10))
m=Basemap(epsg=epsg,resolution='l',width=width,height=height) #lat_ts=(90.+35.)/2.
m.drawcoastlines(color='white')
m.drawmapboundary(fill_color='#99ffff')
m.fillcontinents(color='#cc9966',lake_color='#99ffff')
m.drawparallels(np.arange(10,70,20),labels=[1,1,0,0])
m.drawmeridians(np.arange(-100,0,20),labels=[0,0,0,1])
plt.title('ICESAT2 Tracks in Greenland')
for coord in [[68,-39],[70,-39]]:
lat = coord[0]
lon = coord[1]
x, y = m(lon,lat)
m.scatter(x,y,color='red',s=100,zorder=10)
plt.show()
I think you might need:
plt.figure(figsize(20,10))
before creating the basemap, not after. As it stands it's creating a map and then creating a new figure after that which is why you're getting two figures.
Then your plotting line should be m.scatter() as you mentioned you tried before.
I have a .dat file containing a list of coordinates (~100k) and a temperature at each coordinate. It has a structure like this:
-59.083 -26.583 0.2
-58.417 -26.250 0.6
-58.412 -26.417 0.4
...
To visually display the temperature ranges, I created a numpy array and plotted the datasets using the Basemap module for Python. The code I wrote is the following:
from matplotlib import pyplot as plt
from mpl_toolkits.basemap import Basemap
import numpy as np
m = Basemap(projection='mill',llcrnrlat=-90,urcrnrlat=90,\
llcrnrlon=-180,urcrnrlon=180,resolution='c')
m.drawcoastlines(linewidth=0.15)
data = np.loadtxt('gridly.dat')
xcoordlist = []
ycoordlist = []
tempvallist = []
for i in data:
xcoord = i[0]
ycoord = i[1]
tempval = i[2]
xcoord2 = xcoord*111139 #<--- Multiplying converts each coordinate's degrees to meters)
ycoord2 = ycoord*111139
xcoordlist.append(xcoord2)
ycoordlist.append(ycoord2)
tempvallist.append(tempval)
xco = np.array(xcoordlist)
yco = np.array(ycoordlist)
tval = np.array(tempvallist)
gridsize = 100
m.hexbin(yco, xco, C=tval, gridsize=gridsize)
cb = m.colorbar()
plt.show()
When I plot the data, I'm getting almost exactly what I want, however, the hexagonal heatmap is offset for some reason, giving me the following chart:
I've been searching online for what might be wrong but unfortunately couldn't find answers or troubleshoot. Does anyone know how I can fix this issue?
After hours of digging around, I finally figured it out! What was wrong with my code was that I was trying to manually convert the geographic coordinates into point coordinates for the displaying chart (by multiplying by 111139).
While the logic for doing this makes sense, I believe this process broke down when I began to plot the data onto different kinds of charts (i.e. orthogonal, miller projection etc.) because the different projections/charts will have different point coordinates (kind of like how the pixel locations on your computer screen may not align with the pixel locations on a different computer screen).
Instead, the Basemap module has a built-in function that will convert real-world coordinates into coordinates that can be plotted on the chart, for you: m(x, y).
So, the improved and correct script would be:
from matplotlib import pyplot as plt
from mpl_toolkits.basemap import Basemap
import numpy as np
m = Basemap(projection='mill',llcrnrlat=-90,urcrnrlat=90,\
llcrnrlon=-180,urcrnrlon=180,resolution='c')
m.drawcoastlines(linewidth=0.15)
data = np.loadtxt('gridly.dat')
xcoordlist = []
ycoordlist = []
tempvallist = []
for i in data:
lat = i[0]
lon = i[1]
tempval = i[2]
xpt, ypt = m(lon, lat)
xcoordlist.append(xpt)
ycoordlist.append(ypt)
tempvallist.append(tempval)
xco = np.array(xcoordlist)
yco = np.array(ycoordlist)
tval = np.array(tempvallist)
gridsize = 100
m.hexbin(xco, yco, C=tval, gridsize=gridsize)
cb = m.colorbar()
plt.show()
As you can see where it says xpt, ypt = m(lon, lat), the function converts the real world longitudes (lon) and latitudes (lat) from the .dat file into pottable points. Hope this helps anyone else that may have this problem in the future!
I would like to plot in 3D with Pandas / MatplotLib (Wireframe or other, I do not care) but in a specific way..
I'm using RFID sensors and I'm trying to record the signal I receive at different distance + different angles. And I want to see the correlation between the rising of the distance and the angle.
So that's why I want to plot in 3D :
X Axis -> the Distance, Y Axis -> the Angle, Z Axis -> the signal received which means a float
My CSV file from where I generate my DataFrame is organized like this a double entry table :
Distance;0;23;45;90;120;180
0;-53.145;-53.08;-53.1;-53.035;-53.035;-53.035
5;-53.145;-53.145;-53.05;-53.145;-53.145;-53.145
15;-53.145;-53.145;-53.145;-53.145;-53.145;-53.145
25;-53.145;-52.145;-53.145;-53.002;-53.145;-53.145
40;-53.145;-53.002;-51.145;-53.145;-54.255;-53.145
60;-53.145;-53.145;-53.145;-53.145;-53.145;-53.145
80;-53.145;-53.145;-53.145;-53.145;-60;-53.145
100;-53.145;-52;-53.145;-54;-53.145;-53.145
120;-53.145;-53.145;-53.145;-53.145;-53.002;-53.145
140;-51.754;-53.145;-51.845;-53.145;-53.145;-53.145
160;-53.145;-53.145;-49;-53.145;-53.145;-53.145
180;-53.145;-53.145;-53.145;-53.145;-53.145;-53.002
200;-53.145;-53.145;-53.145;-53.145;-53.145;-53.145
On the first label row we've different angles : 0°, 23°, 45°, ...
And the index of the DataFrame is the distance : 0 cm, 15 cm...
And the matrix inside represents the signal, so, values of Z Axis...
But I do not know how to generate a 3D Scatter, WireFrame... because in every tutorial I see people that use specific columns as axis.
Indeed, in my CSV file on the first row I've the label of all columns
Distance;0 ;23 ;45 ;90 ;120;180
And I do not know how to generate a 3D plot with a double entry table.
Do you know how to do it ? Or, to generate my CSV file in a better way to see the same result at the end !
I would be grateful if you would help me about this !
Thank you !
maybe contour is enough
b = np.array([0,5,15,25,40,60,80,100,120,140,160,180,200])
a = np.array([0,23,45,90,120,180])
x, y = np.meshgrid(a, b)
z = np.random.randint(-50,-40, (x.shape))
scm = plt.contourf(x, y, z, cmap='inferno')
plt.colorbar(scm)
plt.xticks(a)
plt.yticks(b)
plt.xlabel('Distance')
plt.ylabel('Angle')
plt.show()
displays
You can get a contour plot with something like this (but for the data shown it is not very interesting since all the values are constant at -45):
df = pd.read_csv(sep=';')
df = df.set_index('Distance')
x = df.index
y = df.columns.astype(int)
z = df.values
X,Y = np.meshgrid(x,y)
Z = z.T
plt.contourf(X,Y,Z,cmap='jet')
plt.colorbar()
plt.show()
Welcome to stackoverflow, your question can be split into several steps:
Step 1 - read the data
I have stored your data in a file called data.txt.
I don't know Pandas very well but this can also be handled with the nice simple function of Numpy called loadtxt. Your data is a bit problematic because of the text 'Distance' value in the first column and first row. But don't panic we load the file as a matrix of strings:
raw_data = np.loadtxt('data.txt', delimiter=';', dtype=np.string_)
Step 2 - transform the raw data
To extract the wanted data from the raw data we can do the following:
angle = raw_data[0 , 1:].astype(float)
distance = raw_data[1:, 0 ].astype(float)
data = raw_data[1:, 1:].astype(float)
With indexing the raw data we select the data that we want and with astype we change the string values to numbers.
Intermediate step - making the data a bit fancier
Your data was a bit boring, only the value -45, i took the liberty to make it a bit fancier:
data = (50 + angle[np.newaxis,:]) / (10 + np.sqrt(distance[:,np.newaxis]))
Step 4 - make a wireframe plot
The example at matplotlib.org looks easy enough:
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_wireframe(X, Y, Z)
plt.show()
But the trick is to get the X, Y, Z parameters right...
Step 3 - make the X and Y data
The Z data is simply our data values:
Z = data
The X and Y should also be 2D array's such that plot_wireframe can find the x and y for each value of Z in the 2D arrays X an Y at the same array locations. There is a Numpy function to create these 2D array's:
X, Y = np.meshgrid(angle, distance)
Step 5 - fancing it up a bit
ax.set_xticks(angle)
ax.set_yticks(distance[::2])
ax.set_xlabel('angle')
ax.set_ylabel('distance')
Putting it together
All steps together in the right order:
# necessary includes...
from mpl_toolkits.mplot3d import axes3d
import matplotlib.pyplot as plt
import numpy as np
raw_data = np.loadtxt('data.txt', delimiter=';', dtype=np.string_)
angle = raw_data[0 , 1:].astype(float)
distance = raw_data[1:, 0 ].astype(float)
data = raw_data[1:, 1:].astype(float)
# make the example data a bit more interesting...
data = (50 + angle[np.newaxis,:]) / (10 + np.sqrt(distance[:,np.newaxis]))
# setting up the plot
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
# the trickey part creating the data that plot_wireframe wants
Z = data
X, Y = np.meshgrid(angle, distance)
ax.plot_wireframe(X, Y, Z)
# fancing it up a bit
ax.set_xticks(angle)
ax.set_yticks(distance[::2])
ax.set_xlabel('angle')
ax.set_ylabel('distance')
# and showing the plot ...
plt.show()
I am currently working with BUFR files with wind data. When I read this file on python I get 4 large vectors, latitude vector, longitude vector, wind_direction vector, and wind_speed vector.
Both wind vectors are masked python arrays because there is non-valid data. This happens because the data comes from a non-geostationary satellite. In fact I successfully generated the following image from this BUFR file to show you the general shape that the data takes.
In this image I have plotted a color field to represent the wind speed, while the arrows obviously represent the wind direction.
Please notice the two bands of actual data. Unfortunately the way I am plotting the data, generates a third band (where the color field is smooth), in-between the actual data bands. This is an artefact of the function pcolormesh. If I could superimpose two `pcolormesh plots, each one representing one of the bands, this problem would disappear.
Unfortunately, I do not know how I could separate the data "regions". I have thought about clustering techniques but do not know how to cluster along latlon data using ANOTHER array (the wind data) as the clustering rule.
This is my current code:
#!/usr/bin/python
import bufr
import numpy as np
import sys
import matplotlib
matplotlib.use('Agg')
from matplotlib import pyplot as plt
from matplotlib import mlab
WIND_DIR_INDEX = 97
WIND_SPEED_INDEX = 96
bfrfile = sys.argv[1]
print bfrfile
bfr = bufr.BUFRFile(bfrfile)
lon = []
lat = []
wind_d = []
wind_s = []
for record in bfr:
for entry in record:
if entry.index == WIND_DIR_INDEX:
wind_d.append(entry.data)
if entry.index == WIND_SPEED_INDEX:
wind_s.append(entry.data)
if entry.name.find("LONGITUDE") == 0:
lon.append(entry.data)
if entry.name.find("LATITUDE") == 0:
lat.append(entry.data)
lons = np.concatenate(lon)
lats = np.concatenate(lat)
winds_d = np.concatenate(wind_d)
winds_s = np.concatenate(wind_s)
winds_d = np.ma.masked_greater(winds_d,1.0e+6)
winds_s = np.ma.masked_greater(winds_s,1.0e+6)
windu = np.cos((winds_d-180)*(np.pi/180))
windv = np.sin((winds_d-180)*(np.pi/180))
# Data interpolation for pcolormesh (needs gridded data)
xi = np.linspace(lons.min(),lons.max(),lons.size/10)
yi = np.linspace(lats.min(),lats.max(),lats.size/10)
Z = mlab.griddata(lons,lats,winds_s,xi,yi)
X,Y = np.meshgrid(xi,yi)
mydpi = 96
fig = plt.figure(frameon=True)
fig.set_size_inches(1600/mydpi,1200/mydpi)
ax = plt.Axes(fig,[0,0,1,1])
#ax.set_axis_off()
fig.add_axes(ax)
plt.hold(True);
plt.quiver(lons[::5],lats[::5],windu[::5],windv[::5],linewidths=0)
for method in (ax.set_xticks,ax.set_xticklabels,ax.set_yticks,ax.set_yticklabels):
method([])
fig.savefig('/home/cendas/bin/python/bufr_ascat.png',bbox_inches=0,dpi=5*mydpi)
mydpi = 96
fig = plt.figure(frameon=True)
fig.set_size_inches(1600/mydpi,1200/mydpi)
ax = plt.Axes(fig,[0,0,1,1])
#ax.set_axis_off()
fig.add_axes(ax)
plt.hold(True);
try:
plt.pcolormesh(X,Y,Z,alpha=None)
plt.clim(0,10)
except ValueError:
pass
print "Warning: Empty data array."
for method in (ax.set_xticks,ax.set_xticklabels,ax.set_yticks,ax.set_yticklabels):
method([])
fig.savefig('/home/cendas/bin/python/bufr_ascat_color.png',bbox_inches=0,dpi=5*mydpi)
I then usually follow this python code with the following terminal commands to combine the images:
convert bufr_ascat.png -transparent white bufr_ascat.png
convert bufr_ascat_color.png -transparent white bufr_ascat_color.png
composite bufr_ascat.png bufr_ascat_color.png bufrascat.png
Don't abuse clustering for this.
What you need is a simple selection / filtering; not a structure discovery process.
Choose the mean of the masked data. All non-masked data left of that mean is the left part, all non-masked data on the right is the other?
Clustering is the wrong tool for this task.