I'm an new one in python and plotting data with Matplotlib. I really need help and thank you in advance for the answers.
So, I have a netCDF file with v-component of wind data. Grid coordinates: points=9600 (240x40)
lon : 0 to 358.5 by 1.5 degrees_east circular
lat : 88.5 to 30 by -1.5 degrees_north
My code is:
import numpy as np
import matplotlib
matplotlib.use('Agg')
from netCDF4 import Dataset
from matplotlib.mlab import griddata
from matplotlib import pyplot as plt
from mpl_toolkits.basemap import Basemap
#read data from NETcdf file ".nc"
my_file = '/home/Era-Interim/NH-EraInt-1979.nc'
fh = Dataset(my_file, mode='r')
lons = fh.variables['lon'][:]
lats = fh.variables['lat'][:]
V = fh.variables['V'][:]
V_units = fh.variables['V'].units
fh.close()
# create figure
fig = plt.figure(figsize=(20,20))
# create a map
m = Basemap(projection='nplaea',boundinglat=30,lon_0=10,resolution='l',round=True)
#draw parallels, meridians, coastlines, countries, mapboundary
m.drawcoastlines(linewidth=0.5)
m.drawcountries(linewidth=0.5)
#m.drawmapboundary(linewidth=2)
m.drawparallels(np.arange(30,90,20), labels=[1,1,0,0]) #paral in 10 degree, right, left
m.drawmeridians(np.arange(0,360,30), labels=[1,1,1,1]) #merid in 10 degree, bottom
#Plot the data on top of the map
lon,lat = np.meshgrid(lons,lats)
x,y = m(lon,lat)
cs = m.pcolor(x,y,np.squeeze(V),cmap=plt.cm.RdBu_r)
plt.title("", fontsize=25, verticalalignment='baseline')
plt.savefig("/home/Era-Interim/1.png")
As a result, I received a map (you can find in my dropbox folder) https://www.dropbox.com/sh/nvy8wcodk9jtat0/AAC-omkPP8_7uINSSXbzImeja?dl=0
On the map, there are white pixels between 358.5 and 0 (360) lon, because I have no data between 358.5 and 0 (360) lon.
The question is: how can I change the size of the grid, regrid it, interpolate data, or something else in order to not have this white sector?
I have found a solution. At the beginning of the script, you must add
from mpl_toolkits.basemap import Basemap, addcyclic
and further
datain, lonsin = addcyclic(np.squeeze(Q), lons)
lons, Q = m.shiftdata(lonsin, datain = np.squeeze(Q), lon_0=180.)
print lons
lon, lat = np.meshgrid(lons, lats)
x,y = m(lon, lat)
cs = m.pcolor(x,y,datain,cmap=plt.cm.RdBu_r)
The difference can be seen in the figures (I still can not post images).
https://www.dropbox.com/sh/nvy8wcodk9jtat0/AAC-omkPP8_7uINSSXbzImeja?dl=0
I think in this case some kind of interpolation techniques can be applied.
Check this out. There was similar problem.
Hope it is useful.
The simple answer is 360 degrees is 0 degrees, so you can copy the 0 degrees data and it should look right. I may be interpreting this wrong though, as I believe that the data is representing the pressure levels at each of the points, not between the two points (i.e. at zero degrees, not between zero degrees and 1.5 degrees).
My interpretation means that, yes, you don't have data between 358.5 and 0, but you also don't have data between 357 and 358.5. This seems more likely than just skipping an area. This would mean that the data from 358.5 should be touching the data from 0 as it is just as far away as 0 is from 1.5 which is touching.
Copying the last bit would grant you the ability to change your m.pcolor call to an imshow call (as in Roman Dryndik's link) and use interpolation to smooth out the graph.
Related
I have a .dat file containing a list of coordinates (~100k) and a temperature at each coordinate. It has a structure like this:
-59.083 -26.583 0.2
-58.417 -26.250 0.6
-58.412 -26.417 0.4
...
To visually display the temperature ranges, I created a numpy array and plotted the datasets using the Basemap module for Python. The code I wrote is the following:
from matplotlib import pyplot as plt
from mpl_toolkits.basemap import Basemap
import numpy as np
m = Basemap(projection='mill',llcrnrlat=-90,urcrnrlat=90,\
llcrnrlon=-180,urcrnrlon=180,resolution='c')
m.drawcoastlines(linewidth=0.15)
data = np.loadtxt('gridly.dat')
xcoordlist = []
ycoordlist = []
tempvallist = []
for i in data:
xcoord = i[0]
ycoord = i[1]
tempval = i[2]
xcoord2 = xcoord*111139 #<--- Multiplying converts each coordinate's degrees to meters)
ycoord2 = ycoord*111139
xcoordlist.append(xcoord2)
ycoordlist.append(ycoord2)
tempvallist.append(tempval)
xco = np.array(xcoordlist)
yco = np.array(ycoordlist)
tval = np.array(tempvallist)
gridsize = 100
m.hexbin(yco, xco, C=tval, gridsize=gridsize)
cb = m.colorbar()
plt.show()
When I plot the data, I'm getting almost exactly what I want, however, the hexagonal heatmap is offset for some reason, giving me the following chart:
I've been searching online for what might be wrong but unfortunately couldn't find answers or troubleshoot. Does anyone know how I can fix this issue?
After hours of digging around, I finally figured it out! What was wrong with my code was that I was trying to manually convert the geographic coordinates into point coordinates for the displaying chart (by multiplying by 111139).
While the logic for doing this makes sense, I believe this process broke down when I began to plot the data onto different kinds of charts (i.e. orthogonal, miller projection etc.) because the different projections/charts will have different point coordinates (kind of like how the pixel locations on your computer screen may not align with the pixel locations on a different computer screen).
Instead, the Basemap module has a built-in function that will convert real-world coordinates into coordinates that can be plotted on the chart, for you: m(x, y).
So, the improved and correct script would be:
from matplotlib import pyplot as plt
from mpl_toolkits.basemap import Basemap
import numpy as np
m = Basemap(projection='mill',llcrnrlat=-90,urcrnrlat=90,\
llcrnrlon=-180,urcrnrlon=180,resolution='c')
m.drawcoastlines(linewidth=0.15)
data = np.loadtxt('gridly.dat')
xcoordlist = []
ycoordlist = []
tempvallist = []
for i in data:
lat = i[0]
lon = i[1]
tempval = i[2]
xpt, ypt = m(lon, lat)
xcoordlist.append(xpt)
ycoordlist.append(ypt)
tempvallist.append(tempval)
xco = np.array(xcoordlist)
yco = np.array(ycoordlist)
tval = np.array(tempvallist)
gridsize = 100
m.hexbin(xco, yco, C=tval, gridsize=gridsize)
cb = m.colorbar()
plt.show()
As you can see where it says xpt, ypt = m(lon, lat), the function converts the real world longitudes (lon) and latitudes (lat) from the .dat file into pottable points. Hope this helps anyone else that may have this problem in the future!
I am new to using python for scientific data so apologies in advance if anything is unclear. I have a netCDF4 file with multiple variables including latitude, longitude and density. I am trying to plot the variable density on a matplotlib basemap using only density values from coordinates between 35-40 N and 100-110 W.
import numpy as np
import netCDF4 as nc
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
in: f = nc.Dataset('C:\\Users\\mdc\\data\\density.nc', 'r')
in: f.variables['latitude'].shape
out:(120000,)
(the variables longitude and density have the same shape)
I am stuck trying to find a way to extract only the latitude and longitude coordinate pairs (and their associated density values) that fit the criteria of [35 < lat < 40 & -110 < lon < -100]. Any advice on how to do this would be appreciated.
I have tried extracting each of the relevant variables and compiling them into a 2d-array but I have not figured out how to select only the data I need.
lats = f.variables['latitude'][:]
lons = f.variables['longitude'][:]
dens = f.variables['density'][:]
combined = np.vstack((lats,lons,dens))
in: combined
out: array([[ -4.14770737e+01, -3.89834557e+01, -3.86000137e+01, ...,
4.34283943e+01, 4.37634315e+01, 4.40338402e+01],
[ 1.75510895e+02, 1.74857147e+02, 1.74742798e+02, ...,
7.83558655e+01, 7.81687775e+01, 7.80410919e+01],
[ 7.79418945e-02, 7.38342285e-01, 9.94934082e-01, ...,
5.60119629e-01, -1.60522461e-02, 5.52429199e-01]], dtype=float32)
As for plotting I am trying to plot the coordinate pairs by different colors, rather than sizes, according to their density value.
m = Basemap(projection='robin', resolution='i', lat_0 = 37, lon_0 = -105)
m.drawcoastlines()
for lats,lons,dens in zip(lats,lons,dens):
x,y = m(lats,lons)
size = dens*3
m.plot(x,y, 'r', markersize=size)
plt.show()
The data selection, using pandas (can't install netCDF here, sorry, and pandas is satisfactory):
import pandas as pd
tinyd = pd.DataFrame(np.array(
[[ -4.14770737e+01, -3.89834557e+01, -3.86000137e+01,
4.34283943e+01, 4.37634315e+01, 4.40338402e+01],
[ 1.75510895e+02, 1.74857147e+02, 1.74742798e+02,
7.83558655e+01, 7.81687775e+01, 7.80410919e+01],
[ 7.79418945e-02, 7.38342285e-01, 9.94934082e-01,
5.60119629e-01, -1.60522461e-02, 5.52429199e-01]]).T,
columns=['lat','lon','den'])
mask = (tinyd.lat > -39) & (tinyd.lat < 44) & \
(tinyd.lon > 80) & (tinyd.lon < 175)
toplot = tinyd[mask]
print(toplot)
lat lon den
1 -38.983456 174.857147 0.738342
2 -38.600014 174.742798 0.994934
plt.scatter(toplot.lat, toplot.lon, s=90, c=toplot.den)
plt.colorbar()
plotting on top of Basemap is the same, and you can specify a different colormap, etc.
I want to draw around 7000 points in a map of Germany. I am interested in the points in Germany, the other points are not so interesting. How can this be made better so that you can see more?
The best thing would be a fullscreen plot (plot horizontal instead of vertical), and the points need to be smaller. Also it would be nice if there were the substates of Germany. But I don't know how this works.
Here's an image of what it looks like right now.
This is the code. There are only some sample points in it, the real points are retrieved from a file. But this shows the basic code.
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
import numpy as np
plt.figure(1)
map = Basemap(projection='merc',
resolution='l',
llcrnrlat=44.0,
llcrnrlon=5.0,
urcrnrlat=57.0,
urcrnrlon=17)
map.drawcoastlines()
map.drawcountries()
map.fillcontinents(color='lightgray')
map.drawmapboundary()
long1 = np.array([ 13.404954, 11.581981, 9.993682, 8.682127, 6.960279,
6.773456, 9.182932, 12.373075, 13.737262, 11.07675 ,
7.465298, 7.011555, 12.099147, 9.73201 , 7.628279,
8.801694, 10.52677 , 8.466039, 8.239761, 10.89779 ,
8.403653, 8.532471, 7.098207, 7.216236, 9.987608,
7.626135, 11.627624, 6.852038, 10.686559, 8.047179,
8.247253, 6.083887, 7.588996, 9.953355, 10.122765])
lat1 = np.array([ 52.520007, 48.135125, 53.551085, 50.110922, 50.937531,
51.227741, 48.775846, 51.339695, 51.050409, 49.45203 ,
51.513587, 51.455643, 54.092441, 52.375892, 51.36591 ,
53.079296, 52.268874, 49.487459, 50.078218, 48.370545,
49.00689 , 52.030228, 50.73743 , 51.481845, 48.401082,
51.960665, 52.120533, 51.47512 , 53.865467, 52.279911,
49.992862, 50.775346, 50.356943, 49.791304, 54.323293])
x, y = map(long1, lat1)
map.plot(x,y,'o')
plt.show()
Those are several questions and it would be better to split them up.
In the mean time, Evan Mosseri already answered the question about the markersize. An alternative would be to simply use the dot-marker, as I'll show. He also showed how to maximize the figure, I'll use an alternative whereby the size of the figure is just predefined.
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
fig = plt.figure(figsize=(20,10)) # predefined figure size, change to your liking.
# But doesn't matter if you save to any vector graphics format though (e.g. pdf)
ax = fig.add_axes([0.05,0.05,0.9,0.85])
# These coordinates form the bounding box of Germany
bot, top, left, right = 5.87, 15.04, 47.26, 55.06 # just to zoom in to only Germany
map = Basemap(projection='merc', resolution='l',
llcrnrlat=left,
llcrnrlon=bot,
urcrnrlat=right,
urcrnrlon=top)
map.readshapefile('./DEU_adm/DEU_adm1', 'adm_1', drawbounds=True) # plots the state boundaries, read explanation below code
map.drawcoastlines()
map.fillcontinents(color='lightgray')
long1 = np.array([ 13.404954, 11.581981, 9.993682, 8.682127, 6.960279,
6.773456, 9.182932, 12.373075, 13.737262, 11.07675 ,
7.465298, 7.011555, 12.099147, 9.73201 , 7.628279,
8.801694, 10.52677 , 8.466039, 8.239761, 10.89779 ,
8.403653, 8.532471, 7.098207, 7.216236, 9.987608,
7.626135, 11.627624, 6.852038, 10.686559, 8.047179,
8.247253, 6.083887, 7.588996, 9.953355, 10.122765])
lat1 = np.array([ 52.520007, 48.135125, 53.551085, 50.110922, 50.937531,
51.227741, 48.775846, 51.339695, 51.050409, 49.45203 ,
51.513587, 51.455643, 54.092441, 52.375892, 51.36591 ,
53.079296, 52.268874, 49.487459, 50.078218, 48.370545,
49.00689 , 52.030228, 50.73743 , 51.481845, 48.401082,
51.960665, 52.120533, 51.47512 , 53.865467, 52.279911,
49.992862, 50.775346, 50.356943, 49.791304, 54.323293])
x, y = map(long1, lat1)
map.plot(x,y,'.') # Use the dot-marker or use a different marker, but specify the `markersize`.
The data that is at the basis for the states is obtained from a shapefile. These can be obtained from e.g. Global Administrative Areas (the ones from this website can can be used for non-commercial purposes only)
That'll result in:
.
As for the final question: if you have coordinates in the arrays lat and long that are not within Germany, you'll have to filter them out. One way in which you could do this is to use the geocoder module, pass in the (lat, lon) and check if the returned result contains the dictionary key-value pair "country": "Germany".
if I understand what you are trying to do correctly, it should be as simple as: map.plot(x,y,'o',markersize=2)
or whatever markersize you want
also add this before plt.show():
mng = plt.get_current_fig_manager()
mng.frame.Maximize(True)
I am currently working with BUFR files with wind data. When I read this file on python I get 4 large vectors, latitude vector, longitude vector, wind_direction vector, and wind_speed vector.
Both wind vectors are masked python arrays because there is non-valid data. This happens because the data comes from a non-geostationary satellite. In fact I successfully generated the following image from this BUFR file to show you the general shape that the data takes.
In this image I have plotted a color field to represent the wind speed, while the arrows obviously represent the wind direction.
Please notice the two bands of actual data. Unfortunately the way I am plotting the data, generates a third band (where the color field is smooth), in-between the actual data bands. This is an artefact of the function pcolormesh. If I could superimpose two `pcolormesh plots, each one representing one of the bands, this problem would disappear.
Unfortunately, I do not know how I could separate the data "regions". I have thought about clustering techniques but do not know how to cluster along latlon data using ANOTHER array (the wind data) as the clustering rule.
This is my current code:
#!/usr/bin/python
import bufr
import numpy as np
import sys
import matplotlib
matplotlib.use('Agg')
from matplotlib import pyplot as plt
from matplotlib import mlab
WIND_DIR_INDEX = 97
WIND_SPEED_INDEX = 96
bfrfile = sys.argv[1]
print bfrfile
bfr = bufr.BUFRFile(bfrfile)
lon = []
lat = []
wind_d = []
wind_s = []
for record in bfr:
for entry in record:
if entry.index == WIND_DIR_INDEX:
wind_d.append(entry.data)
if entry.index == WIND_SPEED_INDEX:
wind_s.append(entry.data)
if entry.name.find("LONGITUDE") == 0:
lon.append(entry.data)
if entry.name.find("LATITUDE") == 0:
lat.append(entry.data)
lons = np.concatenate(lon)
lats = np.concatenate(lat)
winds_d = np.concatenate(wind_d)
winds_s = np.concatenate(wind_s)
winds_d = np.ma.masked_greater(winds_d,1.0e+6)
winds_s = np.ma.masked_greater(winds_s,1.0e+6)
windu = np.cos((winds_d-180)*(np.pi/180))
windv = np.sin((winds_d-180)*(np.pi/180))
# Data interpolation for pcolormesh (needs gridded data)
xi = np.linspace(lons.min(),lons.max(),lons.size/10)
yi = np.linspace(lats.min(),lats.max(),lats.size/10)
Z = mlab.griddata(lons,lats,winds_s,xi,yi)
X,Y = np.meshgrid(xi,yi)
mydpi = 96
fig = plt.figure(frameon=True)
fig.set_size_inches(1600/mydpi,1200/mydpi)
ax = plt.Axes(fig,[0,0,1,1])
#ax.set_axis_off()
fig.add_axes(ax)
plt.hold(True);
plt.quiver(lons[::5],lats[::5],windu[::5],windv[::5],linewidths=0)
for method in (ax.set_xticks,ax.set_xticklabels,ax.set_yticks,ax.set_yticklabels):
method([])
fig.savefig('/home/cendas/bin/python/bufr_ascat.png',bbox_inches=0,dpi=5*mydpi)
mydpi = 96
fig = plt.figure(frameon=True)
fig.set_size_inches(1600/mydpi,1200/mydpi)
ax = plt.Axes(fig,[0,0,1,1])
#ax.set_axis_off()
fig.add_axes(ax)
plt.hold(True);
try:
plt.pcolormesh(X,Y,Z,alpha=None)
plt.clim(0,10)
except ValueError:
pass
print "Warning: Empty data array."
for method in (ax.set_xticks,ax.set_xticklabels,ax.set_yticks,ax.set_yticklabels):
method([])
fig.savefig('/home/cendas/bin/python/bufr_ascat_color.png',bbox_inches=0,dpi=5*mydpi)
I then usually follow this python code with the following terminal commands to combine the images:
convert bufr_ascat.png -transparent white bufr_ascat.png
convert bufr_ascat_color.png -transparent white bufr_ascat_color.png
composite bufr_ascat.png bufr_ascat_color.png bufrascat.png
Don't abuse clustering for this.
What you need is a simple selection / filtering; not a structure discovery process.
Choose the mean of the masked data. All non-masked data left of that mean is the left part, all non-masked data on the right is the other?
Clustering is the wrong tool for this task.
I am trying to make a density "all-sky" plot which is complete in RA (i.e 0 to 360 deg) but incomplete in DEC (let's say from -45 to 90 deg). If I plot this without any projection it is ok, but when I try to plot using the 'mollweide' projection I am not recovering the input, but if I do a little change in the code I do recover the expected behavior (however, I don't have a coherent explanation for this change as you'll see in the example).
Let's see a self-contained example with its outputs to be clearer:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.backends.backend_agg
from math import pi
#array between 0 and 360 deg
RA = np.random.random(10000)*360
#array between -45 and 90 degrees. By construction!
DEC= np.random.random(10000)*135-45
fig = plt.Figure((10, 4.5))
ax = fig.add_subplot(111,projection='mollweide')
ax.grid(True)
ax.set_xlabel('RA')
ax.set_ylabel('DEC')
ax.set_xticklabels(np.arange(30,331,30))
hist,xedges,yedges = np.histogram2d(DEC,RA,bins=[90,180],range=[[-90,90],[0,360]])
#TO RECOVER THE EXPECTED BEHAVIOUR, I HAVE TO CHANGE -90 FOR -80 IN THE PREVIOUS LINE:
#hist,xedges,yedges = np.histogram2d(DEC,RA,bins=[90,180],range=[[-80,90],[0,360]])
#I DO NOT WHY!
extent = (-pi,pi,-pi/2.,pi/2.)
image = ax.imshow(hist,extent=extent,clip_on=False,aspect=0.5,origin='lower')
cb = fig.colorbar(image, orientation='horizontal')
canvas = matplotlib.backends.backend_agg.FigureCanvasAgg(fig)
fig.canvas.print_figure("image1.png")
And the output image is:
[As I am new here I am not allowed to post images, so I will post a link, if it does not work, please write me an email and I can share a Dropbox folder with the images ;)]
Output Image that I am getting
Where you can see clearly that the RA is OK so it ranges between 0 and 360, BUT the DEC ranges from -35 to 90 instead of -45 to 90. So far I do not understand why I am missing 10 deg.
However, if I do a little change in the code, replacing the line
hist,xedges,yedges = np.histogram2d(DEC,RA,bins=[90,180],range=[[-90,90],[0,360]]
for
hist,xedges,yedges = np.histogram2d(DEC,RA,bins=[90,180],range=[[-80,90],[0,360]]
I get what I think I should get, which is this plot:
Output Image 2
[Again, if the link does not work, let me know and I can share a Dropbox folder with you]
where DEC now ranges from -45 to 90 as expected because I created DEC in that way.
However the change of -90 for -80 doesn't make sense (I think).
So probably I am doing something wrong that I can't notice now, or I am misunderstanding something in the code or there is a curious bug in matplotlib??
Please any help/hint/correction would be greatly appreciate it
Eduardo
if you don't mind depending on an external package, you could do this with healpy, that provides a Mollweide projection for the Healpix sky pixellization:
https://github.com/healpy/healpy
See an example similar to your script here:
https://gist.github.com/1215159
More info about healpix:
http://healpix.jpl.nasa.gov/html/intro.htm
Ouput image:
If this is useful for someone else, this is the "corrected version" of my code, which gives as output this image. The main change is to use pcolormesh instead of imshow (as #Joe suggested):
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.backends.backend_agg
#array between 0 and 360 deg
#CAVEAT: it seems that is needed an array from -180 to 180, so is just a
#shift in the coordinates
RA = np.random.random(10000)*360-180
#array between -45 and 90 degrees
DEC= np.random.random(10000)*135-45
fig = plt.Figure((10, 5))
ax = fig.add_subplot(111,projection='mollweide')
ax.set_xlabel('RA')
ax.set_ylabel('DEC')
ax.set_xticklabels(np.arange(30,331,30))
hist,xedges,yedges = np.histogram2d(DEC,RA,bins=[60,40],range=[[-90,90],[-180,180]])
X,Y = np.meshgrid(np.radians(yedges),np.radians(xedges))
image = ax.pcolormesh(X,Y,hist)
ax.grid(True)
cb = fig.colorbar(image, orientation='horizontal')
canvas = matplotlib.backends.backend_agg.FigureCanvasAgg(fig)
fig.canvas.print_figure("image4.png")