The full code (excluding the path finding algorithm) I am about to describe can be found on Code Review.
I am reading in 10 co-ordinates from a text file in Python. I then proceed to pass the latitude and longitude co-ordinates to a function which prints the points as follows.
def read_two_column_file(file_name):
with open(file_name, 'r') as f_input:
csv_input = csv.reader(f_input, delimiter=' ', skipinitialspace=True, )
long = []
lat = []
for col in csv_input:
x = float(col[0]) # converting to float
y = float(col[1])
long.append(x)
lat.append(y)
return long, lat
def display_points(long, lat):
plt.figure()
plt.gca().set_aspect('equal', adjustable='box')
plt.ylabel('latitude')
plt.xlabel('longitude')
plt.title('longitude vs latitude')
plt.scatter(lat, long)
plt.orientation = u'vertical'
plt.grid('True')
plt.show()
Sample Input:
35.905333, 14.471970
35.896389, 14.477780
35.901281, 14.518173
35.860491, 14.572245
35.807607, 14.535320
35.832267, 14.455894
35.882414, 14.373217
35.983794, 14.336096
35.974463, 14.351006
35.930951, 14.401137
Plot:
This plots points on a map, and the idea is to find the shortest possible route from a starting point to an end point. Forgetting about the algorithm which does so, let us say I get an output representing the route as:
[2, 1, 0, 9, 8, 7, 6, 5, 4, 3, 2]
How can I translate these nodes back to the co-ordinates they are representing in order to connect them on Matplotlib?
Transform your latitude and longitude into numpy arrays:
long = np.array(long)
lat = np.array(lat)
I would advise to do it in read_two_column_file directly.
Then if the path is in the variable path, you can do directly:
plt.plot(long[path], lat[path])
Related
long =np.array(data.Longitude)
lat = np.array(data.Latitude)
coordinates = np.array(385)
for i in range(385):
coordinates[i] = np.array([lat[i], long[i]])
#x, y = kmeans2(whiten(coordinates), 3, iter = 20)
#plt.scatter(coordinates[:,0], coordinates[:,1], c=y);
#plt.show()
I have a dataset with two columns and I wish to merge the latitude and longitude term by term to apply k means clustering after that.Please help with the array part
coordinates = np.array([lat, long])
or am I missing sth here...
I am plotting weather model precipitation data on a map. The map is a contour fill. Over that data I would like to plot text at specific grid points that tells the value of the precipitation at the point in time, however I am struggling to do this. I have a list of lat/lon points but cannot figure out how to map it properly to the data I have. The data itself comes with it's own set of lat/lon points that uniquely map to the data. Here's example code that I have:
grib = 'gfs.t00z.pgrb2.0p25.f084'
grbs = pygrib.open(grib)
grb = grbs.select(name='Total Precipitation',typeOfLevel='surface')[0]
precip = grb.values *.039370
lat,lon = grb.latlons()
x, y = m(lon,lat)
with open('mydata') as f:
for line in f:
myline = line.replace("\n", "")
myline = myline.split(",")
uniquepoints.append(myline) # contains specific lat, lon points
intervals = [0.0,0.01,0.1,0.25,0.5,0.75,1.00,1.25,1.50,1.75,2.0,2.50,3.00]
m=Basemap(projection='lcc',lon_0 = -95.,llcrnrlon=-125.,
urcrnrlon=-55.,llcrnrlat=20.,urcrnrlat=50., lat_1=25.,lat_2=46., resolution='l',area_thresh=10000,ax=ax)
obsobj = m.contourf(x,y,precip,intervals,cmap=plt.cm.jet)
I know you are suppose to use plt.text, but I can't configure it correctly so that the unique points map correctly to the precip data.
It can be easy to make simple mistakes with placing points on Basemap as you have to convert between lat,long,height and the windows coordinate system in x,y. However, if your data is plotting at the correct points on the map, you can use those positions to plot your labels at those exact points with some specified offset.
A pythonic example of how to do this:
lats = [x['LLHPosition'][0] for x in unit_data]
lons = [x['LLHPosition'][1] for x in unit_data]
x,y = m(lons,lats)
label_text = [x['UnitName'] for x in unit_data]
x_offsets = [2000] * len(unit_data)
y_offsets = [2000] * len(unit_data)
for label, xpt, ypt, x_offset, y_offset in zip(label_text, x, y, x_offsets, y_offsets):
plt.text(xpt+x_offset, ypt+y_offset, label)
I'm fairly new to python and have found stack overflow one of the best resources out there, now I'm hoping someone can help me with what I believe is a fairly basic question.
I'm looking to create a land mask from a list of lats and lons and rainfall data extracted from a netCDF file. I need to get the data from the netcdf file to line up so I can remove rows which have a rainfall value of '-9999.' (indicating no data because its over the ocean). I can access the file, I can create a mesh grid, but when it comes to inserting the rainfall data for the final check I'm getting odd shapes and no luck with the logical test. Can someone have a look at this code and let me know what you think?
from netCDF4 import Dataset
import numpy as np
f=Dataset('/Testing/Ensemble_grid/1970_2012_eMAST_ANUClimate_mon_evap_v1m0_197001.nc')
lat = f.variables['latitude'][:]
lon = f.variables['longitude'][:]
rainfall = np.array(f.variables['lwe_thickness_of_precipitation_amount'])
lons, lats = np.meshgrid(lon,lat)
full_ary = np.array((lats,lons))
full_lats_lons = np.swapaxes(full_ary,0,2)
rain_data = np.squeeze(rainfall,axis=(0,))
grid = np.array((full_lats_lons,rain_data))
full_grid = np.expand_dims(grid,axis=1)
full_grid_col = np.swapaxes(full_grid,0,1)
land_grid = np.logical_not(full_grid_col[:,1]==-9999.)
Here is an alternative method that simply creates a new 2D variable, landmask, where each grid cell is either 0 (ocean) or 1 (land). (I like to use 1 and 0 landmasks because you can transform it into a boolean numpy array and do quick land-averages this way.)
import netCDF4
import numpy as np
ncfile = netCDF4.('/path/to/your/ncfile.nc', 'r')
lat = ncfile.variables['lat'][:]
lon = ncfile.variables['lon'][:]
# Presuming here that rainfall is 2D, if not, just read in the first time step, i.e. [0,:,:]
rain = ncfile.variables['lwe_thickness_of_precipitation_amount'][:,:]
ncfile.close()
nlat, nlon = len(lat), len(lon)
# Populate a 2D landmask array, where 1=land and 0=ocean
landmask = np.zeros([nlat, nlon], dtype='int')
for y in range(nlat):
for x in range(nlon):
if rain[y,x]!=-9999: # We're at a land point
landmask[y,x] = 1
# Now you can write out the landmask into a new netCDF file
filename_out = './landmask.nc'
ncfile_out = netCDF4.Dataset(filename_out, 'w')
ncfile_out.createDimension('lat', nlat)
ncfile_out.createDimension('lon', nlon)
lat_out = ncfile_out.createVariable('lat', 'f4', ('lat',))
lon_out = ncfile_out.createVariable('lon', 'f4', ('lon',))
landmask_out = ncfile_out.createVariable('landmask', 'i', ('lat', 'lon',))
setattr(lat_out, 'units', 'degrees_north')
setattr(lon_out, 'units', 'degrees_east')
setattr(landmask_out, 'description', '1=land 0=ocean')
lat_out[:] = lat
lon_out[:] = lon
landmask_out[:,:] = landmask[:,:]
ncfile_out.close()
Ian, you need to put a repeatable example up here...
I suspect what you need is something like this;
x = np.array([[1, 2, 3], [4, 5, 6]], np.int32)
x.flat
I have the following script which reads the ascii file of two columns and generates 1D plot. The graph has several peaks. What I want is to give all the peak a number like first peak 1, second peak 2 and so on. The peaks appear in an equidistant position in X axis. Can someone tell me how to do that in python. The code-
from pylab import*
# Read the file.
f2 = open('d012_SAXS-recomb.txt', 'r')
# read the whole file into a single variable, which is a list of every row of the file.
lines = f2.readlines()[2:-100]
f2.close()
# initialize some variable to be lists:
x1 = []
y1 = []
# scan the rows of the file stored in lines, and put the values into some variables:
for line in lines:
p = line.split()
x1.append(float(p[0]))
y1.append(float(p[1]))
x = np.array(x1)
y = np.array(y1)
xlim(0.0,4.0)
# now, plot the data:
#subplot(211)
plt.plot(x, y, color='orange',linewidth=2.0, linestyle='-', label='Arabic - LPP''\nRoman - SPP''\nAsterisk - CHOL')
legend(loc='upper right')
xlabel('q')
ylabel('Intensity')
plt.show()
Here's some example code that finds the first (highest) peak. (BTW, I'm using pylab here, so the plot and numpy modules are already imported).
x = linspace(0,10,501)
y = exp(-0.2*x)*sin(x)
k = y.argmax()
plot(x,y)
text(x[k],y[k],'Peak1')
Try that to get started.
I need to compare some theoretical data with real data in python.
The theoretical data comes from resolving an equation.
To improve the comparative I would like to remove data points that fall far from the theoretical curve. I mean, I want to remove the points below and above red dashed lines in the figure (made with matplotlib).
Both the theoretical curves and the data points are arrays of different length.
I can try to remove the points in a roughly-eye way, for example: the first upper point can be detected using:
data2[(data2.redshift<0.4)&data2.dmodulus>1]
rec.array([('1997o', 0.374, 1.0203223485103787, 0.44354759972859786)], dtype=[('SN_name', '|S10'), ('redshift', '<f8'), ('dmodulus', '<f8'), ('dmodulus_error', '<f8')])
But I would like to use a less roughly-eye way.
So, can anyone help me finding an easy way of removing the problematic points?
Thank you!
This might be overkill and is based on your comment
Both the theoretical curves and the data points are arrays of
different length.
I would do the following:
Truncate the data set so that its x values lie within the max and min values of the theoretical set.
Interpolate the theoretical curve using scipy.interpolate.interp1d and the above truncated data x values. The reason for step (1) is to satisfy the constraints of interp1d.
Use numpy.where to find data y values that are out side the range of acceptable theory values.
DONT discard these values, as was suggested in comments and other answers. If you want for clarity, point them out by plotting the 'inliners' one color and the 'outliers' an other color.
Here's a script that is close to what you are looking for, I think. It hopefully will help you accomplish what you want:
import numpy as np
import scipy.interpolate as interpolate
import matplotlib.pyplot as plt
# make up data
def makeUpData():
'''Make many more data points (x,y,yerr) than theory (x,y),
with theory yerr corresponding to a constant "sigma" in y,
about x,y value'''
NX= 150
dataX = (np.random.rand(NX)*1.1)**2
dataY = (1.5*dataX+np.random.rand(NX)**2)*dataX
dataErr = np.random.rand(NX)*dataX*1.3
theoryX = np.arange(0,1,0.1)
theoryY = theoryX*theoryX*1.5
theoryErr = 0.5
return dataX,dataY,dataErr,theoryX,theoryY,theoryErr
def makeSameXrange(theoryX,dataX,dataY):
'''
Truncate the dataX and dataY ranges so that dataX min and max are with in
the max and min of theoryX.
'''
minT,maxT = theoryX.min(),theoryX.max()
goodIdxMax = np.where(dataX<maxT)
goodIdxMin = np.where(dataX[goodIdxMax]>minT)
return (dataX[goodIdxMax])[goodIdxMin],(dataY[goodIdxMax])[goodIdxMin]
# take 'theory' and get values at every 'data' x point
def theoryYatDataX(theoryX,theoryY,dataX):
'''For every dataX point, find interpolated thoeryY value. theoryx needed
for interpolation.'''
f = interpolate.interp1d(theoryX,theoryY)
return f(dataX[np.where(dataX<np.max(theoryX))])
# collect valid points
def findInlierSet(dataX,dataY,interpTheoryY,thoeryErr):
'''Find where theoryY-theoryErr < dataY theoryY+theoryErr and return
valid indicies.'''
withinUpper = np.where(dataY<(interpTheoryY+theoryErr))
withinLower = np.where(dataY[withinUpper]
>(interpTheoryY[withinUpper]-theoryErr))
return (dataX[withinUpper])[withinLower],(dataY[withinUpper])[withinLower]
def findOutlierSet(dataX,dataY,interpTheoryY,thoeryErr):
'''Find where theoryY-theoryErr < dataY theoryY+theoryErr and return
valid indicies.'''
withinUpper = np.where(dataY>(interpTheoryY+theoryErr))
withinLower = np.where(dataY<(interpTheoryY-theoryErr))
return (dataX[withinUpper],dataY[withinUpper],
dataX[withinLower],dataY[withinLower])
if __name__ == "__main__":
dataX,dataY,dataErr,theoryX,theoryY,theoryErr = makeUpData()
TruncDataX,TruncDataY = makeSameXrange(theoryX,dataX,dataY)
interpTheoryY = theoryYatDataX(theoryX,theoryY,TruncDataX)
inDataX,inDataY = findInlierSet(TruncDataX,TruncDataY,interpTheoryY,
theoryErr)
outUpX,outUpY,outDownX,outDownY = findOutlierSet(TruncDataX,
TruncDataY,
interpTheoryY,
theoryErr)
#print inlierIndex
fig = plt.figure()
ax = fig.add_subplot(211)
ax.errorbar(dataX,dataY,dataErr,fmt='.',color='k')
ax.plot(theoryX,theoryY,'r-')
ax.plot(theoryX,theoryY+theoryErr,'r--')
ax.plot(theoryX,theoryY-theoryErr,'r--')
ax.set_xlim(0,1.4)
ax.set_ylim(-.5,3)
ax = fig.add_subplot(212)
ax.plot(inDataX,inDataY,'ko')
ax.plot(outUpX,outUpY,'bo')
ax.plot(outDownX,outDownY,'ro')
ax.plot(theoryX,theoryY,'r-')
ax.plot(theoryX,theoryY+theoryErr,'r--')
ax.plot(theoryX,theoryY-theoryErr,'r--')
ax.set_xlim(0,1.4)
ax.set_ylim(-.5,3)
fig.savefig('findInliers.png')
This figure is the result:
At the end I use some of the Yann code:
def theoryYatDataX(theoryX,theoryY,dataX):
'''For every dataX point, find interpolated theoryY value. theoryx needed
for interpolation.'''
f = interpolate.interp1d(theoryX,theoryY)
return f(dataX[np.where(dataX<np.max(theoryX))])
def findOutlierSet(data,interpTheoryY,theoryErr):
'''Find where theoryY-theoryErr < dataY theoryY+theoryErr and return
valid indicies.'''
up = np.where(data.dmodulus > (interpTheoryY+theoryErr))
low = np.where(data.dmodulus < (interpTheoryY-theoryErr))
# join all the index together in a flat array
out = np.hstack([up,low]).ravel()
index = np.array(np.ones(len(data),dtype=bool))
index[out]=False
datain = data[index]
dataout = data[out]
return datain, dataout
def selectdata(data,theoryX,theoryY):
"""
Data selection: z<1 and +-0.5 LFLRW separation
"""
# Select data with redshift z<1
data1 = data[data.redshift < 1]
# From modulus to light distance:
data1.dmodulus, data1.dmodulus_error = modulus2distance(data1.dmodulus,data1.dmodulus_error)
# redshift data order
data1.sort(order='redshift')
# Outliers: distance to LFLRW curve bigger than +-0.5
theoryErr = 0.5
# Theory curve Interpolation to get the same points as data
interpy = theoryYatDataX(theoryX,theoryY,data1.redshift)
datain, dataout = findOutlierSet(data1,interpy,theoryErr)
return datain, dataout
Using those functions I can finally obtain:
Thank you all for your help.
Just look at the difference between the red curve and the points, if it is bigger than the difference between the red curve and the dashed red curve remove it.
diff=np.abs(points-red_curve)
index= (diff>(dashed_curve-redcurve))
filtered=points[index]
But please take the comment from NickLH serious. Your Data looks pretty good without any filtering, your "outlieres" all have a very big error and won't affect the fit much.
Either you could use the numpy.where() to identify which xy pairs meet your plotting criteria, or perhaps enumerate to do pretty much the same thing. Example:
x_list = [ 1, 2, 3, 4, 5, 6 ]
y_list = ['f','o','o','b','a','r']
result = [y_list[i] for i, x in enumerate(x_list) if 2 <= x < 5]
print result
I'm sure you could change the conditions so that '2' and '5' in the above example are the functions of your curves