I'm using NumPy's linspace to fill in data between points.
lats = (-66.44421,-66.57947,-64.81464,-64.69528)
lons = (-73.03290,-72.73904,-64.71657,-65.03036)
NO3 = (33.48,24.01,17.20,20.03)
xi = np.linspace(min(lats),max(lats),360)
yi = np.linspace(min(lons),max(lons),360)
# grid the data.
zi = griddata((lats, lons), NO3, (xi[None,:], yi[:,None]), method='cubic')
# contour the gridded data.
plt.contourf(xi,yi,zi,15,cmap=cMap)
plt.colorbar()
# plot data points.
plt.scatter(lats,lons,facecolors='none', edgecolors='k',s=26)
plt.show()
I want to retrieve values (missing samples) from the gridded data zi based on coordinate pairs generated from linspace, but the coordinates aren't exact for a dict lookup:
# record index and value of linspace coordinates as key and value
xi_coords = {value: index for index, value in enumerate(xi)}
yi_coords = {value: index for index, value in enumerate(yi)}
# how to retrieve a value inbetween at say... (-65.11018,-67.08512)
zi[xi_coords[-65.11018], yi_coords[-67.08512]]
Returns a Key error.
Is there a smarter workaround for this problem?
If I'm not mistaken the point you try to retrieve is not in your linspace, it is not just a numerical precision problem... If you want to find the closest grid point to any given point, you should define functions rather than using dicts:
latmin = min(lats)
latmax = max(lats)
npoints = 360
def get_lat_index(lat):
return int(round((npoints-1)*(lat-latmin)/(latmax-latmin)))
and similar for longitudes.
One option is rounding. For example to two decimals:
xi_coords = {round(value, 2): index for index, value in enumerate(xi)}
yi_coords = {round(value, 2): index for index, value in enumerate(yi)}
zi[xi_coords[-65.11], yi_coords[-67.08]]
Related
I would like to plot in 3D with Pandas / MatplotLib (Wireframe or other, I do not care) but in a specific way..
I'm using RFID sensors and I'm trying to record the signal I receive at different distance + different angles. And I want to see the correlation between the rising of the distance and the angle.
So that's why I want to plot in 3D :
X Axis -> the Distance, Y Axis -> the Angle, Z Axis -> the signal received which means a float
My CSV file from where I generate my DataFrame is organized like this a double entry table :
Distance;0;23;45;90;120;180
0;-53.145;-53.08;-53.1;-53.035;-53.035;-53.035
5;-53.145;-53.145;-53.05;-53.145;-53.145;-53.145
15;-53.145;-53.145;-53.145;-53.145;-53.145;-53.145
25;-53.145;-52.145;-53.145;-53.002;-53.145;-53.145
40;-53.145;-53.002;-51.145;-53.145;-54.255;-53.145
60;-53.145;-53.145;-53.145;-53.145;-53.145;-53.145
80;-53.145;-53.145;-53.145;-53.145;-60;-53.145
100;-53.145;-52;-53.145;-54;-53.145;-53.145
120;-53.145;-53.145;-53.145;-53.145;-53.002;-53.145
140;-51.754;-53.145;-51.845;-53.145;-53.145;-53.145
160;-53.145;-53.145;-49;-53.145;-53.145;-53.145
180;-53.145;-53.145;-53.145;-53.145;-53.145;-53.002
200;-53.145;-53.145;-53.145;-53.145;-53.145;-53.145
On the first label row we've different angles : 0°, 23°, 45°, ...
And the index of the DataFrame is the distance : 0 cm, 15 cm...
And the matrix inside represents the signal, so, values of Z Axis...
But I do not know how to generate a 3D Scatter, WireFrame... because in every tutorial I see people that use specific columns as axis.
Indeed, in my CSV file on the first row I've the label of all columns
Distance;0 ;23 ;45 ;90 ;120;180
And I do not know how to generate a 3D plot with a double entry table.
Do you know how to do it ? Or, to generate my CSV file in a better way to see the same result at the end !
I would be grateful if you would help me about this !
Thank you !
maybe contour is enough
b = np.array([0,5,15,25,40,60,80,100,120,140,160,180,200])
a = np.array([0,23,45,90,120,180])
x, y = np.meshgrid(a, b)
z = np.random.randint(-50,-40, (x.shape))
scm = plt.contourf(x, y, z, cmap='inferno')
plt.colorbar(scm)
plt.xticks(a)
plt.yticks(b)
plt.xlabel('Distance')
plt.ylabel('Angle')
plt.show()
displays
You can get a contour plot with something like this (but for the data shown it is not very interesting since all the values are constant at -45):
df = pd.read_csv(sep=';')
df = df.set_index('Distance')
x = df.index
y = df.columns.astype(int)
z = df.values
X,Y = np.meshgrid(x,y)
Z = z.T
plt.contourf(X,Y,Z,cmap='jet')
plt.colorbar()
plt.show()
Welcome to stackoverflow, your question can be split into several steps:
Step 1 - read the data
I have stored your data in a file called data.txt.
I don't know Pandas very well but this can also be handled with the nice simple function of Numpy called loadtxt. Your data is a bit problematic because of the text 'Distance' value in the first column and first row. But don't panic we load the file as a matrix of strings:
raw_data = np.loadtxt('data.txt', delimiter=';', dtype=np.string_)
Step 2 - transform the raw data
To extract the wanted data from the raw data we can do the following:
angle = raw_data[0 , 1:].astype(float)
distance = raw_data[1:, 0 ].astype(float)
data = raw_data[1:, 1:].astype(float)
With indexing the raw data we select the data that we want and with astype we change the string values to numbers.
Intermediate step - making the data a bit fancier
Your data was a bit boring, only the value -45, i took the liberty to make it a bit fancier:
data = (50 + angle[np.newaxis,:]) / (10 + np.sqrt(distance[:,np.newaxis]))
Step 4 - make a wireframe plot
The example at matplotlib.org looks easy enough:
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_wireframe(X, Y, Z)
plt.show()
But the trick is to get the X, Y, Z parameters right...
Step 3 - make the X and Y data
The Z data is simply our data values:
Z = data
The X and Y should also be 2D array's such that plot_wireframe can find the x and y for each value of Z in the 2D arrays X an Y at the same array locations. There is a Numpy function to create these 2D array's:
X, Y = np.meshgrid(angle, distance)
Step 5 - fancing it up a bit
ax.set_xticks(angle)
ax.set_yticks(distance[::2])
ax.set_xlabel('angle')
ax.set_ylabel('distance')
Putting it together
All steps together in the right order:
# necessary includes...
from mpl_toolkits.mplot3d import axes3d
import matplotlib.pyplot as plt
import numpy as np
raw_data = np.loadtxt('data.txt', delimiter=';', dtype=np.string_)
angle = raw_data[0 , 1:].astype(float)
distance = raw_data[1:, 0 ].astype(float)
data = raw_data[1:, 1:].astype(float)
# make the example data a bit more interesting...
data = (50 + angle[np.newaxis,:]) / (10 + np.sqrt(distance[:,np.newaxis]))
# setting up the plot
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
# the trickey part creating the data that plot_wireframe wants
Z = data
X, Y = np.meshgrid(angle, distance)
ax.plot_wireframe(X, Y, Z)
# fancing it up a bit
ax.set_xticks(angle)
ax.set_yticks(distance[::2])
ax.set_xlabel('angle')
ax.set_ylabel('distance')
# and showing the plot ...
plt.show()
I want to interpolate data (120*120) in order to get output data (1200*1200).
In this way I'm using scipy.interpolate.interp2d.
Below is my input data, where 255 corresponds to fill values, I mask these values before the interpolation.
I'm using the code below:
tck = interp2d(np.linspace(0, 1200, data.shape[1]),
np.linspace(0, 1200, data.shape[0]),
data,
fill_value=255)
data = tck(range(1200), range(1200))
data = np.ma.MaskedArray(data, data == 255)
I get the following result:
Fill values have been interpolated.
How can I interpolate my data without interpolate fill values ?
I found a solution with scipy.interpolate.griddata but I'm not sure that's the best one.
I interpolate data with the nearest method parameter which returns the value at the data point closest to the point of interpolation.
points = np.meshgrid(np.linspace(0, 1200, data.shape[1]),
np.linspace(0, 1200, data.shape[0]))
points = zip(points[0].flatten(), points[1].flatten())
xi = np.meshgrid(np.arange(1200), np.arange(1200))
xi = zip(xi[0].flatten(), xi[1].flatten())
tck = griddata(np.array(points), data.flatten(), np.array(xi), method='nearest')
data = tck.reshape((1200, 1200))
I am plotting weather model precipitation data on a map. The map is a contour fill. Over that data I would like to plot text at specific grid points that tells the value of the precipitation at the point in time, however I am struggling to do this. I have a list of lat/lon points but cannot figure out how to map it properly to the data I have. The data itself comes with it's own set of lat/lon points that uniquely map to the data. Here's example code that I have:
grib = 'gfs.t00z.pgrb2.0p25.f084'
grbs = pygrib.open(grib)
grb = grbs.select(name='Total Precipitation',typeOfLevel='surface')[0]
precip = grb.values *.039370
lat,lon = grb.latlons()
x, y = m(lon,lat)
with open('mydata') as f:
for line in f:
myline = line.replace("\n", "")
myline = myline.split(",")
uniquepoints.append(myline) # contains specific lat, lon points
intervals = [0.0,0.01,0.1,0.25,0.5,0.75,1.00,1.25,1.50,1.75,2.0,2.50,3.00]
m=Basemap(projection='lcc',lon_0 = -95.,llcrnrlon=-125.,
urcrnrlon=-55.,llcrnrlat=20.,urcrnrlat=50., lat_1=25.,lat_2=46., resolution='l',area_thresh=10000,ax=ax)
obsobj = m.contourf(x,y,precip,intervals,cmap=plt.cm.jet)
I know you are suppose to use plt.text, but I can't configure it correctly so that the unique points map correctly to the precip data.
It can be easy to make simple mistakes with placing points on Basemap as you have to convert between lat,long,height and the windows coordinate system in x,y. However, if your data is plotting at the correct points on the map, you can use those positions to plot your labels at those exact points with some specified offset.
A pythonic example of how to do this:
lats = [x['LLHPosition'][0] for x in unit_data]
lons = [x['LLHPosition'][1] for x in unit_data]
x,y = m(lons,lats)
label_text = [x['UnitName'] for x in unit_data]
x_offsets = [2000] * len(unit_data)
y_offsets = [2000] * len(unit_data)
for label, xpt, ypt, x_offset, y_offset in zip(label_text, x, y, x_offsets, y_offsets):
plt.text(xpt+x_offset, ypt+y_offset, label)
I have got this code to generate a surface plot. But it gives a zero division error. I am not able to figure out what is wrong. Thank you.
import pylab, csv
import numpy
from mayavi.mlab import *
def getData(fileName):
try:
data = csv.reader(open(fileName,'rb'))
except:
print 'File not found'
else:
data = [[float(row[0]), float(row[1]),float(row[2])] for row in data]
x = [row[0] for row in data]
y = [row[1] for row in data]
z = [row[2] for row in data]
return (x, y, z)
def plotData(fileName):
xVals, yVals, zVals = getData(fileName)
xVals = pylab.array(xVals)
yVals = pylab.array(yVals)
zVals = (pylab.array(zVals)*10**3)
x, y = numpy.mgrid[-0.5:0.5:0.001, -0.5:0.5:0.001]
s = surf(x, y, zVals)
return s
plotData('data')
If I have understood the code correctly, there is a problem with zVals in mayavi.mlab.surf.
According to the documentation of the function, s is the elevation matrix, a 2D array, where indices along the first array axis represent x locations, and indices along the second array axis represent y locations. Your file reader seems to return a 1D vector instead of an array.
However, this may not be the most difficult problem. Your file seems to contain triplets of x, y, and z coordinates. You can use mayavi.mlab.surf only if your x and y coordinates in the file form a regular square grid. If this is the case, then you just have to recover that grid and form nice 2D arrays of all three parts. If the points are in the file in a known order, it is easy, otherwise it is rather tricky.
Maybe you would want to start with mayavi.mlab.points3d(xVals, yVals, zVals). That will give you an overall impression of your data. (Or if already know more about your data, you might give us a hint by editing your question and adding more information!)
Just to give you an idea of probably slightly pythonic style of writing this, your code is rewritten (and surf replaced) in the following:
import mayavi.mlab as ml
import numpy
def plot_data(filename):
data = numpy.loadtxt(filename)
xvals = data[:,0]
yvals = data[:,1]
zvals = data[:,2] * 1000.
return ml.points3d(x, y, z)
plot_data('data')
(Essential changes: the use of numpy.loadtxt, get rid of pylab namespace here, no import *, no CamelCase variable or function names. For more information, see PEP 8.)
If you only need to see the shape of the surface, and the data in the file is ordered row-by-row and with the same number of data points in each row (i.e. fixed number of columns), then you may use:
import mayavi.mlab as ml
import numpy
importt matplotlib.pyplot as plt
# whatever you have as the number of points per row
columns = 13
data = numpy.loadtxt(filename)
# draw the data points into a XY plane to check that they really for a rectangular grid:
plt.plot(data[:,0], data[:,1])
# draw the surface
zvals = data[:,2].reshape(-1,columns)
ml.surf(zvals, warp_scale='auto')
As you can see, this code allows you to check that your values really are in the right kind of grid. It does not check that they are in the correct order, but at least you can see they form a nice grid. Also, you have to input the number of columns manually. The keyword warp_scale takes care of the surface scaling so that it should look reasonable.
I need to compare some theoretical data with real data in python.
The theoretical data comes from resolving an equation.
To improve the comparative I would like to remove data points that fall far from the theoretical curve. I mean, I want to remove the points below and above red dashed lines in the figure (made with matplotlib).
Both the theoretical curves and the data points are arrays of different length.
I can try to remove the points in a roughly-eye way, for example: the first upper point can be detected using:
data2[(data2.redshift<0.4)&data2.dmodulus>1]
rec.array([('1997o', 0.374, 1.0203223485103787, 0.44354759972859786)], dtype=[('SN_name', '|S10'), ('redshift', '<f8'), ('dmodulus', '<f8'), ('dmodulus_error', '<f8')])
But I would like to use a less roughly-eye way.
So, can anyone help me finding an easy way of removing the problematic points?
Thank you!
This might be overkill and is based on your comment
Both the theoretical curves and the data points are arrays of
different length.
I would do the following:
Truncate the data set so that its x values lie within the max and min values of the theoretical set.
Interpolate the theoretical curve using scipy.interpolate.interp1d and the above truncated data x values. The reason for step (1) is to satisfy the constraints of interp1d.
Use numpy.where to find data y values that are out side the range of acceptable theory values.
DONT discard these values, as was suggested in comments and other answers. If you want for clarity, point them out by plotting the 'inliners' one color and the 'outliers' an other color.
Here's a script that is close to what you are looking for, I think. It hopefully will help you accomplish what you want:
import numpy as np
import scipy.interpolate as interpolate
import matplotlib.pyplot as plt
# make up data
def makeUpData():
'''Make many more data points (x,y,yerr) than theory (x,y),
with theory yerr corresponding to a constant "sigma" in y,
about x,y value'''
NX= 150
dataX = (np.random.rand(NX)*1.1)**2
dataY = (1.5*dataX+np.random.rand(NX)**2)*dataX
dataErr = np.random.rand(NX)*dataX*1.3
theoryX = np.arange(0,1,0.1)
theoryY = theoryX*theoryX*1.5
theoryErr = 0.5
return dataX,dataY,dataErr,theoryX,theoryY,theoryErr
def makeSameXrange(theoryX,dataX,dataY):
'''
Truncate the dataX and dataY ranges so that dataX min and max are with in
the max and min of theoryX.
'''
minT,maxT = theoryX.min(),theoryX.max()
goodIdxMax = np.where(dataX<maxT)
goodIdxMin = np.where(dataX[goodIdxMax]>minT)
return (dataX[goodIdxMax])[goodIdxMin],(dataY[goodIdxMax])[goodIdxMin]
# take 'theory' and get values at every 'data' x point
def theoryYatDataX(theoryX,theoryY,dataX):
'''For every dataX point, find interpolated thoeryY value. theoryx needed
for interpolation.'''
f = interpolate.interp1d(theoryX,theoryY)
return f(dataX[np.where(dataX<np.max(theoryX))])
# collect valid points
def findInlierSet(dataX,dataY,interpTheoryY,thoeryErr):
'''Find where theoryY-theoryErr < dataY theoryY+theoryErr and return
valid indicies.'''
withinUpper = np.where(dataY<(interpTheoryY+theoryErr))
withinLower = np.where(dataY[withinUpper]
>(interpTheoryY[withinUpper]-theoryErr))
return (dataX[withinUpper])[withinLower],(dataY[withinUpper])[withinLower]
def findOutlierSet(dataX,dataY,interpTheoryY,thoeryErr):
'''Find where theoryY-theoryErr < dataY theoryY+theoryErr and return
valid indicies.'''
withinUpper = np.where(dataY>(interpTheoryY+theoryErr))
withinLower = np.where(dataY<(interpTheoryY-theoryErr))
return (dataX[withinUpper],dataY[withinUpper],
dataX[withinLower],dataY[withinLower])
if __name__ == "__main__":
dataX,dataY,dataErr,theoryX,theoryY,theoryErr = makeUpData()
TruncDataX,TruncDataY = makeSameXrange(theoryX,dataX,dataY)
interpTheoryY = theoryYatDataX(theoryX,theoryY,TruncDataX)
inDataX,inDataY = findInlierSet(TruncDataX,TruncDataY,interpTheoryY,
theoryErr)
outUpX,outUpY,outDownX,outDownY = findOutlierSet(TruncDataX,
TruncDataY,
interpTheoryY,
theoryErr)
#print inlierIndex
fig = plt.figure()
ax = fig.add_subplot(211)
ax.errorbar(dataX,dataY,dataErr,fmt='.',color='k')
ax.plot(theoryX,theoryY,'r-')
ax.plot(theoryX,theoryY+theoryErr,'r--')
ax.plot(theoryX,theoryY-theoryErr,'r--')
ax.set_xlim(0,1.4)
ax.set_ylim(-.5,3)
ax = fig.add_subplot(212)
ax.plot(inDataX,inDataY,'ko')
ax.plot(outUpX,outUpY,'bo')
ax.plot(outDownX,outDownY,'ro')
ax.plot(theoryX,theoryY,'r-')
ax.plot(theoryX,theoryY+theoryErr,'r--')
ax.plot(theoryX,theoryY-theoryErr,'r--')
ax.set_xlim(0,1.4)
ax.set_ylim(-.5,3)
fig.savefig('findInliers.png')
This figure is the result:
At the end I use some of the Yann code:
def theoryYatDataX(theoryX,theoryY,dataX):
'''For every dataX point, find interpolated theoryY value. theoryx needed
for interpolation.'''
f = interpolate.interp1d(theoryX,theoryY)
return f(dataX[np.where(dataX<np.max(theoryX))])
def findOutlierSet(data,interpTheoryY,theoryErr):
'''Find where theoryY-theoryErr < dataY theoryY+theoryErr and return
valid indicies.'''
up = np.where(data.dmodulus > (interpTheoryY+theoryErr))
low = np.where(data.dmodulus < (interpTheoryY-theoryErr))
# join all the index together in a flat array
out = np.hstack([up,low]).ravel()
index = np.array(np.ones(len(data),dtype=bool))
index[out]=False
datain = data[index]
dataout = data[out]
return datain, dataout
def selectdata(data,theoryX,theoryY):
"""
Data selection: z<1 and +-0.5 LFLRW separation
"""
# Select data with redshift z<1
data1 = data[data.redshift < 1]
# From modulus to light distance:
data1.dmodulus, data1.dmodulus_error = modulus2distance(data1.dmodulus,data1.dmodulus_error)
# redshift data order
data1.sort(order='redshift')
# Outliers: distance to LFLRW curve bigger than +-0.5
theoryErr = 0.5
# Theory curve Interpolation to get the same points as data
interpy = theoryYatDataX(theoryX,theoryY,data1.redshift)
datain, dataout = findOutlierSet(data1,interpy,theoryErr)
return datain, dataout
Using those functions I can finally obtain:
Thank you all for your help.
Just look at the difference between the red curve and the points, if it is bigger than the difference between the red curve and the dashed red curve remove it.
diff=np.abs(points-red_curve)
index= (diff>(dashed_curve-redcurve))
filtered=points[index]
But please take the comment from NickLH serious. Your Data looks pretty good without any filtering, your "outlieres" all have a very big error and won't affect the fit much.
Either you could use the numpy.where() to identify which xy pairs meet your plotting criteria, or perhaps enumerate to do pretty much the same thing. Example:
x_list = [ 1, 2, 3, 4, 5, 6 ]
y_list = ['f','o','o','b','a','r']
result = [y_list[i] for i, x in enumerate(x_list) if 2 <= x < 5]
print result
I'm sure you could change the conditions so that '2' and '5' in the above example are the functions of your curves