Inaccurate values of x-axis in plot - python

I am trying to plot a specific course (acceleration over time) using matplotlib. The plot works so far and is being shown (see image). J equals 35 and represents the derivative of acceleration over time (which in this case is a constant).
import numpy as np
import matplotlib.pyplot as plt
def limits_acc_course():
limits_acc_course.t1 = 0.14285714285714285
limits_acc_course.t2 = 0.14285714285714285 + 0.10714285714285715
limits_acc_course.t3 = 2*0.14285714285714285 + 0.10714285714285715
limits_acc_course.t4 = 2*0.14285714285714285 + 0.10714285714285715 + 0.5*0.24714285714285716
limits_acc_course()
t_end = 2*limits_acc_course.t4
t_1 = np.linspace(0, limits_acc_course.t1)
t_2 = np.linspace(limits_acc_course.t1, limits_acc_course.t2)
t_3 = np.linspace(limits_acc_course.t2, limits_acc_course.t3)
t_4 = np.linspace(limits_acc_course.t3, limits_acc_course.t4)
tk1 = np.array([])
tk2 = np.array([])
tk3 = np.array([])
tk4 = np.array([])
for value1 in t_1:
tk1 = np.append(tk1, value1*j)
for value2 in t_2:
tk2 = np.append(tk2, limits_acc_course.t1*j)
for value3 in t_3:
tk3 = np.append(tk3, (limits_acc_course.t3-value3)*j)
for value4 in t_4:
tk4 = np.append(tk4, value4*0)
if value4 == (2*limits_acc_course.t4-limits_acc_course.t3)*j:
break
t = np.concatenate((tk1, tk2, tk3, tk4), axis=0)
t_neg = (-1)*np.concatenate((tk1, tk2, tk3), axis=0)
t_final = np.concatenate((t, t_neg), axis=0)
t_range = np.linspace(0, t_end, t_final.size)
fig, t = plt.subplots()
t.plot(t_range, t_final)
t.get_xaxis().get_major_formatter().set_useOffset(False)
plt.show()
The problem is that the x-coordinates in plot do not match the calculated values.
The x-values in the plot (see image)) should be:
0.142857142857 0.25
(Or at least with such an accuracy:0.1429)
The x-values in the plot are.
0.144777 0.295348
I have tried to turn off the offset and i have played with range from 100 to 2500 values for each part and I have tried to round the values but it didn't work either. Further I have tried using endpoint=False in creating the ranges t_1 to t_4.
By now I ran out of ideas.
enter image description here

The plot is created in an axes which will extent over ~500 pixels on screen. The x axis shows 1.1 units. Hence you have 1.1/500 = 0.0022 units per pixel. The mouse cursor cannot know its position more accurate than 1 pixel. Hence the coordiante shown by the mouse cursor is accurate to ~±0.0022 units.
The observed coordinate (0.144777) deviates from the actual coordinate (0.142857142857) by 0.0019 units, which is well within the accuracy of the cursor.

Related

How to split data into two graphs with mat plot lib

I would be so thankful if someone would be able to help me with this. I am creating a graph in matplotib however I would to love to split up the 14 lines created from the while loop into the x and y values of P, so instead of plt.plot(t,P) it would be plt.plot(t,((P[1])[0]))) and
plt.plot(t,((P[1])[1]))). I would love if someone could help me very quick, it should be easy but i am just getting errors with the arrays
`
#Altering Alpha in Tumor Cells vs PACCs
#What is alpha? α = Rate of conversion of cancer cells to PACCs
import numpy as np
from scipy.integrate import odeint
import matplotlib.pyplot as plt
from google.colab import files
value = -6
counter = -1
array = []
pac = []
while value <= 0:
def modelP(x,t):
P, C = x
λc = 0.0601
K = 2000
α = 1 * (10**value)
ν = 1 * (10**-6)
λp = 0.1
γ = 2
#returning odes
dPdt = ((λp))*P*(1-(C+(γ*P))/K)+ (α*C)
dCdt = ((λc)*C)*(1-(C+(γ*P))/K)-(α*C) + (ν***P)
return dPdt, dCdt
#initial
C0= 256
P0 = 0
Pinit = [P0,C0]
#time points
t = np.linspace(0,730)
#solve odes
P = odeint(modelP,Pinit,t)
plt.plot(t,P)
value += 1
#plot results
plt.xlabel('Time [days]')
plt.ylabel('Number of PACCs')
plt.show()
`
You can use subplots() to create two subplots and then plot the individual line into the plot you need. To do this, firstly add the subplots at the start (before the while loop) by adding this line...
fig, ax = plt.subplots(2,1) ## Plot will 2 rows, 1 column... change if required
Then... within the while loop, replace the plotting line...
plt.plot(t,P)
with (do take care of the space so that the lines are within while loop)
if value < -3: ## I am using value = -3 as the point of split, change as needed
ax[0].plot(t,P)#, ax=ax[0]) ## Add to first plot
else:
ax[1].plot(t,P)#,ax=ax[1]) ## Add to second plot
This will give a plot like this.

In matplotlib, how can I plot a multi-colored line, like a rainbow

I would like to plot parallel lines with different colors. E.g. rather than a single red line of thickness 6, I would like to have two parallel lines of thickness 3, with one red and one blue.
Any thoughts would be appreciated.
Merci
Even with the smart offsetting (s. below), there is still an issue in a view that has sharp angles between consecutive points.
Zoomed view of smart offsetting:
Overlaying lines of varying thickness:
Plotting parallel lines is not an easy task. Using a simple uniform offset will of course not show the desired result. This is shown in the left picture below.
Such a simple offset can be produced in matplotlib as shown in the transformation tutorial.
Method1
A better solution may be to use the idea sketched on the right side. To calculate the offset of the nth point we can use the normal vector to the line between the n-1st and the n+1st point and use the same distance along this normal vector to calculate the offset point.
The advantage of this method is that we have the same number of points in the original line as in the offset line. The disadvantage is that it is not completely accurate, as can be see in the picture.
This method is implemented in the function offset in the code below.
In order to make this useful for a matplotlib plot, we need to consider that the linewidth should be independent of the data units. Linewidth is usually given in units of points, and the offset would best be given in the same unit, such that e.g. the requirement from the question ("two parallel lines of width 3") can be met.
The idea is therefore to transform the coordinates from data to display coordinates, using ax.transData.transform. Also the offset in points o can be transformed to the same units: Using the dpi and the standard of ppi=72, the offset in display coordinates is o*dpi/ppi. After the offset in display coordinates has been applied, the inverse transform (ax.transData.inverted().transform) allows a backtransformation.
Now there is another dimension of the problem: How to assure that the offset remains the same independent of the zoom and size of the figure?
This last point can be addressed by recalculating the offset each time a zooming of resizing event has taken place.
Here is how a rainbow curve would look like produced by this method.
And here is the code to produce the image.
import numpy as np
import matplotlib.pyplot as plt
dpi = 100
def offset(x,y, o):
""" Offset coordinates given by array x,y by o """
X = np.c_[x,y].T
m = np.array([[0,-1],[1,0]])
R = np.zeros_like(X)
S = X[:,2:]-X[:,:-2]
R[:,1:-1] = np.dot(m, S)
R[:,0] = np.dot(m, X[:,1]-X[:,0])
R[:,-1] = np.dot(m, X[:,-1]-X[:,-2])
On = R/np.sqrt(R[0,:]**2+R[1,:]**2)*o
Out = On+X
return Out[0,:], Out[1,:]
def offset_curve(ax, x,y, o):
""" Offset array x,y in data coordinates
by o in points """
trans = ax.transData.transform
inv = ax.transData.inverted().transform
X = np.c_[x,y]
Xt = trans(X)
xto, yto = offset(Xt[:,0],Xt[:,1],o*dpi/72. )
Xto = np.c_[xto, yto]
Xo = inv(Xto)
return Xo[:,0], Xo[:,1]
# some single points
y = np.array([1,2,2,3,3,0])
x = np.arange(len(y))
#or try a sinus
x = np.linspace(0,9)
y=np.sin(x)*x/3.
fig, ax=plt.subplots(figsize=(4,2.5), dpi=dpi)
cols = ["#fff40b", "#00e103", "#ff9921", "#3a00ef", "#ff2121", "#af00e7"]
lw = 2.
lines = []
for i in range(len(cols)):
l, = plt.plot(x,y, lw=lw, color=cols[i])
lines.append(l)
def plot_rainbow(event=None):
xr = range(6); yr = range(6);
xr[0],yr[0] = offset_curve(ax, x,y, lw/2.)
xr[1],yr[1] = offset_curve(ax, x,y, -lw/2.)
xr[2],yr[2] = offset_curve(ax, xr[0],yr[0], lw)
xr[3],yr[3] = offset_curve(ax, xr[1],yr[1], -lw)
xr[4],yr[4] = offset_curve(ax, xr[2],yr[2], lw)
xr[5],yr[5] = offset_curve(ax, xr[3],yr[3], -lw)
for i in range(6):
lines[i].set_data(xr[i], yr[i])
plot_rainbow()
fig.canvas.mpl_connect("resize_event", plot_rainbow)
fig.canvas.mpl_connect("button_release_event", plot_rainbow)
plt.savefig(__file__+".png", dpi=dpi)
plt.show()
Method2
To avoid overlapping lines, one has to use a more complicated solution.
One could first offset every point normal to the two line segments it is part of (green points in the picture below). Then calculate the line through those offset points and find their intersection.
A particular case would be when the slopes of two subsequent line segments equal. This has to be taken care of (eps in the code below).
from __future__ import division
import numpy as np
import matplotlib.pyplot as plt
dpi = 100
def intersect(p1, p2, q1, q2, eps=1.e-10):
""" given two lines, first through points pn, second through qn,
find the intersection """
x1 = p1[0]; y1 = p1[1]; x2 = p2[0]; y2 = p2[1]
x3 = q1[0]; y3 = q1[1]; x4 = q2[0]; y4 = q2[1]
nomX = ((x1*y2-y1*x2)*(x3-x4)- (x1-x2)*(x3*y4-y3*x4))
denom = float( (x1-x2)*(y3-y4) - (y1-y2)*(x3-x4) )
nomY = (x1*y2-y1*x2)*(y3-y4) - (y1-y2)*(x3*y4-y3*x4)
if np.abs(denom) < eps:
#print "intersection undefined", p1
return np.array( p1 )
else:
return np.array( [ nomX/denom , nomY/denom ])
def offset(x,y, o, eps=1.e-10):
""" Offset coordinates given by array x,y by o """
X = np.c_[x,y].T
m = np.array([[0,-1],[1,0]])
S = X[:,1:]-X[:,:-1]
R = np.dot(m, S)
norm = np.sqrt(R[0,:]**2+R[1,:]**2) / o
On = R/norm
Outa = On+X[:,1:]
Outb = On+X[:,:-1]
G = np.zeros_like(X)
for i in xrange(0, len(X[0,:])-2):
p = intersect(Outa[:,i], Outb[:,i], Outa[:,i+1], Outb[:,i+1], eps=eps)
G[:,i+1] = p
G[:,0] = Outb[:,0]
G[:,-1] = Outa[:,-1]
return G[0,:], G[1,:]
def offset_curve(ax, x,y, o, eps=1.e-10):
""" Offset array x,y in data coordinates
by o in points """
trans = ax.transData.transform
inv = ax.transData.inverted().transform
X = np.c_[x,y]
Xt = trans(X)
xto, yto = offset(Xt[:,0],Xt[:,1],o*dpi/72., eps=eps )
Xto = np.c_[xto, yto]
Xo = inv(Xto)
return Xo[:,0], Xo[:,1]
# some single points
y = np.array([1,1,2,0,3,2,1.,4,3]) *1.e9
x = np.arange(len(y))
x[3]=x[4]
#or try a sinus
#x = np.linspace(0,9)
#y=np.sin(x)*x/3.
fig, ax=plt.subplots(figsize=(4,2.5), dpi=dpi)
cols = ["r", "b"]
lw = 11.
lines = []
for i in range(len(cols)):
l, = plt.plot(x,y, lw=lw, color=cols[i], solid_joinstyle="miter")
lines.append(l)
def plot_rainbow(event=None):
xr = range(2); yr = range(2);
xr[0],yr[0] = offset_curve(ax, x,y, lw/2.)
xr[1],yr[1] = offset_curve(ax, x,y, -lw/2.)
for i in range(2):
lines[i].set_data(xr[i], yr[i])
plot_rainbow()
fig.canvas.mpl_connect("resize_event", plot_rainbow)
fig.canvas.mpl_connect("button_release_event", plot_rainbow)
plt.show()
Note that this method should work well as long as the offset between the lines is smaller then the distance between subsequent points on the line. Otherwise method 1 may be better suited.
The best that I can think of is to take your data, generate a series of small offsets, and use fill_between to make bands of whatever color you like.
I wrote a function to do this. I don't know what shape you're trying to plot, so this may or may not work for you. I tested it on a parabola and got decent results. You can also play around with the list of colors.
def rainbow_plot(x, y, spacing=0.1):
fig, ax = plt.subplots()
colors = ['red', 'yellow', 'green', 'cyan','blue']
top = max(y)
lines = []
for i in range(len(colors)+1):
newline_data = y - top*spacing*i
lines.append(newline_data)
for i, c in enumerate(colors):
ax.fill_between(x, lines[i], lines[i+1], facecolor=c)
return fig, ax
x = np.linspace(0,1,51)
y = 1-(x-0.5)**2
rainbow_plot(x,y)

Find a easier way to cluster 2-d scatter data into grid array data

I have figured out a method to cluster disperse point data into structured 2-d array(like rasterize function). And I hope there are some better ways to achieve that target.
My work
1. Intro
1000 point data has there dimensions of properties (lon, lat, emission) whicn represent one factory located at (x,y) emit certain amount of CO2 into atmosphere
grid network: predefine the 2-d array in the shape of 20x20
http://i4.tietuku.com/02fbaf32d2f09fff.png
The code reproduced here:
#### define the map area
xc1,xc2,yc1,yc2 = 113.49805889531724,115.5030664238035,37.39995194888143,38.789235929357105
map = Basemap(llcrnrlon=xc1,llcrnrlat=yc1,urcrnrlon=xc2,urcrnrlat=yc2)
#### reading the point data and scatter plot by their position
df = pd.read_csv("xxxxx.csv")
px,py = map(df.lon, df.lat)
map.scatter(px, py, color = "red", s= 5,zorder =3)
#### predefine the grid networks
lon_grid,lat_grid = np.linspace(xc1,xc2,21), np.linspace(yc1,yc2,21)
lon_x,lat_y = np.meshgrid(lon_grid,lat_grid)
grids = np.zeros(20*20).reshape(20,20)
plt.pcolormesh(lon_x,lat_y,grids,cmap = 'gray', facecolor = 'none',edgecolor = 'k',zorder=3)
2. My target
Finding the nearest grid point for each factory
Add the emission data into this grid number
3. Algorithm realization
3.1 Raster grid
note: 20x20 grid points are distributed in this area represented by blue dot.
http://i4.tietuku.com/8548554587b0cb3a.png
3.2 KD-tree
Find the nearest blue dot of each red point
sh = (20*20,2)
grids = np.zeros(20*20*2).reshape(*sh)
sh_emission = (20*20)
grids_em = np.zeros(20*20).reshape(sh_emission)
k = 0
for j in range(0,yy.shape[0],1):
for i in range(0,xx.shape[0],1):
grids[k] = np.array([lon_grid[i],lat_grid[j]])
k+=1
T = KDTree(grids)
x_delta = (lon_grid[2] - lon_grid[1])
y_delta = (lat_grid[2] - lat_grid[1])
R = np.sqrt(x_delta**2 + y_delta**2)
for i in range(0,len(df.lon),1):
idx = T.query_ball_point([df.lon.iloc[i],df.lat.iloc[i]], r=R)
# there are more than one blue dot which are founded sometimes,
# So I'll calculate the distances between the factory(red point)
# and all blue dots which are listed
if (idx > 1):
distance = []
for k in range(0,len(idx),1):
distance.append(np.sqrt((df.lon.iloc[i] - grids[k][0])**2 + (df.lat.iloc[i] - grids[k][1])**2))
pos_index = distance.index(min(distance))
pos = idx[pos_index]
# Only find 1 point
else:
pos = idx
grids_em[pos] += df.so2[i]
4. Result
co2 = grids_em.reshape(20,20)
plt.pcolormesh(lon_x,lat_y,co2,cmap =plt.cm.Spectral_r,zorder=3)
http://i4.tietuku.com/6ded65c4ac301294.png
5. My question
Can someone point out some drawbacks or error of this method?
Is there some algorithms more aligned with my target?
Thanks a lot!
There are many for-loop in your code, it's not the numpy way.
Make some sample data first:
import numpy as np
import pandas as pd
from scipy.spatial import KDTree
import pylab as pl
xc1, xc2, yc1, yc2 = 113.49805889531724, 115.5030664238035, 37.39995194888143, 38.789235929357105
N = 1000
GSIZE = 20
x, y = np.random.multivariate_normal([(xc1 + xc2)*0.5, (yc1 + yc2)*0.5], [[0.1, 0.02], [0.02, 0.1]], size=N).T
value = np.ones(N)
df_points = pd.DataFrame({"x":x, "y":y, "v":value})
For equal space grids you can use hist2d():
pl.hist2d(df_points.x, df_points.y, weights=df_points.v, bins=20, cmap="viridis");
Here is the output:
Here is the code to use KdTree:
X, Y = np.mgrid[x.min():x.max():GSIZE*1j, y.min():y.max():GSIZE*1j]
grid = np.c_[X.ravel(), Y.ravel()]
points = np.c_[df_points.x, df_points.y]
tree = KDTree(grid)
dist, indices = tree.query(points)
grid_values = df_points.groupby(indices).v.sum()
df_grid = pd.DataFrame(grid, columns=["x", "y"])
df_grid["v"] = grid_values
fig, ax = pl.subplots(figsize=(10, 8))
ax.plot(df_points.x, df_points.y, "kx", alpha=0.2)
mapper = ax.scatter(df_grid.x, df_grid.y, c=df_grid.v,
cmap="viridis",
linewidths=0,
s=100, marker="o")
pl.colorbar(mapper, ax=ax);
the output is:

RINEX plotting in Python

I’m trying to plot data an in order to check my code, I’m making a comparison of the resulting plots with what has already been generated with Matlab. I am encountering several issues however with this:
Generally, the parsing of RINEX files works, and the general pattern of the presentation of the data looks similar to that the Matlab scripts plotted. However there are small deviations in data that should become apparent when zooming in on the data i.e. when using a smaller time series, for example plotting over a special 2 hour period, not 24 hours. In Matlab, this small discrepancy can be seen, and a polynomial fitting applied. However for the Python plots (the first plot shown below), the curved line of this two hour period appears “smooth” and does not deviate at all, like that seen in the Matlab script (the second plot shows the blue line as the data, against the red line of the polyfit, hence, the blue line shows a slight discrepancy at x=9.4). The Matlab script is assumed correct, as this deviation is because of an Seismic activity that disrupts the ionosphere temporarily. Please refer to the plots below:
The third plot is in Matlab, where this is simply the polyfit minus the live data.
Therefore, it is not clear just how this data is being plotted on the axes for the Python script, because the data appears to smooth? Nor if my code is wrong (see below) and somehow “smooths” out the data somehow:
#Calculating by looping through
for sv in range(32):
sat = self.obs_data_chunks_dataframe[sv, :]
#print "sat.index_{0}: {1}".format(sv+1, sat.index)
phi1 = sat['L1'] * LAMBDA_1 #Change units of L1 to meters
phi2 = sat['L2'] * LAMBDA_2 #Change units of L2 to meters
pr1 = sat['P1']
pr2 = sat['P2']
#CALCULATION: teqc Calculation
iono_teqc = COEFF * (pr2 - pr1) / 1000000 #divide to make values smaller (tbc)
print "iono_teqc_{0}: {1}".format(sv+1, iono_teqc)
#PLOTTING
#Plotting of the data
plt.plot(sat.index, iono_teqc, label=‘teqc’)
plt.xlabel('Time (UTC)')
plt.ylabel('Ionosphere Delay (meters)')
plt.title("Ionosphere Delay on {0} for Satellite {1}.".format(self.date, sv+1))
plt.legend()
ax = plt.gca()
ax.ticklabel_format(useOffset=False)
plt.grid()
if sys.platform.startswith('win'):
plt.savefig(winpath + '\Figure_SV{0}'.format(sv+1))
elif sys.platform.startswith('darwin'):
plt.savefig(macpath + 'Figure_SV{0}'.format(sv+1))
plt.close()
Following on from point 1, the polynomial fitting code below does not run the way I’d like, so I’m overlooking something here. I assume this has to do with the data used upon the x,y-axes but can’t pinpoint exactly what. Would anyone know where I am going wrong here?
#Zoomed in plots
if sv == 19:
#Plotting of the data
plt.plot(sat.index, iono_teqc, label=‘teqc’) #sat.index to plot for time in UTC
plt.xlim(8, 10)
plt.xlabel('Time (UTC)')
plt.ylabel('Ionosphere Delay (meters)')
plt.title("Ionosphere Delay on {0} for Satellite {1}.".format(self.date, sv+1))
plt.legend()
ax = plt.gca()
ax.ticklabel_format(useOffset=False)
plt.grid()
#Polynomial fitting
coefficients = np.polyfit(sat.index, iono_teqc, 2)
plt.plot(coefficients)
if sys.platform.startswith('win'):
#os.path.join(winpath, 'Figure_SV{0}'.format(sv+1))
plt.savefig(winpath + '\Zoom_SV{0}'.format(sv+1))
elif sys.platform.startswith('darwin'):
plt.savefig(macpath + 'Zoom_SV{0}'.format(sv+1))
plt.close()
My RINEX file comprises 32 satellites. However when trying to generate the plots for all 32, I receive:
IndexError: index 31 is out of bounds for axis 0 with size 31
Changing the code below to 31 solves this partly, only excluding the 32nd satellite. I’d like to also plot for satellite 32. The functions for the parsing, and formatting of the data are given below:
def read_obs(self, RINEXfile, n_sat, sat_map):
obs = np.empty((TOTAL_SATS, len(self.obs_types)), dtype=np.float64) * np.NaN
lli = np.zeros((TOTAL_SATS, len(self.obs_types)), dtype=np.uint8)
signal_strength = np.zeros((TOTAL_SATS, len(self.obs_types)), dtype=np.uint8)
for i in range(n_sat):
# Join together observations for a single satellite if split across lines.
obs_line = ''.join(padline(RINEXfile.readline()[:-1], 16) for _ in range((len(self.obs_types) + 4) / 5))
#obs_line = ''.join(padline(RINEXfile.readline()[:-1], 16) for _ in range(2))
#while obs_line
for j in range(len(self.obs_types)):
obs_record = obs_line[16*j:16*(j+1)]
obs[sat_map[i], j] = floatornan(obs_record[0:14])
lli[sat_map[i], j] = digitorzero(obs_record[14:15])
signal_strength[sat_map[i], j] = digitorzero(obs_record[15:16])
return obs, lli, signal_strength
def read_data_chunk(self, RINEXfile, CHUNK_SIZE = 10000):
obss = np.empty((CHUNK_SIZE, TOTAL_SATS, len(self.obs_types)), dtype=np.float64) * np.NaN
llis = np.zeros((CHUNK_SIZE, TOTAL_SATS, len(self.obs_types)), dtype=np.uint8)
signal_strengths = np.zeros((CHUNK_SIZE, TOTAL_SATS, len(self.obs_types)), dtype=np.uint8)
epochs = np.zeros(CHUNK_SIZE, dtype='datetime64[us]')
flags = np.zeros(CHUNK_SIZE, dtype=np.uint8)
i = 0 #ggfrfg
while True:
hdr = self.read_epoch_header(RINEXfile)
if hdr is None:
break
epoch_time, flags[i], sats = hdr
#epochs[i] = np.datetime64(epoch_time)
epochs[i] = epoch_time
sat_map = np.ones(len(sats)) * -1
for n, sat in enumerate(sats):
if sat[0] == 'G':
sat_map[n] = int(sat[1:]) - 1
obss[i], llis[i], signal_strengths[i] = self.read_obs(RINEXfile, len(sats), sat_map)
i += 1
if i >= CHUNK_SIZE:
break
return obss[:i], llis[:i], signal_strengths[:i], epochs[:i], flags[:i]
def read_data(self, RINEXfile):
obs_data_chunks = []
while True:
obss, _, _, epochs, _ = self.read_data_chunk(RINEXfile)
epochs = epochs.astype(np.int64)
epochs = np.divide(epochs, float(3600.000))
if obss.shape[0] == 0:
break
obs_data_chunks.append(pd.Panel(
np.rollaxis(obss, 1, 0),
items=['G%02d' % d for d in range(1, 33)],
major_axis=epochs,
minor_axis=self.obs_types
).dropna(axis=0, how='all').dropna(axis=2, how='all'))
self.obs_data_chunks_dataframe = obs_data_chunks[0]
Any suggestions?
Cheers, pymat.
I managed to solve Qu1 as it was a conversion issue with my calculation that was overlooked, the other two points are however open...

Referencing Data From a 2D Histogram

I have the following code that reads data from a CSV file and creates a 2D histogram:
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
#Read in CSV data
filename = 'Complete_Storms_All_US_Only.csv'
df = pd.read_csv(filename)
min_85 = df.min85
min_37 = df.min37
verification = df.one_min_15
#Numbers
x = min_85
y = min_37
H = verification
#Estimate the 2D histogram
nbins = 33
H, xedges, yedges = np.histogram2d(x,y,bins=nbins)
#Rotate and flip H
H = np.rot90(H)
H = np.flipud(H)
#Mask zeros
Hmasked = np.ma.masked_where(H==0,H)
#Calculate Averages
avgarr = np.zeros((nbins, nbins))
xbins = np.digitize(x, xedges[1:-1])
ybins = np.digitize(y, yedges[1:-1])
for xb, yb, v in zip(xbins, ybins, verification):
avgarr[yb, xb] += v
divisor = H.copy()
divisor[divisor==0.0] = np.nan
avgarr /= divisor
binavg = np.around((avgarr * 100), decimals=1)
binper = np.ma.array(binavg, mask=np.isnan(binavg))
#Plot 2D histogram using pcolor
fig1 = plt.figure()
plt.pcolormesh(xedges,yedges,binper)
plt.title('1 minute at +/- 0.15 degrees')
plt.xlabel('min 85 GHz PCT (K)')
plt.ylabel('min 37 GHz PCT (K)')
cbar = plt.colorbar()
cbar.ax.set_ylabel('Probability of CG Lightning (%)')
plt.show()
Each pixel in the histogram contains the probability of lightning for a given range of temperatures at two different frequencies on the x and y axis (min_85 on the x axis and min_37 on the y axis). I am trying to reference the probability of lightning from the histogram based on a wide range of temperatures that vary on an individual basis for any given storm. Each storm has a min_85 and min_37 that corresponds to a probability from the 2D histogram. I know there is a brute-force method where you can create a ridiculous amount of if statements, with one for each pixel, but this is tedious and inefficient when trying to incorporate over multiple 2D histograms. Is there a more efficient way to reference the probability from the histogram based on the given min_85 and min_37? I have a separate file with the min_85 and min_37 data for a large amount of storms, I just need to assign the corresponding probability of lightning from the histogram to each one.
It sounds like all you need to do is turn the min_85 and min_37 values into indices. Something like this will work:
# min85data and min37data from your file
dx = xedges[1] - xedges[0]
dy = yedges[1] - yedges[0]
min85inds = np.floor((min85data - yedges[1]) / dx).astype(np.int)
min37inds = np.floor((min37data - yedges[0]) / dy).astype(np.int)
# Pretend you didn't do all that flipping of H, or make a copy of it first
hvals = h_orig[min85inds, min37ends]
But do make sure that the resulting indices are valid before you extract them.

Categories

Resources