Using Polyfit with Lists - python

I am trying to construct a function with np.polyfit() to extrapolate data according to my need. I have some temperature and pressure observations which I have plotted. I need to fit a best-fit line in the observations so that I can extrapolate (get the temperature in each pressure level for a different surface temperature, i.e., the temperature of the last pressure level, assuming that the shape of the fit remains constant) the observations to my need. This is what I have done so far:
import pandas as pd
import glob
import numpy as np
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
from datetime import datetime
dfs = []
pressure = []
temp = []
#reading the data
for fname in glob.glob('/home/swadhin/project/radiosonde_data/in/pb/*.txt'):
df = pd.read_csv(fname, skiprows=1, delimiter='\s+',
names=['LVLpTYP', 'ETIME', 'PRESSURE','GPH','TEMP','RH','DPDP','WDIR','WSPD'])
p1 = df['PRESSURE'].to_numpy(dtype = np.float64)
t1 = df['TEMP'].to_numpy(dtype = np.float64)
pressure.append(p1)
temp.append(t1)
dfs.append(df)
p =[]
for i in pressure:
a = np.ma.masked_equal(i,-9999.) #masking the fill_values
p.append(a)
p = [i/100 for i in p] #converting the pressure to hPa
t =[]
for j in temp:
b = np.ma.masked_equal(j,-9999.)
c = np.ma.masked_equal(b,-8888.)
t.append(c)
t = [i/10 for i in t] #converting the temp to the appropriate unit
zipped = zip(p, t)
z_c = [np.polyfit(x,y,2) for x,y in zipped]
p_array = np.linspace(1000,0, num = 101)
for i in range(len(p)):
x = p[i]
y = t[i]
z = z_c[i]
xp = p_array[i]
p = np.poly1d(z)
plt.subplot(1,2,i+1)
plt.plot(y, x, '.', p(xp),xp, '-');
plt.gca().invert_yaxis()
But I am not getting any plots.
Earlier to plot the pressure and temperature from the observations I did this and got the following plot:
for i in range(len(p)):
plt.plot(t[i],p[i])
plt.gca().invert_yaxis()
plt.ylim(bottom = 1010)
plt.ylabel('Pressure(hPa)')
plt.xlabel('Temperature($^\circ$C)')
The pressure and temperature arrays have an inhomogeneous structure.
I am attaching the datafiles for reference:
Pressure data = https://drive.google.com/file/d/13e7u8iBZWvmHAj0yt9MXEtB1eniI9xOR/view?usp=sharing
Temp data = https://drive.google.com/file/d/13dysYQlutg0_a9aJnm2U3_lCecxSObFN/view?usp=sharing

Related

Using scipy.optimize.curve_fit with more than one input data and p0= two variables

I am using scipy.optimize.curve_fit to fit measured (test in code) data to theoretical (run in code) data. Attached are two codes. In the first one I have one measured and theoretical data. When I use scipy.optimize.curve_fit I get approximately the correct temperature. The problem comes when I need to extend scipy.optimize.curve_fit to more one measured and theoretical data. The second code is my progress so far. How do I deal with two input data, i.e, what do I replace x-and y-data with. For example do I a need to combine the data in some manner. I have tried a few ways to non-success. Any help would be appreciated.
import pandas as pd
import numpy as np
from scipy import interpolate
# mr data: wave, counts, temp
run_1 = pd.read_excel("run_1.xlsx")
run_1_temp = np.array(run_1['temp'])
run_1_counts = np.array(run_1['count'])
# test data: wave, counts, temp = 30
test_1 = pd.read_excel("test_1.xlsx")
xdata = test_1['wave']
ydata = test_1['counts']
# Interpolate
inter_run_1 = interpolate.interp1d(run_1_temp,run_1_counts, kind='linear', fill_value='extrapolation')
run_1_temp_new = np.linspace(20,50,0.1)
run_1_count_new = inter_run_1(run_1_temp_new)
# Curve-fit
def f(wave, temp):
signal = inter_run_1(temp)
return signal
popt, pcov = scipy.optimize.curve_fit(f,xdata,ydata,p0=[30])
print(popt, pcov)
import pandas as pd
import numpy as np
from scipy import interpolate
# mr data: wave, counts, temp
run_1 = pd.read_excel("run_1.xlsx")
run_2 = pd.read_excel("run_2.xlsx")
# test data 1: wave, counts, temp = 30
test_1 = pd.read_excel("test_1.xlsx")
xdata = test_1['wave']
ydata = test_1['counts']
test data 1: wave, counts, temp = 40
test_2 = pd.read_excel("test_2.xlsx")
x1data = test_2['wave']
y1data = test_2['counts']
run_1_temp = np.array(run_1['temp'])
run_1_counts = np.array(run_1['count'])
run_2_temp = np.array(run_2['temp'])
run_2_counts = np.array(run_2['count'])
# Interpolate
inter_run_1 = interpolate.interp1d(run_1_temp,run_1_counts, kind='linear', fill_value='extrapolation')
run_1_temp_new = np.linspace(20,50,0.1)
run_1_count_new = inter_run_1(run_1_temp_new)
inter_run_2 = interpolate.interp1d(run_2_temp,run_2_counts, kind='linear', fill_value='extrapolation')
run_2_temp_new = np.linspace(20,50,0.1)
run_2_count_new = inter_run_2(run_2_temp_new)
def f(wave,temp1,temp2):
signal_1 = inter_run_1(temp1)
signal_2 = inter_run_2(temp2)
signal = signal_1 + signal_1
return signal
popt, pcov = scipy.optimize.curve_fit(f,xdata,ydata,p0=[30,50])
print(popt, pcov)

Having some problem to understand the x_bin in regplot of Seaborn

I used the seaborn.regplot to plot data, but not quite understand how the error bar in regplot was calculated. I have compared the results with the mean and standard deviation derived from mannual calculation. Here is my testing script.
import numpy as np
import pandas as pd
import seaborn as sn
def get_data_XYE(p):
x_list = []
lower_list = []
upper_list = []
for line in p.lines:
x_list.append(line.get_xdata()[0])
lower_list.append(line.get_ydata()[0])
upper_list.append(line.get_ydata()[1])
y = 0.5 * (np.asarray(lower_list) + np.asarray(upper_list))
y_error = np.asarray(upper_list) - y
x = np.asarray(x_list)
return x, y, y_error
x = [37.3448,36.6026,42.7795,34.7072,75.4027,226.2615,192.7984,140.8045,242.9952,458.451,640.6542,726.1024,231.7347,107.5605,200.2254,190.0006,314.1349,146.8131,152.4497,175.9096,284.9926,116.9681,118.2953,312.3787,815.8389,458.0146,409.5797,595.5373,188.9955,15.7716,36.1839,244.8689,57.4579,94.8717,112.2237,87.0687,72.79,22.3457,24.1728,29.505,80.8765,252.7454,280.6002,252.9573,348.246,112.705,98.7545,317.0541,300.9573,402.8411,406.6884,56.1286,30.1385,32.9909,497.556,19.3606,20.8409,95.2324,108.6074,15.7753,54.5511,45.5623,64.564,101.1934,81.8459,88.286,58.2642,56.1225,51.2943,38.0649,63.5882,63.6847,120.495,102.4097,49.3255,111.3309,171.6028,58.9526,28.7698,144.6884,180.0661,116.6028,146.2594,199.8702,128.9378,423.2363,119.8537,124.6508,518.8625,306.3023,79.5213,121.0309,116.9346,170.8863,930.361,48.9983,55.039,47.1092,72.0548,75.4045,103.521,83.4134,142.3253,146.6215,121.4467,101.4252,68.4812,291.4275,143.9475,142.647,78.9826,47.094,204.2196,89.0208,82.792,27.1346,142.4764,83.7874,67.3216,112.9531,138.2549,133.3446,86.2659,45.3464,56.1604,43.5882,54.3623,86.296,115.7272,96.5498,111.8081,36.1756,40.2947,34.2532,89.1452,53.9062,36.458,113.9297,176.9962,77.3125,77.8891,64.807,64.1515,127.7242,119.6876,976.2324,322.8454,434.2883,168.6923,250.0284,234.7329,131.0793,152.335,118.8838,243.1772,24.1776,168.6327,170.7541,167.8444,75.9315,110.1045,113.4417,60.5464,66.8956,79.7606,71.6659,72.5251,77.513,207.8019,21.8592,35.2787,169.7698,146.5012,412.9934,248.0708,318.5489,104.1278,184.7592,108.0581,175.2646,169.7698,340.3732,570.3396,23.9853,69.0405,66.7391,67.9435,294.6085,68.0537,77.6344,433.2713,104.3178,229.4615,187.8587,78.1399,121.4737,122.5451,384.5935,38.5232,117.6835,50.3308,318.2513,103.6695,20.7181,321.9601,510.3248,13.4754,16.1188,44.8082,37.7291,733.4587,446.6241,21.1822,287.9603,327.2367,274.1109,195.4713,158.2114,64.4537,26.9857,172.8503]
y = [37,40,30,29,24,23,27,12,21,20,29,28,27,32,23,29,28,22,28,23,24,29,32,18,22,12,12,14,29,31,34,31,22,40,25,36,27,27,29,35,33,25,25,27,27,19,35,26,18,24,25,37,52,47,34,39,40,48,41,44,35,36,53,46,38,44,23,26,26,28,27,21,25,21,20,27,35,24,46,34,22,30,30,30,31,26,25,28,21,31,24,27,33,21,31,33,29,33,32,21,25,22,39,31,34,26,23,18,20,18,34,25,20,12,23,25,21,21,25,31,17,27,28,29,25,24,25,21,24,27,23,22,23,22,22,26,22,19,26,35,33,35,29,26,26,30,22,32,33,33,28,32,26,29,36,37,37,28,24,30,25,20,29,24,33,35,30,32,31,33,40,35,37,24,34,29,27,24,36,26,26,26,27,27,20,17,28,34,18,20,20,18,19,23,20,22,25,32,44,41,39,41,40,44,36,42,31,32,26,29,23,29,29,28,31,22,29,24,28,28,25]
xbreaks = [13.4754, 27.1346, 43.5882, 58.9526, 72.79, 89.1452, 110.1045, 131.0793, 158.2114, 180.0661, 207.8019, 234.7329, 252.9573, 300.9573, 327.2367, 348.246, 412.9934, 434.2883, 458.451, 518.8625, 595.5373, 640.6542, 733.4587, 815.8389, 930.361, 976.2324]
df = pd.DataFrame([x,y]).T
df.columns = ['x','y']
# Check the bin average and std using agge
bins = pd.cut(df.x,xbreaks,right=False)
t = df[['x','y']].groupby(bins).agg({"x": "mean", "y": ["mean","std"]})
t.reset_index(inplace=True)
t.columns = ['range_cut','x_avg_cut','y_avg_cut','y_std_cut']
t.index.name ='id'
# Get the bin average from
g = sns.regplot(x='x',y='y',data=df,fit_reg=False,x_bins=xbreaks,seed=seed)
xye = pd.DataFrame(get_data_XYE(g)).T
xye.columns = ['x_regplot','y_regplot','e_regplot']
xye.index.name = 'id'
t2 = xye.merge(t,on='id',how='left')
t2
You can see the y and e from the two ways are different. I understand that the default x_ci or x_estimator may afect the result of regplot, but I still can not the these values in excel by removing some lowest and/or highest values in each bin.
In seaborn.regplot, the x_bins are the center of each bin, and the original x values are assigned to the nearest bin value. Whereas in pandas.cut, the breaks define the bin edges.

Adding a 45 degree line to a time series stock data plot

I guess this is supposed to be simple.. But I cant seem to make it work.
I have some stock data
import pandas as pd
import numpy as np
df = pd.DataFrame(index=pd.date_range(start = "06/01/2018", end = "08/01/2018"),
data = np.random.rand(62)*100)
I am doing some analysis on it, this results of my drawing some lines on the graph.
And I want to plot a 45 line somewhere on the graph as a reference for lines I drew on the graph.
What I have tried is
x = df.tail(len(df)/20).index
x = x.reset_index()
x_first_val = df.loc[x.loc[0].date].adj_close
In order to get some point and then use slope = 1 and calculate y values.. but this sounds all wrong.
Any ideas?
Here is a possibility:
import pandas as pd
import numpy as np
df = pd.DataFrame(index=pd.date_range(start = "06/01/2018", end = "08/01/2018"),
data=np.random.rand(62)*100,
columns=['data'])
# Get values for the time:
index_range = df.index[('2018-06-18' < df.index) & (df.index < '2018-07-21')]
# get the timestamps in nanoseconds (since epoch)
timestamps_ns = index_range.astype(np.int64)
# convert it to a relative number of days (for example, could be seconds)
time_day = (timestamps_ns - timestamps_ns[0]) / 1e9 / 60 / 60 / 24
# Define y-data for a line:
slope = 3 # unit: "something" per day
something = time_day * slope
trendline = pd.Series(something, index=index_range)
# Graph:
df.plot(label='data', alpha=0.8)
trendline.plot(label='some trend')
plt.legend(); plt.ylabel('something');
which gives:
edit - first answer, using dayofyear instead of the timestamps:
import pandas as pd
import numpy as np
df = pd.DataFrame(index=pd.date_range(start = "06/01/2018", end = "08/01/2018"),
data=np.random.rand(62)*100,
columns=['data'])
# Define data for a line:
slope = 3 # unit: "something" per day
index_range = df.index[('2018-06-18' < df.index) & (df.index < '2018-07-21')]
dayofyear = index_range.dayofyear # it will not work around the new year...
dayofyear = dayofyear - dayofyear[0]
something = dayofyear * slope
trendline = pd.Series(something, index=index_range)
# Graph:
df.plot(label='data', alpha=0.8)
trendline.plot(label='some trend')
plt.legend(); plt.ylabel('something');

Converting relative time from CSV file into absolute time

import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import Rbf, InterpolatedUnivariateSpline
data = np.genfromtxt('FTIR Data.csv', skip_header=1, delimiter=',', usecols=(1,2,3), names=['Time','Peakat2188cm1', 'water'] )
x=data['Time']
y1=data['Peakat2188cm1']
y2=data['water']
fig=plt.figure()
ax1 = fig.add_subplot(111)
ax2 = ax1.twinx()
ius=InterpolatedUnivariateSpline
xs = np.linspace(x.min(), x.max(), 100)
s1=ius(x,y1)
s2=ius(x,y2)
ys1 = s1(xs)
ys2 = s2(xs)
ax2.plot(xs,ys1)
ax2.plot(xs,ys2)
ax1.set_ylabel('Peak at 2188 cm-1')
ax2.set_ylabel('water')
ax1.set_xlabel('RT (mins)')
plt.title('RT Vs Conc')
This is my code for reading data from a csv file which is an export data from my instrument. In excel file, i have manually converted the relative time into Time in minutes and got the right plot. But i want to convert the relative time directly in matplotlib when reading the relative time column in csv file. I have tried from different examples but couldnt get through. I am very new to python so can anyone please help with editing in my code. My actual data is in the following format. (this code is used to plot absolute time i.e. Time, which i already converted in excel before ploting in matplotlib)[enter image description here][1]
Relative Time,Peak at 2188 cm-1,water
00:00:51,0.572157,0.179023
00:02:51,0.520037,0.171217
00:04:51,0.551843,0.221285
00:06:50,0.566279,0.209182
00:09:26,0.022696,0.0161351
00:10:51,-0.00344509,0.0141303
00:12:51,0.555898,0.21082
00:14:51,0.519753,0.179563
00:16:51,0.503512,0.150133
00:18:51,0.498554,0.154512
00:20:51,0.00128343,-0.0129148
00:22:51,0.349077,0.0414234
00:24:50,0.360565,0.0522027
00:26:51,0.403705,0.0667703
Plot
At this moment, the Time column is still a string. You will have to convert this to minutes in some way
pandas.to_timedelta
import pandas as pd
column_names = ['Time','Peakat2188cm1', 'water']
df_orig = pd.read_csv(filename, sep=',')
df_orig.columns = column_names
time_in_minutes = pd.to_timedelta(df_orig['Time']).dt.total_seconds() / 60
semi-manually
time_in_minutes = [sum(int(x) * 60**i for i, x in enumerate(reversed(t.split(':')), -1)) for t in data['Time']]
explanation
This is the same as:
time_in_minutes = []
for t in data['Time']:
minutes = 0
# t = '00:00:51'
h_m_s = t.split(':')
# h_m_s = ['00', '00', '51']
s_m_h = list(enumerate(reversed(h_m_s), -1))
# s_m_h = [(-1, '51'), (0, '00'), (1, '00')]
for i, x in s_m_h:
# i = -1
# x = '51'
minutes += int(x) * 60 ** i
# minutes = 0.85
time_in_minutes.append(minutes)

Covariance/heat flux in Python

I'm looking to compute poleward heat fluxes at a level in the atmosphere, i.e the mean of (u't') . I'm aware of the covariance function in NumPy, but cannot seem to implement it. Here is my code below.
from netCDF4 import Dataset
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
myfile = '/home/ubuntu/Fluxes_Test/out.nc'
Import = Dataset(myfile, mode='r')
lon = Import.variables['lon'][:] # Longitude
lat = Import.variables['lat'][:] # Latitude
time = Import.variables['time'][:] # Time
lev = Import.variables['lev'][:] # Level
wind = Import.variables['ua'][:]
temp = Import.variables['ta'][:]
lon = lon-180 # to shift co-ordinates to -180 to 180.
variable1 = np.squeeze(wind,temp, axis=0)
variable2 = np.cov(variable1)
m = Basemap(resolution='l')
lons, lats = np.meshgrid(lon,lat)
X, Y = m(lons, lats)
cs = m.pcolor(X,Y, variable2)
plt.show()
The shape of the variables wind and temp which I am trying to compute the flux of (the covariance) are both (3960,64,128), so 3960 pieces of data on a 64x128 grid (with co-ordinates).
I tried squeezing both variables to produce a array of (3960, 3960, 64,128) so cov could work on these first two series of data (the two 3960's) of wind and temp, but this didn't work.

Categories

Resources