Using Polyfit with Lists - python
I am trying to construct a function with np.polyfit() to extrapolate data according to my need. I have some temperature and pressure observations which I have plotted. I need to fit a best-fit line in the observations so that I can extrapolate (get the temperature in each pressure level for a different surface temperature, i.e., the temperature of the last pressure level, assuming that the shape of the fit remains constant) the observations to my need. This is what I have done so far:
import pandas as pd
import glob
import numpy as np
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
from datetime import datetime
dfs = []
pressure = []
temp = []
#reading the data
for fname in glob.glob('/home/swadhin/project/radiosonde_data/in/pb/*.txt'):
df = pd.read_csv(fname, skiprows=1, delimiter='\s+',
names=['LVLpTYP', 'ETIME', 'PRESSURE','GPH','TEMP','RH','DPDP','WDIR','WSPD'])
p1 = df['PRESSURE'].to_numpy(dtype = np.float64)
t1 = df['TEMP'].to_numpy(dtype = np.float64)
pressure.append(p1)
temp.append(t1)
dfs.append(df)
p =[]
for i in pressure:
a = np.ma.masked_equal(i,-9999.) #masking the fill_values
p.append(a)
p = [i/100 for i in p] #converting the pressure to hPa
t =[]
for j in temp:
b = np.ma.masked_equal(j,-9999.)
c = np.ma.masked_equal(b,-8888.)
t.append(c)
t = [i/10 for i in t] #converting the temp to the appropriate unit
zipped = zip(p, t)
z_c = [np.polyfit(x,y,2) for x,y in zipped]
p_array = np.linspace(1000,0, num = 101)
for i in range(len(p)):
x = p[i]
y = t[i]
z = z_c[i]
xp = p_array[i]
p = np.poly1d(z)
plt.subplot(1,2,i+1)
plt.plot(y, x, '.', p(xp),xp, '-');
plt.gca().invert_yaxis()
But I am not getting any plots.
Earlier to plot the pressure and temperature from the observations I did this and got the following plot:
for i in range(len(p)):
plt.plot(t[i],p[i])
plt.gca().invert_yaxis()
plt.ylim(bottom = 1010)
plt.ylabel('Pressure(hPa)')
plt.xlabel('Temperature($^\circ$C)')
The pressure and temperature arrays have an inhomogeneous structure.
I am attaching the datafiles for reference:
Pressure data = https://drive.google.com/file/d/13e7u8iBZWvmHAj0yt9MXEtB1eniI9xOR/view?usp=sharing
Temp data = https://drive.google.com/file/d/13dysYQlutg0_a9aJnm2U3_lCecxSObFN/view?usp=sharing
Related
Using scipy.optimize.curve_fit with more than one input data and p0= two variables
I am using scipy.optimize.curve_fit to fit measured (test in code) data to theoretical (run in code) data. Attached are two codes. In the first one I have one measured and theoretical data. When I use scipy.optimize.curve_fit I get approximately the correct temperature. The problem comes when I need to extend scipy.optimize.curve_fit to more one measured and theoretical data. The second code is my progress so far. How do I deal with two input data, i.e, what do I replace x-and y-data with. For example do I a need to combine the data in some manner. I have tried a few ways to non-success. Any help would be appreciated. import pandas as pd import numpy as np from scipy import interpolate # mr data: wave, counts, temp run_1 = pd.read_excel("run_1.xlsx") run_1_temp = np.array(run_1['temp']) run_1_counts = np.array(run_1['count']) # test data: wave, counts, temp = 30 test_1 = pd.read_excel("test_1.xlsx") xdata = test_1['wave'] ydata = test_1['counts'] # Interpolate inter_run_1 = interpolate.interp1d(run_1_temp,run_1_counts, kind='linear', fill_value='extrapolation') run_1_temp_new = np.linspace(20,50,0.1) run_1_count_new = inter_run_1(run_1_temp_new) # Curve-fit def f(wave, temp): signal = inter_run_1(temp) return signal popt, pcov = scipy.optimize.curve_fit(f,xdata,ydata,p0=[30]) print(popt, pcov) import pandas as pd import numpy as np from scipy import interpolate # mr data: wave, counts, temp run_1 = pd.read_excel("run_1.xlsx") run_2 = pd.read_excel("run_2.xlsx") # test data 1: wave, counts, temp = 30 test_1 = pd.read_excel("test_1.xlsx") xdata = test_1['wave'] ydata = test_1['counts'] test data 1: wave, counts, temp = 40 test_2 = pd.read_excel("test_2.xlsx") x1data = test_2['wave'] y1data = test_2['counts'] run_1_temp = np.array(run_1['temp']) run_1_counts = np.array(run_1['count']) run_2_temp = np.array(run_2['temp']) run_2_counts = np.array(run_2['count']) # Interpolate inter_run_1 = interpolate.interp1d(run_1_temp,run_1_counts, kind='linear', fill_value='extrapolation') run_1_temp_new = np.linspace(20,50,0.1) run_1_count_new = inter_run_1(run_1_temp_new) inter_run_2 = interpolate.interp1d(run_2_temp,run_2_counts, kind='linear', fill_value='extrapolation') run_2_temp_new = np.linspace(20,50,0.1) run_2_count_new = inter_run_2(run_2_temp_new) def f(wave,temp1,temp2): signal_1 = inter_run_1(temp1) signal_2 = inter_run_2(temp2) signal = signal_1 + signal_1 return signal popt, pcov = scipy.optimize.curve_fit(f,xdata,ydata,p0=[30,50]) print(popt, pcov)
Having some problem to understand the x_bin in regplot of Seaborn
I used the seaborn.regplot to plot data, but not quite understand how the error bar in regplot was calculated. I have compared the results with the mean and standard deviation derived from mannual calculation. Here is my testing script. import numpy as np import pandas as pd import seaborn as sn def get_data_XYE(p): x_list = [] lower_list = [] upper_list = [] for line in p.lines: x_list.append(line.get_xdata()[0]) lower_list.append(line.get_ydata()[0]) upper_list.append(line.get_ydata()[1]) y = 0.5 * (np.asarray(lower_list) + np.asarray(upper_list)) y_error = np.asarray(upper_list) - y x = np.asarray(x_list) return x, y, y_error x = [37.3448,36.6026,42.7795,34.7072,75.4027,226.2615,192.7984,140.8045,242.9952,458.451,640.6542,726.1024,231.7347,107.5605,200.2254,190.0006,314.1349,146.8131,152.4497,175.9096,284.9926,116.9681,118.2953,312.3787,815.8389,458.0146,409.5797,595.5373,188.9955,15.7716,36.1839,244.8689,57.4579,94.8717,112.2237,87.0687,72.79,22.3457,24.1728,29.505,80.8765,252.7454,280.6002,252.9573,348.246,112.705,98.7545,317.0541,300.9573,402.8411,406.6884,56.1286,30.1385,32.9909,497.556,19.3606,20.8409,95.2324,108.6074,15.7753,54.5511,45.5623,64.564,101.1934,81.8459,88.286,58.2642,56.1225,51.2943,38.0649,63.5882,63.6847,120.495,102.4097,49.3255,111.3309,171.6028,58.9526,28.7698,144.6884,180.0661,116.6028,146.2594,199.8702,128.9378,423.2363,119.8537,124.6508,518.8625,306.3023,79.5213,121.0309,116.9346,170.8863,930.361,48.9983,55.039,47.1092,72.0548,75.4045,103.521,83.4134,142.3253,146.6215,121.4467,101.4252,68.4812,291.4275,143.9475,142.647,78.9826,47.094,204.2196,89.0208,82.792,27.1346,142.4764,83.7874,67.3216,112.9531,138.2549,133.3446,86.2659,45.3464,56.1604,43.5882,54.3623,86.296,115.7272,96.5498,111.8081,36.1756,40.2947,34.2532,89.1452,53.9062,36.458,113.9297,176.9962,77.3125,77.8891,64.807,64.1515,127.7242,119.6876,976.2324,322.8454,434.2883,168.6923,250.0284,234.7329,131.0793,152.335,118.8838,243.1772,24.1776,168.6327,170.7541,167.8444,75.9315,110.1045,113.4417,60.5464,66.8956,79.7606,71.6659,72.5251,77.513,207.8019,21.8592,35.2787,169.7698,146.5012,412.9934,248.0708,318.5489,104.1278,184.7592,108.0581,175.2646,169.7698,340.3732,570.3396,23.9853,69.0405,66.7391,67.9435,294.6085,68.0537,77.6344,433.2713,104.3178,229.4615,187.8587,78.1399,121.4737,122.5451,384.5935,38.5232,117.6835,50.3308,318.2513,103.6695,20.7181,321.9601,510.3248,13.4754,16.1188,44.8082,37.7291,733.4587,446.6241,21.1822,287.9603,327.2367,274.1109,195.4713,158.2114,64.4537,26.9857,172.8503] y = [37,40,30,29,24,23,27,12,21,20,29,28,27,32,23,29,28,22,28,23,24,29,32,18,22,12,12,14,29,31,34,31,22,40,25,36,27,27,29,35,33,25,25,27,27,19,35,26,18,24,25,37,52,47,34,39,40,48,41,44,35,36,53,46,38,44,23,26,26,28,27,21,25,21,20,27,35,24,46,34,22,30,30,30,31,26,25,28,21,31,24,27,33,21,31,33,29,33,32,21,25,22,39,31,34,26,23,18,20,18,34,25,20,12,23,25,21,21,25,31,17,27,28,29,25,24,25,21,24,27,23,22,23,22,22,26,22,19,26,35,33,35,29,26,26,30,22,32,33,33,28,32,26,29,36,37,37,28,24,30,25,20,29,24,33,35,30,32,31,33,40,35,37,24,34,29,27,24,36,26,26,26,27,27,20,17,28,34,18,20,20,18,19,23,20,22,25,32,44,41,39,41,40,44,36,42,31,32,26,29,23,29,29,28,31,22,29,24,28,28,25] xbreaks = [13.4754, 27.1346, 43.5882, 58.9526, 72.79, 89.1452, 110.1045, 131.0793, 158.2114, 180.0661, 207.8019, 234.7329, 252.9573, 300.9573, 327.2367, 348.246, 412.9934, 434.2883, 458.451, 518.8625, 595.5373, 640.6542, 733.4587, 815.8389, 930.361, 976.2324] df = pd.DataFrame([x,y]).T df.columns = ['x','y'] # Check the bin average and std using agge bins = pd.cut(df.x,xbreaks,right=False) t = df[['x','y']].groupby(bins).agg({"x": "mean", "y": ["mean","std"]}) t.reset_index(inplace=True) t.columns = ['range_cut','x_avg_cut','y_avg_cut','y_std_cut'] t.index.name ='id' # Get the bin average from g = sns.regplot(x='x',y='y',data=df,fit_reg=False,x_bins=xbreaks,seed=seed) xye = pd.DataFrame(get_data_XYE(g)).T xye.columns = ['x_regplot','y_regplot','e_regplot'] xye.index.name = 'id' t2 = xye.merge(t,on='id',how='left') t2 You can see the y and e from the two ways are different. I understand that the default x_ci or x_estimator may afect the result of regplot, but I still can not the these values in excel by removing some lowest and/or highest values in each bin.
In seaborn.regplot, the x_bins are the center of each bin, and the original x values are assigned to the nearest bin value. Whereas in pandas.cut, the breaks define the bin edges.
Adding a 45 degree line to a time series stock data plot
I guess this is supposed to be simple.. But I cant seem to make it work. I have some stock data import pandas as pd import numpy as np df = pd.DataFrame(index=pd.date_range(start = "06/01/2018", end = "08/01/2018"), data = np.random.rand(62)*100) I am doing some analysis on it, this results of my drawing some lines on the graph. And I want to plot a 45 line somewhere on the graph as a reference for lines I drew on the graph. What I have tried is x = df.tail(len(df)/20).index x = x.reset_index() x_first_val = df.loc[x.loc[0].date].adj_close In order to get some point and then use slope = 1 and calculate y values.. but this sounds all wrong. Any ideas?
Here is a possibility: import pandas as pd import numpy as np df = pd.DataFrame(index=pd.date_range(start = "06/01/2018", end = "08/01/2018"), data=np.random.rand(62)*100, columns=['data']) # Get values for the time: index_range = df.index[('2018-06-18' < df.index) & (df.index < '2018-07-21')] # get the timestamps in nanoseconds (since epoch) timestamps_ns = index_range.astype(np.int64) # convert it to a relative number of days (for example, could be seconds) time_day = (timestamps_ns - timestamps_ns[0]) / 1e9 / 60 / 60 / 24 # Define y-data for a line: slope = 3 # unit: "something" per day something = time_day * slope trendline = pd.Series(something, index=index_range) # Graph: df.plot(label='data', alpha=0.8) trendline.plot(label='some trend') plt.legend(); plt.ylabel('something'); which gives: edit - first answer, using dayofyear instead of the timestamps: import pandas as pd import numpy as np df = pd.DataFrame(index=pd.date_range(start = "06/01/2018", end = "08/01/2018"), data=np.random.rand(62)*100, columns=['data']) # Define data for a line: slope = 3 # unit: "something" per day index_range = df.index[('2018-06-18' < df.index) & (df.index < '2018-07-21')] dayofyear = index_range.dayofyear # it will not work around the new year... dayofyear = dayofyear - dayofyear[0] something = dayofyear * slope trendline = pd.Series(something, index=index_range) # Graph: df.plot(label='data', alpha=0.8) trendline.plot(label='some trend') plt.legend(); plt.ylabel('something');
Converting relative time from CSV file into absolute time
import numpy as np import matplotlib.pyplot as plt from scipy.interpolate import Rbf, InterpolatedUnivariateSpline data = np.genfromtxt('FTIR Data.csv', skip_header=1, delimiter=',', usecols=(1,2,3), names=['Time','Peakat2188cm1', 'water'] ) x=data['Time'] y1=data['Peakat2188cm1'] y2=data['water'] fig=plt.figure() ax1 = fig.add_subplot(111) ax2 = ax1.twinx() ius=InterpolatedUnivariateSpline xs = np.linspace(x.min(), x.max(), 100) s1=ius(x,y1) s2=ius(x,y2) ys1 = s1(xs) ys2 = s2(xs) ax2.plot(xs,ys1) ax2.plot(xs,ys2) ax1.set_ylabel('Peak at 2188 cm-1') ax2.set_ylabel('water') ax1.set_xlabel('RT (mins)') plt.title('RT Vs Conc') This is my code for reading data from a csv file which is an export data from my instrument. In excel file, i have manually converted the relative time into Time in minutes and got the right plot. But i want to convert the relative time directly in matplotlib when reading the relative time column in csv file. I have tried from different examples but couldnt get through. I am very new to python so can anyone please help with editing in my code. My actual data is in the following format. (this code is used to plot absolute time i.e. Time, which i already converted in excel before ploting in matplotlib)[enter image description here][1] Relative Time,Peak at 2188 cm-1,water 00:00:51,0.572157,0.179023 00:02:51,0.520037,0.171217 00:04:51,0.551843,0.221285 00:06:50,0.566279,0.209182 00:09:26,0.022696,0.0161351 00:10:51,-0.00344509,0.0141303 00:12:51,0.555898,0.21082 00:14:51,0.519753,0.179563 00:16:51,0.503512,0.150133 00:18:51,0.498554,0.154512 00:20:51,0.00128343,-0.0129148 00:22:51,0.349077,0.0414234 00:24:50,0.360565,0.0522027 00:26:51,0.403705,0.0667703 Plot
At this moment, the Time column is still a string. You will have to convert this to minutes in some way pandas.to_timedelta import pandas as pd column_names = ['Time','Peakat2188cm1', 'water'] df_orig = pd.read_csv(filename, sep=',') df_orig.columns = column_names time_in_minutes = pd.to_timedelta(df_orig['Time']).dt.total_seconds() / 60 semi-manually time_in_minutes = [sum(int(x) * 60**i for i, x in enumerate(reversed(t.split(':')), -1)) for t in data['Time']] explanation This is the same as: time_in_minutes = [] for t in data['Time']: minutes = 0 # t = '00:00:51' h_m_s = t.split(':') # h_m_s = ['00', '00', '51'] s_m_h = list(enumerate(reversed(h_m_s), -1)) # s_m_h = [(-1, '51'), (0, '00'), (1, '00')] for i, x in s_m_h: # i = -1 # x = '51' minutes += int(x) * 60 ** i # minutes = 0.85 time_in_minutes.append(minutes)
Covariance/heat flux in Python
I'm looking to compute poleward heat fluxes at a level in the atmosphere, i.e the mean of (u't') . I'm aware of the covariance function in NumPy, but cannot seem to implement it. Here is my code below. from netCDF4 import Dataset import numpy as np import matplotlib.pyplot as plt from mpl_toolkits.basemap import Basemap myfile = '/home/ubuntu/Fluxes_Test/out.nc' Import = Dataset(myfile, mode='r') lon = Import.variables['lon'][:] # Longitude lat = Import.variables['lat'][:] # Latitude time = Import.variables['time'][:] # Time lev = Import.variables['lev'][:] # Level wind = Import.variables['ua'][:] temp = Import.variables['ta'][:] lon = lon-180 # to shift co-ordinates to -180 to 180. variable1 = np.squeeze(wind,temp, axis=0) variable2 = np.cov(variable1) m = Basemap(resolution='l') lons, lats = np.meshgrid(lon,lat) X, Y = m(lons, lats) cs = m.pcolor(X,Y, variable2) plt.show() The shape of the variables wind and temp which I am trying to compute the flux of (the covariance) are both (3960,64,128), so 3960 pieces of data on a 64x128 grid (with co-ordinates). I tried squeezing both variables to produce a array of (3960, 3960, 64,128) so cov could work on these first two series of data (the two 3960's) of wind and temp, but this didn't work.