how to subtract first value with all the rest value - python

I have data like this:
timestamp high windSpeed windDir windU windV
04/05/2019 10:02 100 4.39 179.1 -0.14 8.53
150 2.44 164.5 -1.26 4.57
200 4.29 180.9 0.12 8.32
04/05/2019 10:03 100 4.39 179.1 -0.15 8.53
150 2.44 164.5 -1.26 4.57
200 4.29 180.9 0.12 8.32
04/05/2019 10:04 100 4.52 179.1 -0.16 8.79
150 2.15 162.8 -1.24 4
200 3.34 181.9 0.21 6.49
04/05/2019 10:05 100 4.52 179.1 -0.17 8.79
150 2.15 162.8 -1.24 4
200 3.34 181.9 0.21 6.49
and I want to subtract the value from higher level with lower level in each time.This is what I got so far, but this one only give me 1 value. Anyone can help me please? thank you.
for timestamp, group in grouped:
HeightIndices = group["high"].keys()
for heightIndex in range(HeightIndices[0], HeightIndices[0] + len(HeightIndices) - 1):
windMag = sqrt(group["windU"] ** 2 + group["windV"] ** 2)
diffMag = windMag[heightIndex+1]-windMag[heightIndex]

I'm not sure if I'm accomplishing what you're asking, but based on my looking at your code, it seems you are trying to get the difference between the i-th and i+1-th index in the column "high" and call that variable diffMag. If that's the case you can probably use one of the two methods.
Solution 1:
diff_mag = []
for i in range(len(wind['height'])-1):
diff_mag[i] = wind['height'][i+1] - wind['height'][i]
Solution 2:
Use numpy diff.
np.diff(wind['height'])
I made the assumption you're using pandas here based on what your code block looks like. Hope that helps.
EDIT
Okay..I think I understand what you are saying now.
I think this should work:
windMag = []
for timestamp, group in grouped:
HeightIndices = group["high"].keys()
for heightIndex in range(HeightIndices[0], HeightIndices[0] + len(HeightIndices) - 1):
windMag.append(sqrt(group["windU"] ** 2 + group["windV"] ** 2))
diffMag = np.diff(windMag)

Related

Obtaining 2 or more coefficients from defined equation using regression methods

I'm looking to run this code that enables to solve for the x number of unknowns (c_10, c_01, c_11 etc.) just from plotting the graph.
Some background on the equation:
Mooney-Rivlin model (1940) with P1 = c_10[(2*λ+λ**2)-3]+c_01[(λ**-2+2*λ)-3].
P1 (or known as P) and lambda are data pre-defined in numerical terms in the table below (sheet ExperimentData of experimental_data1.xlsx):
λ P
1.00 0.00
1.01 0.03
1.12 0.14
1.24 0.23
1.39 0.32
1.61 0.41
1.89 0.50
2.17 0.58
2.42 0.67
3.01 0.85
3.58 1.04
4.03 1.21
4.76 1.58
5.36 1.94
5.76 2.29
6.16 2.67
6.40 3.02
6.62 3.39
6.87 3.75
7.05 4.12
7.16 4.47
7.27 4.85
7.43 5.21
7.50 5.57
7.61 6.30
I have tried obtaining coefficients using Linear regression. However, to my knowledge, random forest is not able to obtain multiple coefficients using
reg.coef_
Tried SVR with
reg.dual_coef_
However keeps obtaining error
ValueError: not enough values to unpack (expected 2, got 1)
Code below:
data = pd.read_excel('experimental_data.xlsx', sheet_name='ExperimentData')
X_s = [[(2*λ+λ**2)-3, (λ**-2+2*λ)-3] for λ in data['λ']]
y_s = data['P']
svr = SVR()
svr.fit(X_s, y_s)
c_01, c_10 = svr.dual_coef_
And for future proofing this method, if lets say there are more than 2 coefficients, are there other methods apart from Linear Regression?
For example, referring to Ishihara model (1951) where
P1 = {2*c_10 + 4*c_20*c_01[(2*λ**-1+λ**2) - 3]*[(λ**-2 + 2*λ) - 3] + c_20 * c_01 * (λ**-1) * [(2*λ**-1 + λ**2) - 3]**2}*{λ - λ**-2}
Any comments is greatly appreciated!

Dataframe split columns value, how to solve error message?

I have a panda dataframe with the following columns:
Stock ROC5 ROC20 ROC63 ROCmean
0 IBGL.SW -0.59 3.55 6.57 3.18
0 EHYA.SW 0.98 4.00 6.98 3.99
0 HIGH.SW 0.94 4.22 7.18 4.11
0 IHYG.SW 0.56 2.46 6.16 3.06
0 HYGU.SW 1.12 4.56 7.82 4.50
0 IBCI.SW 0.64 3.57 6.04 3.42
0 IAEX.SW 8.34 18.49 14.95 13.93
0 AGED.SW 9.45 24.74 28.13 20.77
0 ISAG.SW 7.97 21.61 34.34 21.31
0 IAPD.SW 0.51 6.62 19.54 8.89
0 IASP.SW 1.08 2.54 12.18 5.27
0 RBOT.SW 10.35 30.53 39.15 26.68
0 RBOD.SW 11.33 30.50 39.69 27.17
0 BRIC.SW 7.24 11.08 75.60 31.31
0 CNYB.SW 1.14 4.78 8.36 4.76
0 FXC.SW 5.68 13.84 19.29 12.94
0 DJSXE.SW 3.11 9.24 6.44 6.26
0 CSSX5E.SW -0.53 5.29 11.85 5.54
How can I write in the dataframe a new columns "Symbol" with the stock without ".SW".
Example first row result should be IBGL (modified value IBGL.SW).
Example last row result should be CSSX5E (splited value SSX5E.SW).
If I send the following command:
new_df['Symbol'] = new_df.loc[:, ('Stock')].str.split('.').str[0]
Than I receive an error message:
:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
new_df['Symbol'] = new_df.loc[:, ('Stock')].str.split('.').str[0]
How can I solve this problem?
Thanks a lot for your support.
METHOD 1:
You can do a vectorized operation by str.get(0) -
df['SYMBOL'] = df['Stock'].str.split('.').str.get(0)
METHOD 2:
You can do another vectorized operation by using expand=True in str.split() and then getting the first column.
df['SYMBOL'] = df['Stock'].str.split('.', expand = True)[0]
METHOD 3:
Or you can write a custom lambda function with apply (for more complex processes). Note, this is slower but good if you have your own UDF.
df['SYMBOL'] = df['Stock'].apply(lambda x:x.split('.')[0])
This is not an error, but a warning as you may have probably noticed your script finishes its execution.
edite: Given your comments it seems your issues generate previously in the code, therefore I suggest you use the following:
new_df = new_df.copy(deep=False)
And then proceed to solve it with:
new_df.loc['Symbol'] = new_df['Stock'].str.split('.').str[0]
new_df = new_df.copy()
new_df['Symbol'] = new_df.Stock.str.replace('.SW','')

Python loop fix [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
price = float(input("Enter the purchase price:"))
print("Month Starting Balance Interest to Pay Principal to Pay Payment Ending Balance")
#math
start = price * 0.10 - price
monthly = start * .05
interest = start * 0.12 / 12
principal = monthly - interest
ending = principal - start
for eachPass in range(1 ,24):
print(eachPass, "%16.2f" % start, "%16.2f" % interest, "%16.2f" % principal, "%13.2f" % monthly, "%16.2f" % ending)
start = ending
monthly = start * .05
interest = start * 0.12 / 12
principal = monthly - interest
ending = principal - start
input("press the enter key to exit")
I think it has to do with monthly = start * 0.5. Should it be something different?
What the output should be
Your problem seems to be that your prices switch signs every iteration. Positive to negative to positive, etc.
And they start negative, which is a problem.
I think the root of your problem is this:
start = price * 0.10 - price
sets price to be a negative value, since a given positive price will always be greater than one-tenth its own value. Maybe change the condition to
start = price * 0.90
instead?
The same issue is present with
ending = principal - start
in that it should be the other way around,
ending = start - principal
to keep everything positive. This needs to be done both outside the loop and inside the loop.
Making those changes and running the code again produced the following output:
Month Starting Balance Interest to Pay Principal to Pay Payment Ending Balance
1 180.00 1.80 7.20 9.00 172.80
2 172.80 1.73 6.91 8.64 165.89
3 165.89 1.66 6.64 8.29 159.25
4 159.25 1.59 6.37 7.96 152.88
5 152.88 1.53 6.12 7.64 146.77
6 146.77 1.47 5.87 7.34 140.90
7 140.90 1.41 5.64 7.04 135.26
8 135.26 1.35 5.41 6.76 129.85
9 129.85 1.30 5.19 6.49 124.66
10 124.66 1.25 4.99 6.23 119.67
11 119.67 1.20 4.79 5.98 114.88
12 114.88 1.15 4.60 5.74 110.29
13 110.29 1.10 4.41 5.51 105.88
14 105.88 1.06 4.24 5.29 101.64
15 101.64 1.02 4.07 5.08 97.58
16 97.58 0.98 3.90 4.88 93.67
17 93.67 0.94 3.75 4.68 89.93
18 89.93 0.90 3.60 4.50 86.33
19 86.33 0.86 3.45 4.32 82.88
20 82.88 0.83 3.32 4.14 79.56
21 79.56 0.80 3.18 3.98 76.38
22 76.38 0.76 3.06 3.82 73.32
23 73.32 0.73 2.93 3.67 70.39
There's also the issue that your Payment column changes values each time instead of remaining fixed at 9.00, but I'll let you figure that one out on your own (hint: why are you changing monthly inside the loop?).

python: grouping or splitting up time series data based on conditions

I work a lot with time series data at my job and I have been trying to use python--specifically pandas--to make some of the work a little faster. I have some code that reads through data in a DataFrame and identifies segments where specified conditions are met. It then separates those segments into individual DataFrames.
I have a sample DataFrame here:
Date Time Pressure Temp Flow Valve Position
0 3/5/2020 12:00:01 5.32 22.12 199 1.00
1 3/5/2020 12:00:02 5.36 22.25 115 0.95
2 3/5/2020 12:00:03 5.33 22.18 109 0.92
3 3/5/2020 12:00:04 5.38 23.51 103 0.90
4 3/5/2020 12:00:05 5.42 24.27 99 0.89
5 3/5/2020 12:00:06 5.49 25.91 92 0.85
6 3/5/2020 12:00:07 5.55 26.78 85 0.82
7 3/5/2020 12:00:08 5.61 29.88 82 0.76
8 3/5/2020 12:00:09 5.69 31.16 87 0.79
9 3/5/2020 12:00:10 5.72 32.01 97 0.87
10 3/5/2020 12:00:11 5.59 29.68 104 0.90
11 3/5/2020 12:00:12 5.53 24.55 111 0.93
12 3/5/2020 12:00:13 5.48 23.54 116 0.96
13 3/5/2020 12:00:14 5.44 23.11 119 1.00
14 3/5/2020 12:00:15 5.41 23.08 121 1.00
The code I have written does what I want but is really difficult to follow and I am sure its offensive to experienced python users.
Here is what it does though:
I more or less create a mask based on a set of conditions and I take the index locations for all the True values in the mask. Then it uses NumPy's .diff() function to identify discontinuity in the indices. Inside the for loop it splits up the mask at the location of each identified discontinuity. Once that is complete I can use the now separate sets of indices to slice out the desired segments of data from my original DataFrame. See the code below:
import pandas as pd
import numpy as np
df = pd.read_csv('sample_data.csv')
idx = np.where((df['Temp'] > 23) & (df['Temp'] < 30))[0]
discontinuity = np.where(np.diff(idx) > 1)[0]
intervals = {}
for i in range(len(discontinuity)+1):
if i == 0:
intervals[i] = df.iloc[idx[0]:idx[discontinuity[i]],1]
if len(intervals[i].values) < 1:
del intervals[i]
elif i == len(discontinuity):
intervals[i] = df.iloc[idx[discontinuity[i-1]+1]:idx[-1],1]
if len(intervals[i].values) < 1:
del intervals[i]
else:
intervals[i] = df.iloc[idx[discontinuity[i-1]+1]:idx[discontinuity[i]],1]
if len(intervals[i].values) < 1:
del intervals[i]
df1 = df.loc[intervals[0].index, :]
df2 = df.loc[intervals[1].index, :]
df1 and df2 contain all the data in the original DataFrame corresponding with the times (rows) that 'Temp' is between 23 and 30.
df1:
Date Time Pressure Temp Flow Valve Position
3 3/5/2020 12:00:04 5.38 23.51 103 0.90
4 3/5/2020 12:00:05 5.42 24.27 99 0.89
5 3/5/2020 12:00:06 5.49 25.91 92 0.85
6 3/5/2020 12:00:07 5.55 26.78 85 0.82
df2:
Date Time Pressure Temp Flow Valve Position
10 3/5/2020 12:00:11 5.59 29.68 104 0.90
11 3/5/2020 12:00:12 5.53 24.55 111 0.93
12 3/5/2020 12:00:13 5.48 23.54 116 0.96
13 3/5/2020 12:00:14 5.44 23.11 119 1.00
I am glad I was able to get this to work for me and I can live with the couple lines that get lost using this method but I know this is a really pedestrian approach and I can't help but think someone who is not a python beginning could do the same thing much more cleanly and efficiently.
Could groupby from itertools or pandas work for this? I haven't been able to find a way to make that work.
Welcome to Stack Overflow.
I think your code can be simplified as such:
# Get the subset that fulfills your conditions
df_conditioned = df.query('Temp > 23 and Temp < 30').copy()
# Check for discontinuities by looking at the indices
# I created a new column called 'Group' to keep track of the continuous indices
indices = df_conditioned.index.to_series()
df_conditioned['Group'] = ((indices - indices.shift(1)) != 1).cumsum()
# Store the groups (segments with same group number) as individual frames in a list
df_list = []
for group in df_conditioned['Group'].unique():
df_list.append(df_conditioned.query('Group == #group').drop(columns='Group'))
Hope it helps!

Cannot convert input to Timestamp, bday_range(...) - Pandas/Python

Looking to generate a number for the days in business days between current date and the end of the month of a pandas dataframe.
E.g. 26/06/2017 - 4, 23/06/2017 - 5
I'm having trouble as I keep getting a Type Error:
TypeError: Cannot convert input to Timestamp
From line:
result['bdaterange'] = pd.bdate_range(pd.to_datetime(result['dte'], unit='ns').values, pd.to_datetime(result['bdate'], unit='ns').values)
I have a Data Frame result with the column dte in a date format and I'm trying to create a new column (bdaterange) as a simple integer/float that I can use to see how far from month end in business days it has.
Sample data:
bid ask spread dte day bdate
01:49:00 2.17 3.83 1.66 2016-12-20 20.858333 2016-12-30
02:38:00 2.2 3.8 1.60 2016-12-20 20.716667 2016-12-30
22:15:00 2.63 3.12 0.49 2016-12-20 21.166667 2016-12-30
03:16:00 1.63 2.38 0.75 2016-12-21 21.391667 2016-12-30
07:11:00 1.46 2.54 1.08 2016-12-21 21.475000 2016-12-30
I've tried BDay() and using that the day cannot be 6 & 7 in the calculation but have not got anywhere. I came across bdate_range which I believe will be exactly what I'm looking for, but the closest I've got gives me the error Cannot convert input to Timestamp.
My attempt is:
result['bdate'] = pd.to_datetime(result['dte']) + BMonthEnd(0)
result['bdaterange'] = pd.bdate_range(pd.to_datetime(result['dte'], unit='ns').values, pd.to_datetime(result['bdate'], unit='ns').values)
print(result['bdaterange'])
Not sure how to solve the error though.
I think you need length of bdate_range for each row, so need custom function with apply:
#convert only once to datetime
result['dte'] = pd.to_datetime(result['dte'])
f = lambda x: len(pd.bdate_range(x['dte'], x['dte'] + pd.offsets.BMonthEnd(0)))
result['bdaterange'] = result.apply(f, axis=1)
print (result)
bid ask spread dte day bdaterange
01:49:00 2.17 3.83 1.66 2016-12-20 20.858333 9
02:38:00 2.20 3.80 1.60 2016-12-20 20.716667 9
22:15:00 2.63 3.12 0.49 2016-12-20 21.166667 9
03:16:00 1.63 2.38 0.75 2016-12-21 21.391667 8
07:11:00 1.46 2.54 1.08 2016-12-21 21.475000 8

Categories

Resources