need to get output in column <5_Days_Up> like the image.
Date price 5_Days_Up
20-May-21 1
21-May-21 2
22-May-21 4
23-May-21 5
24-May-21 6 5
25-May-21 7 6
26-May-21 8 7
27-May-21 9 8
28-May-21 10 9
29-May-21 11 10
30-May-21 12 11
31-May-21 13 12
1-Jun-21 14 13
2-Jun-21 15 14
But, got the output like this.
Date price 5_Days_Up
20-May-21 1
21-May-21 2
22-May-21 4
23-May-21 5
24-May-21 6 6
25-May-21 7 7
26-May-21 8 8
27-May-21 9 9
28-May-21 10 10
29-May-21 11 11
30-May-21 12 12
31-May-21 13 13
1-Jun-21 14 14
2-Jun-21 15 15
Here, in python pandas, I am using
df['5_Days_Up'] = df['price'].rolling(window=5).max()
is there a way to get the maximum value of the last 5 periods after skipping the today's price using the same rolling() or any other?
Your data has only 4 (instead of 5) previous entries before the entry on date 24-May-21 with price equals 6 (owing to there is no price equals 3 in the data sample.) Therefore, your first entry to show non-NaN value will start from the date 25-May-21 with price equals 7.
To include up to the previous entry (exclude current entry), you can use the parameter closed='left' to achieve this:
df['5_Days_Up'] = df['price'].rolling(window=5, closed='left').max()
Result:
Date price 5_Days_Up
0 20-May-21 1 NaN
1 21-May-21 2 NaN
2 22-May-21 4 NaN
3 23-May-21 5 NaN
4 24-May-21 6 NaN
5 25-May-21 7 6.0
6 26-May-21 8 7.0
7 27-May-21 9 8.0
8 28-May-21 10 9.0
9 29-May-21 11 10.0
10 30-May-21 12 11.0
11 31-May-21 13 12.0
12 1-Jun-21 14 13.0
13 2-Jun-21 15 14.0
I want to calculate the distance that my cars have driven. I have all the coordinates that the cars need to go to. Some cars parks earlier then others, and this messes up my calculation.
I have this:
cars= pd.DataFrame({'x': [3,3,3,3,3,3,3,3,3],
'y': [1,2,3,4,5,6,7,8,9],
'x_goal_1': [3,3,3,3,3,3,3,3,3],
'y_goal_1': [10,10,10,10,10,10,10,10,10],
'x_goal_2': [17,24,31,31,17,17,38,38,31],
'y_goal_2': [10,10,10,10,10,10,10,10,10],
'x_goal_3': [17,24,31,31,17,17,38,38,31],
'y_goal_3': [17, 3, 3, 3, 17, 17, 17, 17, 3],
'x_goal_4': [None,27,35,28,14,18,42,43,None],
'y_goal_4': [None, 3, 3, 3, 17, 17, 17, 17, None],
'z': [3,4,5,6,7,8,9,12,22]})
cars['moved_tot'] = (
abs(cars['x']-cars['x_goal_1']) + abs(cars['y']-cars['y_goal_1']) +
abs(cars['x_goal_1']-cars['x_goal_2']) + abs(cars['y_goal_1']-cars['y_goal_2']) +
abs(cars['x_goal_2']-cars['x_goal_3']) + abs(cars['y_goal_2']-cars['y_goal_3']) +
abs(cars['x_goal_3']-cars['x_goal_4']) + abs(cars['y_goal_3']-cars['y_goal_4']) )
I then get:
x y x_goal_1 y_goal_1 ... x_goal_4 y_goal_4 z moved_tot
0 3 1 3 10 ... NaN NaN 3 NaN
1 3 2 3 10 ... 27.0 3.0 4 39.0
2 3 3 3 10 ... 35.0 3.0 5 46.0
3 3 4 3 10 ... 28.0 3.0 6 44.0
4 3 5 3 10 ... 14.0 17.0 7 29.0
5 3 6 3 10 ... 18.0 17.0 8 26.0
6 3 7 3 10 ... 42.0 17.0 9 49.0
7 3 8 3 10 ... 43.0 17.0 12 49.0
8 3 9 3 10 ... NaN NaN 22 NaN
I want the first moved_tot I want 30, and in the last I want 36. I want the calculation to ignore if a value is None ( that is if this car has parked earlier ). How do I do this?
with help from David S ( thank you! ) I figured out how to do it.
bags['moved_tot'] = (
abs(bags['x']-bags['x_goal_1']).fillna(0) + abs(bags['y']-bags['y_goal_1']).fillna(0) +
abs(bags['x_goal_1']-bags['x_goal_2']).fillna(0) + abs(bags['y_goal_1']-bags['y_goal_2']).fillna(0) +
abs(bags['x_goal_2']-bags['x_goal_3']).fillna(0) + abs(bags['y_goal_2']-bags['y_goal_3']).fillna(0) +
abs(bags['x_goal_3']-bags['x_goal_4']).fillna(0) + abs(bags['y_goal_3']-bags['y_goal_4']).fillna(0)
)
You could just replace the NaN inplace with 0 to avoid getting NaN in the result column, like:
cars['moved_tot'] = (abs(cars['x']-cars['x_goal_1'].fillna(0)) + abs(cars['y']-cars['y_goal_1'].fillna(0)) +
abs(cars['x_goal_1'].fillna(0)-cars['x_goal_2'].fillna(0)) + abs(cars['y_goal_1'].fillna(0)-cars['y_goal_2'].fillna(0)) +
abs(cars['x_goal_2'].fillna(0)-cars['x_goal_3'].fillna(0)) + abs(cars['y_goal_2'].fillna(0)-cars['y_goal_3'].fillna(0)) +
abs(cars['x_goal_3'].fillna(0)-cars['x_goal_4'].fillna(0)) + abs(cars['y_goal_3'].fillna(0)-cars['y_goal_4'].fillna(0)) )
If you want to 0 the calculation result if NaN is present just move the .fillna(0) to outside the abs()
Just use cars = cars.fillna(0) or cars.fillna(0, inplace=True) to get rid of repeating .fillna(0) after each abs.
If you don't want to change the original dataframe, use cars_ = cars.fillna(0) then replace cars in cars['moved_tot'] to cars_.
Besides, you could make use of the feature of columns to get rid of writing repeating column names.
cars_ = cars.fillna(0)
cars_['moved_tot'] = 0
for i in range(len(cars.columns) - 3):
print(cars_.columns[i], '-', cars_.columns[i+2])
cars_['moved_tot'] += abs(cars_[cars_.columns[i]] - cars_[cars_.columns[i+2]])
# Output and print(cars_)
x - x_goal_1
y - y_goal_1
x_goal_1 - x_goal_2
y_goal_1 - y_goal_2
x_goal_2 - x_goal_3
y_goal_2 - y_goal_3
x_goal_3 - x_goal_4
y_goal_3 - y_goal_4
x y x_goal_1 y_goal_1 x_goal_2 y_goal_2 x_goal_3 y_goal_3 x_goal_4 y_goal_4 z moved_tot
0 3 1 3 10 17 10 17 17 0.0 0.0 3 64.0
1 3 2 3 10 24 10 24 3 27.0 3.0 4 39.0
2 3 3 3 10 31 10 31 3 35.0 3.0 5 46.0
3 3 4 3 10 31 10 31 3 28.0 3.0 6 44.0
4 3 5 3 10 17 10 17 17 14.0 17.0 7 29.0
5 3 6 3 10 17 10 17 17 18.0 17.0 8 26.0
6 3 7 3 10 38 10 38 17 42.0 17.0 9 49.0
7 3 8 3 10 38 10 38 17 43.0 17.0 12 49.0
8 3 9 3 10 31 10 31 3 0.0 0.0 22 70.0
You could even do the sum in one line
cars_['moved_tot'] = pd.concat([abs(cars_[cars_.columns[i]] - cars_[cars_.columns[i+2]]) for i in range(len(cars.columns) - 3)], axis=1).sum(1)
I have a pandas dataframe of the form df :
timeCol dataCol
2 5
9.135 8
11 4
12 6
I want to do a rolling mean over a 3 second interval over dataCol such that it returns a dataframe of the form, new_df :
startTime endTime meanCol
0 3 5.0
1 4 5.0
2 5 5.0
3 6 0.0
4 7 0.0
5 8 0.0
6 9 0.0
7 10 8.0
8 11 6.0
9 12 6.0
10 13 5.0
11 14 5.0
12 15 6.0
Notice, in new_df, for example, for time ranges (8-11) and (9-12), the value of 6.0 is returned (because mean(8,4)=6.0 and mean(8,4,6)=6.0 respectively. All columns are float type. time_col will always be ordered. What is an efficient, pythonic way of achieving this?
I am using numpy board-cast
df=pd.DataFrame({'startTime':np.arange(13),'endTime':np.arange(13)+3})
s=ori.timeCol[:,None]
s1=(df.startTime.values-s<=0)&(df.endTime.values-s>=0)
df['New']=ori.dataCol.dot(s1)/s1.sum(axis=0)
df
startTime endTime New
0 0 3 5.0
1 1 4 5.0
2 2 5 5.0
3 3 6 NaN
4 4 7 NaN
5 5 8 NaN
6 6 9 NaN
7 7 10 8.0
8 8 11 6.0
9 9 12 6.0
10 10 13 5.0
11 11 14 5.0
12 12 15 6.0
Here's one way to do it:
import pandas as pd
# Source data
data = {
'timeCol': [2, 9.135, 11, 12],
'dataCol': [5, 8, 4, 6]
}
df = pd.DataFrame(data=data)
# Build list of rows based on time series
rows = []
for startTime in range(12):
endTime = startTime + 3
print(startTime, ' to ', endTime)
# Get only rows from source data that match current time interval
filtered = df.loc[(df['timeCol'] >= startTime) &
(df['timeCol'] <= endTime)]
# Append current row, including mean of matching source rows
rows.append([startTime, endTime, filtered['dataCol'].mean()])
# Create final dataframe, replacing any missing values with 0
res = pd.DataFrame(data=rows, columns=['startTime', 'endTime', 'meanCol']).fillna(0)
print(res)
You could also build the result set first, then loop through it and calculate the average for each row in that.
I have a DataFrame, say df, which looks like this:
id property_type1 property_type pro
1 Condominium 2 2
2 Farm 14 14
3 House 7 7
4 Lots/Land 15 15
5 Mobile/Manufactured Home 13 13
6 Multi-Family 8 8
7 Townhouse 11 11
8 Single Family 10 10
9 Apt/Condo 1 1
10 Home 7 7
11 NaN 29 NaN
Now, I need the pro column to have the same value as the property_type column, whenever the property_type1 column has a NaN value. This is how it should be:
id property_type1 property_type pro
1 Condominium 2 2
2 Farm 14 14
3 House 7 7
4 Lots/Land 15 15
5 Mobile/Manufactured Home 13 13
6 Multi-Family 8 8
7 Townhouse 11 11
8 Single Family 10 10
9 Apt/Condo 1 1
10 Home 7 7
11 NaN 29 29
That is, in line 11, where property_type1 is NaN, the value of the pro column becomes 29, which is the value of property_type. How can I do this?
ix is deprecated, don't use it.
Option 1
I'd do this with np.where -
df = df.assign(pro=np.where(df.pro.isnull(), df.property_type, df.pro))
df
id property_type1 property_type pro
0 1 Condominium 2 2.0
1 2 Farm 14 14.0
2 3 House 7 7.0
3 4 Lots/Land 15 15.0
4 5 Mobile/Manufactured Home 13 13.0
5 6 Multi-Family 8 8.0
6 7 Townhouse 11 11.0
7 8 Single Family 10 10.0
8 9 Apt/Condo 1 1.0
9 10 Home 7 7.0
10 11 NaN 29 29.0
Option 2
If you want to perform in-place assignment, use loc -
m = df.pro.isnull()
df.loc[m, 'pro'] = df.loc[m, 'property_type']
df
id property_type1 property_type pro
0 1 Condominium 2 2.0
1 2 Farm 14 14.0
2 3 House 7 7.0
3 4 Lots/Land 15 15.0
4 5 Mobile/Manufactured Home 13 13.0
5 6 Multi-Family 8 8.0
6 7 Townhouse 11 11.0
7 8 Single Family 10 10.0
8 9 Apt/Condo 1 1.0
9 10 Home 7 7.0
10 11 NaN 29 29.0
Compute the mask just once, and use it to index multiple times, which should be more efficient than computing it twice.
Find the rows where property_type1 column is NaN, and for those rows: assign the property_type values to the pro column.
df.ix[df.property_type1.isnull(), 'pro'] = df.ix[df.property_type1.isnull(), 'property_type']
I'm working a Pandas Dataframe, that looks like this:
0 Data
1
2
3
4 5
5
6
7
8 21
9
10 2
11
12
13
14
15
I'm trying to fill the blank with next valid values by: df.fillna(method='backfill'). This works, but then I need to add the previous valid value to the next valid value, from the bottom up, such as:
0 Data
1 28
2 28
3 28
4 28
5 23
6 23
7 23
8 23
9 2
10 2
11
12
13
14
15
I can get this to work by looping over it, but is there a method within pandas that can do this?
Thanks a lot!
You could reverse the df, then fillna(0) and then cumsum and reverse again:
In [12]:
df = df[::-1].fillna(0).cumsum()[::-1]
df
Out[12]:
Data
0 28.0
1 28.0
2 28.0
3 28.0
4 23.0
5 23.0
6 23.0
7 23.0
8 2.0
9 2.0
10 0.0
11 0.0
12 0.0
13 0.0
14 0.0
here we use slicing notation to reverse the df, then replace all NaN with 0, perform cumsum and reverse back
Another simple way to do that : df.sum()-df.fillna(0).cumsum()