How to compare with the previous line after reassignment [closed] - python

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 months ago.
Improve this question
Compare each row of column A with the previous row
If greater than, reassign to the value of the previous row
If less than, the value is unchanged
Now the problem is that each time the comparison is made with the original value
What I want is, to compare with the previous line after reassignment
import pandas as pd
import numpy as np
d={'A':[16,19,18,15,13,16]}
df = pd.DataFrame(d)
df['A_changed']=np.where(df.A>df.A.shift(),df.A.shift(),df.A)
df
A A_changed
0 16 16.0
1 19 16.0
2 18 18.0
3 15 15.0
4 13 13.0
5 16 13.0
expected output
A A_changed
0 16 16.0
1 19 16.0
2 18 16.0
3 15 15.0
4 13 13.0
5 16 13.0

Are you trying to do cummin?
df['compare_min'] = df['A'].cummin()
Output:
A compare compare_min
0 5 5.0 5
1 14 5.0 5
2 12 12.0 5
3 15 12.0 5
4 13 13.0 5
5 16 13.0 5
df['b'] = [10, 11, 12, 5, 8, 2]
df['compare_min_b'] = df['b'].cummin()
Output:
A compare compare_min b compare_min_b
0 5 5.0 5 10 10
1 14 5.0 5 11 10
2 12 12.0 5 12 10
3 15 12.0 5 5 5
4 13 13.0 5 8 5
5 16 13.0 5 2 2
Update using your example, this exactly what cummin does:
d={'A':[16,19,18,15,13,16]}
df = pd.DataFrame(d)
df['A_change'] = df['A'].cummin()
df
Output:
A A_changed A_change
0 16 16.0 16
1 19 16.0 16
2 18 18.0 16
3 15 15.0 15
4 13 13.0 13
5 16 13.0 13
Here is why your code will not work:
d={'A':[16,19,18,15,13,16]}
df = pd.DataFrame(d)
df['A_shift'] = df['A'].shift()
df
Output:
A A_shift
0 16 NaN
1 19 16.0
2 18 19.0
3 15 18.0
4 13 15.0
5 16 13.0
Look at the output of the shifted column, what you want to do is keep the cumulative mine instead of just comparing A to shifted A. Hence index 2 is not giving you what you expected.

Related

Is it possible to use pandas.DataFrame.rolling window period 5 with skipping today's value in that

need to get output in column <5_Days_Up> like the image.
Date price 5_Days_Up
20-May-21 1
21-May-21 2
22-May-21 4
23-May-21 5
24-May-21 6 5
25-May-21 7 6
26-May-21 8 7
27-May-21 9 8
28-May-21 10 9
29-May-21 11 10
30-May-21 12 11
31-May-21 13 12
1-Jun-21 14 13
2-Jun-21 15 14
But, got the output like this.
Date price 5_Days_Up
20-May-21 1
21-May-21 2
22-May-21 4
23-May-21 5
24-May-21 6 6
25-May-21 7 7
26-May-21 8 8
27-May-21 9 9
28-May-21 10 10
29-May-21 11 11
30-May-21 12 12
31-May-21 13 13
1-Jun-21 14 14
2-Jun-21 15 15
Here, in python pandas, I am using
df['5_Days_Up'] = df['price'].rolling(window=5).max()
is there a way to get the maximum value of the last 5 periods after skipping the today's price using the same rolling() or any other?
Your data has only 4 (instead of 5) previous entries before the entry on date 24-May-21 with price equals 6 (owing to there is no price equals 3 in the data sample.) Therefore, your first entry to show non-NaN value will start from the date 25-May-21 with price equals 7.
To include up to the previous entry (exclude current entry), you can use the parameter closed='left' to achieve this:
df['5_Days_Up'] = df['price'].rolling(window=5, closed='left').max()
Result:
Date price 5_Days_Up
0 20-May-21 1 NaN
1 21-May-21 2 NaN
2 22-May-21 4 NaN
3 23-May-21 5 NaN
4 24-May-21 6 NaN
5 25-May-21 7 6.0
6 26-May-21 8 7.0
7 27-May-21 9 8.0
8 28-May-21 10 9.0
9 29-May-21 11 10.0
10 30-May-21 12 11.0
11 31-May-21 13 12.0
12 1-Jun-21 14 13.0
13 2-Jun-21 15 14.0

Sum values in specific columns in DataFrame and ignore None

I want to calculate the distance that my cars have driven. I have all the coordinates that the cars need to go to. Some cars parks earlier then others, and this messes up my calculation.
I have this:
cars= pd.DataFrame({'x': [3,3,3,3,3,3,3,3,3],
'y': [1,2,3,4,5,6,7,8,9],
'x_goal_1': [3,3,3,3,3,3,3,3,3],
'y_goal_1': [10,10,10,10,10,10,10,10,10],
'x_goal_2': [17,24,31,31,17,17,38,38,31],
'y_goal_2': [10,10,10,10,10,10,10,10,10],
'x_goal_3': [17,24,31,31,17,17,38,38,31],
'y_goal_3': [17, 3, 3, 3, 17, 17, 17, 17, 3],
'x_goal_4': [None,27,35,28,14,18,42,43,None],
'y_goal_4': [None, 3, 3, 3, 17, 17, 17, 17, None],
'z': [3,4,5,6,7,8,9,12,22]})
cars['moved_tot'] = (
abs(cars['x']-cars['x_goal_1']) + abs(cars['y']-cars['y_goal_1']) +
abs(cars['x_goal_1']-cars['x_goal_2']) + abs(cars['y_goal_1']-cars['y_goal_2']) +
abs(cars['x_goal_2']-cars['x_goal_3']) + abs(cars['y_goal_2']-cars['y_goal_3']) +
abs(cars['x_goal_3']-cars['x_goal_4']) + abs(cars['y_goal_3']-cars['y_goal_4']) )
I then get:
x y x_goal_1 y_goal_1 ... x_goal_4 y_goal_4 z moved_tot
0 3 1 3 10 ... NaN NaN 3 NaN
1 3 2 3 10 ... 27.0 3.0 4 39.0
2 3 3 3 10 ... 35.0 3.0 5 46.0
3 3 4 3 10 ... 28.0 3.0 6 44.0
4 3 5 3 10 ... 14.0 17.0 7 29.0
5 3 6 3 10 ... 18.0 17.0 8 26.0
6 3 7 3 10 ... 42.0 17.0 9 49.0
7 3 8 3 10 ... 43.0 17.0 12 49.0
8 3 9 3 10 ... NaN NaN 22 NaN
I want the first moved_tot I want 30, and in the last I want 36. I want the calculation to ignore if a value is None ( that is if this car has parked earlier ). How do I do this?
with help from David S ( thank you! ) I figured out how to do it.
bags['moved_tot'] = (
abs(bags['x']-bags['x_goal_1']).fillna(0) + abs(bags['y']-bags['y_goal_1']).fillna(0) +
abs(bags['x_goal_1']-bags['x_goal_2']).fillna(0) + abs(bags['y_goal_1']-bags['y_goal_2']).fillna(0) +
abs(bags['x_goal_2']-bags['x_goal_3']).fillna(0) + abs(bags['y_goal_2']-bags['y_goal_3']).fillna(0) +
abs(bags['x_goal_3']-bags['x_goal_4']).fillna(0) + abs(bags['y_goal_3']-bags['y_goal_4']).fillna(0)
)
You could just replace the NaN inplace with 0 to avoid getting NaN in the result column, like:
cars['moved_tot'] = (abs(cars['x']-cars['x_goal_1'].fillna(0)) + abs(cars['y']-cars['y_goal_1'].fillna(0)) +
abs(cars['x_goal_1'].fillna(0)-cars['x_goal_2'].fillna(0)) + abs(cars['y_goal_1'].fillna(0)-cars['y_goal_2'].fillna(0)) +
abs(cars['x_goal_2'].fillna(0)-cars['x_goal_3'].fillna(0)) + abs(cars['y_goal_2'].fillna(0)-cars['y_goal_3'].fillna(0)) +
abs(cars['x_goal_3'].fillna(0)-cars['x_goal_4'].fillna(0)) + abs(cars['y_goal_3'].fillna(0)-cars['y_goal_4'].fillna(0)) )
If you want to 0 the calculation result if NaN is present just move the .fillna(0) to outside the abs()
Just use cars = cars.fillna(0) or cars.fillna(0, inplace=True) to get rid of repeating .fillna(0) after each abs.
If you don't want to change the original dataframe, use cars_ = cars.fillna(0) then replace cars in cars['moved_tot'] to cars_.
Besides, you could make use of the feature of columns to get rid of writing repeating column names.
cars_ = cars.fillna(0)
cars_['moved_tot'] = 0
for i in range(len(cars.columns) - 3):
print(cars_.columns[i], '-', cars_.columns[i+2])
cars_['moved_tot'] += abs(cars_[cars_.columns[i]] - cars_[cars_.columns[i+2]])
# Output and print(cars_)
x - x_goal_1
y - y_goal_1
x_goal_1 - x_goal_2
y_goal_1 - y_goal_2
x_goal_2 - x_goal_3
y_goal_2 - y_goal_3
x_goal_3 - x_goal_4
y_goal_3 - y_goal_4
x y x_goal_1 y_goal_1 x_goal_2 y_goal_2 x_goal_3 y_goal_3 x_goal_4 y_goal_4 z moved_tot
0 3 1 3 10 17 10 17 17 0.0 0.0 3 64.0
1 3 2 3 10 24 10 24 3 27.0 3.0 4 39.0
2 3 3 3 10 31 10 31 3 35.0 3.0 5 46.0
3 3 4 3 10 31 10 31 3 28.0 3.0 6 44.0
4 3 5 3 10 17 10 17 17 14.0 17.0 7 29.0
5 3 6 3 10 17 10 17 17 18.0 17.0 8 26.0
6 3 7 3 10 38 10 38 17 42.0 17.0 9 49.0
7 3 8 3 10 38 10 38 17 43.0 17.0 12 49.0
8 3 9 3 10 31 10 31 3 0.0 0.0 22 70.0
You could even do the sum in one line
cars_['moved_tot'] = pd.concat([abs(cars_[cars_.columns[i]] - cars_[cars_.columns[i+2]]) for i in range(len(cars.columns) - 3)], axis=1).sum(1)

How to perform a rolling average for irregular time intervals in pandas?

I have a pandas dataframe of the form df :
timeCol dataCol
2 5
9.135 8
11 4
12 6
I want to do a rolling mean over a 3 second interval over dataCol such that it returns a dataframe of the form, new_df :
startTime endTime meanCol
0 3 5.0
1 4 5.0
2 5 5.0
3 6 0.0
4 7 0.0
5 8 0.0
6 9 0.0
7 10 8.0
8 11 6.0
9 12 6.0
10 13 5.0
11 14 5.0
12 15 6.0
Notice, in new_df, for example, for time ranges (8-11) and (9-12), the value of 6.0 is returned (because mean(8,4)=6.0 and mean(8,4,6)=6.0 respectively. All columns are float type. time_col will always be ordered. What is an efficient, pythonic way of achieving this?
I am using numpy board-cast
df=pd.DataFrame({'startTime':np.arange(13),'endTime':np.arange(13)+3})
s=ori.timeCol[:,None]
s1=(df.startTime.values-s<=0)&(df.endTime.values-s>=0)
df['New']=ori.dataCol.dot(s1)/s1.sum(axis=0)
df
startTime endTime New
0 0 3 5.0
1 1 4 5.0
2 2 5 5.0
3 3 6 NaN
4 4 7 NaN
5 5 8 NaN
6 6 9 NaN
7 7 10 8.0
8 8 11 6.0
9 9 12 6.0
10 10 13 5.0
11 11 14 5.0
12 12 15 6.0
Here's one way to do it:
import pandas as pd
# Source data
data = {
'timeCol': [2, 9.135, 11, 12],
'dataCol': [5, 8, 4, 6]
}
df = pd.DataFrame(data=data)
# Build list of rows based on time series
rows = []
for startTime in range(12):
endTime = startTime + 3
print(startTime, ' to ', endTime)
# Get only rows from source data that match current time interval
filtered = df.loc[(df['timeCol'] >= startTime) &
(df['timeCol'] <= endTime)]
# Append current row, including mean of matching source rows
rows.append([startTime, endTime, filtered['dataCol'].mean()])
# Create final dataframe, replacing any missing values with 0
res = pd.DataFrame(data=rows, columns=['startTime', 'endTime', 'meanCol']).fillna(0)
print(res)
You could also build the result set first, then loop through it and calculate the average for each row in that.

Call a Nan Value and change to a number in python

I have a DataFrame, say df, which looks like this:
id property_type1 property_type pro
1 Condominium 2 2
2 Farm 14 14
3 House 7 7
4 Lots/Land 15 15
5 Mobile/Manufactured Home 13 13
6 Multi-Family 8 8
7 Townhouse 11 11
8 Single Family 10 10
9 Apt/Condo 1 1
10 Home 7 7
11 NaN 29 NaN
Now, I need the pro column to have the same value as the property_type column, whenever the property_type1 column has a NaN value. This is how it should be:
id property_type1 property_type pro
1 Condominium 2 2
2 Farm 14 14
3 House 7 7
4 Lots/Land 15 15
5 Mobile/Manufactured Home 13 13
6 Multi-Family 8 8
7 Townhouse 11 11
8 Single Family 10 10
9 Apt/Condo 1 1
10 Home 7 7
11 NaN 29 29
That is, in line 11, where property_type1 is NaN, the value of the pro column becomes 29, which is the value of property_type. How can I do this?
ix is deprecated, don't use it.
Option 1
I'd do this with np.where -
df = df.assign(pro=np.where(df.pro.isnull(), df.property_type, df.pro))
df
id property_type1 property_type pro
0 1 Condominium 2 2.0
1 2 Farm 14 14.0
2 3 House 7 7.0
3 4 Lots/Land 15 15.0
4 5 Mobile/Manufactured Home 13 13.0
5 6 Multi-Family 8 8.0
6 7 Townhouse 11 11.0
7 8 Single Family 10 10.0
8 9 Apt/Condo 1 1.0
9 10 Home 7 7.0
10 11 NaN 29 29.0
Option 2
If you want to perform in-place assignment, use loc -
m = df.pro.isnull()
df.loc[m, 'pro'] = df.loc[m, 'property_type']
df
id property_type1 property_type pro
0 1 Condominium 2 2.0
1 2 Farm 14 14.0
2 3 House 7 7.0
3 4 Lots/Land 15 15.0
4 5 Mobile/Manufactured Home 13 13.0
5 6 Multi-Family 8 8.0
6 7 Townhouse 11 11.0
7 8 Single Family 10 10.0
8 9 Apt/Condo 1 1.0
9 10 Home 7 7.0
10 11 NaN 29 29.0
Compute the mask just once, and use it to index multiple times, which should be more efficient than computing it twice.
Find the rows where property_type1 column is NaN, and for those rows: assign the property_type values to the pro column.
df.ix[df.property_type1.isnull(), 'pro'] = df.ix[df.property_type1.isnull(), 'property_type']

DataFrame: fillna() with running sum of valid values

I'm working a Pandas Dataframe, that looks like this:
0 Data
1
2
3
4 5
5
6
7
8 21
9
10 2
11
12
13
14
15
I'm trying to fill the blank with next valid values by: df.fillna(method='backfill'). This works, but then I need to add the previous valid value to the next valid value, from the bottom up, such as:
0 Data
1 28
2 28
3 28
4 28
5 23
6 23
7 23
8 23
9 2
10 2
11
12
13
14
15
I can get this to work by looping over it, but is there a method within pandas that can do this?
Thanks a lot!
You could reverse the df, then fillna(0) and then cumsum and reverse again:
In [12]:
df = df[::-1].fillna(0).cumsum()[::-1]
df
Out[12]:
Data
0 28.0
1 28.0
2 28.0
3 28.0
4 23.0
5 23.0
6 23.0
7 23.0
8 2.0
9 2.0
10 0.0
11 0.0
12 0.0
13 0.0
14 0.0
here we use slicing notation to reverse the df, then replace all NaN with 0, perform cumsum and reverse back
Another simple way to do that : df.sum()-df.fillna(0).cumsum()

Categories

Resources