Subtract fixed row value in reference to column value in pandas dataframe

Subtract fixed row value in reference to column value in pandas dataframe - python

I would like to subtract a fixed row value in rows, in reference to their values in another column.
My data looks like this:
TRACK TIME POSITION_X
0 1 0 12
1 1 30 13
2 1 60 15
3 1 90 11
4 2 0 10
5 2 20 11
6 2 60 13
7 2 90 17
I would like to subtract a fixed row value (WHEN TIME=0) of the POSITION_X column in reference to the TRACK column, and create a new column ("NEW_POSX") with those values. The output should be like this:
TRACK TIME POSITION_X NEW_POSX
0 1 0 12 0
1 1 30 13 1
2 1 60 15 3
3 1 90 11 -1
4 2 0 10 0
5 2 20 11 1
6 2 60 13 3
7 2 90 17 7
I have been using the following code to get this done:
import pandas as pd
data = {'TRACK': [1,1,1,1,2,2,2,2],
'TIME': [0,30,60,90,0,20,60,90],
'POSITION_X': [12,13,15,11,10,11,13,17],
}
df = pd.DataFrame (data, columns = ['TRACK','TIME','POSITION_X'])
df['NEW_POSX']= df.groupby('TRACK')['POSITION_X'].diff().fillna(0).astype(int)
df.head(8)
... but I don't get the desired output. Instead, I get a new column where every row is subtracted by the previous row (according to the "TRACK" column):
TRACK TIME POSITION_X NEW_POSX
0 1 0 12 0
1 1 30 13 1
2 1 60 15 2
3 1 90 11 -4
4 2 0 10 0
5 2 20 11 1
6 2 60 13 2
7 2 90 17 4
can anyone help me with this?

You can use transform and first to get the value at time 0, and then substract it to the 'POSITION_X' column:
s=df.groupby('TRACK')['POSITION_X'].transform('first')
df['NEW_POSX']=df['POSITION_X']-s
#Same as:
#df['NEW_POSX']=df['POSITION_X'].sub(s)
Output:
df
TRACK TIME POSITION_X NEW_POSX
0 1 0 12 0
1 1 30 13 1
2 1 60 15 3
3 1 90 11 -1
4 2 0 10 0
5 2 20 11 1
6 2 60 13 3
7 2 90 17 7

Related

Python group by only neighbours

I have a dataset consisting of measurements, and my dataframe looks like that:
ID VAL BS ERROR
0 0 0 0
1 1 0 1
2 1 0 1
3 0 0 0
4 11 10 1
5 10 10 0
6 12 10 2
7 11 10 1
8 9 10 -1
9 30 30 0
10 31 30 1
11 29 30 -1
12 10 10 0
13 9 10 -1
14 8 10 -2
15 11 10 1
16 0 0 0
17 1 0 1
18 2 0 2
19 9 10 -1
20 10 10 0
Where VAL is measured value, BS is base(round to nearest 10), and ERROR is a difference between measured value and base.
What I am trying to do is somewhat group by 'BASE' column, but only for neighborhood rows.
So, a resulting dataset will look like that (I will also want to calculate aggregate min and max error for a group, but I guess it will not be a problem)
It is important to keep the order of the groups for this case.
ID BS MIN MAX
0 0 0 1
1 10 -1 2
2 30 -1 1
3 10 -2 1
4 0 0 2
5 10 -1 0

You can find the consecutive groups like this:
df['GROUP'] = (df['BS']!=df['BS'].shift()).cumsum()
Then you group by the GROUP column and aggregate min and max:
df.groupby(['GROUP', 'BS'])['ERROR'].agg(['min', 'max']).reset_index()
The output should be:
GROUP BS min max
0 1 0 0 1
1 2 10 -1 2
2 3 30 -1 1
3 4 10 -2 1
4 5 0 0 2
5 6 10 -1 0

Python Dataframe GroupBy Function

I am having hard time understanding what the code below does. I initially thought it was counting the unique appearances of the values in (weight age) and (weight height) however when I ran this example, I found out it was doing something else.
data = [[0,33,15,4],[1,44,12,3],[0,44,12,5],[1,33,15,4],[0,77,13,4],[1,33,15,4],[1,99,40,7],[0,58,45,4],[1,11,13,4]]
df = pd.DataFrame(data,columns=["Lbl","Weight","Age","Height"])
print (df)
def group_fea(df,key,target):
'''
Adds columns for feature combinations
'''
tmp = df.groupby(key, as_index=False)[target].agg({
key+target + '_nunique': 'nunique',
}).reset_index()
del tmp['index']
print("****{}****".format(target))
return tmp
#Add feature combinations
feature_key = ['Weight']
feature_target = ['Age','Height']
for key in feature_key:
for target in feature_target:
tmp = group_fea(df,key,target)
df = df.merge(tmp,on=key,how='left')
print (df)
Lbl Weight Age Height
0 0 33 15 4
1 1 44 12 3
2 0 44 12 5
3 1 33 15 4
4 0 77 13 4
5 1 33 15 4
6 1 99 40 7
7 0 58 45 4
8 1 11 13 4
****Age****
****Height****
Lbl Weight Age Height WeightAge_nunique WeightHeight_nunique
0 0 33 15 4 1 1
1 1 44 12 3 1 2
2 0 44 12 5 1 2
3 1 33 15 4 1 1
4 0 77 13 4 1 1
5 1 33 15 4 1 1
6 1 99 40 7 1 1
7 0 58 45 4 1 1
8 1 11 13 4 1 1
I want to understand what the values in WeightAge_nunique WeightHeight_nunique mean

The value of WeightAge_nunique on a given row is the number of unique Ages that have the same Weight. The corresponding thing is true of WeightHeight_nunique. E.g., for people of Weight=44, there is only 1 unique age (12), hence WeightAge_nunique=1 on those rows, but there are 2 unique Heights (3 and 5), hence WeightHeight_nunique=2 on those same rows.
You can see that this happens because the grouping function groups by the "key" column (Weight), then performs the "nunique" aggregation function on the "target" column (either Age or Height).

Let us try transform
g = df.groupby('Weight').transform('nunique')
df['WeightAge_nunique'] = g['Age']
df['WeightHeight_nunique'] = g['Height']
df
Out[196]:
Lbl Weight Age Height WeightAge_nunique WeightHeight_nunique
0 0 33 15 4 1 1
1 1 44 12 3 1 2
2 0 44 12 5 1 2
3 1 33 15 4 1 1
4 0 77 13 4 1 1
5 1 33 15 4 1 1
6 1 99 40 7 1 1
7 0 58 45 4 1 1
8 1 11 13 4 1 1

How do you parse out data from a dataframe for each ID when an adjacent column contains a certain value?

I have a large dataframe in the following format. I need to parse out only the values where values ==1 and through the remaining id. This should reset on each ID so that it takes the first value in a unique id that contains the value 1 and ends when the id number terminates.
d={'ID':[1,1,1,1,1,2,2,2,2,2,3,3,3,3,4,4,4,4,4,4,4,4,4,5,5,5,5,5] \
,'values':[0,0,0,1,0,1,0,1,1,1,0,1,0,0,0,0,0,0,1,1,0,1,0,1,1,1,1,1,] }
df=pd.DataFrame(data=d)
df=pd.DataFrame(data=d)
df
ND = {'ID':[1,1,2,2,2,2,2,3,3,3,4,4,4,4,4,5,5,5,5,5],\
'values':[1,0,1,0,1,1,1,1,0,0,1,1,0,1,0,1,1,1,1,1]}
df_final=pd.DataFrame(ND)
df_final
'''

IIUC,
df[df.groupby('ID')['values'].transform('cummax')==1]
Output:
ID values
3 1 1
4 1 0
5 2 1
6 2 0
7 2 1
8 2 1
9 2 1
11 3 1
12 3 0
13 3 0
18 4 1
19 4 1
20 4 0
21 4 1
22 4 0
23 5 1
24 5 1
25 5 1
26 5 1
27 5 1
Details, use cummax to keep the value of 1 after first found. Then use equal to 1 to create a boolean series, which then is used to do boolean indexing.

if your column values is only 0 and 1, you can use groupby.cummax that will replace 0 by 1 if they are after a 1 per ID, then use this as a boolean mask:
df_ = df[df.groupby('ID')['values'].cummax().astype(bool).to_numpy()]
print(df_)
ID values
3 1 1
4 1 0
5 2 1
6 2 0
7 2 1
8 2 1
9 2 1
11 3 1
12 3 0
13 3 0
18 4 1
19 4 1
20 4 0
21 4 1
22 4 0
23 5 1
24 5 1
25 5 1
26 5 1
27 5 1

How to downsampling time series data in pandas?

I have a time series in pandas that looks like this (order by id):
id time value
1 0 2
1 1 4
1 2 5
1 3 10
1 4 15
1 5 16
1 6 18
1 7 20
2 15 3
2 16 5
2 17 8
2 18 10
4 6 5
4 7 6
I want downsampling time from 1 minute to 3 minutes for each group id.
And value is a maximum of group (id and 3 minutes).
The output should be like:
id time value
1 0 5
1 1 16
1 2 20
2 0 8
2 1 10
4 0 6
I tried loop it take long time process.
Any idea how to solve this for large dataframe?
Thanks!

You can convert your time series to an actual timedelta, then use resample for a vectorized solution:
t = pd.to_timedelta(df.time, unit='T')
s = df.set_index(t).groupby('id').resample('3T').last().reset_index(drop=True)
s.assign(time=s.groupby('id').cumcount())
id time value
0 1 0 5
1 1 1 16
2 1 2 20
3 2 0 8
4 2 1 10
5 4 0 6

Use np.r_ and .iloc with groupby:
df.groupby('id')['value'].apply(lambda x: x.iloc[np.r_[2:len(x):3,-1]])
Output:
id
1 2 5
5 16
7 20
2 10 8
11 10
4 13 6
Name: value, dtype: int64
Going a little further with column naming etc..
df_out = df.groupby('id')['value']\
.apply(lambda x: x.iloc[np.r_[2:len(x):3,-1]]).reset_index()
df_out.assign(time=df_out.groupby('id').cumcount()).drop('level_1', axis=1)
Output:
id value time
0 1 5 0
1 1 16 1
2 1 20 2
3 2 8 0
4 2 10 1
5 4 6 0

how can I create a data frame containing all the combinations of a column?pandas python?

how can I create a data frame containing all the combinations a column(eg. Usage) value for a specific group (eg. userid) in a data frame using pandas python?
for example:
if this is the data frame I have,
user-id serial-number value day
1 2 10 1
1 2 20 2
1 2 30 3
1 2 40 4
1 2 50 5
1 2 60 6
1 2 70 7
1 2 80 8
1 2 90 9
1 2 100 10
1 2 200 11
1 2 300 12
1 2 400 13
2 3 11 1
2 3 12 2
2 3 13 3
2 3 14 4
2 3 15 5
2 3 16 6
2 3 17 7
2 3 18 8
I need the resultant dataframe to be:
(combinations for the first value in the "value" column)
user-id serial-number value value1 day
1 2 10 10 1
1 2 10 20 1
1 2 10 30 1
1 2 10 40 1
1 2 10 50 1
1 2 10 60 1
1 2 10 70 1
1 2 10 80 1
1 2 10 90 1
1 2 10 100 1
1 2 10 200 1
1 2 10 300 1
1 2 10 400 1
.
.
.
2 3 11 11 1
2 3 11 12 1
2 3 11 13 1
2 3 11 14 1
2 3 11 15 1
2 3 11 16 1
2 3 11 17 1
2 3 11 18 1
similarly i want to do it for all the values in the "value" column.
Any Suggestions?

subset = pd.DataFrame()
for _, i in test1.groupby(['user-id']):
vals = i['value']
val, val1 = zip(*product(vals, vals))
vals_len = len(vals)
def elongate(s, k):
return pd.concat([s] * k, ignore_index=True)
res = pd.DataFrame({'user-id': i['user-id'].pipe(elongate, vals_len),
'serial-number': i['serial-number'].pipe(elongate, vals_len),
'value': i['value'].pipe(elongate, vals_len),
'day': i['day'].pipe(elongate, vals_len),
'value': val,
'value1': val1
})
subtest = subtest.append(res)
print(subtest)
This Worked for me perfectly.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Subtract fixed row value in reference to column value in pandas dataframe - python

Related

Python group by only neighbours

Python Dataframe GroupBy Function

How do you parse out data from a dataframe for each ID when an adjacent column contains a certain value?

How to downsampling time series data in pandas?

how can I create a data frame containing all the combinations of a column?pandas python?

Categories

Resources