I have a pandas dataframe looking like.
time value group
0 1 12 1
1 2 14 1
2 3 15 2
3 4 15 1
4 5 18 2
5 6 20 1
6 7 19 2
7 8 24 2
I know want to calculate the spread between group 1 and group 2 for the latest values.
I.e. in each row I want to look at the latest value for group 1 and group 2 and calculate value of group 1 - value of group 2.
In the example the output should look like
time value group diff
0 1 12 1 0
1 2 14 1 0
2 3 15 2 -1
3 4 15 1 0
4 5 18 2 -3
5 6 20 1 2
6 7 19 2 1
7 8 24 2 -4
The only function I could find so far was pd.diff() but it doesn't satisfy my needs. So I would really appreciate some help here. Thanks!
You can forward fill values for group 1 and 2 respectively first and then calculate the difference:
df['diff'] = df.value.where(df.group == 1).ffill() - df.value.where(df.group == 2).ffill()
df
time value group diff
0 1 12 1 NaN
1 2 14 1 NaN
2 3 15 2 -1.0
3 4 15 1 0.0
4 5 18 2 -3.0
5 6 20 1 2.0
6 7 19 2 1.0
7 8 24 2 -4.0
Use fillna -- df['diff'] = df['diff'].fillna(0) if you need to fill NaN.
Good morning
I have a current problem when trying to replace some values. I have a dataframe that has a column "loc10p" that separates the records into 10 groups, and for each group I have grouped those records into smaller groups, but each group has a starting range of 1 of the subgroups instead of counting the last subgroup. For example:
c2[c2.loc10p.isin([1,2])].sort_values(['loc10p','subgrupoloc10'])[['loc10p','subgrupoloc10']]
loc10p subgrupoloc10
1 1 1
7 1 1
15 1 1
0 1 2
14 1 2
30 1 2
31 1 2
2 2 1
8 2 1
9 2 1
16 2 1
17 2 1
18 2 2
23 2 2
How can I transform that into something like the following:
loc10p subgrupoloc10
1 1 1
7 1 1
15 1 1
0 1 2
14 1 2
30 1 2
31 1 2
2 2 3
8 2 3
9 2 3
16 2 3
17 2 3
18 2 4
23 2 4
I tried to do a loop that separates each group category into a different dataframe and then, replacing the values of the subgroup with a counter of the previous group, but it didn't replace anything:
w=1
temporal=[]
for e in range(1,11):
temp=c2[c2['loc10p']==e]
temporal.append(temp)
for e,i in zip(temporal,range(1,9)):
try:
e.loc[,'subgrupoloc10']=w
w+=1
except:
pass
Any help will be really appreciated!!
Try with ngroup
df['out'] = df.groupby(['loc10p','subgrupoloc10']).ngroup()+1
Out[204]:
1 1
7 1
15 1
0 2
14 2
30 2
31 2
2 3
8 3
9 3
16 3
17 3
18 4
23 4
dtype: int64
Try:
groups = (df["subgrupoloc10"] != df["subgrupoloc10"].shift()).cumsum()
df["subgrupoloc10"] = groups
print(df)
Prints:
loc10p subgrupoloc10
1 1 1
7 1 1
15 1 1
0 1 2
14 1 2
30 1 2
31 1 2
2 2 3
8 2 3
9 2 3
16 2 3
17 2 3
18 2 4
23 2 4
I have a large dataframe in the following format. I need to parse out only the values where values ==1 and through the remaining id. This should reset on each ID so that it takes the first value in a unique id that contains the value 1 and ends when the id number terminates.
d={'ID':[1,1,1,1,1,2,2,2,2,2,3,3,3,3,4,4,4,4,4,4,4,4,4,5,5,5,5,5] \
,'values':[0,0,0,1,0,1,0,1,1,1,0,1,0,0,0,0,0,0,1,1,0,1,0,1,1,1,1,1,] }
df=pd.DataFrame(data=d)
df=pd.DataFrame(data=d)
df
ND = {'ID':[1,1,2,2,2,2,2,3,3,3,4,4,4,4,4,5,5,5,5,5],\
'values':[1,0,1,0,1,1,1,1,0,0,1,1,0,1,0,1,1,1,1,1]}
df_final=pd.DataFrame(ND)
df_final
'''
IIUC,
df[df.groupby('ID')['values'].transform('cummax')==1]
Output:
ID values
3 1 1
4 1 0
5 2 1
6 2 0
7 2 1
8 2 1
9 2 1
11 3 1
12 3 0
13 3 0
18 4 1
19 4 1
20 4 0
21 4 1
22 4 0
23 5 1
24 5 1
25 5 1
26 5 1
27 5 1
Details, use cummax to keep the value of 1 after first found. Then use equal to 1 to create a boolean series, which then is used to do boolean indexing.
if your column values is only 0 and 1, you can use groupby.cummax that will replace 0 by 1 if they are after a 1 per ID, then use this as a boolean mask:
df_ = df[df.groupby('ID')['values'].cummax().astype(bool).to_numpy()]
print(df_)
ID values
3 1 1
4 1 0
5 2 1
6 2 0
7 2 1
8 2 1
9 2 1
11 3 1
12 3 0
13 3 0
18 4 1
19 4 1
20 4 0
21 4 1
22 4 0
23 5 1
24 5 1
25 5 1
26 5 1
27 5 1
Say I have a data frame that looks like this:
Id ColA
1 2
2 2
3 3
4 5
5 10
6 12
7 18
8 20
9 25
10 26
I would like my code to create a new column at the end of the DataFrame that divides the total # of obvservations by 5 ranging from 5 to 1.
Id ColA Segment
1 2 5
2 2 5
3 3 4
4 5 4
5 10 3
6 12 3
7 18 2
8 20 2
9 25 1
10 26 1
I tried the following code but doesn't work:
df['segment'] = pd.qcut(df['Id'],5)
I also want to know what would happpen if the total of my observations was not dividable by 5.
Actually, you were closer to the answer than you think. This will work regardless of whether len(df) is a multiple of 5 or not.
bins = 5
df['Segment'] = bins - pd.qcut(df['Id'], bins).cat.codes
df
Id ColA Segment
0 1 2 5
1 2 2 5
2 3 3 4
3 4 5 4
4 5 10 3
5 6 12 3
6 7 18 2
7 8 20 2
8 9 25 1
9 10 26 1
Where,
pd.qcut(df['Id'], bins).cat.codes
0 0
1 0
2 1
3 2
4 3
5 4
6 4
dtype: int8
Represents the categorical intervals returned by pd.qcut as integer values.
Another example, for a DataFrame with 7 rows.
df = df.head(7).copy()
df['Segment'] = bins - pd.qcut(df['Id'], bins).cat.codes
df
Id ColA Segment
0 1 2 5
1 2 2 5
2 3 3 4
3 4 5 3
4 5 10 2
5 6 12 1
6 7 18 1
This should work:
df['segment'] = np.linspace(1, 6, len(df), False, dtype=int)
It creates a list of int between 1 and 5 of the size of your array. If you want from 5 to 1, just add [::-1] at the end of the line.
I am trying to extract the max value for a column based on the index. I have this series:
Hour Values
1 0
1 3
1 1
2 0
2 5
2 4
...
23 3
23 4
23 2
24 1
24 9
24 2
and am looking to add a new column 'Max Value' that will have the maximum of the 'Values' column for each value, based on the index (Hour):
Hour Values Max Value
1 0 3
1 3 3
1 1 3
2 0 5
2 5 5
2 4 5
...
23 3 4
23 4 4
23 2 4
24 1 9
24 9 9
24 2 9
I can do this in excel, but am new to pandas. The closest I have come is this scratchy effort, which is as far as I have got, but I get a syntax error on the first '=':
df['Max Value'] = 0
df['Max Value'][(df['Hour'] =1)] = df['Value'].max()
Use transform('max') method:
In [61]: df['Max Value'] = df.groupby('Hour')['Values'].transform('max')
In [62]: df
Out[62]:
Hour Values Max Value
0 1 0 3
1 1 3 3
2 1 1 3
3 2 0 5
4 2 5 5
5 2 4 5
6 23 3 4
7 23 4 4
8 23 2 4
9 24 1 9
10 24 9 9
11 24 2 9