Let's say I have the following data:
df=pd.DataFrame({'Days':[1,2,3,4,1,2,3,4],
'Flag':["First","First","First","First","Second","Second","Second","Second"],
'Payments':[1,2,3,4,9,3,1,6]})
I want to create a cumulative sum for payments, but it has to reset when flag turns from first to second. Any help?
The output that I'm looking for is the following:
Not sure if this is you want since you didn't provide an output but try this
df=pd.DataFrame({'Days':[1,2,3,4,1,2,3,4],
'Flag':["First","Second","First","Second","First","Second","Second","First"],
'Payments':[1,2,3,4,9,3,1,6]})
# make groups using consecutive Flags
groups = df.Flag.shift().ne(df.Flag).cumsum()
# groupby the groups and cumulatively sum payments
df['cumsum'] = df.groupby(groups).Payments.cumsum()
df
You can use df['Flag'].ne(df['Flag'].shift()).cumsum() to generate a grouper that will group by changes in the Flag column. Then, group by that, and cumsum:
df['cumsum'] = df['Payments'].groupby(df['Flag'].ne(df['Flag'].shift()).cumsum()).cumsum()
Output:
>>> df
Days Flag Payments cumsum
0 1 First 1 1
1 2 First 2 3
2 3 First 3 6
3 4 First 4 10
4 1 Second 9 9
5 2 Second 3 12
6 3 Second 1 13
7 4 Second 6 19
What is wrong with
df['Cumulative Payments'] = df.groupby('Flag')['Payments'].cumsum()
Days Flag Payments Cumulative Payments
0 1 First 1 1
1 2 First 2 3
2 3 First 3 6
3 4 First 4 10
4 1 Second 9 9
5 2 Second 3 12
6 3 Second 1 13
7 4 Second 6 19
i have a column with faulty values as it is supposed to count cycles, but the device where the data from resets the count after 50 so i was left with exmalple [1,1,1,1,2,2,2,,3,3,3,3,...,50,50,50,1,1,1,2,2,2,2,3,3,3,...,50,50,.....,50]
My solution is and i cannt even make it work:(for simplicity i made the data resets from 10 cycles
data = {'Cyc-Count':[1,1,2,2,2,3,4,5,6,7,7,7,8,9,10,1,1,1,2,3,3,3,3,
4,4,5,6,6,6,7,8,8,8,8,9,10]}
df = pd.DataFrame(data)
x=0
count=0
old_value=df.at[x,'Cyc-Count']
for x in range(x,len(df)-1):
if df.at[x,'Cyc-Count']==df.at[x+1,'Cyc-Count']:
old_value=df.at[x+1,'Cyc-Count']
df.at[x+1,'Cyc-Count']=count
else:
old_value=df.at[x+1,'Cyc-Count']
count+=1
df.at[x+1,'Cyc-Count']=count
i need to fix this but preferably without even using if statments
the desired output for the upper example should be
data = {'Cyc-Count':[1,1,2,2,2,3,4,5,6,7,7,7,8,9,10,11,11,11,12,13,13,13,13,
14,14,15,16,16,16,17,18,18,18,18,19,20]}
hint" my method has a big issue is that the last indexed value will be hard to change since when comparing it with its index+1 > it dosnt even exist
IIUC, you want to continue the count when the counter decreases.
You can use vectorial code:
s = df['Cyc-Count'].shift()
df['Cyc-Count2'] = (df['Cyc-Count']
+ s.where(s.gt(df['Cyc-Count']))
.fillna(0, downcast='infer')
.cumsum()
)
Or, to modify the column in place:
s = df['Cyc-Count'].shift()
df['Cyc-Count'] += (s.where(s.gt(df['Cyc-Count']))
.fillna(0, downcast='infer').cumsum()
)
output:
Cyc-Count Cyc-Count2
0 1 1
1 1 1
2 1 1
3 1 1
4 2 2
5 2 2
6 2 2
7 3 3
8 3 3
9 3 3
10 3 3
11 4 4
12 5 5
13 5 5
14 5 5
15 1 6
16 1 6
17 1 6
18 2 7
19 2 7
20 2 7
21 2 7
22 3 8
23 3 8
24 3 8
25 4 9
26 5 10
27 5 10
28 1 11
29 2 12
30 2 12
31 3 13
32 4 14
33 5 15
34 5 15
used input:
l = [1,1,1,1,2,2,2,3,3,3,3,4,5,5,5,1,1,1,2,2,2,2,3,3,3,4,5,5,1,2,2,3,4,5,5]
df = pd.DataFrame({'Cyc-Count': l})
You can use df.loc to access a group of rows and columns by label(s) or a boolean array.
syntax: df.loc[df['column name'] condition, 'column name or the new one'] = 'value if condition is met'
for example:
import pandas as pd
numbers = {'set_of_numbers': [1,2,3,4,5,6,7,8,9,10,0,0]}
df = pd.DataFrame(numbers,columns=['set_of_numbers'])
print (df)
df.loc[df['set_of_numbers'] == 0, 'set_of_numbers'] = 999
df.loc[df['set_of_numbers'] == 5, 'set_of_numbers'] = 555
print (df)
before: ‘set_of_numbers’: [1,2,3,4,5,6,7,8,9,10,0,0]
After: ‘set_of_numbers’: [1,2,3,4,555,6,7,8,9,10,999,999]
My current DF looks like below
x y z x c name status
1 2 3 2 5 Jon Work
1 2 5 4 5 Adam Work
9 7 3 9 5 Adam Holiday
3 2 3 4 5 Anna Work
1 4 6 8 5 Anna Work
4 1 6 8 5 Kate Off
2 1 6 1 5 Jon Off
My lists with specific values looks like below:
name = [Jon, Adam]
status = [Off, Work]
I need using those lists create new dataframes for all unique elements in "status" list. So it should looks like below:
df_off:
x y z x c name status
2 1 6 1 5 Jon Off
there is only one values, because name Kate in not in the list name
df_Work:
x y z x c name status
1 2 3 2 5 Jon Work
1 2 5 4 5 Adam Work
In second DF there is no "Anna" because she is not in list "name".
I hope it is clear. Do you have any idea how can I solve this issue?
Regard
Tomasz
First part, filter your data using:
name = ['Jon', 'Adam']
status = ['Off', 'Work']
df[df['name'].isin(name)&df['status'].isin(status)]
Then use groupby and transform the output to dictionary:
conditions = df['name'].isin(name)&df['status'].isin(status)
dfs = {'df_%s' % k:v for k,v in df[conditions].groupby('status')}
Then access your dataframes using:
>>> dfs['df_Work']
x y z x.1 c name status
0 1 2 3 2 5 Jon Work
1 1 2 5 4 5 Adam Work
You can even use multiple groups:
dfs = {'df_%s_%s' % k:v for k,v in df.groupby(['name', 'status'])}
dfs['df_Adam_Work']
If you goal is to save the subframes:
for groupname, df in df[conditions].groupby('status'):
df.to_excel(f'df_{groupname}.xlsx')
I have the following dataframe:
df = pd.DataFrame({'Value': [0, 1, 2,3, 4,5,6,7,8,9],'Name': ['John', 'Jim', 'John','Jim', 'John','Jim','Jim','John','Jim','John']})
df
Value Name
0 0 John
1 1 Jim
2 2 John
3 3 Jim
4 4 John
5 5 Jim
6 6 Jim
7 7 John
8 8 Jim
9 9 John
I would like to select the top n items by Name and find the mean from the Value column.
I have tried this:
df['Top2Mean'] = df.groupby(['Name'])['Value'].nlargest(2).transform('mean')
But the following error:
ValueError: transforms cannot produce aggregated results
My expected result is a new column called Top2Mean with a 8 next to John and 7 next to Jim.
Thanks in advance!
Let us calculate mean on level=0, then map the calculated mean value to the Name column to broadcast the aggregated results.
top2 = df.groupby('Name')['Value'].nlargest(2).mean(level=0)
df['Top2Mean'] = df['Name'].map(top2)
If we need to group on multiple columns for example Name and City then we have to take mean on level=[Name, City] and map the calculated mean values using MultiIndex.map
c = ['Name', 'City']
top2 = df.groupby(c)['Value'].nlargest(2).mean(level=c)
df['Top2Mean'] = df.set_index(c).index.map(top2)
Alternative approach with groupby and transform using a custom lambda function
df['Top2Mean'] = df.groupby('Name')['Value']\
.transform(lambda v: v.nlargest(2).mean())
Value Name Top2Mean
0 0 John 8
1 1 Jim 7
2 2 John 8
3 3 Jim 7
4 4 John 8
5 5 Jim 7
6 6 Jim 7
7 7 John 8
8 8 Jim 7
9 9 John 8