rm_id a b c d r_id
12 TRUE TRUE TRUE 0.2 1
13 TRUE TRUE TRUE 0.32 1
14 TRUE TRUE TRUE 0.02 1
15 TRUE TRUE FALSE 1.2 1
16 TRUE TRUE TRUE 0.05 1
17 FALSE TRUE FALSE 0.06 2
18 FALSE TRUE TRUE 0.8 1
19 TRUE TRUE FALSE 0.32 2
20 FALSE TRUE TRUE 0.54 1
13 TRUE TRUE FALSE 0.12 2
14 FALSE TRUE TRUE 0.012 2
16 FALSE FALSE FALSE 0.5 2
12 TRUE FALSE FALSE 0.9 2
11 FALSE TRUE TRUE 0.37 1
Hi Everyone:
I have the above table, I want to get the values as displayed below when I filter the values by r_id i.e. sum of each column. Can you help me?
rm_id a b c d r_id
12 TRUE TRUE TRUE 0.2 1
13 TRUE TRUE TRUE 0.32 1
14 TRUE TRUE TRUE 0.02 1
15 TRUE TRUE FALSE 1.2 1
16 TRUE TRUE TRUE 0.05 1
18 FALSE TRUE TRUE 0.8 1
20 FALSE TRUE TRUE 0.54 1
11 FALSE TRUE TRUE 0.37 1
FALSE TRUE FALSE 3.5
Use GroupBy.agg with GroupBy.all and sum function in dictionary:
If data contains TRUE and FALSE strings use:
print (df[['a','b','c']].dtypes)
a object
b object
c object
dtype: object
#check real data
print (df[['a','b','c']].stack().unique())
['TRUE' 'FALSE']
#replace to boolean
df[['a','b','c']] = df[['a','b','c']].replace({'TRUE':True, 'FALSE':False})
print (df[['a','b','c']].dtypes)
a bool
b bool
c bool
dtype: object
df1 = df.groupby('r_id', as_index=False).agg({'a':'all', 'b':'all','c':'all', 'd':'sum'})
print (df1)
r_id a b c d
0 1 False True False 3.500
1 2 False False False 1.912
Related
Hello i have a series of bool values where i have performed logical operations on them since they are series i have used & instead of and here are the series
z = ~(df['HOME ZIP'].isin(zip_series['zipcd_ZIP_CD']))
here is type of z
type(z)
pandas.core.series.Series
0 False
1 False
2 False
3 False
4 False
5 False
6 False
7 False
8 False
9 False
10 False
11 False
12 False
13 False
14 False
15 False
16 False
17 False
18 False
19 False
20 False
21 False
22 False
23 False
24 False
25 False
26 False
27 False
28 False
29 False
30 False
Name: HOME ZIP, dtype: bool
similarly
y = df['HOME ZIP'].astype(str).str.len() != 0
x = (df['HOME ZIP'].isnull() == False)
values of x and y are both True
0 True
1 True
2 True
3 True
4 True
5 True
6 True
7 True
8 True
9 True
10 True
11 True
12 True
13 True
14 True
15 True
16 True
17 True
18 True
19 True
20 True
21 True
22 True
23 True
24 True
25 True
26 True
27 True
28 True
29 True
30 True
Name: HOME ZIP, dtype: bool
x & y & z values are here
0 False
1 False
2 False
3 False
4 False
5 False
6 False
7 False
8 False
9 False
10 False
11 False
12 False
13 False
14 False
15 False
16 False
17 False
18 False
19 False
20 False
21 False
22 False
23 False
24 False
25 False
26 False
27 False
28 False
29 False
30 False
Name: HOME ZIP, dtype: bool
but when i keep all x y,z as single statements i am getting different output
(df['HOME ZIP'].isnull() == False) & df['HOME ZIP'].astype(str).str.len() != 0 & ~(df['HOME ZIP'].isin(zip_series['zipcd_ZIP_CD']))
0 True
1 True
2 True
3 True
4 True
5 True
6 True
7 True
8 True
9 True
10 True
11 True
12 True
13 True
14 True
15 True
16 True
17 True
18 True
19 True
20 True
21 True
22 True
23 True
24 True
25 True
26 True
27 True
28 True
29 True
30 True
Name: HOME ZIP, dtype: bool
You should add () for each condition
(df['HOME ZIP'].isnull() == False) &
(df['HOME ZIP'].astype(str).str.len() != 0) &
(~(df['HOME ZIP'].isin(zip_series['zipcd_ZIP_CD'])))
I want to return a boolean index using separate columns. Where End is in Item, I want to return False.
I'm meeting those conditions but I want to account for all unique values in Seq. For each unique group in Seq, if any row matches the previous condition, then return False for all those unique groups.
df = pd.DataFrame({
'Item' : ['Start','A','B','B','G','Start','A','B','B','A','X','Start','A','H'],
})
End = ['X','Y','Z']
df['Seq'] = df['Item'].eq('Start').groupby(df['Item'].eq('Start').cumsum()).transform('idxmax')
m2 = df.Item.isin(End)
out:
0 True
1 True
2 True
3 True
4 True
5 True
6 True
7 True
8 True
9 True
10 False
11 True
12 True
13 True
intended out:
0 True
1 True
2 True
3 True
4 True
5 True
6 False
7 False
8 False
9 False
10 False
11 True
12 True
13 True
Instead of idxmax, use max and then negate the result:
~df.Item.isin(End).groupby(df.Item.eq('Start').cumsum()).transform('max')
0 True
1 True
2 True
3 True
4 True
5 False
6 False
7 False
8 False
9 False
10 False
11 True
12 True
13 True
Name: Item, dtype: bool
To exclude row with Start:
~(df.Item.isin(End).groupby(df.Item.eq('Start').cumsum()).transform('max') & df.Item.ne('Start'))
Group the boolean mask m2 by Seq and transform with any then negate the output
~(m2.groupby(df['Seq']).transform('any'))
0 True
1 True
2 True
3 True
4 True
5 False
6 False
7 False
8 False
9 False
10 False
11 True
12 True
13 True
Name: Item, dtype: bool
I have a dataframe like this:
Bool Hour
0 False 12
1 False 24
2 False 12
3 False 24
4 True 12
5 False 24
6 False 12
7 False 24
8 False 12
9 False 24
10 False 12
11 True 24
and I would like to backfill the True value in 'Bool' column to the point when 'Hour' first reaches '12'. The result would be something like this:
Bool Hour Result
0 False 12 False
1 False 24 False
2 False 12 True <- desired backfill
3 False 24 True <- desired backfill
4 True 12 True
5 False 24 False
6 False 12 False
7 False 24 False
8 False 12 False
9 False 24 False
10 False 12 True <- desired backfill
11 True 24 True
Any help is greatly appreciated! Thank you very much!
This is a little bit hard to achieve , here we can use groupby with idxmax
s=(~df.Bool&df.Hour.eq(12)).iloc[::-1].groupby(df.Bool.iloc[::-1].cumsum()).transform('idxmax')
df['result']=df.index>=s.iloc[::-1]
df
Out[375]:
Bool Hour result
0 False 12 False
1 False 24 False
2 False 12 True
3 False 24 True
4 True 12 True
5 False 24 False
6 False 12 False
7 False 24 False
8 False 12 False
9 False 24 False
10 False 12 True
11 True 24 True
IIUC, you can do:
s = df['Bool'].shift(-1)
df['Result'] = df['Bool'] | s.where(s).groupby(df['Hour'].eq(12).cumsum()).bfill()
Output:
Bool Hour Result
0 False 12 False
1 False 24 False
2 False 12 True
3 False 24 True
4 True 12 True
5 False 24 False
6 False 12 False
7 False 24 False
8 False 12 False
9 False 24 False
10 False 12 True
11 True 24 True
create a groupID s on consecutive False and separate True from them. Groupby on Hour equals 12 by using s. Use transform sum and cumsum to get the count of True on 12 from bottom-up on each group and return True on 0 and or with values of Bool
s = df.Bool.ne(df.Bool.shift()).cumsum()
s1 = df.where(df.Bool).Bool.bfill()
g = df.Hour.eq(12).groupby(s)
df['bfill_Bool'] = (g.transform('sum') - g.cumsum()).eq(0) & s1 | df.Bool
Out[905]:
Bool Hour bfill_Bool
0 False 12 False
1 False 24 False
2 False 12 True
3 False 24 True
4 True 12 True
5 False 24 False
6 False 12 False
7 False 24 False
8 False 12 False
9 False 24 False
10 False 12 True
11 True 24 True
My dataframe df:
SCHOOL CLASS GRADE
A Spanish nan
A Spanish nan
A Math 4000
A Math 7830
A Math 3893
B . nan
B . nan
B Biology 1929
B Biology 4839
B Biology 8195
C Spanish nan
C English 2003
C English 1000
C Biology 4839
C Biology 8191
If I do:
school_has_only_two_classes = df.groupby('SCHOOL').CLASS
.transform(lambda series: series.nunique()) == 2
I get
0 True
1 True
2 True
3 True
4 True
5 True
6 True
7 True
8 True
9 True
10 False
11 False
12 False
13 False
14 False
15 False
The transform works fine for the school C. BUT, if I do:
school_has_spanish = df.groupby('SCHOOL').CLASS.transform(lambda series: series.str.contains('^Spanish$',regex=True))
or
school_has_spanish = df.groupby('SCHOOL').CLASS.transform(lambda series: series=='Spanish')
I get the following result which is not what I was expecting:
0 True
1 True
2 False
3 False
4 False
5 False
6 False
7 False
8 False
9 False
10 True
11 False
12 False
13 False
14 False
15 False
The transform just does not spread all True's to the other rows of the group. Result I was expecting:
0 True
1 True
2 True
3 True
4 False
5 False
6 False
7 False
8 False
9 False
10 True
11 True
12 True
13 True
14 True
15 True
Any help is appreciated.
Check any with contains
df.CLASS.str.contains('Spanish').groupby(df.SCHOOL).transform('any')
Out[230]:
0 True
1 True
2 True
3 True
4 True
5 False
6 False
7 False
8 False
9 False
10 True
11 True
12 True
13 True
14 True
Name: CLASS, dtype: bool
Given the following dataframe:
col_1 col_2
False 1
False 1
False 1
False 1
False 1
False 1
False 1
False 1
False 1
False 1
False 1
False 1
False 1
False 1
False 2
True 2
False 2
False 2
True 2
False 2
False 2
False 2
False 2
False 2
False 2
False 2
False 2
False 2
False 2
False 2
How can I create a new index that help to identify when a True value is present in col_1? That is, when in the first column a True value appears I would like to fill backward with a number starting from one the new column. For example, this is the expected output for the above dataframe:
col_1 col_2 new_id
False 1 1
False 1 1
False 1 1
False 1 1
False 1 1
False 1 1
False 1 1
False 1 1
False 1 1
False 1 1
False 1 1
False 1 1
False 1 1
False 1 1
False 2 1
True 2 1 --------- ^ (fill with 1 and increase the counter)
False 2 2
False 2 2
True 2 2 --------- ^ (fill with 2 and increase the counter)
False 2 3
False 2 3
False 2 3
False 2 3
False 2 3
False 2 3
False 2 3
False 2 3
False 2 3
False 2 3
False 2 3
True 2 4 --------- ^ (fill with 3 and increase the counter)
The problem is that I do not know how to create the id although I know that pandas provide a bfill object that may help to achieve this purpose. So far I tried to iterate with a simple for loop:
count = 0
for index, row in df.iterrows():
if row['col_1'] == False:
print(count+1)
else:
print(row['col_2'] + 1)
However, I do not know how to increase the counter to the next number. Also I tried to create a function and then apply it to the dataframe:
def create_id(col_1, col_2):
counter = 0
if col_1 == True and col_2.bool() == True:
return counter + 1
else:
pass
Nevertheless, i lose control of filling backward the column.
Just do with cumsum
df['new_id']=(df.col_1.cumsum().shift().fillna(0)+1).astype(int)
df
Out[210]:
col_1 col_2 new_id
0 False 1 1
1 False 1 1
2 False 1 1
3 False 1 1
4 False 1 1
5 False 1 1
6 False 1 1
7 False 1 1
8 False 1 1
9 False 1 1
10 False 1 1
11 False 1 1
12 False 1 1
13 False 1 1
14 False 2 1
15 True 2 1
16 False 2 2
17 False 2 2
18 True 2 2
19 False 2 3
20 False 2 3
21 False 2 3
22 False 2 3
23 False 2 3
24 False 2 3
25 False 2 3
26 False 2 3
27 False 2 3
28 False 2 3
29 False 2 3
If you aim to append the new_id column to your dataframe:
new_id=[]
counter=1
for index, row in df.iterrows():
new_id+= [counter]
if row['col_1']==True:
counter+=1
df['new_id']=new_id