Below is script for a simplified version of the df in question:
import pandas as pd
df = pd.DataFrame({
'id' : [1,1,1,1,2,2,2,2,3,3,3,3],
'feature' : ['cd_player', 'sat_nav', 'sub_woofer', 'usb_port','cd_player', 'sat_nav', 'sub_woofer', 'usb_port','cd_player', 'sat_nav', 'sub_woofer', 'usb_port'],
'feature_value' : [1,1,1,0,1,0,0,1,1,1,1,0],
})
df
id feature feature_value
0 1 cd_player 1
1 1 sat_nav 1
2 1 sub_woofer 1
3 1 usb_port 0
4 2 cd_player 1
5 2 sat_nav 0
6 2 sub_woofer 0
7 2 usb_port 1
8 3 cd_player 1
9 3 sat_nav 1
10 3 sub_woofer 1
11 3 usb_port 0
What I would like to do, is create a new column which counts the number of 0 values for each feature as per the df below.
INTENDED DF:
id feature feature_value no_value_count
0 1 cd_player 1 0
1 1 sat_nav 1 1
2 1 sub_woofer 1 1
3 1 usb_port 0 2
4 2 cd_player 1 0
5 2 sat_nav 0 1
6 2 sub_woofer 0 1
7 2 usb_port 1 2
8 3 cd_player 1 0
9 3 sat_nav 1 1
10 3 sub_woofer 1 1
11 3 usb_port 0 2
Any help would be greatly appreciated.
IIUC:
df["count"] = df["id"].nunique() - df.groupby("feature")["feature_value"].transform("sum")
print (df)
id feature feature_value count
0 1 cd_player 1 0
1 1 sat_nav 1 1
2 1 sub_woofer 1 1
3 1 usb_port 0 2
4 2 cd_player 1 0
5 2 sat_nav 0 1
6 2 sub_woofer 0 1
7 2 usb_port 1 2
8 3 cd_player 1 0
9 3 sat_nav 1 1
10 3 sub_woofer 1 1
11 3 usb_port 0 2
you can map the column feature with the result of groupby.sum by feature where the column feature_value is equal (eq) to 0.
df['no_value_count'] = df['feature'].map(df['feature_value'].eq(0)
.groupby(df['feature']).sum())
print(df)
id feature feature_value no_value_count
0 1 cd_player 1 0
1 1 sat_nav 1 1
2 1 sub_woofer 1 1
3 1 usb_port 0 2
4 1 cd_player 1 0
5 2 sat_nav 0 1
6 2 sub_woofer 0 1
7 2 usb_port 1 2
8 2 cd_player 1 0
9 2 sat_nav 1 1
10 3 sub_woofer 1 1
11 3 usb_port 0 2
From what I understand, you can try:
df['feature_value'].eq(0).groupby(df['feature']).transform('sum')
0 0.0
1 1.0
2 1.0
3 2.0
4 0.0
5 1.0
6 1.0
7 2.0
8 0.0
9 1.0
10 1.0
11 2.0
Related
I created a dataframe where the last column shows the binary conversion of the previous column. Now I want flip the last one/two binary bits from every binary numbers of that column(R). I want flip that specific binary bit in a way that it does not affect the other bits.
How can I do that? Following is my sample code and O/P.
For example, for the 1st row, I want the binary number as 001101 (from 001100) after fliping the last bit.
df = pd.DataFrame(np.random.randint(1,3,size=(10, 8)), columns=list('ABCDEFGH'))
df['Q']=df.sum(axis=1)
df['R']=df.Q.apply(lambda x: format(int(x), '06b'))
O/P:
A B C D E F G H Q R
0 2 2 1 1 1 1 2 2 12 001100
1 2 1 1 1 1 1 2 1 10 001010
2 2 2 1 2 2 2 1 2 14 001110
3 1 2 2 2 1 1 2 1 12 001100
4 1 2 2 1 1 2 1 1 11 001011
5 2 1 1 2 1 1 2 1 11 001011
6 1 2 2 1 1 2 1 2 12 001100
7 2 2 1 2 1 1 1 1 11 001011
8 2 1 2 2 2 2 2 1 14 001110
9 1 2 1 1 1 2 2 2 12 001100
One way is to use pandas string slicing
df = pd.DataFrame(np.random.randint(1,3,size=(10, 8)), columns=list('ABCDEFGH'))
df['Q']=df.sum(axis=1)
df['R']=df.Q.apply(lambda x: format(int(x), '06b'))
print("Before:\n", df)
df.R = df.R.str.slice(stop=-1) + (1 - df.R.str.slice(start=-1).astype(int)).astype(str)
print("\nAfter:\n", df)
Output:
Before:
A B C D E F G H Q R
0 1 1 2 2 1 1 1 2 11 001011
1 2 2 1 2 1 1 1 2 12 001100
2 1 1 2 1 1 1 1 1 9 001001
3 2 1 2 2 1 2 2 1 13 001101
4 1 1 2 2 1 1 1 2 11 001011
5 2 2 2 1 2 2 2 2 15 001111
6 1 2 1 2 2 1 2 1 12 001100
7 1 1 1 1 1 2 2 2 11 001011
8 1 1 1 2 1 2 1 1 10 001010
9 2 2 1 2 2 1 1 1 12 001100
After:
A B C D E F G H Q R
0 1 1 2 2 1 1 1 2 11 001010
1 2 2 1 2 1 1 1 2 12 001101
2 1 1 2 1 1 1 1 1 9 001000
3 2 1 2 2 1 2 2 1 13 001100
4 1 1 2 2 1 1 1 2 11 001010
5 2 2 2 1 2 2 2 2 15 001110
6 1 2 1 2 2 1 2 1 12 001101
7 1 1 1 1 1 2 2 2 11 001010
8 1 1 1 2 1 2 1 1 10 001011
9 2 2 1 2 2 1 1 1 12 001101
So I am trying to count the number of consecutive same values in a dataframe and put that information into a new column in the dataframe, but I want the count to look iterative.
Here is what I have so far:
df = pd.DataFrame(np.random.randint(0,3, size=(15,4)), columns=list('ABCD'))
df['subgroupA'] = (df.A != df.A.shift(1)).cumsum()
dfg = df.groupby(by='subgroupA', as_index=False).apply(lambda grp: len(grp))
dfg.rename(columns={None: 'numConsec'}, inplace=True)
df = df.merge(dfg, how='left', on='subgroupA')
df
Here is the result:
A B C D subgroupA numConsec
0 2 1 1 1 1 1
1 1 2 1 0 2 2
2 1 0 2 1 2 2
3 0 1 2 0 3 1
4 1 0 0 1 4 1
5 0 2 2 1 5 2
6 0 2 1 1 5 2
7 1 0 0 1 6 1
8 0 2 0 0 7 4
9 0 0 0 2 7 4
10 0 2 1 1 7 4
11 0 2 2 0 7 4
12 1 2 0 1 8 1
13 0 1 1 0 9 1
14 1 1 1 0 10 1
The problem is, in the numConsec column, I don't want the full count for every row. I want it to reflect how it looks as you iteratively look at the dataframe. The problem is, my dataframe is too large to iteratively loop through and make the counts, as that would be too slow. I need to do it in a pythonic way and make it look like this:
A B C D subgroupA numConsec
0 2 1 1 1 1 1
1 1 2 1 0 2 1
2 1 0 2 1 2 2
3 0 1 2 0 3 1
4 1 0 0 1 4 1
5 0 2 2 1 5 1
6 0 2 1 1 5 2
7 1 0 0 1 6 1
8 0 2 0 0 7 1
9 0 0 0 2 7 2
10 0 2 1 1 7 3
11 0 2 2 0 7 4
12 1 2 0 1 8 1
13 0 1 1 0 9 1
14 1 1 1 0 10 1
Any ideas?
I have a pandas data frame which looks like below:
ID Value
1 2
2 6
3 3
4 5
I want a new dataframe which gives
ID Value
1 0
1 1
1 2
2 0
2 1
2 2
2 3
2 4
2 5
2 6
3 1
3 2
3 3
3 4
Any kind of suggestions would be appreciated.
Using reindex with repeat and cumcount for get the new value updated
df.reindex(df.index.repeat(df.Value+1)).assign(Value=lambda x : x.groupby('ID').cumcount())
Out[611]:
ID Value
0 1 0
0 1 1
0 1 2
1 2 0
1 2 1
1 2 2
1 2 3
1 2 4
1 2 5
1 2 6
2 3 0
2 3 1
2 3 2
2 3 3
3 4 0
3 4 1
3 4 2
3 4 3
3 4 4
3 4 5
Try,
new_df = df.groupby('ID').Value.apply(lambda x: pd.Series(np.arange(x+1)))\
.reset_index().drop('level_1', 1)
ID Value
0 1 0
1 1 1
2 1 2
3 2 0
4 2 1
5 2 2
6 2 3
7 2 4
8 2 5
9 2 6
10 3 0
11 3 1
12 3 2
13 3 3
14 4 0
15 4 1
16 4 2
17 4 3
18 4 4
19 4 5
Using stack and a list comprehension:
vals = [np.arange(i+1) for i in df.Value]
(pd.DataFrame(vals, index=df.ID)
.stack().reset_index(1, drop=True).astype(int).to_frame('Value'))
Value
ID
1 0
1 1
1 2
2 0
2 1
2 2
2 3
2 4
2 5
2 6
3 0
3 1
3 2
3 3
4 0
4 1
4 2
4 3
4 4
4 5
I have a pandas dataframe with three columns, and I want to drop all rows where the unique combination
of df['person'], df['id'], and df['day'] only occur twice or less. Is there a simple way to do this in pandas?
[In]:
person id day
1 2 1
1 2 1
1 2 1
1 2 1
1 1 1
1 1 1
1 1 1
1 0 1
1 2 2
2 2 2
2 2 2
2 2 2
1 3 1
1 3 1
1 3 1
1 0 1
2 2 2
[Out]:
person id day
1 2 1
1 2 1
1 2 1
1 1 1
1 1 1
1 1 1
2 2 2
2 2 2
2 2 2
1 3 1
1 3 1
1 3 1
2 2 2
We can using transform build a new para info
df['Info']=df.groupby(list(df)).id.transform('count')
df
Out[444]:
person id day Info
0 1 2 1 4
1 1 2 1 4
2 1 2 1 4
3 1 2 1 4
4 1 1 1 3
5 1 1 1 3
6 1 1 1 3
7 1 0 1 2
8 1 2 2 1
9 2 2 2 4
10 2 2 2 4
11 2 2 2 4
12 1 3 1 3
13 1 3 1 3
14 1 3 1 3
15 1 0 1 2
16 2 2 2 4
Then you can do
df[df.Info>2].drop('Info',1)
Out[447]:
person id day
0 1 2 1
1 1 2 1
2 1 2 1
3 1 2 1
4 1 1 1
5 1 1 1
6 1 1 1
9 2 2 2
10 2 2 2
11 2 2 2
12 1 3 1
13 1 3 1
14 1 3 1
16 2 2 2
df.groupby(['person','id','day']).filter(lambda x:x.shape[0]>2)
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.DataFrameGroupBy.filter.html
I have the following short dataframe:
A B C
1 1 3
2 1 3
3 2 3
4 2 3
5 0 0
I want the output to look like this:
A B C
1 1 3
2 1 3
3 0 0
4 0 0
5 0 0
1 1 3
2 1 3
3 2 3
4 2 3
5 0 0
use pd.MultiIndex.from_product with unique As and Bs. Then reindex.
cols = list('AB')
mux = pd.MultiIndex.from_product([df.A.unique(), df.B.unique()], names=cols)
df.set_index(cols).reindex(mux, fill_value=0).reset_index()
A B C
0 1 1 3
1 1 2 0
2 1 0 0
3 2 1 3
4 2 2 0
5 2 0 0
6 3 1 0
7 3 2 3
8 3 0 0
9 4 1 0
10 4 2 3
11 4 0 0
12 5 1 0
13 5 2 0
14 5 0 0