Create new column showing count of missing values

Create new column showing count of missing values - python

Below is script for a simplified version of the df in question:
import pandas as pd
df = pd.DataFrame({
'id' : [1,1,1,1,2,2,2,2,3,3,3,3],
'feature' : ['cd_player', 'sat_nav', 'sub_woofer', 'usb_port','cd_player', 'sat_nav', 'sub_woofer', 'usb_port','cd_player', 'sat_nav', 'sub_woofer', 'usb_port'],
'feature_value' : [1,1,1,0,1,0,0,1,1,1,1,0],
})
df
id feature feature_value
0 1 cd_player 1
1 1 sat_nav 1
2 1 sub_woofer 1
3 1 usb_port 0
4 2 cd_player 1
5 2 sat_nav 0
6 2 sub_woofer 0
7 2 usb_port 1
8 3 cd_player 1
9 3 sat_nav 1
10 3 sub_woofer 1
11 3 usb_port 0
What I would like to do, is create a new column which counts the number of 0 values for each feature as per the df below.
INTENDED DF:
id feature feature_value no_value_count
0 1 cd_player 1 0
1 1 sat_nav 1 1
2 1 sub_woofer 1 1
3 1 usb_port 0 2
4 2 cd_player 1 0
5 2 sat_nav 0 1
6 2 sub_woofer 0 1
7 2 usb_port 1 2
8 3 cd_player 1 0
9 3 sat_nav 1 1
10 3 sub_woofer 1 1
11 3 usb_port 0 2
Any help would be greatly appreciated.

IIUC:
df["count"] = df["id"].nunique() - df.groupby("feature")["feature_value"].transform("sum")
print (df)
id feature feature_value count
0 1 cd_player 1 0
1 1 sat_nav 1 1
2 1 sub_woofer 1 1
3 1 usb_port 0 2
4 2 cd_player 1 0
5 2 sat_nav 0 1
6 2 sub_woofer 0 1
7 2 usb_port 1 2
8 3 cd_player 1 0
9 3 sat_nav 1 1
10 3 sub_woofer 1 1
11 3 usb_port 0 2

you can map the column feature with the result of groupby.sum by feature where the column feature_value is equal (eq) to 0.
df['no_value_count'] = df['feature'].map(df['feature_value'].eq(0)
.groupby(df['feature']).sum())
print(df)
id feature feature_value no_value_count
0 1 cd_player 1 0
1 1 sat_nav 1 1
2 1 sub_woofer 1 1
3 1 usb_port 0 2
4 1 cd_player 1 0
5 2 sat_nav 0 1
6 2 sub_woofer 0 1
7 2 usb_port 1 2
8 2 cd_player 1 0
9 2 sat_nav 1 1
10 3 sub_woofer 1 1
11 3 usb_port 0 2

From what I understand, you can try:
df['feature_value'].eq(0).groupby(df['feature']).transform('sum')
0 0.0
1 1.0
2 1.0
3 2.0
4 0.0
5 1.0
6 1.0
7 2.0
8 0.0
9 1.0
10 1.0
11 2.0

Related

How to flip/inverse nth bit of all binary numbers in a column using python

I created a dataframe where the last column shows the binary conversion of the previous column. Now I want flip the last one/two binary bits from every binary numbers of that column(R). I want flip that specific binary bit in a way that it does not affect the other bits.
How can I do that? Following is my sample code and O/P.
For example, for the 1st row, I want the binary number as 001101 (from 001100) after fliping the last bit.
df = pd.DataFrame(np.random.randint(1,3,size=(10, 8)), columns=list('ABCDEFGH'))
df['Q']=df.sum(axis=1)
df['R']=df.Q.apply(lambda x: format(int(x), '06b'))
O/P:
A B C D E F G H Q R
0 2 2 1 1 1 1 2 2 12 001100
1 2 1 1 1 1 1 2 1 10 001010
2 2 2 1 2 2 2 1 2 14 001110
3 1 2 2 2 1 1 2 1 12 001100
4 1 2 2 1 1 2 1 1 11 001011
5 2 1 1 2 1 1 2 1 11 001011
6 1 2 2 1 1 2 1 2 12 001100
7 2 2 1 2 1 1 1 1 11 001011
8 2 1 2 2 2 2 2 1 14 001110
9 1 2 1 1 1 2 2 2 12 001100

One way is to use pandas string slicing
df = pd.DataFrame(np.random.randint(1,3,size=(10, 8)), columns=list('ABCDEFGH'))
df['Q']=df.sum(axis=1)
df['R']=df.Q.apply(lambda x: format(int(x), '06b'))
print("Before:\n", df)
df.R = df.R.str.slice(stop=-1) + (1 - df.R.str.slice(start=-1).astype(int)).astype(str)
print("\nAfter:\n", df)
Output:
Before:
A B C D E F G H Q R
0 1 1 2 2 1 1 1 2 11 001011
1 2 2 1 2 1 1 1 2 12 001100
2 1 1 2 1 1 1 1 1 9 001001
3 2 1 2 2 1 2 2 1 13 001101
4 1 1 2 2 1 1 1 2 11 001011
5 2 2 2 1 2 2 2 2 15 001111
6 1 2 1 2 2 1 2 1 12 001100
7 1 1 1 1 1 2 2 2 11 001011
8 1 1 1 2 1 2 1 1 10 001010
9 2 2 1 2 2 1 1 1 12 001100
After:
A B C D E F G H Q R
0 1 1 2 2 1 1 1 2 11 001010
1 2 2 1 2 1 1 1 2 12 001101
2 1 1 2 1 1 1 1 1 9 001000
3 2 1 2 2 1 2 2 1 13 001100
4 1 1 2 2 1 1 1 2 11 001010
5 2 2 2 1 2 2 2 2 15 001110
6 1 2 1 2 2 1 2 1 12 001101
7 1 1 1 1 1 2 2 2 11 001010
8 1 1 1 2 1 2 1 1 10 001011
9 2 2 1 2 2 1 1 1 12 001101

How to count consecutive same values in a pythonic way that looks iterative

So I am trying to count the number of consecutive same values in a dataframe and put that information into a new column in the dataframe, but I want the count to look iterative.
Here is what I have so far:
df = pd.DataFrame(np.random.randint(0,3, size=(15,4)), columns=list('ABCD'))
df['subgroupA'] = (df.A != df.A.shift(1)).cumsum()
dfg = df.groupby(by='subgroupA', as_index=False).apply(lambda grp: len(grp))
dfg.rename(columns={None: 'numConsec'}, inplace=True)
df = df.merge(dfg, how='left', on='subgroupA')
df
Here is the result:
A B C D subgroupA numConsec
0 2 1 1 1 1 1
1 1 2 1 0 2 2
2 1 0 2 1 2 2
3 0 1 2 0 3 1
4 1 0 0 1 4 1
5 0 2 2 1 5 2
6 0 2 1 1 5 2
7 1 0 0 1 6 1
8 0 2 0 0 7 4
9 0 0 0 2 7 4
10 0 2 1 1 7 4
11 0 2 2 0 7 4
12 1 2 0 1 8 1
13 0 1 1 0 9 1
14 1 1 1 0 10 1
The problem is, in the numConsec column, I don't want the full count for every row. I want it to reflect how it looks as you iteratively look at the dataframe. The problem is, my dataframe is too large to iteratively loop through and make the counts, as that would be too slow. I need to do it in a pythonic way and make it look like this:
A B C D subgroupA numConsec
0 2 1 1 1 1 1
1 1 2 1 0 2 1
2 1 0 2 1 2 2
3 0 1 2 0 3 1
4 1 0 0 1 4 1
5 0 2 2 1 5 1
6 0 2 1 1 5 2
7 1 0 0 1 6 1
8 0 2 0 0 7 1
9 0 0 0 2 7 2
10 0 2 1 1 7 3
11 0 2 2 0 7 4
12 1 2 0 1 8 1
13 0 1 1 0 9 1
14 1 1 1 0 10 1
Any ideas?

Filling in sequence previous values based on current value in pandas

I have a pandas data frame which looks like below:
ID Value
1 2
2 6
3 3
4 5
I want a new dataframe which gives
ID Value
1 0
1 1
1 2
2 0
2 1
2 2
2 3
2 4
2 5
2 6
3 1
3 2
3 3
3 4
Any kind of suggestions would be appreciated.

Using reindex with repeat and cumcount for get the new value updated
df.reindex(df.index.repeat(df.Value+1)).assign(Value=lambda x : x.groupby('ID').cumcount())
Out[611]:
ID Value
0 1 0
0 1 1
0 1 2
1 2 0
1 2 1
1 2 2
1 2 3
1 2 4
1 2 5
1 2 6
2 3 0
2 3 1
2 3 2
2 3 3
3 4 0
3 4 1
3 4 2
3 4 3
3 4 4
3 4 5

Try,
new_df = df.groupby('ID').Value.apply(lambda x: pd.Series(np.arange(x+1)))\
.reset_index().drop('level_1', 1)
ID Value
0 1 0
1 1 1
2 1 2
3 2 0
4 2 1
5 2 2
6 2 3
7 2 4
8 2 5
9 2 6
10 3 0
11 3 1
12 3 2
13 3 3
14 4 0
15 4 1
16 4 2
17 4 3
18 4 4
19 4 5

Using stack and a list comprehension:
vals = [np.arange(i+1) for i in df.Value]
(pd.DataFrame(vals, index=df.ID)
.stack().reset_index(1, drop=True).astype(int).to_frame('Value'))
Value
ID
1 0
1 1
1 2
2 0
2 1
2 2
2 3
2 4
2 5
2 6
3 0
3 1
3 2
3 3
4 0
4 1
4 2
4 3
4 4
4 5

Deleting rows in pandas that occur less than a certain amount based on three columns

I have a pandas dataframe with three columns, and I want to drop all rows where the unique combination
of df['person'], df['id'], and df['day'] only occur twice or less. Is there a simple way to do this in pandas?
[In]:
person id day
1 2 1
1 2 1
1 2 1
1 2 1
1 1 1
1 1 1
1 1 1
1 0 1
1 2 2
2 2 2
2 2 2
2 2 2
1 3 1
1 3 1
1 3 1
1 0 1
2 2 2
[Out]:
person id day
1 2 1
1 2 1
1 2 1
1 1 1
1 1 1
1 1 1
2 2 2
2 2 2
2 2 2
1 3 1
1 3 1
1 3 1
2 2 2

We can using transform build a new para info
df['Info']=df.groupby(list(df)).id.transform('count')
df
Out[444]:
person id day Info
0 1 2 1 4
1 1 2 1 4
2 1 2 1 4
3 1 2 1 4
4 1 1 1 3
5 1 1 1 3
6 1 1 1 3
7 1 0 1 2
8 1 2 2 1
9 2 2 2 4
10 2 2 2 4
11 2 2 2 4
12 1 3 1 3
13 1 3 1 3
14 1 3 1 3
15 1 0 1 2
16 2 2 2 4
Then you can do
df[df.Info>2].drop('Info',1)
Out[447]:
person id day
0 1 2 1
1 1 2 1
2 1 2 1
3 1 2 1
4 1 1 1
5 1 1 1
6 1 1 1
9 2 2 2
10 2 2 2
11 2 2 2
12 1 3 1
13 1 3 1
14 1 3 1
16 2 2 2

df.groupby(['person','id','day']).filter(lambda x:x.shape[0]>2)
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.DataFrameGroupBy.filter.html

Transform dataframe to have a row for every observation at every time point

I have the following short dataframe:
A B C
1 1 3
2 1 3
3 2 3
4 2 3
5 0 0
I want the output to look like this:
A B C
1 1 3
2 1 3
3 0 0
4 0 0
5 0 0
1 1 3
2 1 3
3 2 3
4 2 3
5 0 0

use pd.MultiIndex.from_product with unique As and Bs. Then reindex.
cols = list('AB')
mux = pd.MultiIndex.from_product([df.A.unique(), df.B.unique()], names=cols)
df.set_index(cols).reindex(mux, fill_value=0).reset_index()
A B C
0 1 1 3
1 1 2 0
2 1 0 0
3 2 1 3
4 2 2 0
5 2 0 0
6 3 1 0
7 3 2 3
8 3 0 0
9 4 1 0
10 4 2 3
11 4 0 0
12 5 1 0
13 5 2 0
14 5 0 0

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Create new column showing count of missing values - python

From what I understand, you can try: df['feature_value'].eq(0).groupby(df['feature']).transform('sum') 0 0.0 1 1.0 2 1.0 3 2.0 4 0.0 5 1.0 6 1.0 7 2.0 8 0.0 9 1.0 10 1.0 11 2.0

Related

How to flip/inverse nth bit of all binary numbers in a column using python

How to count consecutive same values in a pythonic way that looks iterative

Filling in sequence previous values based on current value in pandas

Deleting rows in pandas that occur less than a certain amount based on three columns

Transform dataframe to have a row for every observation at every time point

Categories

Resources