Delete keys with missing values?

Delete keys with missing values? - python

My dataframe is
ID Alphabet Number1 Number2
1 A NaN 9
1 A 3 5
1 A 1 4
1 A 2 4
2 B 7 3
2 B 2 8
2 B 4 1
2 B 8 5
3 C 2 2
3 C 1 9
4 D 2 3
4 D 6 2
4 D 8 NaN
I got unique Alphabets by doing
df.groupby('Alphabet')['ID'].nunique()
and the result is
A 1
B 1
C 1
D 1
but I want to store Alphabets that does NOT have missing data in them
I want the result to be look like
B 1
C 1
and from this console result, how would I store "B" and "C" into a list?

IIUC, using all()
s=df.groupby('Alphabet').apply(lambda x : x.notnull().all()).all(1)
df.groupby('Alphabet').ID.nunique()[s[s].index]
Out[1082]:
Alphabet
B 1
C 1
Name: ID, dtype: int64
Or
df.loc[~df.Alphabet.isin(df.loc[s[s].index,'Alphabet'])].groupby('Alphabet').ID.nunique()
Out[1095]:
Alphabet
B 1
C 1
Name: ID, dtype: int64

Related

pandas dataframe duplicate values count not properly working

value count is : df['ID'].value_counts().values
-----> array([4,3,3,1], dtype=int64)
input:
ID emp
a 1
a 1
b 1
a 1
b 1
c 1
c 1
a 1
b 1
c 1
d 1
when I jumble the ID column
df.loc[~df.duplicated(keep='first', subset=['ID']), 'emp']= df['ID'].value_counts().values
output:
ID emp
a 4
c 3
d 3
c 1
b 1
a 1
c 1
a 1
b 1
b 1
a 1
expected result:
ID emp
a 4
c 3
d 1
c 1
b 3
a 1
c 1
a 1
b 1
b 1
a 1
problem :the count is not checking the ID before assigning it the emp.

Here is problem ouput of df['ID'].value_counts() is Series with counted values in different number of values like original data, for new column filled by couter value use Series.map:
df.loc[~df.duplicated(subset=['ID']), 'emp'] = df['ID'].map(df['ID'].value_counts())
Or GroupBy.transform with size:
df.loc[~df.duplicated(subset=['ID']), 'emp'] = df.groupby('ID')['ID'].transform('size')
Output Series with 4 values cannot assign back, because different index in df1.index and df['ID'].value_counts().index
print (df['ID'].value_counts())
a 4
b 3
c 3
d 1
Name: ID, dtype: int64
If convert to numpy array only first 4 values are assigned, because in this DataFrame are 4 groups a,b,c,d, so df.duplicated(subset=['ID']) returned 4 times Trues, but in order 4,3,3,1 what reason of wrong output:
print (df['ID'].value_counts().values)
[4 3 3 1]
What need - new column (Series) with same df.index:
print (df['ID'].map(df['ID'].value_counts()))
0 4
1 4
2 3
3 4
4 3
5 3
6 3
7 4
8 3
9 3
10 1
Name: ID, dtype: int64
print (df.groupby('ID')['ID'].transform('size'))
0 4
1 4
2 3
3 4
4 3
5 3
6 3
7 4
8 3
9 3
10 1
Name: ID, dtype: int64

This alone is giving df.loc[~df.duplicated(keep='first', subset=['ID']), 'emp']= df['ID'].value_counts().values desired output for your given sample dataframe
but you can try:
cond=~df.duplicated(keep='first', subset=['ID'])
df.loc[cond,'emp']=df.loc[cond,'ID'].map(df['ID'].value_counts())

Python Counting Same Values For Specific Columns

If i have a dataframe;
A B C D
1 1 2 2 1
2 1 1 2 1
3 3 1 0 1
4 2 4 4 4
I want to make addition B and C columns and counting whether or not the same values with D columns. Desired output is;
A B C B+C D
1 1 2 2 4 1
2 1 1 2 3 1
3 3 1 0 1 1
4 2 4 4 8 4
There are 3 different values compare the "B+C" and "D".
Could you please help me about this?

You could do something like:
df.B.add(df.C).ne(df.D).sum()
# 3
If you need to add the column:
df['B+C'] = df.B.add(df.C)
diff = df['B+C'].ne(df.D).sum()
print(f'There are {diff} different values compare the "B+C" and "D"')
#There are 3 different values compare the "B+C" and "D"

df.insert(3,'B+C', df['B']+df['C'])
3 is the index
df.head()
A B C B+C D
0 1 2 2 4 1
1 1 1 2 3 1
2 3 1 0 1 1
3 2 4 4 8 4
After that you can follow the steps of #yatu
df['B+C'].ne(df['D'])
0 True
1 True
2 False
3 True dtype: bool
df['B+C'].ne(df['D']).sum()
3

How to pandas groupby one column and filter dataframe based on the minimum unique values of another column?

I have a data frame that looks like this:
CP AID type
1 1 b
1 2 b
1 3 a
2 4 a
2 4 b
3 5 b
3 6 a
3 7 b
I would like to groupby the CP column and filter so it only returns rows where the CP has at least 3 unique 'pairs' from the AID column.
The result should look like this:
CP AID type
1 1 b
1 2 b
1 3 a
3 5 b
3 6 a
3 7 b

You can groupby in combination with unique:
m = df.groupby('CP').AID.transform('unique').str.len() >= 3
print(df[m])
CP AID type
0 1 1 b
1 1 2 b
2 1 3 a
5 3 5 b
6 3 6 a
7 3 7 b
Or as RafaelC mentioned in the comments:
m = df.groupby('CP').AID.transform('nunique').ge(3)
print(df[m])
CP AID type
0 1 1 b
1 1 2 b
2 1 3 a
5 3 5 b
6 3 6 a
7 3 7 b

You can do that:
count = df1[['CP', 'AID']].groupby('CP').count().reset_index()
df1 = df1[df1['CP'].isin(count.loc[count['AID'] == 3,'CP'].values.tolist())]

Pandas reverse column values groupwise

I want to reverse a column values in my dataframe, but only on a individual "groupby" level. Below you can find a minimal demonstration example, where I want to "flip" values that belong the same letter A,B or C:
df = pd.DataFrame({"group":["A","A","A","B","B","B","B","C","C"],
"value": [1,3,2,4,4,2,3,2,5]})
group value
0 A 1
1 A 3
2 A 2
3 B 4
4 B 4
5 B 2
6 B 3
7 C 2
8 C 5
My desired output looks like this: (column is added instead of replaced only for the brevity purposes)
group value value_desired
0 A 1 2
1 A 3 3
2 A 2 1
3 B 4 3
4 B 4 2
5 B 2 4
6 B 3 4
7 C 2 5
8 C 5 2
As always, when I don't see a proper vector-style approach, I end messing with loops just for the sake of final output, but my current code hurts me very much:
for i in list(set(df["group"].values.tolist())):
reversed_group = df.loc[df["group"]==i,"value"].values.tolist()[::-1]
df.loc[df["group"]==i,"value_desired"] = reversed_group
Pandas gurus, please show me the way :)

You can use transform
In [900]: df.groupby('group')['value'].transform(lambda x: x[::-1])
Out[900]:
0 2
1 3
2 1
3 3
4 2
5 4
6 4
7 5
8 2
Name: value, dtype: int64
Details
In [901]: df['value_desired'] = df.groupby('group')['value'].transform(lambda x: x[::-1])
In [902]: df
Out[902]:
group value value_desired
0 A 1 2
1 A 3 3
2 A 2 1
3 B 4 3
4 B 4 2
5 B 2 4
6 B 3 4
7 C 2 5
8 C 5 2

Handling duplicate rows in python

I have a date frame df, let's say with 5 columns : a, b, c, d, e.
a b c d e
1 6 x 8 3
2 3 y 2 3
3 5 d 1 1
3 4 g 3 4
5 3 z 3 1
This is what I want to do, for all the rows with same value of column a, I want to drop duplicates, but value of column b should be summed across those rows, and for rest of the columns, I want to keep the first value.
Final Data frame will be :
a b c d e
1 6 x 8 3
2 3 y 2 3
3 9 d 1 1
5 3 z 3 1
How to do this?

I'd assign to column 'b' the result of grouping on 'a' and summing, you can then drop the duplicates:
In [171]:
df['b'] = df.groupby('a')['b'].transform('sum')
df
Out[171]:
a b c d e
0 1 6 x 8 3
1 2 3 y 2 3
2 3 9 d 1 1
3 3 9 g 3 4
4 5 3 z 3 1
In [172]:
df.drop_duplicates('a')
Out[172]:
a b c d e
0 1 6 x 8 3
1 2 3 y 2 3
2 3 9 d 1 1
4 5 3 z 3 1

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Delete keys with missing values? - python

Related

pandas dataframe duplicate values count not properly working

Python Counting Same Values For Specific Columns

How to pandas groupby one column and filter dataframe based on the minimum unique values of another column?

Pandas reverse column values groupwise

Handling duplicate rows in python

Categories

Resources