I have a dataframe that looks like this:
A B C D
0 abc 0 cdf
abf 0 0 afg
And I want to replace any string value with 1.
The expected outcome should look like:
A B C D
0 1 0 1
1 0 0 1
Any help on how to do this is appreciated..
The safe way
df.apply(pd.to_numeric,errors = 'coerce').fillna(1)
Out[217]:
A B C D
0 0.0 1.0 0 1.0
1 1.0 0.0 0 1.0
And for the show case work
(~df.isin([0,'0'])).astype(int)
Out[221]:
A B C D
0 0 1 0 1
1 1 0 0 1
Related
I have a dictionary like this:
Dictionary:
{1:['A','B','B','C'],
2:['A','B','C','D','D','E','E','E','E'],
3:['C','C','C','D','D','D','D']
}
I want to convert this dictionary into a data frame that has keys on the index and its list values on columns and display the count of the list values like this:
DataFrame:
A B C D E
1 1 2 1 0 0
2 1 1 1 2 4
3 0 0 3 4 0
Please help me with how I can achieve this data frame!
You can utilize Counter here:
from collections import Counter
import pandas as pd
d = {1:['A','B','B','C'],
2:['A','B','C','D','D','E','E','E','E'],
3:['C','C','C','D','D','D','D']
}
count_dict = {k: Counter(v) for k, v in d.items()}
res = pd.DataFrame.from_dict(count_dict, orient='index').fillna(0).astype('int')
print(res)
# A B C D E
# 1 1 2 1 0 0
# 2 1 1 1 2 4
# 3 0 0 3 4 0
You can do it like this also:
pd.DataFrame.from_dict(d, orient='index').T.apply(pd.Series.value_counts).T.fillna(0)
Output:
A B C D E
1 1.0 2.0 1.0 0.0 0.0
2 1.0 1.0 1.0 2.0 4.0
3 0.0 0.0 3.0 4.0 0.0
Let us try explode with pd.crosstab
s = pd.Series(d).explode()
out = pd.crosstab(s.index,s)
Out[257]:
col_0 A B C D E
row_0
1 1 2 1 0 0
2 1 1 1 2 4
3 0 0 3 4 0
Suppose, I have the following dataframe:
A B C D E F
1 1 1 0 0 0
0 0 0 0 0 0
1 1 0.9 1 0 0
0 0 0 0 -1.95 0
0 0 0 0 2.75 0
1 1 1 1 1 1
I want to select rows which have only zeros as well as ones (0 & 1) based on the columns C, D, E and F. For this example, the expected output is
A B C D E F
1 1 1 0 0 0
How can I do this with considering a range of columns in pandas?
Thanks in advance.
Let's try boolean indexing with loc to filter the rows:
c = ['C', 'D', 'E', 'F']
df.loc[df[c].isin([0, 1]).all(1) & df[c].eq(0).any(1) & df[c].eq(1).any(1)]
Result:
A B C D E F
0 1 1 1.0 0 0.0 0
Try apply and loc:
print(df.loc[df.apply(lambda x: sorted(x.drop_duplicates().tolist()) == [0, 1], axis=1)])
Output:
A B C D E F
0 1 1 1.0 0 0.0 0
I want the below solution for this in PANDAS 3.5. I have the partial solution in SQL in an earlier post.
Hi I have a dataframe as below with thousands of ID's. It has a list of ID's which have sub id's within them as shown. The subid's may get changed on daily basis, either a new sub id may be added, or an existing sub id maybe lost.
I need to create 2 new columns, which will flag whenever a sub id is added/lost.
So, in the below format you can see that on the 12th, a new sub id 'D' is added and on the 13th, and existing sub id (c) is lost. i want to create a new column/flag to track these sub ids. Can you please help me with this?
When a subid gets removed, I would like it to have a additional row, with the is_removed column = 1 on the date it is actually removed. The sample input/output dataframes are below. Thanks.
Sample input dataframe:
ID Sub Id Date
1 a 3/11/2016
1 b 3/11/2016
1 c 3/11/2016
1 a 3/12/2016
1 b 3/12/2016
1 c 3/12/2016
1 d 3/12/2016
1 a 3/13/2016
1 b 3/13/2016
1 d 3/13/2016
Sample Output:
ID SUBID UPDDATE IS_NEW IS_REMOVED
1 a 2016-03-11 0 0
1 b 2016-03-11 0 0
1 c 2016-03-11 0 0
1 a 2016-03-12 0 0
1 b 2016-03-12 0 0
1 c 2016-03-12 0 0
1 d 2016-03-12 1 0
1 a 2016-03-13 0 0
1 b 2016-03-13 0 0
1 c 2016-03-13 0 1
1 d 2016-03-13 0 0
One way you could do this and visualize the results as you do it to use pd.crosstab:
df_out = pd.crosstab([df['ID'],df['Date']],df['Sub Id'])
df_diff = df_out.diff().fillna(0).stack()
pd.concat([df.set_index(['ID','Date','Sub Id']),
df_diff.eq(1).mul(1).rename('IS_NEW'),
df_diff.eq(-1).mul(1).rename('IS_REMOVED')],axis=1)\
.reset_index()
Output:
ID Date Sub Id IS_NEW IS_REMOVED
0 1 2016-03-11 a 0 0
1 1 2016-03-11 b 0 0
2 1 2016-03-11 c 0 0
3 1 2016-03-11 d 0 0
4 1 2016-03-12 a 0 0
5 1 2016-03-12 b 0 0
6 1 2016-03-12 c 0 0
7 1 2016-03-12 d 1 0
8 1 2016-03-13 a 0 0
9 1 2016-03-13 b 0 0
10 1 2016-03-13 c 0 1
11 1 2016-03-13 d 0 0
Visualize results:
print(df_out)
Sub Id a b c d
ID Date
1 2016-03-11 1 1 1 0
2016-03-12 1 1 1 1
2016-03-13 1 1 0 1
print(df_out.diff().fillna(0))
Sub Id a b c d
ID Date
1 2016-03-11 0.0 0.0 0.0 0.0
2016-03-12 0.0 0.0 0.0 1.0
2016-03-13 0.0 0.0 -1.0 0.0
I have the following dataframes:
The MAIN dataframe A:
A B
0 1 0
1 1 0
The second dataframe B:
A B
0 0 1
1 1 0
The third dataframe C:
A B C
0 1 0 0
1 0 1 1
2 0 0 1
In python pandas, I want to add A,B and Cthem in such a way that the structure of the resulting dataframe D consists of same columns and rows structure as the MAIN dataframe A while the the values of rows/columns are added.
A + B + C
A B
0 2 1
1 2 1
And by Union addition, I mean that if values > 1, make it 1. So the final
A + B + C is:
A B
0 1 1
1 1 1
As you can see, the structure of first A dataframe is maintained while the values from common rows and columns are added. The common rows and columns are variable so I need a code to do this automatically by detecting common rows and columns. Any ideas how to do this?
UPDATE
Please note that the data frames can multidimensional:
For example:
A
A B
0 a 2 1
1 a 2 1
C
A B C
0 a 1 0 0
0 b 1 0 0
0 b 1 0 0
1 a 0 1 1
2 c 0 0 1
In this case I am expecting: A + C to be:
A B
0 a 3 1
1 a 2 2
Thereby keeping the structure of MAIN dataframe A. Then 'binarized' to
A B
0 a 1 1
1 a 1 1
((dfA+dfB+dfC).reindex(index=dfA.index,columns=dfA.columns)>=1).astype(int)
Out[252]:
A B
0 1 1
1 1 1
Updated :
(A+C).reindex(A.index,columns=A.columns)
Out[297]:
A B
0 a 3.0 1.0
1 a 2.0 2.0
IIUC:
In [56]: (d1 + d2 + d3).dropna(how='all').dropna(axis=1, how='all').ne(0).astype(np.int8)
Out[56]:
A B
0 1 1
1 1 1
UPDATE:
In [129]: idx = A.index.intersection(C.index)
In [131]: (A.loc[idx] | B.loc[idx, A.columns]).gt(0).astype('int8')
Out[131]:
A B
0 a 1 1
1 a 1 1
Depending a bit on how much of your given structure will generalize,
In [50]: df_a | df_b | df_c.loc[df_a.index, df_a.columns]
Out[50]:
A B
0 1 1
1 1 1
For the following data frame df1:
sentence A B C D F G
dizzy 1 1 0 0 k 1
Head 0 0 1 0 l 1
nausea 0 0 0 1 fd 1
zap 1 0 1 0 g 1
dizziness 0 0 0 1 V 1
I need to create a dictionary from column sentence with columns A, B, C,and D.
In the next step, I need to map sentences column in data frame F2 to the value A, B, C, and D. The output is like this:
sentences A B C D
dizzy 1 1 0 0
happy
Head 0 0 1 0
nausea 0 0 0 1
fill out
zap 1 0 1 0
dizziness 0 0 0 1
This is my code, but just for one column, I do not know how to do it for several columns:
equiv = df1.set_index (sentences)[A].to_dict()
df2[A]=df2[sentences].apply (lambda x:equiv.get(x, np.nan))
Thanks.
IIUC:
Setup:
In [164]: df1
Out[164]:
sentence A B C D F G
0 dizzy 1 1 0 0 k 1
1 Head 0 0 1 0 l 1
2 nausea 0 0 0 1 fd 1
3 zap 1 0 1 0 g 1
4 dizziness 0 0 0 1 V 1
In [165]: df2
Out[165]:
sentences
0 dizzy
1 happy
2 Head
3 nausea
4 fill out
5 zap
6 dizziness
Solution:
In [174]: df2[['sentences']].merge(df1[['sentence','A','B','C','D']],
left_on='sentences',
right_on='sentence',
how='outer')
Out[174]:
sentences sentence A B C D
0 dizzy dizzy 1.0 1.0 0.0 0.0
1 happy NaN NaN NaN NaN NaN
2 Head Head 0.0 0.0 1.0 0.0
3 nausea nausea 0.0 0.0 0.0 1.0
4 fill out NaN NaN NaN NaN NaN
5 zap zap 1.0 0.0 1.0 0.0
6 dizziness dizziness 0.0 0.0 0.0 1.0