I have a df like this:
0 1 2 3 4 5
abc 0 1 0 0 1
bcd 0 0 1 0 0
def 0 0 0 1 0
How can I convert the dataframe cells to be the column name if there's a 1 in the cell?
Looks like this:
0 1 2 3 4 5
abc 0 2 0 0 5
bcd 0 0 3 0 0
def 0 0 0 4 0
Let us try
df.loc[:,'1':] = df.loc[:,'1':] * df.columns[1:].astype(int)
df
Out[468]:
0 1 2 3 4 5
0 abc 0 2 0 0 5
1 bcd 0 0 3 0 0
2 def 0 0 0 4 0
We can use np.where over the whole dataframe:
values = np.where(df.eq(1), df.columns, df)
df = pd.DataFrame(values, columns=df.columns)
0 1 2 3 4 5
0 abc 0 2 0 0 5
1 bcd 0 0 3 0 0
2 def 0 0 0 4 0
I'd suggest you simply do the logic for each column, where the value is 1 in the given column, set the value as the column name
for col in df.columns:
df.loc[df[col] == 1, col] = col
Related
I have a dataframe df
ID ID2 escto1 escto2 escto3
1 A 1 0 0
2 B 0 1 0
3 C 0 0 3
4 D 0 2 0
so either using indexing or using wildcard
like column name 'escto*'
if df.iloc[:, 2:]>0 then df.helper=1
or
df.loc[(df.iloc[:, 3:]>0,'Transfer')]=1
So that output becomes
ID ID2 escto1 escto2 escto3 helper
1 A 1 0 0 1
2 B 0 1 0 1
3 C 0 0 3 1
4 D 0 2 0 1
Output
One option is to use the boolean output:
df.assign(helper = df.filter(like='escto').gt(0).any(1).astype(int))
ID ID2 escto1 escto2 escto3 helper
0 1 A 1 0 0 1
1 2 B 0 1 0 1
2 3 C 0 0 3 1
3 4 D 0 2 0 1
I want to join two dataframes keeping all differing values.
Should be easy, but I did not find a related post in here.
DF1:
0 1 2 3 4
0 0 0 0 0 1
1 0 0 0 0 1
2 0 0 0 0 1
DF2:
0 1 2 3 4
0 0 0 2 0 0
1 0 0 2 0 0
2 0 0 2 0 0
Result:
0 1 2 3 4
0 0 0 2 0 1
1 0 0 2 0 1
2 0 0 2 0 1
If both have the same dimensions and are filled with zeros as in your example, you can simple sum them up
df1 = pd.DataFrame(data = [[0,0,0,1],[0,0,0,1]])
df2 = pd.DataFrame(data = [[0,2,0,0],[0,2,0,0]])
df1 + df2
0 1 2 3
0 2 0 1
0 2 0 1
But maybe you want a more flexible answer
I have a pandas dataframe:
Index 0 1 2
0 0 1 0
1 0 0 1
2 1 0 0
3 0 0 1
How do I create a new dataframe according to it column name where the value is existed or when the value = 1 ?
Expected output:
Index type
0 1
1 2
2 0
3 2
Use DataFrame.dot if only 1 or 0 values in columns:
#if Index is not column, but index
df['type'] = df.dot(df.columns)
#if Index is column or necessary omit first column
#df['type'] = df.iloc[:, 1:].dot(df.columns[1:])
print (df)
0 1 2 type
Index
0 0 1 0 1
1 0 0 1 2
2 1 0 0 0
3 0 0 1 2
Solution also working correct if no 1 value per row, then return empty string:
df['type'] = df.dot(df.columns)
print (df)
0 1 2 type
Index
0 0 0 0
1 0 0 1 2
2 1 0 0 0
3 0 0 1 2
Here's a way using np.nonzero
_, df['type'] = np.nonzero(df.values)
print(df)
0 1 2 type
0 0 1 0 1
1 0 0 1 2
2 1 0 0 0
3 0 0 1 2
As it's seems like dummies. you can also use pandas.DataFrame.idxmax
>>> df['type'] = df.idxmax(axis=1)
>>> df
0 1 2 type
0 0 1 0 1
1 0 0 1 2
2 1 0 0 0
3 0 0 1 2
I have a dataframe where some cells contain lists of multiple values. How can I create new columns based on unique values of those lists? Those lists can contain values already included in previous observations, and also can be empty. How I create a new column (One Hot Encoding) based on those values?
CHECK EDIT - Data is within quotation marks:
data = {'tokens': ['["Spain", "Germany", "England", "Japan"]',
'["Spain", "Germany"]',
'["Morocco"]',
'[]',
'["Japan"]',
'[]']}
my_new_pd = pd.DataFrame(data)
0 ["Spain", "Germany", "England", "Japan"]
1 ["Spain", "Germany"]
2 ["Morocco"]
3 []
4 ["Japan", ""]
5 []
Name: tokens, dtype: object
I want something like
tokens_Spain|tokens_Germany |tokens_England |tokens_Japan|tokens_Morocco
0 1 1 1 1 0
1 1 1 0 0 0
2 0 0 0 0 1
3. 0 0 0 0 0
4. 0 0 1 1 0
5. 0 0 0 0 0
Method one from sklearn, since you already have the list type column in your dfs
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
yourdf=pd.DataFrame(mlb.fit_transform(df['tokens']),columns=mlb.classes_, index=df.index)
Method two we do explode first then find the dummies
df['tokens'].explode().str.get_dummies().sum(level=0).add_prefix('tokens_')
tokens_A tokens_B tokens_C tokens_D tokens_Z
0 1 1 1 1 0
1 1 1 0 0 0
2 0 0 0 0 1
3 0 0 0 0 0
4 0 0 0 1 1
5 0 0 0 0 0
Method three kind of like "explode" on the axis = 0
pd.get_dummies(pd.DataFrame(df.tokens.tolist()),prefix='tokens',prefix_sep='_').sum(level=0,axis=1)
tokens_A tokens_D tokens_Z tokens_B tokens_C
0 1 1 0 1 1
1 1 0 0 1 0
2 0 0 1 0 0
3 0 0 0 0 0
4 0 1 1 0 0
5 0 0 0 0 0
Update
df['tokens'].explode().str.get_dummies().sum(level=0).add_prefix('tokens_')
tokens_England tokens_Germany tokens_Japan tokens_Morocco tokens_Spain
0 1 1 1 0 1
1 0 1 0 0 1
2 0 0 0 1 0
3 0 0 0 0 0
4 1 0 1 0 0
5 0 0 0 0 0
I have a dataframe with the following column:
df = pd.DataFrame({"A": [1,2,1,2,2,2,0,1,0]})
and i want:
df2 = pd.DataFrame({"0": [0,0,0,0,0,0,1,0,1],"1": [1,0,1,0,0,0,0,1,0],"2": [0,1,0,1,1,1,0,0,0]})
is there an elegant way of doing this using a oneliner.
NOTE
I can do this using df['0'] = df['A'].apply(find_zeros)
I dont mind if 'A' is included in the final.
Use get_dummies:
df2 = pd.get_dummies(df.A)
print (df2)
0 1 2
0 0 1 0
1 0 0 1
2 0 1 0
3 0 0 1
4 0 0 1
5 0 0 1
6 1 0 0
7 0 1 0
8 1 0 0
In [50]: df.A.astype(str).str.get_dummies()
Out[50]:
0 1 2
0 0 1 0
1 0 0 1
2 0 1 0
3 0 0 1
4 0 0 1
5 0 0 1
6 1 0 0
7 0 1 0
8 1 0 0