I have a dataframe which contains many pre-defined column names. One column of this dataframe contains the name of these columns.
I want to write the value 1 where the string name is equal to the column name.
For example, I have this current situation:
df = pd.DataFrame(0,index=[0,1,2,3],columns = ["string","a","b","c","d"])
df["string"] = ["b", "b", "c", "a"]
string a b c d
------------------------------
b 0 0 0 0
b 0 0 0 0
c 0 0 0 0
a 0 0 0 0
And this is what I would like the desired result to be like:
string a b c d
------------------------------
b 0 1 0 0
b 0 1 0 0
c 0 0 1 0
a 1 0 0 0
You can use get_dummies on df['string'] and update the DataFrame in place:
df.update(pd.get_dummies(df['string']))
updated df:
string a b c d
0 b 0 1 0 0
1 b 0 1 0 0
2 c 0 0 1 0
3 a 1 0 0 0
you can also use this
df.loc[ df[“column_name”] == “some_value”, “column_name”] = “value”
In your case
df.loc[ df["string"] == "b", "b"] = 1
Related
This is my csv file:
A B C D J
0 1 0 0 0
0 0 0 0 0
1 1 1 0 0
0 0 0 0 0
0 0 7 0 7
I need each time to select two columns and I verify this condition if I have Two 0 I delete the row so for exemple I select A and B
Input
A B
0 1
0 0
1 1
0 0
0 0
Output
A B
0 1
1 1
And Then I select A and C ..
I used This code for A and B but it return errors
import pandas as pd
df = pd.read_csv('Book1.csv')
a=df['A']
b=df['B']
indexes_to_drop = []
for i in df.index:
if df[(a==0) & (b==0)] :
indexes_to_drop.append(i)
df.drop(df.index[indexes_to_drop], inplace=True )
Any help please!
First we make your desired combinations of column A with all the rest, then we use iloc to select the correct rows per column combination:
idx_ranges = [[0,i] for i in range(1, len(df.columns))]
dfs = [df[df.iloc[:, idx].ne(0).any(axis=1)].iloc[:, idx] for idx in idx_ranges]
print(dfs[0], '\n')
print(dfs[1], '\n')
print(dfs[2], '\n')
print(dfs[3])
A B
0 0 1
2 1 1
A C
2 1 1
4 0 7
A D
2 1 0
A J
2 1 0
4 0 7
Do not iterate. Create a Boolean Series to slice your DataFrame:
cols = ['A', 'B']
m = df[cols].ne(0).any(1)
df.loc[m]
A B C D J
0 0 1 0 0 0
2 1 1 1 0 0
You can get all combinations and store them in a dict with itertools.combinations. Use .loc to select both the rows and columns you care about.
from itertools import combinations
d = {c: df.loc[df[list(c)].ne(0).any(1), list(c)]
for c in list(combinations(df.columns, 2))}
d[('A', 'B')]
# A B
#0 0 1
#2 1 1
d[('C', 'J')]
# C J
#2 1 0
#4 7 7
I have a dataframe that looks like this :
A B C
1 0 0
1 1 0
0 1 0
0 0 1
I want to replace all values with the respective column name, so that the data looks like:
A B C
A 0 0
A B 0
0 B 0
0 0 C
Afterwards, I want to create a column that is a list of all column values like so:
A B C D
A 0 0 ['A','0','0']
A B 0 ['A','B','0']
0 B 0 ['0','B','0']
0 0 C ['0','0','C']
Finally, I want to group by column D and count the number of occurrences for each pattern.
You can do with mul
df.mul(df.columns).replace('',0)
Out[63]:
A B C
0 A 0 0
1 A B 0
2 0 B 0
3 0 0 C
#df['D']=df.mul(df.columns).replace('',0).values.tolist()
There must be cleaner ways to achieve this, but the you can use:
for column in df:
df[column] = df[column].astype(str).replace("1", column)
df["D"] = df.values.tolist()
Output:
A B C D
0 A 0 0 [A, 0, 0]
1 A B 0 [A, B, 0]
2 0 B 0 [0, B, 0]
3 0 0 C [0, 0, C]
PS: W-B's answer is the cleaner way.
I am trying to do a multiple column select then replace in pandas
df:
a b c d e
0 1 1 0 none
0 0 0 1 none
1 0 0 0 none
0 0 0 0 none
select where any or all of a, b, c, d are non zero
i, j = np.where(df)
s=pd.Series(dict(zip(zip(i, j),
df.columns[j]))).reset_index(-1, drop=True)
s:
0 b
0 c
1 d
2 a
Now I want to replace the values in column e by the series:
df['e'] = s.values
so that e looks like:
e:
b, c
d
a
none
But the problem is that the lengths of the series are different to the number of rows in the dataframe.
Any idea on how I can do this?
Use DataFrame.dot for product with columns names, add rstrip, last add numpy.where for replace empty strings to None:
e = df.dot(df.columns + ', ').str.rstrip(', ')
df['e'] = np.where(e.astype(bool), e, None)
print (df)
a b c d e
0 0 1 1 0 b, c
1 0 0 0 1 d
2 1 0 0 0 a
3 0 0 0 0 None
You can locate the 1's and use their locations as boolean indexes into the dataframe columns:
df['e'] = (df==1).apply(lambda x: df.columns[x], axis=1)\
.str.join(",").replace('','none')
# a b c d e
#0 0 1 1 0 b,c
#1 0 0 0 1 d
#2 1 0 0 0 a
#3 0 0 0 0 none
I have a pandas DataFrame that looks like the following example:
tags tag1 tag2 tag3
0 [a,b,c] 0 0 0
1 [a,b] 0 0 0
2 [b,d] 0 0 0
...
n [a,b,d] 0 0 0
I want to encade the tags as 1s in the rows for tag1, tag2, tag3 if they are present in the tags array for that row index.
However, I can't quite figure out to iterate over properly; my idea so far is as follows:
for i, row in dataset.iterrows():
for tag in row[0]:
for column in range (1,4):
if dataset.iloc[:,column].index == tag:
dataset.set_value(i, column, 1)
However, upon returning the dataset from this method, the columns are still all at 0 value.
Thank you!
It seems you need:
astype for convert column if contains lists to strings
str.strip for remove []
str.get_dummies
df1 = df['tags'].astype(str).str.strip('[]').str.get_dummies(', ')
print (df1)
'a' 'b' 'c' 'd'
0 1 1 1 0
1 1 1 0 0
2 0 1 0 1
3 1 1 0 1
Last add df1 to original DataFrame by concat:
df = pd.concat([df,df1], axis=1)
print (df)
tags tag1 tag2 tag3 'a' 'b' 'c' 'd'
0 [a, b, c] 0 0 0 1 1 1 0
1 [a, b] 0 0 0 1 1 0 0
2 [b, d] 0 0 0 0 1 0 1
3 [a, b, d] 0 0 0 1 1 0 1
Suppose I have a dataframe like this:
Knownvalue A B C D E F G H
17.3413 0 0 0 0 0 0 0 0
33.4534 0 0 0 0 0 0 0 0
what I wanna do is that when Knownvalue is between 0-10, A is changed from 0 to 1. And when Knownvalue is between 10-20, B is changed from 0 to 1,so on so forth.
It should be like this after changing:
Knownvalue A B C D E F G H
17.3413 0 1 0 0 0 0 0 0
33.4534 0 0 0 1 0 0 0 0
Anyone know how to apply a method to change it?
I first bucket the Knownvalue Series into a list of integers equal to its truncated value divided by ten (e.g. 27.87 // 10 = 2). These buckets represent the integer for the desired column location. Because the Knownvalue is in the first column, I add one to these values.
Next, I enumerate through these bin values which effectively gives me tuple pairs of row and column integer indices. I use iat to set the value of the these locations equal to 1.
import pandas as pd
import numpy as np
# Create some sample data.
df_vals = pd.DataFrame({'Knownvalue': np.random.random(5) * 50})
df = pd.concat([df_vals, pd.DataFrame(np.zeros((5, 5)), columns=list('ABCDE'))], axis=1)
# Create desired column locations based on the `Knownvalue`.
bins = (df.Knownvalue // 10).astype('int').tolist()
>>> bins
[4, 3, 0, 1, 0]
# Set these locations equal to 1.
for idx, col in enumerate(bins):
df.iat[idx, col + 1] = 1 # The first column is the `Knownvalue`, hence col + 1
>>> df
Knownvalue A B C D E
0 47.353937 0 0 0 0 1
1 37.460338 0 0 0 1 0
2 3.797964 1 0 0 0 0
3 18.323131 0 1 0 0 0
4 7.927030 1 0 0 0 0
A different approach would be to reconstruct the frame from the Knownvalue column using get_dummies:
>>> import string
>>> new_cols = pd.get_dummies(df["Knownvalue"]//10).loc[:,range(8)].fillna(0)
>>> new_cols.columns = list(string.ascii_uppercase)[:len(new_cols.columns)]
>>> pd.concat([df[["Knownvalue"]], new_cols], axis=1)
Knownvalue A B C D E F G H
0 17.3413 0 1 0 0 0 0 0 0
1 33.4534 0 0 0 1 0 0 0 0
get_dummies does the hard work:
>>> (df.Knownvalue//10)
0 1
1 3
Name: Knownvalue, dtype: float64
>>> pd.get_dummies((df.Knownvalue//10))
1 3
0 1 0
1 0 1