convert categorical attribute into multiple attributes in python pandas [duplicate]

convert categorical attribute into multiple attributes in python pandas [duplicate] - python

This question already has answers here:
Creating dummy variables in pandas for python
(12 answers)
Closed 4 years ago.
For example, the Gender attribute will be transformed into two attributes, "Genre=M" and "Genre=F"enter image description here
and i need two columns Male and Female ,assigning binary values corresponding to the presence or not presence of the attribute

Method 1: You can make use of pd.get_dummies(colname) which will give you n new columns(where n is number of distinct values of that col) each representing binary flags to represent the value state for each row.
Method 2:
We can also use df. Colname. map({'M' :0,'F':1})
Method 3:
We can use replace command like df. Colname. replace(['M', 'F' ], [1, 0], inplace=True)
First method is onehot encoding other 2 is similar to label encoding

Use the pandas function get_dummies.
get_dummies: Convert categorical variable into dummy/indicator variables. Source.
Example of usage:
s = pd.Series(list('abca'))
pd.get_dummies(s)
Output:
a b c
0 1 0 0
1 0 1 0
2 0 0 1
3 1 0 0

Related

Pandas, how to transform column values from integers to a pre-defined string values [duplicate]

This question already has answers here:
Remap values in pandas column with a dict, preserve NaNs
(11 answers)
Closed 1 year ago.
I have a pandas dataframe on the form:
Emotion Text
0 0 Say , Jim , how about going for a few beers af...
1 0 You know that is tempting but is really not g...
2 0 What do you mean ? It will help us to relax .
3 0 Do you really think so ? I don't . It will ju...
4 0 I guess you are right.But what shall we do ? ...
What I want to do is convert the Emotion column based on a dictionary mapping:
EMOTIONS = {0: neutral, 1: anger, 2: disgust, 3: fear, 4: happiness}

Original answer - apply
You can create another column applying a function to your original column.
>>> emotions_dict = {0: "neutral", 1: "anger", 2: "disgust", 3: "fear", 4: "happiness"}
>>> df["emotions_str"] = df["emotions"].apply(lambda el: emotions_dict[el])
>>> df
emotions text emotions_str
0 0 foo neutral
1 0 bar neutral
2 0 baz neutral
3 0 hello neutral
If you want to override your numeric column, you can just replace emotions_str with emotions.
map
The same result can be achieved with map; the assignment becomes:
>>> df["emotions_str"] = df["emotions"].map(emotions_dict)
Values not found in dictionary will be converted to NaN.

Using a dictionary to map specific values in python [duplicate]

This question already has an answer here:
Adding a new pandas column with mapped value from a dictionary [duplicate]
(1 answer)
Closed 2 years ago.
I am iterating through a dataframe and pulling out specific lines and then enriching those lines with some other elements. I have a dictionary that has the following definition mapping:
testdir = {0: 'zero', 40: 'forty', 60: 'sixty', 80: 'eighty'}
When i pull out a specific line from the original dataframe which looks like this
a b c x str
0 0 0 0 100.0 aaaa
i want the str cell to now be set to the string value of column c which is 0 so
output should be
a b c x str
0 0 0 0 100.0 zero
and then after meeting some other conditions a new line is pulled out from the original dataframe and the output should be
a b c x str
0 0 0 0 100.0 zero
3 4 30 60 100.0 sixty
i tried to use the map() method so something like'
df['str'][-1] = df['c'][-1].map(testdir)
but i'm erroring all over the place!

map is intended for pd.Series, so if you can, just map the entire column when it is fully populated, that way you avoid the oevrhead of multiple calls to map:
df['str'] = df.c.map(testdir)
print(df)
a b c x str
0 0 0 0 100.0 zero
3 4 30 60 100.0 sixty
Note that, to correctly index the dataframe on a single cell, and map with the dictionary, you need something like:
testdir[df.iat[-1, 2]]
Chained indexing as df['c'][-1] is discouraged in the docs, and has a negative effect when assigning, which is that your not updating the original dataframe, since it returns a copy.

Pandas dataframe with concatenated column [duplicate]

This question already has an answer here:
Pandas rolling sum on string column
(1 answer)
Closed 3 years ago.
I have a Pandas dataframe that looks like the below code. I need to add a dynamic column that concatenates every value in a sequence before a given line. A loop sounds like the logical solution but would be super inefficient over a very large dataframe (1M+ rows).
user_id=[1,1,1,1,2,2,2,3,3,3,3,3]
variable=["A","B","C","D","A","B","C","A","B","C","D","E"]
sequence=[0,1,2,3,0,1,2,0,1,2,3,4]
df=pd.DataFrame(list(zip(ID,variable,sequence)),columns =['User_ID', 'Variables','Seq'])
# Need to add a column dynamically
df['dynamic_column']=["A","AB","ABC","ABCD","A","AB","ABC","A","AB","ABC","ABCD","ABCDE"]
I need to be able to create the dynamic column in an efficient way based on the user_id and the sequence number. I have played with the pandas shift function and that just results in having to create a loop. Looking for some easy efficient way of creating that dynamic concatenated column.

This is cumsum:
df['dynamic_column'] = df.groupby('User_ID').Variables.apply(lambda x: x.cumsum())
Output:
0 A
1 AB
2 ABC
3 ABCD
4 A
5 AB
6 ABC
7 A
8 AB
9 ABC
10 ABCD
11 ABCDE
Name: Variables, dtype: object

Your question is a little vague, but would something like this work?
df['DynamicColumn'] = df['user_id'] + df['sequencenumber']

Error subsetting a data frame in python [duplicate]

This question already has an answer here:
Python - splitting dataframe into multiple dataframes based on column values and naming them with those values [duplicate]
(1 answer)
Closed 4 years ago.
I am learning python and pandas and am having trouble overcoming an error while trying to subset a data frame.
I have an input data frame:
df0-
Index Group Value
1 A 10
2 A 15
3 B 20
4 C 10
5 C 10
df0.dtypes-
Group object
Value float64
That I am trying to split out into unique values based off of the Group column. With the output looking something like this:
df1-
Index Group Value
1 A 10
2 A 15
df2-
Index Group Value
3 B 20
df3-
Index Group Value
4 C 10
5 C 10
So far I have written this code to subset the input:
UniqueGroups = df0['Group'].unique().tolist()
OutputFrame = {}
for x in UniqueAgencies:
ReturnFrame[str('ConsolidateReport_')+x] = UniqueAgencies[df0['Group']==x]
The code above returns the following error, which I can`t quite work my head around. Can anyone point me in the right direction?
*** TypeError: list indices must be integers or slices, not str

you can use groupby to group the column
for _, g in df0.groupby('Group'):
print g

Apply one hot encoding in sklearn

How to apply one hot encoding only to the columns having numeric categorical values. I want to modify the same dataframe. Dataframe has other features with string values. thanks

If you've got a dataframe what you can do is use the pd.get_dummies(...) method.
>>> import pandas as pd
>>> s = pd.Series(list('abca'))
>>> pd.get_dummies(s)
a b c
0 1 0 0
1 0 1 0
2 0 0 1
3 1 0 0
You can checkout the Docs for more.
There is also an optional columns argument which takes in a list of the columns to turn into dummies.
Here is an SO question pertaining to how to get a list of columns and types.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

convert categorical attribute into multiple attributes in python pandas [duplicate] - python

Use the pandas function get_dummies. get_dummies: Convert categorical variable into dummy/indicator variables. Source. Example of usage: s = pd.Series(list('abca')) pd.get_dummies(s) Output: a b c 0 1 0 0 1 0 1 0 2 0 0 1 3 1 0 0

Related

Pandas, how to transform column values from integers to a pre-defined string values [duplicate]

Using a dictionary to map specific values in python [duplicate]

Pandas dataframe with concatenated column [duplicate]

Error subsetting a data frame in python [duplicate]

Apply one hot encoding in sklearn

Categories

Resources