This question already has answers here:
Remap values in pandas column with a dict, preserve NaNs
(11 answers)
Closed 1 year ago.
I have a pandas dataframe on the form:
Emotion Text
0 0 Say , Jim , how about going for a few beers af...
1 0 You know that is tempting but is really not g...
2 0 What do you mean ? It will help us to relax .
3 0 Do you really think so ? I don't . It will ju...
4 0 I guess you are right.But what shall we do ? ...
What I want to do is convert the Emotion column based on a dictionary mapping:
EMOTIONS = {0: neutral, 1: anger, 2: disgust, 3: fear, 4: happiness}
Original answer - apply
You can create another column applying a function to your original column.
>>> emotions_dict = {0: "neutral", 1: "anger", 2: "disgust", 3: "fear", 4: "happiness"}
>>> df["emotions_str"] = df["emotions"].apply(lambda el: emotions_dict[el])
>>> df
emotions text emotions_str
0 0 foo neutral
1 0 bar neutral
2 0 baz neutral
3 0 hello neutral
If you want to override your numeric column, you can just replace emotions_str with emotions.
map
The same result can be achieved with map; the assignment becomes:
>>> df["emotions_str"] = df["emotions"].map(emotions_dict)
Values not found in dictionary will be converted to NaN.
This question already has an answer here:
Adding a new pandas column with mapped value from a dictionary [duplicate]
(1 answer)
Closed 2 years ago.
I am iterating through a dataframe and pulling out specific lines and then enriching those lines with some other elements. I have a dictionary that has the following definition mapping:
testdir = {0: 'zero', 40: 'forty', 60: 'sixty', 80: 'eighty'}
When i pull out a specific line from the original dataframe which looks like this
a b c x str
0 0 0 0 100.0 aaaa
i want the str cell to now be set to the string value of column c which is 0 so
output should be
a b c x str
0 0 0 0 100.0 zero
and then after meeting some other conditions a new line is pulled out from the original dataframe and the output should be
a b c x str
0 0 0 0 100.0 zero
3 4 30 60 100.0 sixty
i tried to use the map() method so something like'
df['str'][-1] = df['c'][-1].map(testdir)
but i'm erroring all over the place!
map is intended for pd.Series, so if you can, just map the entire column when it is fully populated, that way you avoid the oevrhead of multiple calls to map:
df['str'] = df.c.map(testdir)
print(df)
a b c x str
0 0 0 0 100.0 zero
3 4 30 60 100.0 sixty
Note that, to correctly index the dataframe on a single cell, and map with the dictionary, you need something like:
testdir[df.iat[-1, 2]]
Chained indexing as df['c'][-1] is discouraged in the docs, and has a negative effect when assigning, which is that your not updating the original dataframe, since it returns a copy.
This question already has an answer here:
Pandas rolling sum on string column
(1 answer)
Closed 3 years ago.
I have a Pandas dataframe that looks like the below code. I need to add a dynamic column that concatenates every value in a sequence before a given line. A loop sounds like the logical solution but would be super inefficient over a very large dataframe (1M+ rows).
user_id=[1,1,1,1,2,2,2,3,3,3,3,3]
variable=["A","B","C","D","A","B","C","A","B","C","D","E"]
sequence=[0,1,2,3,0,1,2,0,1,2,3,4]
df=pd.DataFrame(list(zip(ID,variable,sequence)),columns =['User_ID', 'Variables','Seq'])
# Need to add a column dynamically
df['dynamic_column']=["A","AB","ABC","ABCD","A","AB","ABC","A","AB","ABC","ABCD","ABCDE"]
I need to be able to create the dynamic column in an efficient way based on the user_id and the sequence number. I have played with the pandas shift function and that just results in having to create a loop. Looking for some easy efficient way of creating that dynamic concatenated column.
This is cumsum:
df['dynamic_column'] = df.groupby('User_ID').Variables.apply(lambda x: x.cumsum())
Output:
0 A
1 AB
2 ABC
3 ABCD
4 A
5 AB
6 ABC
7 A
8 AB
9 ABC
10 ABCD
11 ABCDE
Name: Variables, dtype: object
Your question is a little vague, but would something like this work?
df['DynamicColumn'] = df['user_id'] + df['sequencenumber']
This question already has an answer here:
Python - splitting dataframe into multiple dataframes based on column values and naming them with those values [duplicate]
(1 answer)
Closed 4 years ago.
I am learning python and pandas and am having trouble overcoming an error while trying to subset a data frame.
I have an input data frame:
df0-
Index Group Value
1 A 10
2 A 15
3 B 20
4 C 10
5 C 10
df0.dtypes-
Group object
Value float64
That I am trying to split out into unique values based off of the Group column. With the output looking something like this:
df1-
Index Group Value
1 A 10
2 A 15
df2-
Index Group Value
3 B 20
df3-
Index Group Value
4 C 10
5 C 10
So far I have written this code to subset the input:
UniqueGroups = df0['Group'].unique().tolist()
OutputFrame = {}
for x in UniqueAgencies:
ReturnFrame[str('ConsolidateReport_')+x] = UniqueAgencies[df0['Group']==x]
The code above returns the following error, which I can`t quite work my head around. Can anyone point me in the right direction?
*** TypeError: list indices must be integers or slices, not str
you can use groupby to group the column
for _, g in df0.groupby('Group'):
print g
How to apply one hot encoding only to the columns having numeric categorical values. I want to modify the same dataframe. Dataframe has other features with string values. thanks
If you've got a dataframe what you can do is use the pd.get_dummies(...) method.
>>> import pandas as pd
>>> s = pd.Series(list('abca'))
>>> pd.get_dummies(s)
a b c
0 1 0 0
1 0 1 0
2 0 0 1
3 1 0 0
You can checkout the Docs for more.
There is also an optional columns argument which takes in a list of the columns to turn into dummies.
Here is an SO question pertaining to how to get a list of columns and types.