This question already has answers here:
Pandas make new column from string slice of another column
(3 answers)
Closed 4 months ago.
data = [['Tom', '5-123g'], ['Max', '6-745.0d'], ['Bob', '5-900.0e'], ['Ben', '2-345',], ['Eva', '9-712.x']]
df = pd.DataFrame(data, columns=['Person', 'Action'])
I want to shorten the "Action" column to a length of 5. My current df has two columns:
['Person'] and ['Action']
I need it to look like this:
person Action Action_short
0 Tom 5-123g 5-123
1 Max 6-745.0d 6-745
2 Bob 5-900.0e 5-900
3 Ben 2-345 2-345
4 Eva 9-712.x 9-712
What I´ve tried was:
Checking the type of the Column
df['Action'].dtypes
The output is:
dtype('0')
Then I tried:
df['Action'] = df['Action'].map(str)
df['Action_short'] = df.Action.str.slice(start=0, stop=5)
I also tried it with:
df['Action'] = df['Action'].astype(str)
df['Action'] = df['Action'].values.astype(str)
df['Action'] = df['Action'].map(str)
df['Action'] = df['Action'].apply(str)```
and with:
df['Action_short'] = df.Action.str.slice(0:5)
df['Action_short'] = df.Action.apply(lambda x: x[:5])
df['pos'] = df['Action'].str.find('.')
df['new_var'] = df.apply(lambda x: x['Action'][0:x['pos']],axis=1)
The output from all my versions was:
person Action Action_short
0 Tom 5-123g 5-12
1 Max 6-745.0d 6-745
2 Bob 5-900.0e 5-90
3 Ben 2-345 2-34
4 Eva 9-712.x 9-712
The lambda funktion is not working with 3-222 it sclices it to 3-22
I don't get it why it is working for some parts and for others not.
Try this:
df['Action_short'] = df['Action'].str.slice(0, 5)
By using .str on a DataFrame or a single column of a DataFrame (which is a pd.Series), you can access pandas string manipulation methods that are designed to look like the string operations on standard python strings.
# slice by specifying the length you need
df['Action_short']=df['Action'].str[:5]
df
Person Action Action_short
0 Tom 5-123g 5-123
1 Max 6-745.0d 6-745
2 Bob 5-900.0e 5-900
3 Ben 2-345 2-345
4 Eva 9-712.x 9-712
Related
This question already has answers here:
Merge two dataframes by index
(7 answers)
Closed 1 year ago.
I am working with an adult dataset where I split the dataframe to label encode categorical columns. Now I want to append the new dataframe with the original dataframe. What is the simplest way to perform the same?
Original Dataframe-
age
salary
32
3000
25
2300
After label encoding few columns
country
gender
1
1
4
2
I want to append the above dataframe and the final result should be the following.
age
salary
country
gender
32
3000
1
1
25
2300
4
2
Any insights are helpful.
lets consider two dataframe named as df1 and df2 hence,
df1.merge(df2,left_index=True, right_index=True)
You can use .join() if the datrframes rows are matched by index, as follows:
.join() is a left join by default and join by index by default.
df1.join(df2)
In addition to simple syntax, it has the extra advantage that when you put your master/original dataframe on the left, left join ensures that the dataframe indexes of the master are retained in the result.
Result:
age salary country gender
0 32 3000 1 1
1 25 2300 4 2
You maybe find your solution in checking pandas.concat.
import numpy as np
import pandas as pd
df1 = pd.DataFrame(np.array([[32,3000],[25,2300]]), columns=['age', 'salary'])
df2 = pd.DataFrame(np.array([[1,1],[4,2]]), columns=['country', 'gender'])
pd.concat([df1, df2], axis=1)
age salary country gender
0 32 25 1 1
1 3000 2300 4 2
This question already has an answer here:
Difference between "as_index = False", and "reset_index()" in pandas groupby
(1 answer)
Closed 3 years ago.
I have this dataframe
df =
name age character
0 A 10 fire
1 A 15 water
2 A 20 earth
3 A 25 air
4 B 10 fire
5 B 7 air
I organized it with groupby,
df = df.groupby('name').aggregate(list)
then I have this output
age character
name
---------------------------------------
A [10,15,20,25] [fire, water, earth, air]
B [10, 7] [fire, air]
I tried to pivot this dataframe, based on name column. But after groupby, name columns is not in the columns anymore
print(df.columns)
>>> ["age", "character"]
How can I lift up this column, so that I can use for pivot?
EDIT
Expected output is,
name age character
---------------------------------------
A [10,15,20,25] [fire, water, earth, air]
B [10, 7] [fire, air]
df = df.reset_index()
However, pivoting list data is usually unhelpful.
This question already has answers here:
How do I create a new column from the output of pandas groupby().sum()?
(4 answers)
Closed 3 years ago.
I have a dataframe with a series of days, users active each day and events for that user on that day. I want to add a column that gives me the total number of events for each user over the total time span in another column.
I can make it work with this code but I'm certain there's a more elegant way to do it. Please let me know what could be better!
df1 = pd.DataFrame({'users': ['Sara', 'James', 'Sara', 'James'],
'events': [3, 2, 5, 1]
})
df2 = df1.groupby('users').sum()
df2.rename(columns= {'events' : 'total'}, inplace=True)
df3 = pd.merge(df1, df2, how='left', on='users')
This gives me the output I want with 8 in every Sara row and 3 in every James row.
There is indeed, do you know about the transform method? it returns a groupby in the same format as your current dataframe
df1['total'] = df1.groupby('users').transform('sum')
print(df1)
users events total
0 Sara 3 8
1 James 2 3
2 Sara 5 8
3 James 1 3
just as a test
df1 == df3
users events total
0 True True True
1 True True True
2 True True True
3 True True True
more here :
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.transform.html
https://pbpython.com/pandas_transform.html
Wonder if you have two Columns (A = 'Name', B = 'Name_Age'), is there a quick way to remove 'Name' from 'Name_Age' so that you can quickly get 'Age', like a reversed concatenation??
I've thought about 'string split', but in some cases (when there's no string split factor) I really need a method to remove strings of one column from strings of another.
#example data below:
import pandas as pd
data = {'Name':['Mark','Matt','Michael'], 'Name_Age':['Mark 14','Matt 29','Michael 18']}
df = pd.DataFrame(data)
You can try using pandas apply function, which lets you define your own function to be passed to every row of the dataframe:
def age_from_name_age(name, name_age):
return name_age.replace(name, '').strip()
df['Age'] = df.apply(lambda x: age_from_name_age(x['Name'], x['Name_Age']),
axis='columns')
age_from_name_age takes two strings (a name and a name_age), and returns just the age. Then, in the apply statement, I define an anonymous lambda function that just takes in a row and passes the correct fields to age_from_name_age.
Using string slicing:
df['Age'] = df.apply(lambda row: row['Name_Age'][len(row['Name']):], axis=1).astype(int)
You can use str.split() to separate the values from the column names with space separator and then rename the column's with new names.
1) Using str.split()
>>> df['Name_Age'].str.split(" ", expand=True).rename(columns={0:'Name', 1:'Age'})
Name Age
0 Mark 14
1 Matt 29
2 Michael 18
OR
>>> df = df['Name_Age'].str.split(" ", expand=True).rename(columns={0:'Name', 1:'Age'})
>>> df
Name Age
0 Mark 14
1 Matt 29
2 Michael 18
OR, by Converting the splitted list into new dataframe:
>>> pd.DataFrame(df.Name_Age.str.split().tolist(), columns="Name Age".split())
Name Age
0 Mark 14
1 Matt 29
2 Michael 18
2) Another option using str.partition
>>> df['Name_Age'].str.partition(" ", True).rename(columns={0:'Name', 2:'Age'}).drop(1, axis=1)
Name Age
0 Mark 14
1 Matt 29
2 Michael 18
3) another using df.assign with lambda
Use split() with default separator as follows and assigning the values back with new column Age.
>>> df.assign(Age = df.Name_Age.apply(lambda x: x.split()[1]))
Name Name_Age Age
0 Mark Mark 14 14
1 Matt Matt 29 29
2 Michael Michael 18 18
OR
>>> df.Name_Age.apply(lambda x: pd.Series(str(x).split())).rename({0:"Name",1:"Age"}, axis=1)
Name Age
0 Mark 14
1 Matt 29
2 Michael 18
This question already has answers here:
pandas: records with lists to separate rows
(3 answers)
Closed 4 years ago.
I have a pandas dataframe as shown here:
id pos value sent
1 a/b/c test/test2/test3 21
2 d/a test/test5 21
I would like to split (=explode)df['pos'] and df['token'] so that the dataframe looks like this:
id pos value sent
1 a test 21
1 b test2 21
1 c test3 21
2 d test 21
2 a test5 21
It doesn't work if I split each column and then concat them à la
pos = df.token.str.split('/', expand=True).stack().str.strip().reset_index(level=1, drop=True)
df1 = pd.concat([pos,value], axis=1, keys=['pos','value'])
Any ideas? I'd really appreciate it.
EDIT:
I tried using this solution here : https://stackoverflow.com/a/40449726/4219498
But I get the following error:
TypeError: Cannot cast array data from dtype('int64') to dtype('int32') according to the rule 'safe'
I suppose this is a numpy related issue although I'm not sure how this happens. I'm using Python 2.7.14
I tend to avoid the stack magic in favour of building a new dataframe from scratch. This is usually also more efficient. Below is one way.
import numpy as np
from itertools import chain
lens = list(map(len, df['pos'].str.split('/')))
res = pd.DataFrame({'id': np.repeat(df['id'], lens),
'pos': list(chain.from_iterable(df['pos'].str.split('/'))),
'value': list(chain.from_iterable(df['value'].str.split('/'))),
'sent': np.repeat(df['sent'], lens)})
print(res)
id pos sent value
0 1 a 21 test
0 1 b 21 test2
0 1 c 21 test3
1 2 d 21 test
1 2 a 21 test5