how to choose certain amount of character from a column in Python? - python

for example, there is a column in a dataframe, 'ID'.
One of the entries is for example, '13245993, 3004992'
I only want to get '13245993'.
That also applies for every row in column 'ID'.
How to change the data in each row in column 'ID'?

You can try like this, apply slicing on ID column to get the required result. I am using 3 chars as no:of chars here
import pandas as pd
data = {'Name':['Tom', 'nick', 'krish', 'jack'],
'ID':[90877, 10909, 12223, 12334]}
df=pd.DataFrame(data)
print('Before change')
print(df)
df["ID"]=df["ID"].apply(lambda x: (str(x)[:3]))
print('After change')
print(df)
output
Before change
Name ID
0 Tom 90877
1 nick 10909
2 krish 12223
3 jack 12334
After change
Name ID
0 Tom 908
1 nick 109
2 krish 122
3 jack 123

You could do something like
data[data['ID'] == '13245993']
this will give you the columns where ID is 13245993
More Indepth Code
I hope this answers your question if not please let me know.
With best regards

Related

How to collapse all rows in pandas dataframe across all columns

I am trying to collapse all the rows of a dataframe into one single row across all columns.
My data frame looks like the following:
name
job
value
bob
business
100
NAN
dentist
Nan
jack
Nan
Nan
I am trying to get the following output:
name
job
value
bob jack
business dentist
100
I am trying to group across all columns, I do not care if the value column is converted to dtype object (string).
I'm just trying to collapse all the rows across all columns.
I've tried groupby(index=0) but did not get good results.
You could apply join:
out = df.apply(lambda x: ' '.join(x.dropna().astype(str))).to_frame().T
Output:
name job value
0 bob jack business dentist 100.0
Try this:
new_df = df.agg(lambda x: x.dropna().astype(str).tolist()).str.join(' ').to_frame().T
Output:
>>> new_df
name job value
0 bob jack business dentist 100.0

Update an existing column in one dataframe based on the value of a column in another dataframe

I have two csv files as my raw data to read into different dataframes. One is called 'employee' and another is called 'origin'. However, I cannot upload the files here so I hardcoded the data into the dataframes below. The task I'm trying to solve is to update the 'Eligible' column in employee_details with 'Yes' or 'No' based on the value of the 'Country' column in origin_details. If Country = UK, then put 'Yes' in the Eligible column for that Personal_ID. Else, put 'No'.
import pandas as pd
import numpy as np
employee = {
'Personal_ID': ['1000123', '1100258', '1104682', '1020943'],
'Name': ['Tom', 'Joseph', 'Krish', 'John'],
'Age': ['40', '35', '43', '51'],
'Eligible': ' '}
origin = {
'Personal_ID': ['1000123', '1100258', '1104682', '1020943', '1573482', '1739526'],
'Country': ['UK', 'USA', 'FRA', 'SWE', 'UK', 'AU']}
employee_details = pd.DataFrame(employee)
origin_details = pd.DataFrame(origin)
employee_details['Eligible'] = np.where((origin_details['Country']) == 'UK', 'Yes', 'No')
print(employee_details)
print(origin_details)
The output of above code shows the below error message:
ValueError: Length of values (6) does not match length of index (4)
However, I am expecting to see the below as my output.
Personal_ID Name Age Eligible
0 1000123 Tom 40 Yes
1 1100258 Joseph 35 No
2 1104682 Krish 43 No
3 1020943 John 51 No
I also don't want to delete anything in my dataframes to match the size specified in the ValueError message because I may need the extra Personal_IDs in the origin_details later. Alternatively, I can keep all the existing Personal_ID's in the raw data (employee_details, origin_details) and create a new dataframe from those to extract the records which have the same Personal_ID's and determine the np.where() condition from there.
Please advise! Any helps are appreciated, thank you!!
You can merge the 2 dataframes on Personal ID and then use np.where
Merge with how='outer' to keep all personal IDs
df_merge = pd.merge(employee_details, origin_details, on='Personal_ID', how='outer')
df_merge['Eligible'] = np.where(df_merge['Country']=='UK', 'Yes', 'No')
Personal_ID Name Age Eligible Country
0 1000123 Tom 40 Yes UK
1 1100258 Joseph 35 No USA
2 1104682 Krish 43 No FRA
3 1020943 John 51 No SWE
4 1573482 NaN NaN Yes UK
5 1739526 NaN NaN No AU
If you dont want to keep all personal IDs then you can merge with how='inner' and you won't see the NANs
df_merge = pd.merge(employee_details, origin_details, on='Personal_ID', how='inner')
df_merge['Eligible'] = np.where(df_merge['Country']=='UK', 'Yes', 'No')
Personal_ID Name Age Eligible Country
0 1000123 Tom 40 Yes UK
1 1100258 Joseph 35 No USA
2 1104682 Krish 43 No FRA
3 1020943 John 51 No SWE
You are using a Pandas Series object inside a Numpy method, np.where((origin_details['Country'])). I believe this is the problem.
try:
employee_details['Eligible'] = origin_details['Country'].apply(lambda x:"Yes" if x=='UK' else "No")
It is always much easier and faster to use the pandas library to analyze dataframes instead of converting them back to numpy arrays
Well, the first thing I want to answer about is the exception and how lucky you are that it didn't if your tables were the same length your code was going to work.
but there is an assumption in the code that I don't think you thought about and that is that the ids may not be in the same order or like in the example there are more ids in some table than the other if you had the same length of tables but not the same order you would have got incorrect eligible values for each row. the current way to do this is as follow
first join the table to one using personal_id but use left join as you don't want to lose data if there is no origin info for that personal id.
combine_df = pd.merge(employee_details, origin_details, on='Personal_ID', how='left')
use the apply function to fill the new column
combine_df['Eligible'] = combine_df['Country'].apply(lambda x:'Yes' if x=='UK' else 'No')

How can I create a column in an existing df from dictionary and using conditions?

I have the following df (just for example):
data={'Name': ['Tom', 'Joseph', 'Krish', 'John']}
df=pd.DataFrame(data)
print(df)
city={"New York": "123",
"LA":"456",
"Miami":"789"}
Output:
Name
0 Tom
1 Joseph
2 Krish
3 John
I would like to add another column to the df which will be based on the city dictionary.
I would like to do it by the following conditions:
If the Name is Tom or Krish then they should get 123 (New York).
If the Name is John then he should get 456 (LA).
If the Name is Joseph then he should get 789 (Miami).
Thanks in advance :)
try via loc and boolean masking:
df.loc[df['Name'].isin(['Tom','Krish']),'City']='New York'
df.loc[df['Name'].eq('Joseph'),'City']='LA'
df.loc[df['Name'].eq('John'),'City']='Miami'
Finally:
df['Value']=df['City'].map(city)
#you can also use replace() in place of map()
OR
#import numpy as np
cond=[
df['Name'].isin(['Tom','Krish']),
df['Name'].eq('Joseph'),
df['Name'].eq('John')
]
df['City']=np.select(cond,['New York','LA','Miami'])
df['Value']=df['City'].map(city)

Check if pandas column contains text in another dataframe and replace values

I have two df's, one for user names and another for real names. I'd like to know how I can check if I have a real name in my first df using the data of the other, and then replace it.
For example:
import pandas as pd
df1 = pd.DataFrame({'userName':['peterKing', 'john', 'joe545', 'mary']})
df2 = pd.DataFrame({'realName':['alice','peter', 'john', 'francis', 'joe', 'carol']})
df1
userName
0 peterKing
1 john
2 joe545
3 mary
df2
realName
0 alice
1 peter
2 john
3 francis
4 joe
5 carol
My code should replace 'peterKing' and 'joe545' since these names appear in my df2. I tried using pd.contains, but I can only verify if a name appears or not.
The output should be like this:
userName
0 peter
1 john
2 joe
3 mary
Can someone help me with that? Thanks in advance!
You can use loc[row, colum], here you can see the documentation about loc method. And Series.str.contain method to select the usernames you need to replace with the real names. In my opinion, this solution is clear in terms of readability.
for real_name in df2['realName'].to_list():
df1.loc[ df1['userName'].str.contains(real_name), 'userName' ] = real_name
Output:
userName
0 peter
1 john
2 joe
3 mary

Creating a new column in a specific place in Pandas [duplicate]

This question already has answers here:
how do I insert a column at a specific column index in pandas?
(6 answers)
Closed 2 years ago.
I would like to create a new column in Python and place in a specific position. For instance, let "example" be the following dataframe:
import pandas as pd
example = pd.DataFrame({
'name': ['alice','bob','charlie'],
'age': [25,26,27],
'job': ['manager','clerk','policeman'],
'gender': ['female','male','male'],
'nationality': ['french','german','american']
})
I would like to create a new column to contain the values of the column "age":
example['age_times_two']= example['age'] *2
Yet, this code creates a column at the end of the dataframe. I would like to place it as the third column, or, in other words, the column right next to the column "age". How could this be done:
a) By setting an absolute place to the new column (e.g. third position)?
b) By setting a relative place for the new column (e.g. right to the column "age")?
You can use df.insert here.
example.insert(2,'age_times_two',example['age']*2)
example
name age age_times_two job gender nationality
0 alice 25 50 manager female french
1 bob 26 52 clerk male german
2 charlie 27 54 policeman male american
This is a bit manual way of doing it:
example['age_times_two']= example['age'] *2
cols = list(example.columns.values)
cols
You get a list of all the columns, and you can rearrange them manually and place them in the code below
example = example[['name', 'age', 'age_times_two', 'job', 'gender', 'nationality']]
Another way to do it:
example['age_times_two']= example['age'] *2
cols = example.columns.tolist()
cols = cols[:2]+cols[-1:]+cols[2:-1]
example = example[cols]
print(example)
.
name age age_times_two job gender nationality
0 alice 25 50 manager female french
1 bob 26 52 clerk male german
2 charlie 27 54 policeman male american

Categories

Resources