This question already has answers here:
replace column values in one dataframe by values of another dataframe
(5 answers)
Closed 5 years ago.
So i got my dataframe (df1):
Number Name Gender Hobby
122 John Male -
123 Patrick Male -
124 Rudy Male -
I want to add data to hobby based on number column. Assuming i've got my list of hobby based on its number on different dataframe. Like Example (df2):
Number Hobby
124 Soccer
... ...
... ...
and df3 :
Number Hobby
122 Basketball
... ...
... ...
How can i achieve this dataframe:
Number Name Gender Hobby
122 John Male Basketball
123 Patrick Male -
124 Rudy Male Soccer
So far i've already tried this following solutions :
Select rows from a DataFrame based on values in a column in pandas
but its only selecting some data. How can i update the 'Hobby' column ?
Thanks in advance.
You can use map, merge and join will also achieve it
df['Hobby']=df.Number.map(df1.set_index('Number').Hobby)
df
Out[155]:
Number Name Gender Hobby
0 122 John Male NaN
1 123 Patrick Male NaN
2 124 Rudy Male Soccer
Related
I'm looking to combine multiple row in a dataframe into a single row based on one column
This is what my df looks like:
id Name score
0 1234 jim 34
1 5678 james 45
2 4321 Macy 56
3 1234 Jim 78
4 5678 James 80
I want to combine based on column "score" so the output would look like:
id Name score
0 1234 jim 34,78
1 5678 james 45,80
2 4321 Macy 56
Basically I want to do the reverse of the explode function. How can I achieve this using pandas dataframe?
Try agg with groupby
out = df.groupby('id',as_index=False).agg({'Name':'first','score':lambda x : ','.join(x.astype(str))})
Out[29]:
id Name score
0 1234 jim 34,78
1 4321 Macy 56
2 5678 james 45,80
This question already has answers here:
how do I insert a column at a specific column index in pandas?
(6 answers)
Closed 2 years ago.
I would like to create a new column in Python and place in a specific position. For instance, let "example" be the following dataframe:
import pandas as pd
example = pd.DataFrame({
'name': ['alice','bob','charlie'],
'age': [25,26,27],
'job': ['manager','clerk','policeman'],
'gender': ['female','male','male'],
'nationality': ['french','german','american']
})
I would like to create a new column to contain the values of the column "age":
example['age_times_two']= example['age'] *2
Yet, this code creates a column at the end of the dataframe. I would like to place it as the third column, or, in other words, the column right next to the column "age". How could this be done:
a) By setting an absolute place to the new column (e.g. third position)?
b) By setting a relative place for the new column (e.g. right to the column "age")?
You can use df.insert here.
example.insert(2,'age_times_two',example['age']*2)
example
name age age_times_two job gender nationality
0 alice 25 50 manager female french
1 bob 26 52 clerk male german
2 charlie 27 54 policeman male american
This is a bit manual way of doing it:
example['age_times_two']= example['age'] *2
cols = list(example.columns.values)
cols
You get a list of all the columns, and you can rearrange them manually and place them in the code below
example = example[['name', 'age', 'age_times_two', 'job', 'gender', 'nationality']]
Another way to do it:
example['age_times_two']= example['age'] *2
cols = example.columns.tolist()
cols = cols[:2]+cols[-1:]+cols[2:-1]
example = example[cols]
print(example)
.
name age age_times_two job gender nationality
0 alice 25 50 manager female french
1 bob 26 52 clerk male german
2 charlie 27 54 policeman male american
i am new to Mysql and just getting started with some basic concepts. i have been trying to solve this for a while now. any help is appreciated.
I have a list of users with two phone numbers. i would want to compare two columns(phone numbers) and generate a new row if the data is different in both columns, else retain the row and make no changes.
The processed data would look like the second table.
Is there any way to acheive this in MySql.
i also don't minds doing the transformation in a dataframe and then loading into a table.
id username primary_phone landline
1 John 222 222
2 Michael 123 121
3 lucy 456 456
4 Anderson 900 901
Thanks!!!
Use DataFrame.melt with remove variable column and DataFrame.drop_duplicates:
df = (df.melt(['id','username'], value_name='phone')
.drop('variable', axis=1)
.drop_duplicates()
.sort_values('id'))
print (df)
id username phone
0 1 John 222
1 2 Michael 123
5 2 Michael 121
2 3 lucy 456
3 4 Anderson 900
7 4 Anderson 901
TL;DR:
Have 2 dataframes with different sizes, but one 'id' column(in both df) that supposed to act as index. Need to merge them, group by 'sector' and 'gender' and count/sum entrys in each group.
Long version:
I have a dataframe with 'id', 'sector', among other columns, from company personnel. Another dataframe with 'id' and 'gender'. Examples bellow:
df1:
row* id sector other columns
1 0 Operational ...
2 0 Administrative ...
3 1 Sales ...
4 2 IT ...
5 3 Operational ...
6 3 IT ...
7 4 Sales ...
[...]
150 100 Operational ...
151 100 Sales ...
152 101 IT ...
*I don't really have a 'row' column, it's there just to make it easier to understand my problem.
df2:
row* id gender
1 0 Male
2 1 Female
3 2 Female
4 3 Male
5 4 Male
[...]
101 100 Male
102 101 Female
As you can see, one person can be in more then one sector (which seems to make my problem more complicated.)
I need to merge them together and then make a sum from how many male and female in each sector.
FIRST PROBLEM
Decided to make a new df to get only the columns 'id' and 'sector'.
df3 = df1[['id','sector']]
df3 = df3.merge(df2)
I get:
No common columns to perform merge on. Merge options: left_on=None,
right_on=None, left_index=False, right_index=False
Tried using .join() instead of .merge() and I get:
['id'] not in index"
Tried now with reset_index() - Found in some of the answers around here, but didn't really solved my issue.
df1 = df1.reset_index()
df3 = df1[['id','sector']]
df3 = df3.join(df2)
What I got was this:
row* id sector gender
1 0 Operational Male
2 0 Administrative Female
3 1 Sales Female
4 2 IT Male
5 3 Operational Male
6 3 IT ...
7 4 Sales ...
[...]
150 100 Operational NaN
151 100 Sales NaN
152 101 IT NaN
It didn't respected the 'id' and just concatenated the column to the side. Since df2 only had 102 rows, I got NaN in the other rows(103 to 152), aside from the fact that the 'gender' was no longer accurate.
SECOND PROBLEM
Decided to power through that in order to get the rest of the work done. I tried this:
df3 = df3.groupby('sector','gender').size()
It raises:
No axis named gender for object type < class 'pandas.core.frame.DataFrame'>
What doesn't really make sense to me, because I can call df3.gender and I get the (entire) expected series. If I remove 'gender' from the line above, it actually group but just that doesn't work for me. Also tried passing the columns names befor groupby, to no avail.
Expected result should be something like this:
sector gender sum
operational male 20
operational female 5
administrative male 10
administrative female 17
sales male 12
sales female 13
IT male 1
IT female 11
Not sure if I can answer to my own question but I think I should since the issue is resolved.
The solutions were very simple, even though I don't understand some of the issues I got.
First problem added on='id' in the merge
df3 = df1[['id','sector']].merge(df2, on='id')
Second problem just missing a list, as pointed by #DYZ
df3.groupby(['sector','gender']).size()
Feeling quite stupid right now... Must be tired. Thanks DYZ and sorry for the trouble.
This question already has an answer here:
Convert pandas.groupby to dict
(1 answer)
Closed 4 years ago.
Given a table (/dataFrame) x:
name day earnings revenue
Oliver 1 100 44
Oliver 2 200 69
John 1 144 11
John 2 415 54
John 3 33 10
John 4 82 82
Is it possible to split the table into two tables based on the name column (that acts as an index), and nest the two tables under the same object (not sure about the exact terms to use). So in the example above, tables[0] will be:
name day earnings revenue
Oliver 1 100 44
Oliver 2 200 69
and tables[1] will be:
name day earnings revenue
John 1 144 11
John 2 415 54
John 3 33 10
John 4 82 82
Note that that the number of rows in each 'sub-table' may vary.
Cheers,
Create dictionary of DataFrames:
dfs = dict(tuple(df.groupby('name')))
And then select by keys - value of name column:
print (dfs['Oliver'])
print (dfs['John'])