I've got two datasets of equal length. Both only one column. I'm trying to combine them and make one dataset with two columns.
What I've tried gives me one column with all the values from the first dataframe. But the second column is al NaN's.
Please help.
I have tried .join & .merge & pd.concat & .add & ...
df_low_rename = df_low_sui.rename(index=str, columns={'suicides/100k pop': 'low_gdp'})
df_high_rename = df_high_sui.rename(index=str, columns={'suicides/100k pop': 'high_gdp'})
df_combined = df_low_rename.add(df_high_rename)
df_combined
Output
Pandas merge function works.
Dataset 1:
import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13]]
df1 = pd.DataFrame(data,columns=['Name','Age'])
print(df1)
output:
Name Age
0 Alex 10
1 Bob 12
2 Clarke 13
Dataset 2:
data2 = [['Alex','Science'],['Bob','Physics'],['Clarke','Social']]
df2 = pd.DataFrame(data2,columns=['Name','Courses'])
print(df2)
output:
Name Courses
0 Alex Science
1 Bob Physics
2 Clarke Social
merging the datasets:
final=pd.merge(df1,df2)
output:
Name Age Courses
0 Alex 10 Science
1 Bob 12 Physics
2 Clarke 13 Social
I believe a join will do it for you. Something like this:
df_low_rename.join(df_high_rename)
Try with concat on the column axis:
combined = pandas.concat([df_low_rename, df_high_rename], axis=1)
Both data sets didn't have the same indexes. I fixed it like this:
df_low_rename = df_low_rename.reset_index(drop=True)
df_high_rename = df_high_rename.reset_index(drop=True)
Then I used the join function:
df_combined = df_low_rename.join(df_high_rename)
df_combined
This way, I got the right output. Thanks to everyone who tried to help me and I apologize for this rookie mistake.
Related
This question already has answers here:
Merge two dataframes by index
(7 answers)
Closed 1 year ago.
I am working with an adult dataset where I split the dataframe to label encode categorical columns. Now I want to append the new dataframe with the original dataframe. What is the simplest way to perform the same?
Original Dataframe-
age
salary
32
3000
25
2300
After label encoding few columns
country
gender
1
1
4
2
I want to append the above dataframe and the final result should be the following.
age
salary
country
gender
32
3000
1
1
25
2300
4
2
Any insights are helpful.
lets consider two dataframe named as df1 and df2 hence,
df1.merge(df2,left_index=True, right_index=True)
You can use .join() if the datrframes rows are matched by index, as follows:
.join() is a left join by default and join by index by default.
df1.join(df2)
In addition to simple syntax, it has the extra advantage that when you put your master/original dataframe on the left, left join ensures that the dataframe indexes of the master are retained in the result.
Result:
age salary country gender
0 32 3000 1 1
1 25 2300 4 2
You maybe find your solution in checking pandas.concat.
import numpy as np
import pandas as pd
df1 = pd.DataFrame(np.array([[32,3000],[25,2300]]), columns=['age', 'salary'])
df2 = pd.DataFrame(np.array([[1,1],[4,2]]), columns=['country', 'gender'])
pd.concat([df1, df2], axis=1)
age salary country gender
0 32 25 1 1
1 3000 2300 4 2
Suppose I have the following df that I would like to reshape:
df6 = pd.DataFrame({'name':['Sara', 'John', 'Jack'],
'trip places': ['UK,UK,UK', 'US,US,US', 'AUS,AUS,AUS'],
'Trip code': ['UK322,UK454,UK4441', 'US664,US4544,US44', 'AUS11,AUS11,AUS11']
})
df6
Looks like:
name trip places Trip code
0 Sara UK,UK,UK UK322,UK454,UK4441
1 John US,US,US US664,US4544,US44
2 Jack AUS,AUS,AUS AUS11,AUS11,AUS11
I want to add a new column lets say df6['total-info'] and merge the current two columns trip places and Trip code in two rows per name, so the output will be like this:
name total-info
0 Sara UK,UK,UK
UK322,UK454,UK4441
1 John US,US,US
US664,US4544,US44
2 Jack AUS,AUS,AUS
AUS11,AUS11,AUS11
I tried to do so by many methods grouping/stack/unstack pivot .. etc but all what I tried does not generate the output I need and I am not completely familiar with the best function to do so. I also used concatenation but it generated one column and added all the two columns comma separated values altogether.
Use set_index, stack, droplevel then reset_index and specify the new column name:
df7 = (
df6
.set_index('name') # Preserve during reshaping
.stack() # Reshape
.droplevel(1) # Remove column names
.reset_index(name='total-info') # reset_index and name new column
)
df7:
name total-info
0 Sara UK,UK,UK
1 Sara UK322,UK454,UK4441
2 John US,US,US
3 John US664,US4544,US44
4 Jack AUS,AUS,AUS
5 Jack AUS11,AUS11,AUS11
Or if name is to be part of the multi-index append name and call to_frame
after stack and droplevel instead:
df7 = (
df6
.set_index('name', append=True) # Preserve during reshaping
.stack() # Reshape
.droplevel(2) # Remove column names
.to_frame(name='total-info') # Make DataFrame and name new column
)
total-info
name
0 Sara UK,UK,UK
Sara UK322,UK454,UK4441
1 John US,US,US
John US664,US4544,US44
2 Jack AUS,AUS,AUS
Jack AUS11,AUS11,AUS11
I would like to transpose a Pandas Dataframe from row to columns, where number of rows is dynamic. Then, transposed Dataframe must have dynamic number of columns also.
I succeeded using iterrows() and concat() methods, but I would like to optimize my code.
Please find my current code:
import pandas as pd
expected_results_transposed = pd.DataFrame()
for i, r in expected_results.iterrows():
t = pd.Series([r.get('B')], name=r.get('A'))
expected_results_transposed = pd.concat([expected_results_transposed, t], axis=1)
print("CURRENT CASE EXPECTED RESULTS TRANSPOSED:\n{0}\n".format(expected_results_transposed))
Please find an illustration of expected result :
picture of expected result
Do you have any solution to optimize my code using "standards" Pandas dataframes methods/options ?
Thank you for your help :)
Use DataFrame.transpose + DataFrame.set_index:
new_df=df.set_index('A').T.reset_index(drop=True)
new_df.columns.name=None
Example
df2=pd.DataFrame({'A':'Mike Ana Jon Jenny'.split(),'B':[1,2,3,4]})
print(df2)
A B
0 Mike 1
1 Ana 2
2 Jon 3
3 Jenny 4
new_df=df2.set_index('A').T.reset_index(drop=True)
new_df.columns.name=None
print(new_df)
Mike Ana Jon Jenny
0 1 2 3 4
I have two dataframes like the following:
df1
id name
-------------------------
0 43 c
1 23 t
2 38 j
3 9 s
df2
user id
--------------------------------------------------
0 222087 27,26
1 1343649 6,47,17
2 404134 18,12,23,22,27,43,38,20,35,1
3 1110200 9,23,2,20,26,47,37
I want to split all the ids in df2 into multiple rows and join the resultant dataframe to df1 on "id".
I do the following:
b = pd.DataFrame(df2['id'].str.split(',').tolist(), index=df2.user_id).stack()
b = b.reset_index()[[0, 'user_id']] # var1 variable is currently labeled 0
b.columns = ['Item_id', 'user_id']
When I try to merge, I get NaNs in the resultant dataframe.
pd.merge(b, df1, on = "id", how="left")
id user name
-------------------------------------
0 27 222087 NaN
1 26 222087 NaN
2 6 1343649 NaN
3 47 1343649 NaN
4 17 1343649 NaN
So, I tried doing the following:
b['name']=np.nan
for i in range(0, len(df1)):
b['name'][(b['id'] == df1['id'][i])] = df1['name'][i]
It still gives the same result as above. I am confused as to what could cause this because I am sure both of them should work!
Any help would be much appreciated!
I read similar posts on SO but none seemed to have a concrete answer. I am also not sure if this is not at all related to coding or not.
Thanks in advance!
Problem is you need convert column id in df2 to int, because output of string functions is always string, also if works with numeric.
df2.id = df2.id.astype(int)
Another solution is convert df1.id to string:
df1.id = df1.id.astype(str)
And get NaNs because no match - str values doesnt match with int values.
If I have a pandas Dataframe like such
and I want to transform it in a way that it results in
Is there a way to achieve this on the most correct way? a good pattern
Use a pivot table:
pd.pivot_table(df,index='name',columns=['property'],aggfunc=sum).fillna(0)
Output:
price
Property boat dog house
name
Bob 0 5 4
Josh 0 2 0
Sam 3 0 0
Sidenote: Pasting in your df's helps so people can use pd.read_clipboard instead of generating the df themselves.