Get data from another data frame in python - python

My data frame df1:
ID Date
0 90 02/01/2021
1 101 01/01/2021
2 30 12/01/2021
My data frame df2:
ID City 01/01/2021 02/01/2021 12/01/2021
0 90 A 20 14 22
1 101 B 15 10 5
2 30 C 12 9 13
I need to create a column in df1 'New'. It should contain data from df2 with respect to 'ID' and 'Date' of df1. I am finding difficult in merging data. How could I do it?

Use DataFrame.melt with DataFrame.merge:
df22 = df2.drop('City', 1).melt(['ID'], var_name='Date', value_name='Val')
df = df1.merge(df22, how='left')
print (df)
ID Date Val
0 90 02/01/2021 14
1 101 01/01/2021 15
2 30 12/01/2021 13

You can melt and merge:
df1.merge(df2.melt(id_vars=['ID', 'City'], var_name='Date'), on=['ID', 'Date'])
output:
ID Date City value
0 90 02/01/2021 A 14
1 101 01/01/2021 B 15
2 30 12/01/2021 C 13
Alternative:
df1.merge(df2.melt(id_vars='ID',
value_vars=df2.filter(regex='/'),
var_name='Date'),
on=['ID', 'Date'])
output:
ID Date value
0 90 02/01/2021 14
1 101 01/01/2021 15
2 30 12/01/2021 13

Related

Python: Iterate Over Year and Month in DatetimeIndex

I have two DataFrames:
df1:
A B
Date
01/01/2020 2 4
02/01/2020 6 8
df2:
A B
Date
01/01/2020 5 10
I want to get the following:
df3:
A B
Date
01/01/2020 10 40
02/01/2020 30 80
What I want is to multiply the column entries based on year and month in DatetimeIndex. But I'm not sure how to iterate over datetime.
use to_numpy():
df3=pd.DataFrame(df1.to_numpy()*df2.to_numpy(),index=df1.index,columns=df1.columns)
output of df3:
A B
Date
01/01/2020 10 40
02/01/2020 30 80
You may need reindex
df1.index = pd.to_datetime(df1.index,dayfirst=True)
df2.index = pd.to_datetime(df2.index,dayfirst=True)
df2.index = df2.index.strftime('%Y-%m')
df1[:] *= df2.reindex(df1.index.strftime('%Y-%m')).values
df1
Out[529]:
A B
Date
2020-01-01 10 40
2020-01-02 30 80

How to Transpose dataframe column when duplicate entries exist in python?

I am having difficulty in transposing a certain column in python.
I have the following df
ID Value Date
1 15 2019/01/01
1 13 2019/02/01
1 17 2019/03/01
2 16 2019/01/01
2 14 2019/02/01
2 15 2019/03/01
I want to create a df such that the duplicates from ID column are removed and the Values get transposed
ID Value_01 Value_02 Value_03
1 15 13 17
2 16 14 15
use cumcount with groupby to make your columns, then crosstab
df1 = df.assign(key=df.groupby('ID').cumcount() + 1)
df2 = pd.crosstab(df1["ID"], df1["key"], df1["Value"], aggfunc='first').add_prefix(
"Value_"
).reset_index().rename_axis(None, axis=1)
print(df2)
ID Value_1 Value_2 Value_3
0 1 15 13 17
1 2 16 14 15

How to write the fucntion that transfrom the columns of my dataframe to a single column?

I have a dataframe like this:
A = ID Material1 Materia2 Material3
14 0 0 0
24 1 0 0
12 1 1 0
25 0 0 2
I want to have all information in one column like this:
A = ID Materials
14 Nan
24 Material1
12 Material1
12 Material2
25 Material3
25 Material3
can anyone help write a function please !
Use DataFrame.melt with repeat rows by counts with Index.repeat and DataFrame.loc:
df1 = df.melt('ID', var_name='Materials')
df1 = df1.loc[df1.index.repeat(df1['value'])].drop('value', axis=1).reset_index(drop=True)
print (df1)
ID Materials
0 24 Material1
1 12 Material1
2 12 Materia2
3 25 Material3
4 25 Material3
EDIT: For add only 0 Materials with missing values use DataFrame.merge with left join by original df['ID'] in one column DataFrame withoiut duplications by DataFrame.drop_duplicates:
df1 = df.melt('ID', var_name='Materials')
df0 = df[['ID']].drop_duplicates()
print (df0)
ID
0 14
1 24
2 12
3 25
df2 = df1.loc[df1.index.repeat(df1['value'])].drop('value', axis=1).reset_index(drop=True)
df2 = df0.merge(df2, on='ID', how='left')
print (df2)
ID Materials
0 14 NaN
1 24 Material1
2 12 Material1
3 12 Materia2
4 25 Material3
5 25 Material3

Pandas iterate two dataframes

i have two df, in one i have the list of several ids and in the other the name of the person and the id.
I want to loop them that when the id in df1 equals the id df2, he takes the name in df2 and create in df1.
I tried to adapt this code with wuzzy that I found, but didn't create.
for key,row in df.iterrows():
choices = str(list(df2.NAME_ID.unique()))
names = process.extract(str(row['P1_ID']), choices, limit=2)[0][0]
name = df2[df2['NAME_ID'] == names]['NAME']
if not name.empty:
df.loc[key,'Name'] = name
import pandas as pd
df = pd.read_clipboard(sep='\s\s+')
GAME_DATE_EST GAME_ID GAME_STATUS_TEXT P1_ID P2_ID SEASON P1_ID PTS_P1
0 2020-01-01 21900504 Final 1610612764 1610612753 2019 1610612764 10
1 2020-01-01 21900505 Final 1610612752 1610612757 2019 1610612752 9
2 2020-01-01 21900506 Final 1610612749 1610612750 2019 1610612749 10
3 2020-01-01 21900507 Final 1610612747 1610612756 2019 1610612747 8
4 2019-12-31 21900497 Final 1610612766 1610612738 2019 1610612766 9
df2
NAME_ID STANDINGSDATE NAME G W L W_PCT
0 1610612747 2020-01-01 Math 34 27 7 0.79
1 1610612743 2020-01-01 John 33 23 10 0.70
2 1610612746 2020-01-01 Elias 35 24 11 0.69
3 1610612745 2020-01-01 Alexander 34 23 11 0.68
4 1610612742 2020-01-01 Michael 33 21 12 0.64
I hope you understand and can help me
For that, you can do a simple join:
newdf = df.join(df2, on='NAME_ID', how='left')
Based on your given data, you can try:
df.merge(df2[['NAME_ID','NAME']], left_on=['P1_ID'], right_on=['NAME_ID'], how='left')

Convert the data frame from long to wide format and dynamically name columns

I am converting the data frame from long to wide format, however the problem I am facing is generating the right number of translated columns and dynamically renaming the new data frame columns.
So lets say I have a sample data frame as follows:
data = {'name':['Tom', 'nick', 'Tom', 'nick','Tom'], 'id':[20, 21, 20, 21,22], 'plan' : [100,101,102,101,100], 'drug' : ['a','b','b','c','a']}
df = pd.DataFrame(data)
drug id name plan
a 20 Tom 100
b 21 nick 101
b 20 Tom 102
c 21 nick 101
a 22 Tom 100
So for every given name and id I want to create multiple columns for plan and drugs. For example there are 3 distinct plans and 3 distinct drugs , so ideally I should get 6 new columns which indicate whether a particular plan/drug has been taken or not.
I tried converting from long to wide but I am not getting the desired result.
Convert long to wide:
df1 = df.groupby(['name','id'])['plan', 'drug'].apply(lambda x: pd.DataFrame(x.values)).unstack().reset_index()
Actual output:
name id 0 1 0 1
Tom 20 100 102 a b
nick 21 101 101 b c
Tom 22 100 None a None
Expected output:
name age 100 101 102 a b c
Tom 20 1 0 1 1 1 0
Tom 22 1 0 0 1 0 0
nick 21 0 1 0 0 1 1
Use get_dummies with max:
df1 = pd.get_dummies(df.set_index(['name','id']).astype(str)).max(level=[0,1]).reset_index()
print(df1)
name id plan_100 plan_101 plan_102 drug_a drug_b drug_c
0 Tom 20 1 0 1 1 1 0
1 nick 21 0 1 0 0 1 1
2 Tom 22 1 0 0 1 0 0
df2 = (pd.get_dummies(df.set_index(['name','id'])
.astype(str), prefix='', prefix_sep='')
.max(level=[0,1])
.reset_index())
print(df2)
name id 100 101 102 a b c
0 Tom 20 1 0 1 1 1 0
1 nick 21 0 1 0 0 1 1
2 Tom 22 1 0 0 1 0 0
EDIT: Solution with DataFrame.pivot_table, concat and DataFrame.clip:
df1 = df.pivot_table(index=['name','id'],
columns=['plan'],
aggfunc='size',
fill_value=0)
df2 = df.pivot_table(index=['name','id'],
columns=['drug'],
aggfunc='size',
fill_value=0)
df = pd.concat([df1, df2], axis=1).clip(upper=1).reset_index()
print(df)
name id 100 101 102 a b c
0 Tom 20 1 0 1 1 1 0
1 Tom 22 1 0 0 1 0 0
2 nick 21 0 1 0 0 1 1
import pandas as pd
data = {
'name':['Tom', 'nick', 'Tom', 'nick','Tom'],
'id':[20, 21, 20, 21,22],
'plan': [100,101,102,101,100],
'drug': ['a','b','b','c','a']
}
df = pd.DataFrame(data)
plans = df.groupby(['name', 'id', 'plan']).size().unstack()
drugs = df.groupby(['name', 'id', 'drug']).size().unstack()
merged_df = pd.merge(plans, drugs, left_index=True, right_index=True)
merged_df = merged_df.fillna(0)
get the plan and drug counts for each name and id. (that's what's size() and then unstack() is for)
and then just merge them on their index (which is set to name and id).
use fillna to replace NaN to 0

Categories

Resources