Merge 2 dataframes on Days and Month - python

I have the following dataframes:
print(df1)
day month quantity Operation_type
21 6 6 2
24 6 4 2
...
print(df2)
day month quantity Operation_type
22 6 10 1
23 6 15 1
...
I would like to get the following dataset:
print(final_df)
day month quantity Operation_type
21 6 6 2
22 6 10 1
23 6 15 1
24 6 4 2
...
I tried using:
final_df = pd.merge(df1, df2, on=['day','month']) but it creates a huge dataset and does not seem to be working properly;
Furthermore, if day and month are the same, I would like to paste the line whose Operation_type == 2 before the one with ==1.
How can I solve this problem?

To combine the DataFrames into one, you don't want merge, you want pd.concat. To get the ordering properly, just use DataFrame.sort_values
pd.concat([df1, df2]).sort_values(by=['day', 'month', 'Operation_type'],
ascending=[True, True, False])

You can perform an outer merge to achieve this result.
res = pd.merge(df1, df2, how='outer').sort_values('day')
# day month quantity Operation_type
# 0 21 6 6 2
# 2 22 6 10 1
# 3 23 6 15 1
# 1 24 6 4 2

Related

Python Monthly Change Calculation (Pandas)

Here is data
id
date
population
1
2021-5
21
2
2021-5
22
3
2021-5
23
4
2021-5
24
1
2021-4
17
2
2021-4
24
3
2021-4
18
4
2021-4
29
1
2021-3
20
2
2021-3
29
3
2021-3
17
4
2021-3
22
I want to calculate the monthly change regarding population in each id. so result will be:
id
date
delta
1
5
.2353
1
4
-.15
2
5
-.1519
2
4
-.2083
3
5
.2174
3
4
.0556
4
5
-.2083
4
4
.3182
delta := (this month - last month) / last month
How to approach this in pandas? I'm thinking of groupby but don't know what to do next
remember there might be more dates. but results is always
Use GroupBy.pct_change with sorting columns first before, last remove misisng rows by column delta:
df['date'] = pd.to_datetime(df['date'])
df = df.sort_values(['id','date'], ascending=[True, False])
df['delta'] = df.groupby('id')['population'].pct_change(-1)
df = df.dropna(subset=['delta'])
print (df)
id date population delta
0 1 2021-05-01 21 0.235294
4 1 2021-04-01 17 -0.150000
1 2 2021-05-01 22 -0.083333
5 2 2021-04-01 24 -0.172414
2 3 2021-05-01 23 0.277778
6 3 2021-04-01 18 0.058824
3 4 2021-05-01 24 -0.172414
7 4 2021-04-01 29 0.318182
Try this:
df.groupby('id')['population'].rolling(2).apply(lambda x: (x.iloc[0] - x.iloc[1]) / x.iloc[0]).dropna()
maybe you could try something like:
data['delta'] = data['population'].diff()
data['delta'] /= data['population']
with this approach the first line would be NaNs, but for the rest, this should work.

Sort part of DataFrame in Python Panda, return new column with order depending on row values

My first question here, I hope this is understandable.
I have a Panda DataFrame:
order_numbers
x_closest_autobahn
0
34
3
1
11
3
2
5
3
3
8
12
4
2
12
I would like to get a new column with the order_number per closest_autobahn in ascending order:
order_numbers
x_closest_autobahn
order_number_autobahn_x
2
5
3
1
1
11
3
2
0
34
3
3
4
2
12
1
3
8
12
2
I have tried:
df['order_number_autobahn_x'] = ([df.loc[(df['x_closest_autobahn'] == 3)]].sort_values(by=['order_numbers'], ascending=True, inplace=True))
I have looked at slicing, sort_values and reset_index
df.sort_values(by=['order_numbers'], ascending=True, inplace=True)
df = df.reset_index() # reset index to the order after sort
df['order_numbers_index'] = df.index
but I can't seem to get the DataFrame I am looking for.
Use DataFrame.sort_values by both columns and for counter use GroupBy.cumcount:
df = df.sort_values(['x_closest_autobahn','order_numbers'])
df['order_number_autobahn_x'] = df.groupby('x_closest_autobahn').cumcount().add(1)
print (df)
order_numbers x_closest_autobahn order_number_autobahn_x
2 5 3 1
1 11 3 2
0 34 3 3
4 2 12 1
3 8 12 2

Merge two Series according to their index

After a long time of googling and not finding a solution to my, probably often asked, problem.
I have two Dataframes:
DF1: DF2:
val val
index index
1 3 2 5
3 10 4 15
5 20 7 35
6 30 8 40
and need an output like this:
DF_out:
val
index
1 3
2 5
3 10
4 15
5 20
6 30
7 35
8 40
DF1 and DF2 should be combined and sorted according to ther indices.
Side notes:
DF1 and DF2 never have the same index twice
The values of the dataframes are always sequel
I would very much appreciate your help!
Use concat with Series.sort_index:
df = pd.concat([DF1, DF2]).sort_index()
print (df)
val
index
1 3
2 5
3 10
4 15
5 20
6 30
7 35
8 40

How to write the fucntion that transfrom the columns of my dataframe to a single column?

I have a dataframe like this:
A = ID Material1 Materia2 Material3
14 0 0 0
24 1 0 0
12 1 1 0
25 0 0 2
I want to have all information in one column like this:
A = ID Materials
14 Nan
24 Material1
12 Material1
12 Material2
25 Material3
25 Material3
can anyone help write a function please !
Use DataFrame.melt with repeat rows by counts with Index.repeat and DataFrame.loc:
df1 = df.melt('ID', var_name='Materials')
df1 = df1.loc[df1.index.repeat(df1['value'])].drop('value', axis=1).reset_index(drop=True)
print (df1)
ID Materials
0 24 Material1
1 12 Material1
2 12 Materia2
3 25 Material3
4 25 Material3
EDIT: For add only 0 Materials with missing values use DataFrame.merge with left join by original df['ID'] in one column DataFrame withoiut duplications by DataFrame.drop_duplicates:
df1 = df.melt('ID', var_name='Materials')
df0 = df[['ID']].drop_duplicates()
print (df0)
ID
0 14
1 24
2 12
3 25
df2 = df1.loc[df1.index.repeat(df1['value'])].drop('value', axis=1).reset_index(drop=True)
df2 = df0.merge(df2, on='ID', how='left')
print (df2)
ID Materials
0 14 NaN
1 24 Material1
2 12 Material1
3 12 Materia2
4 25 Material3
5 25 Material3

Concatenating two Pandas DataFrames while maintaining index order

Basic question - I am trying to concatenate two DataFrames, with the resulting DataFrame preserving the index in order of the original two. For example:
df = pd.DataFrame({'Houses':[10,20,30,40,50], 'Cities':[3,4,7,6,1]}, index = [1,2,4,6,8])
df2 = pd.DataFrame({'Houses':[15,25,35,45,55], 'Cities':[1,8,11,14,4]}, index = [0,3,5,7,9])
Using pd.concat([df, df2]) simply appends df2 to the end of df1. I am trying to instead concatenate them to produce correct index order (0 through 9).
Use concat with parameter sort for avoid warning and then DataFrame.sort_index:
df = pd.concat([df, df2], sort=False).sort_index()
print(df)
Cities Houses
0 1 15
1 3 10
2 4 20
3 8 25
4 7 30
5 11 35
6 6 40
7 14 45
8 1 50
9 4 55
Try using:
print(df.T.join(df2.T).T.sort_index())
Output:
Cities Houses
0 1 15
1 3 10
2 4 20
3 8 25
4 7 30
5 11 35
6 6 40
7 14 45
8 1 50
9 4 55

Categories

Resources