Pandas arthmetic between two different sized dataframes given common columns - python

DF 1
| ColA | Colb | Stock | Date |
| -------- | -------------- | -------- | ---------- |
| A | 1 | 3 | 2022-26-12 |
| B | 2 | 3 | 2022-26-12 |
| C | 3 | 3 | 2022-26-12 |
DF 2
| ColA | Colb | Sales | Date |
| -------- | -------------- | -------- | ---------- |
| A | 1 | 1 | 2022-26-12 |
| B | 2 | 1 | 2022-26-12 |
| C | 3 | 1 | 2022-26-12 |
Given any number of columns to join on, how do you do Dataframe arithmetic in pandas, for instance if I wanted to subtract the above two Dataframes to get something like this
STOCK AT END OF THE DAY
| ColA | Colb | Stock | Date |
| -------- | -------------- | -------- | ---------- |
| A | 1 | 2 | 2022-26-12 |
| B | 2 | 2 | 2022-26-12 |
| C | 3 | 2 | 2022-26-12 |
So stock - sales given all the common columns, in this case
Edit:
The equivalent SQL code to my problem is:
SELECT
DF1.ColA,
DF1.Colb,
DF1.Date,
DF1.Stock - coalesce(DF2.Sales, 0)
FROM
DF1
LEFT JOIN DF2
on
DF1.ColA = DF2.ColA and
DF1.Colb = DF2.Colb and
DF1.Date = DF2.Date

If they have the same number of rows and columns then do something like that:
df3 = df1[['ColA', 'Colb','Date']]
df3['Stock'] = df1.Stock - df2.Sales
However, if they are different merge them then do what you want:
df3= pd.merge(df1, df2, on='ColA', how='inner')
df3['Stock'] = df3.Stock - df3.Sales
In your case, based on your edited question:
df3 = pd.merge(df1, df2, how='left', left_on=['ColA','Colb','Date'], right_on = ['ColA','Colb','Date'])
#rename the columns as you want
df3.columns=['col1','col2']
#only select columns you want
df3=df3[['col1','col2']]
#then do your subtraction
df3['Stock'] = df3.col1 - df3.col2

Related

Map Id's in one dataframe with the corresponding names in another dataframe [duplicate]

This question already has answers here:
Pandas Merging 101
(8 answers)
Closed last year.
I have below 2 dataframes:
df_1:
| | assign_to_id |
| | ------------ |
| 0 | 1, 2 |
| 1 | 2 |
| 2 | 3,4,5 |
df_2:
| | id | name |
| | ------------| -----------|
| 0 | 1 | John |
| 1 | 2 | Adam |
| 2 | 3 | Max |
| 3 | 4 | Martha |
| 4 | 5 | Robert |
I want to map the Id's in the df_1 to the names in df_2 by matching their id's
final_df:
| | assign_to_name |
| | ----------------- |
| 0 | John, Adam |
| 1 | Adam |
| 2 | Max,Martha,Robert |
I don't know how to achieve this. Looking forward to some help.
Idea is mapping column splitted by , by dictionary and then join back by ,:
d = df_2.assign(id = df_2['id'].astype(str)).set_index('id')['name'].to_dict()
f = lambda x: ','.join(d[y] for y in x.split(',') if y in d)
df_1['assign_to_name'] = df_1['assign_to_id'].replace('\s+', '', regex=True).apply(f)
print (df_1)
assign_to_id assign_to_name
0 1, 2 John,Adam
1 2 Adam
2 3,4,5 Max,Martha,Robert

Pandas merge and keep only non-matching records [duplicate]

This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 2 years ago.
How can I merge/join these two dataframes ONLY on "id". Produce 3 new dataframes:
1)R1 = Merged records
2)R2 = (DF1 - Merged records)
3)R3 = (DF2 - Merged records)
Using pandas in Python.
First dataframe (DF1)
| id | name |
|-----------|-------|
| 1 | Mark |
| 2 | Dart |
| 3 | Julia |
| 4 | Oolia |
| 5 | Talia |
Second dataframe (DF2)
| id | salary |
|-----------|--------|
| 1 | 20 |
| 2 | 30 |
| 3 | 40 |
| 4 | 50 |
| 6 | 33 |
| 7 | 23 |
| 8 | 24 |
| 9 | 28 |
My solution for
R1 =pd.merge(DF1, DF2, on='id', how='inner')
I am unsure that is the easiest way to get R2 and R3
R2 should look like
| id | name |
|-----------|-------|
| 5 | Talia |
R3 should look like:
| id | salary |
|-----------|--------|
| 6 | 33 |
| 7 | 23 |
| 8 | 24 |
| 9 | 28 |
You can turn on indicator in merge and look for the corresponding values:
total_merge = df1.merge(df2, on='id', how='outer', indicator=True)
R1 = total_merge[total_merge['_merge']=='both']
R2 = total_merge[total_merge['_merge']=='left_only']
R3 = total_merge[total_merge['_merge']=='right_only']
Update: Ben's suggestion would be something like this:
dfs = {k:v for k,v in total_merge.groupby('_merge')}
and then you can do, for examples:
dfs['both']

Intersect two dataframes in Pandas with respect to first dataframe?

I want to intersect two Pandas dataframes (1 and 2) based on two columns (A and B) present in both dataframes. However, I would like to return a dataframe that only has data with respect to the data in the first dataframe, omitting anything that is not found in the second dataframe.
So for example:
Dataframe 1:
A | B | Extra | Columns | In | 1 |
----------------------------------
1 | 2 | Extra | Columns | In | 1 |
1 | 3 | Extra | Columns | In | 1 |
1 | 5 | Extra | Columns | In | 1 |
Dataframe 2:
A | B | Extra | Columns | In | 2 |
----------------------------------
1 | 3 | Extra | Columns | In | 2 |
1 | 4 | Extra | Columns | In | 2 |
1 | 5 | Extra | Columns | In | 2 |
should return:
A | B | Extra | Columns | In | 1 |
----------------------------------
1 | 3 | Extra | Columns | In | 1 |
1 | 5 | Extra | Columns | In | 1 |
Is there a way I can do this simply?
You can use df.merge:
df = df1.merge(df2, on=['A','B'], how='inner').drop('2', axis=1)
how='inner' is default. Just put it there for your understanding of how df.merge works.
As #piRSquared suggested, you can do:
df1.merge(df2[['A', 'B']], how='inner')

Dataframe add column: with count of rows where condition

I have got a dataframe (df) in python with 2 columns: ID and Date.
| ID | Date |
| ------------- |:-------------:|
| 1 | 06-14-2019 |
| 1 | 06-10-2019 |
| 2 | 06-16-2019 |
| 3 | 06-12-2019 |
| 3 | 06-12-2019 |
I'm trying to add a column to the dataframe which contains the count of rows where ID matches ID of the current row and Date <= Date of the current row.
Like the following:
| ID | Date | Count |
| ------------- |:-------------:|:-------------:|
| 1 | 06-14-2019 | 2 |
| 1 | 06-10-2019 | 1 |
| 2 | 06-16-2019 | 1 |
| 3 | 06-12-2019 | 2 |
| 3 | 06-12-2019 | 2 |
I have tried something like:
grouped = df.groupby(['ID'])
df['count'] = df.apply(lambda row: grouped.get_group[row['ID']][grouped.get_group(row['ID'])['Date'] < row['Date']]['ID'].size, axis=1)
This results in the following error:
TypeError: ("'method' object is not subscriptable", 'occurred at index 0')
Suggestions are welcome
I forgot to mention:
My real dataframe contains almost 4 million rows, so i'm looking for a smart and fast solution that won't take to long to run
Using df.iterrows():
df['Count'] = None
for idx, value in df.iterrows():
df.iloc[idx, -1 ] = len(df[(df.ID == value[0]) & (df.Date <= value[1])].index)
Output:
+---+----+------------+-------+
| | ID | Date | Count |
+---+----+------------+-------+
| 0 | 1 | 06-14-2019 | 2 |
| 1 | 1 | 06-10-2019 | 1 |
| 2 | 2 | 06-16-2019 | 1 |
| 3 | 3 | 06-12-2019 | 2 |
| 4 | 3 | 06-12-2019 | 2 |
+---+----+------------+-------+

How to merge rows with same string, but sum up the rows connected

I have the following DataFrame
| name | number |
|------|--------|
| a | 1 |
| a | 1 |
| a | 1 |
| b | 2 |
| b | 2 |
| b | 2 |
| c | 3 |
| c | 3 |
| c | 3 |
| d | 4 |
| d | 4 |
| d | 4 |
I wish to merge all the rows by string, but have their number value added up and kept in line with the name..
Output desired..
| name | number |
|------|--------|
| a | 3 |
| b | 6 |
| c | 9 |
| d | 12 |
It seems you need groupby and aggregate sum:
df = df.groupby('name', as_index=False)['number'].sum()
#or
#df = df.groupby('name')['number'].sum().reset_index()
Assuming DataFrame is your table name
Select name, SUM(number) [number] FROM DataFrame GROUP BY name
Insert the result after deleting the original rows

Categories

Resources