How to add values from one column to another table using join? - python

I am having difficulties in merging 2 tables. In fact, I would like add a column from table B into table A based on one key
Table A (632 rows) contains the following columns:
part_number / part_designation / AC / AC_program
Table B (4,674 rows) contains the following columns:
part_ref / supplier_id / supplier_name / ac_program
I would like to add the supplier_name values into Table A
I have succeeded compiling a left joint based on the condition tableA.part_number == tableB.part_ref
However, when I look at the resulting Table, additional rows were created. I have now 683 rows instead of the initial 632 rows in Table A. How do I keep the same number of rows with including the supplier_name values in Table A? Below is presented a graph of my transformations:
Here is my code:

Table B seems to contain duplicates (part_ref). The join operation creates a new record in your original table for each duplicate in Table B
import pandas as pd
print(len(pd.unique(updated_ref_table.part_ref)))
print(updated_ref_table.shape[0])

Related

Joining portions of a python dictionary using a reference dataframe

I have a dictionary of dataframes with keys that look like this. It's called frames1.
dict_keys(['TableA','TableB','TableC','TableD'])
I also have a 'master' dataframe that tells me how to join these dataframes.
Gold Table
Silver Table 1
Silver Table 2
Join Type
Left_Attr
Right_Attr
System
Table A
Table B
left
ID
applic_id
System
Table C
Table A
right
fam
famid
System
Table A
Table D
left
NameID
name
The "System" gold table is the combination of all 3 rows. In other words, I need to join Table A to Table B on the attributes listed and then use that output as my NEW Table A when I join Table C and Table A in row 2. Then I need to use that table to as my NEW Table A to join to Table D. This creates the final "System" Table.
What I've tried:
for i in range(len(master)):
System = pd.merge(frames1[master.iloc[i,1]],frames1[master.iloc[i,2]], how=master.iloc[i,3], on_left= master.iloc[i,4],on_right=master.iloc[i,5])
This only gets me one row which will then over write the other rows as it goes on. How would I go about creating a for loop to join these together?

Pandas convert data from two tables into third table. Cross Referencing and converting unique rows to columns

I have the following tables:
Table A
listData = {'id':[1,2,3],'date':['06-05-2021','07-05-2021','17-05-2021']}
pd.DataFrame(listData,columns=['id','date'])
Table B
detailData = {'code':['D123','F268','A291','D123','F268','A291'],'id':['1','1','1','2','2','2'],'stock':[5,5,2,10,11,8]}
pd.DataFrame(detailData,columns=['code','id','stock'])
OUTPUT TABLE
output = {'code':['D123','F268','A291'],'06-05-2021':[5,5,2],'07-05-2021':[10,11,8]}
pd.DataFrame(output,columns=['code','06-05-2021','07-05-2021'])
Note: The code provided is hard coded code for the output. I need to generate the output table from Table A and Table B
Here is brief explanation of how the output table is generated if it is not self explanatory.
The id column needs to be cross reference from Table A to Table B and the dates should be put instead in Table B
Then all the unique dates in Table B should be made into columns and the corresponding stock values need to be shifted to then newly created date columns.
I am not sure where to start to do this. I am new to pandas and have only ever used it for simple data manipulation. If anyone can suggest me where to get started, it will be of great help.
Try:
tableA['id'] = tableA['id'].astype(str)
tableB.merge(tableA, on='id').pivot('code', 'date', 'stock')
Output:
date 06-05-2021 07-05-2021
code
A291 2 8
D123 5 10
F268 5 11
Details:
First, merge on id, this is like doing a SQL join. First, the
dtypes much match, hence using astype to str.
Next, reshape the dataframe using pivot to get code by date.

How to consider foreign key relations in (pandas) pivot_tables?

With PowerPivot in Exycel I am able to
a) create a data model that consists of several connected tables (see example below)
b) create a pivot table based on that model
For display, the pivot table does not use id (=integer) values but the corresponding string values as row/column headers.
With pandas I could
a) load and join those related tables
b) create a pivot table based on the joined table
pivot_table = pandas.pivot_table(
joined_table,
index=["scenario_name"], #entries to show as row headers
columns='param_name', #entries to show as column headers
values='value', #entries to aggregate and show as cells
aggfunc=numpy.sum, #aggregation function(s)
)
However, with huge tables, I would expect it to be more efficient, if the pivot_table operates
on the non-joined data table and applies the string values only for result display.
=> Is there a convenient way to consider foreign key relations when using
pandas DataFrame and pivot_table ?
I would expect something like
pivot_table = pandas.pivot_table(
{"data": data_table,
"scenario": scenario_table,
"param": param_table
},
index=["scenario:name"], #entries to show as row headers
columns="param:name", #entries to show as column headers
values="data:value", #entries to aggregate and show as cells
aggfunc=numpy.sum, #aggregation function(s)
)
=> If not, are there some alternatives libraries to pandas that could handle related tables as source for pivot tables?
Small example table structure:
table "data"
id scenario_id param_id value
1 1 1 100
2 1 2 200
table "scenario"
id name
1 reference
2 best_case
table "param"
id name
1 solar
2 wind
scenario_id of table data points on id of table scenario
param_id of table data points on id of table param
Another example with some more columns:

Pandas -- set row values based on values in another table

I have two tables in pandas. One is about 10,000+ rows that looks like this:
Table 1
col_1 date state ratio [50 more cols]
A 10/12 NY .5
A 12/05 MA NaN
.........
I have another table that's about 10 rows that looks like this:
Table 2
date state ratio
12/05 MA .9
12/03 MA .8
............
I need to set the ratio in table 1 based on the date and state values from table 2. The ideal solution would be to merge on date and state, but that creates two columns: ratio_x and ratio_y
I need a way to set the ratio in table 1 to the corresponding ratio in table 2 where the date and states both match. The ratios in table 1 can be overwritten.
If this can be done correctly by merging then that works too.
Edit: You can consider table 2 as being meant to map to specific state values (so all the states in table 2 are MA in this example)
You'll need to choose which ratio value to take first. Assuming you want ratios from table 2 to take precedence:
# join in ratio from the other table
table1 = table1.join(table2.set_index(["date", "state"])["ratio"].to_frame("ratio2"), on=["date", "state"])
# take ratio2 first, then the existing ratio value if ratio2 is null
table1["ratio"] = table1["ratio2"].fillna(table1["ratio"])
# delete the ratio2 column
del table1["ratio2"]
First create a mapping series from df2:
s = df2.set_index(['date', 'state'])['ratio']
Then feed to df1:
df1['ratio'] = df1.set_index(['date', 'state']).index.map(s.get)\
.fillna(df1['ratio'])
Precedence is given to ratios in df2.

Create or populate a table in python based on other tables

I am new to python/pandas and have a fairly basic question.
I have 2 tables with numerous columns and "ID" as the primary key.
I want to create another table with conditions based on the 2 tables.
For example: Table A, Table B --> Table C
In SQL I would write something like this:
create table TableC as select
a.ID,
case when b.Field1=1000 and a.Field1=50 then 20 else 0 end as FieldA,
case when b.Field2=15 and a.Field2=100 then 100 else 0 end as FieldB
from TableA a, TableB b
where a.ID=b.ID
order by 1
I am struggling to put similar together Table C using python.
I have tried to make a function but I cant seem to include more than 1 table in the function nor create a new table based off multiple tables.
Any help will be much appreciated.
IIUC
TableC=TableA.merge(TableB,on='ID')
TableC['FieldA']=np.where((TableC.Field1_x==1000)& (TableC.Field1_y==50),20,0)
TableC['FieldB']=np.where((TableC.Field2_x==15)& (TableC.Field2_y==100),100,0)

Categories

Resources