I am having difficulties in merging 2 tables. In fact, I would like add a column from table B into table A based on one key
Table A (632 rows) contains the following columns:
part_number / part_designation / AC / AC_program
Table B (4,674 rows) contains the following columns:
part_ref / supplier_id / supplier_name / ac_program
I would like to add the supplier_name values into Table A
I have succeeded compiling a left joint based on the condition tableA.part_number == tableB.part_ref
However, when I look at the resulting Table, additional rows were created. I have now 683 rows instead of the initial 632 rows in Table A. How do I keep the same number of rows with including the supplier_name values in Table A? Below is presented a graph of my transformations:
Here is my code:
Table B seems to contain duplicates (part_ref). The join operation creates a new record in your original table for each duplicate in Table B
import pandas as pd
print(len(pd.unique(updated_ref_table.part_ref)))
print(updated_ref_table.shape[0])
Related
I have a dictionary of dataframes with keys that look like this. It's called frames1.
dict_keys(['TableA','TableB','TableC','TableD'])
I also have a 'master' dataframe that tells me how to join these dataframes.
Gold Table
Silver Table 1
Silver Table 2
Join Type
Left_Attr
Right_Attr
System
Table A
Table B
left
ID
applic_id
System
Table C
Table A
right
fam
famid
System
Table A
Table D
left
NameID
name
The "System" gold table is the combination of all 3 rows. In other words, I need to join Table A to Table B on the attributes listed and then use that output as my NEW Table A when I join Table C and Table A in row 2. Then I need to use that table to as my NEW Table A to join to Table D. This creates the final "System" Table.
What I've tried:
for i in range(len(master)):
System = pd.merge(frames1[master.iloc[i,1]],frames1[master.iloc[i,2]], how=master.iloc[i,3], on_left= master.iloc[i,4],on_right=master.iloc[i,5])
This only gets me one row which will then over write the other rows as it goes on. How would I go about creating a for loop to join these together?
I have the following tables:
Table A
listData = {'id':[1,2,3],'date':['06-05-2021','07-05-2021','17-05-2021']}
pd.DataFrame(listData,columns=['id','date'])
Table B
detailData = {'code':['D123','F268','A291','D123','F268','A291'],'id':['1','1','1','2','2','2'],'stock':[5,5,2,10,11,8]}
pd.DataFrame(detailData,columns=['code','id','stock'])
OUTPUT TABLE
output = {'code':['D123','F268','A291'],'06-05-2021':[5,5,2],'07-05-2021':[10,11,8]}
pd.DataFrame(output,columns=['code','06-05-2021','07-05-2021'])
Note: The code provided is hard coded code for the output. I need to generate the output table from Table A and Table B
Here is brief explanation of how the output table is generated if it is not self explanatory.
The id column needs to be cross reference from Table A to Table B and the dates should be put instead in Table B
Then all the unique dates in Table B should be made into columns and the corresponding stock values need to be shifted to then newly created date columns.
I am not sure where to start to do this. I am new to pandas and have only ever used it for simple data manipulation. If anyone can suggest me where to get started, it will be of great help.
Try:
tableA['id'] = tableA['id'].astype(str)
tableB.merge(tableA, on='id').pivot('code', 'date', 'stock')
Output:
date 06-05-2021 07-05-2021
code
A291 2 8
D123 5 10
F268 5 11
Details:
First, merge on id, this is like doing a SQL join. First, the
dtypes much match, hence using astype to str.
Next, reshape the dataframe using pivot to get code by date.
With PowerPivot in Exycel I am able to
a) create a data model that consists of several connected tables (see example below)
b) create a pivot table based on that model
For display, the pivot table does not use id (=integer) values but the corresponding string values as row/column headers.
With pandas I could
a) load and join those related tables
b) create a pivot table based on the joined table
pivot_table = pandas.pivot_table(
joined_table,
index=["scenario_name"], #entries to show as row headers
columns='param_name', #entries to show as column headers
values='value', #entries to aggregate and show as cells
aggfunc=numpy.sum, #aggregation function(s)
)
However, with huge tables, I would expect it to be more efficient, if the pivot_table operates
on the non-joined data table and applies the string values only for result display.
=> Is there a convenient way to consider foreign key relations when using
pandas DataFrame and pivot_table ?
I would expect something like
pivot_table = pandas.pivot_table(
{"data": data_table,
"scenario": scenario_table,
"param": param_table
},
index=["scenario:name"], #entries to show as row headers
columns="param:name", #entries to show as column headers
values="data:value", #entries to aggregate and show as cells
aggfunc=numpy.sum, #aggregation function(s)
)
=> If not, are there some alternatives libraries to pandas that could handle related tables as source for pivot tables?
Small example table structure:
table "data"
id scenario_id param_id value
1 1 1 100
2 1 2 200
table "scenario"
id name
1 reference
2 best_case
table "param"
id name
1 solar
2 wind
scenario_id of table data points on id of table scenario
param_id of table data points on id of table param
Another example with some more columns:
I have two tables in pandas. One is about 10,000+ rows that looks like this:
Table 1
col_1 date state ratio [50 more cols]
A 10/12 NY .5
A 12/05 MA NaN
.........
I have another table that's about 10 rows that looks like this:
Table 2
date state ratio
12/05 MA .9
12/03 MA .8
............
I need to set the ratio in table 1 based on the date and state values from table 2. The ideal solution would be to merge on date and state, but that creates two columns: ratio_x and ratio_y
I need a way to set the ratio in table 1 to the corresponding ratio in table 2 where the date and states both match. The ratios in table 1 can be overwritten.
If this can be done correctly by merging then that works too.
Edit: You can consider table 2 as being meant to map to specific state values (so all the states in table 2 are MA in this example)
You'll need to choose which ratio value to take first. Assuming you want ratios from table 2 to take precedence:
# join in ratio from the other table
table1 = table1.join(table2.set_index(["date", "state"])["ratio"].to_frame("ratio2"), on=["date", "state"])
# take ratio2 first, then the existing ratio value if ratio2 is null
table1["ratio"] = table1["ratio2"].fillna(table1["ratio"])
# delete the ratio2 column
del table1["ratio2"]
First create a mapping series from df2:
s = df2.set_index(['date', 'state'])['ratio']
Then feed to df1:
df1['ratio'] = df1.set_index(['date', 'state']).index.map(s.get)\
.fillna(df1['ratio'])
Precedence is given to ratios in df2.
I am new to python/pandas and have a fairly basic question.
I have 2 tables with numerous columns and "ID" as the primary key.
I want to create another table with conditions based on the 2 tables.
For example: Table A, Table B --> Table C
In SQL I would write something like this:
create table TableC as select
a.ID,
case when b.Field1=1000 and a.Field1=50 then 20 else 0 end as FieldA,
case when b.Field2=15 and a.Field2=100 then 100 else 0 end as FieldB
from TableA a, TableB b
where a.ID=b.ID
order by 1
I am struggling to put similar together Table C using python.
I have tried to make a function but I cant seem to include more than 1 table in the function nor create a new table based off multiple tables.
Any help will be much appreciated.
IIUC
TableC=TableA.merge(TableB,on='ID')
TableC['FieldA']=np.where((TableC.Field1_x==1000)& (TableC.Field1_y==50),20,0)
TableC['FieldB']=np.where((TableC.Field2_x==15)& (TableC.Field2_y==100),100,0)