Joining portions of a python dictionary using a reference dataframe - python

I have a dictionary of dataframes with keys that look like this. It's called frames1.
dict_keys(['TableA','TableB','TableC','TableD'])
I also have a 'master' dataframe that tells me how to join these dataframes.
Gold Table
Silver Table 1
Silver Table 2
Join Type
Left_Attr
Right_Attr
System
Table A
Table B
left
ID
applic_id
System
Table C
Table A
right
fam
famid
System
Table A
Table D
left
NameID
name
The "System" gold table is the combination of all 3 rows. In other words, I need to join Table A to Table B on the attributes listed and then use that output as my NEW Table A when I join Table C and Table A in row 2. Then I need to use that table to as my NEW Table A to join to Table D. This creates the final "System" Table.
What I've tried:
for i in range(len(master)):
System = pd.merge(frames1[master.iloc[i,1]],frames1[master.iloc[i,2]], how=master.iloc[i,3], on_left= master.iloc[i,4],on_right=master.iloc[i,5])
This only gets me one row which will then over write the other rows as it goes on. How would I go about creating a for loop to join these together?

Related

Checking if a string is in a column from another table

I have two dataframes. Table A => is about location and Table B => is about sold products. I need to bring the location ID into the table B. The criteria is: if tableB[col1] is in tableA[col1] and tableB[col2] is in tableA[col2] and so on. Then, bring the location ID into table B.
I couldn't go foward on it.

Pandas convert data from two tables into third table. Cross Referencing and converting unique rows to columns

I have the following tables:
Table A
listData = {'id':[1,2,3],'date':['06-05-2021','07-05-2021','17-05-2021']}
pd.DataFrame(listData,columns=['id','date'])
Table B
detailData = {'code':['D123','F268','A291','D123','F268','A291'],'id':['1','1','1','2','2','2'],'stock':[5,5,2,10,11,8]}
pd.DataFrame(detailData,columns=['code','id','stock'])
OUTPUT TABLE
output = {'code':['D123','F268','A291'],'06-05-2021':[5,5,2],'07-05-2021':[10,11,8]}
pd.DataFrame(output,columns=['code','06-05-2021','07-05-2021'])
Note: The code provided is hard coded code for the output. I need to generate the output table from Table A and Table B
Here is brief explanation of how the output table is generated if it is not self explanatory.
The id column needs to be cross reference from Table A to Table B and the dates should be put instead in Table B
Then all the unique dates in Table B should be made into columns and the corresponding stock values need to be shifted to then newly created date columns.
I am not sure where to start to do this. I am new to pandas and have only ever used it for simple data manipulation. If anyone can suggest me where to get started, it will be of great help.
Try:
tableA['id'] = tableA['id'].astype(str)
tableB.merge(tableA, on='id').pivot('code', 'date', 'stock')
Output:
date 06-05-2021 07-05-2021
code
A291 2 8
D123 5 10
F268 5 11
Details:
First, merge on id, this is like doing a SQL join. First, the
dtypes much match, hence using astype to str.
Next, reshape the dataframe using pivot to get code by date.

How to add values from one column to another table using join?

I am having difficulties in merging 2 tables. In fact, I would like add a column from table B into table A based on one key
Table A (632 rows) contains the following columns:
part_number / part_designation / AC / AC_program
Table B (4,674 rows) contains the following columns:
part_ref / supplier_id / supplier_name / ac_program
I would like to add the supplier_name values into Table A
I have succeeded compiling a left joint based on the condition tableA.part_number == tableB.part_ref
However, when I look at the resulting Table, additional rows were created. I have now 683 rows instead of the initial 632 rows in Table A. How do I keep the same number of rows with including the supplier_name values in Table A? Below is presented a graph of my transformations:
Here is my code:
Table B seems to contain duplicates (part_ref). The join operation creates a new record in your original table for each duplicate in Table B
import pandas as pd
print(len(pd.unique(updated_ref_table.part_ref)))
print(updated_ref_table.shape[0])

Create or populate a table in python based on other tables

I am new to python/pandas and have a fairly basic question.
I have 2 tables with numerous columns and "ID" as the primary key.
I want to create another table with conditions based on the 2 tables.
For example: Table A, Table B --> Table C
In SQL I would write something like this:
create table TableC as select
a.ID,
case when b.Field1=1000 and a.Field1=50 then 20 else 0 end as FieldA,
case when b.Field2=15 and a.Field2=100 then 100 else 0 end as FieldB
from TableA a, TableB b
where a.ID=b.ID
order by 1
I am struggling to put similar together Table C using python.
I have tried to make a function but I cant seem to include more than 1 table in the function nor create a new table based off multiple tables.
Any help will be much appreciated.
IIUC
TableC=TableA.merge(TableB,on='ID')
TableC['FieldA']=np.where((TableC.Field1_x==1000)& (TableC.Field1_y==50),20,0)
TableC['FieldB']=np.where((TableC.Field2_x==15)& (TableC.Field2_y==100),100,0)

Python with mongodb

I am very beginner to python.
I have two tables named as Table A and Table B, In Table A have 1M record is available and Table B have 14M records is available and each record is a very big sentence(Paragraph) with special character numbers etc..,
I want to split the sentences into words of each record and compare the each row of Table A Column 1 into each and row of the Table B column 1 and I would like to the find top 5 highest match(Most relevant match) from the Table B record.
And, if I compare like 1M*14M it tooks more time could you please suggest any one the right way to do in python with mongodb

Categories

Resources