Join json files in Pandas from multiple rows

Join json files in Pandas from multiple rows - python

I am given a data frame (Table 1) with the following format. It has only col1 and col2, and json_col.
id col1 col2 json_col
1 a b json1
2 a c json2
3 b d json3
4 c a json4
5 d e json5
I have a new table (Table 2) and I would like to join json files in my new table
col1 col2 col3 col4 union_json
a b json1
a b d json1 and json3 union
a b d e json1, json3, and json5 union
c a json4
Here is an example of Table 1
df = pd.DataFrame({'col1': ['a', 'a', 'b', 'c', 'd'],
'col2': ['b', 'c', 'd', 'a', 'e'],
'col3': [{"origin":"a","destination":"b", "arc":[{"Type":"763","Number":"20"}]},
{"origin":"a","destination":"c", "arc":[{"Type":"763","Number":"50"}]},
{"origin":"a","destination":"d", "arc":[{"Type":"723","Number":"40"}]},
{"origin":"c","destination":"a", "arc":[{"Type":"700","Number":"30"}]},
{"origin":"d","destination":"e", "arc":[{"Type":"700","Number":"40"}]}]})
And, here is an example of Table 2:
df = pd.DataFrame({'col1': ['a', 'a', 'a', 'c'],
'col2': ['b', 'b', 'b', 'a'],
'col3': ['', 'd', 'd', ''],
'col4': ['', '', 'e', '']})
The union of json1 and json2 should look like this:
[[{"origin":"a","destination":"b", "arc":[{"Type":"763","Number":"20"}]}],
[{"origin":"a","destination":"d", "arc":[{"Type":"723","Number":"40"}]}]]

I hope I've understood your question right:
from itertools import combinations
def fn(x):
out, non_empty_vals = [], x[x != ""]
for c in combinations(non_empty_vals, 2):
out.extend(df1.loc[df1[["col1", "col2"]].eq(c).all(axis=1), "col3"])
return out
df2["union_json"] = df2.apply(fn, axis=1)
print(df2.to_markdown(index=False))
Prints:
col1
col2
col3
col4
union_json
a
b
[{'origin': 'a', 'destination': 'b', 'arc': [{'Type': '763', 'Number': '20'}]}]
a
b
d
[{'origin': 'a', 'destination': 'b', 'arc': [{'Type': '763', 'Number': '20'}]}, {'origin': 'a', 'destination': 'd', 'arc': [{'Type': '723', 'Number': '40'}]}]
a
b
d
e
[{'origin': 'a', 'destination': 'b', 'arc': [{'Type': '763', 'Number': '20'}]}, {'origin': 'a', 'destination': 'd', 'arc': [{'Type': '723', 'Number': '40'}]}, {'origin': 'd', 'destination': 'e', 'arc': [{'Type': '700', 'Number': '40'}]}]
c
a
[{'origin': 'c', 'destination': 'a', 'arc': [{'Type': '700', 'Number': '30'}]}]
Dataframes used:
df1
col1 col2 col3
0 a b {'origin': 'a', 'destination': 'b', 'arc': [{'Type': '763', 'Number': '20'}]}
1 a c {'origin': 'a', 'destination': 'c', 'arc': [{'Type': '763', 'Number': '50'}]}
2 b d {'origin': 'a', 'destination': 'd', 'arc': [{'Type': '723', 'Number': '40'}]}
3 c a {'origin': 'c', 'destination': 'a', 'arc': [{'Type': '700', 'Number': '30'}]}
4 d e {'origin': 'd', 'destination': 'e', 'arc': [{'Type': '700', 'Number': '40'}]}
df2
col1 col2 col3 col4
0 a b
1 a b d
2 a b d e
3 c a

Related

Excel vlookup in python dataframe

How can i make vlookup like in excel to pandas, i'm totally begginer in python. My first and second dataframe like this
data_01 = pd.DataFrame({'Tipe Car':['A', 'B', 'C', 'D'], 'Branch':['UD', 'UA', 'UK', 'UA'], 'Area':['1A', '1B', '1C', '1D']})
data_02 = pd.DataFrame({'Tipe Car':['A', 'B', 'E', 'F'], 'Branch':['UD', 'UA', 'UK', 'UA']})
and then expected output is
data_03 = pd.DataFrame({'Tipe Car':['A', 'B', 'E', 'F'], 'Branch':['UD', 'UA', 'UK', 'UA'], 'Area':['1A', '1B', 'NaN', 'NaN']})

Use pandas.DataFrame.join
import pandas as pd
df1 = pd.DataFrame({'Tipe Car':['A', 'B', 'C', 'D'], 'Branch':['UD', 'UA', 'UK', 'UA'], 'Area':['1A', '1B', '1C', '1D']})
df2 = pd.DataFrame({'Tipe Car':['A', 'B', 'E', 'F'], 'Branch':['UD', 'UA', 'UK', 'UA']})
df1.set_index('Tipe Car').join(df2.set_index('Tipe Car'), how='right', lsuffix='_df1', rsuffix='_df2')
>>>
Branch_df1 Area Branch_df2
Tipe Car
A UD 1A UD
B UA 1B UA
E NaN NaN UK
F NaN NaN UA

Divide dataframe into list of rows containing all columns

From dataframe sructured like this
A B
0 1 2
1 3 4
I need to get list like this:
[{"A": 1, "B": 2}, {"A": 3, "B": 4}]

It looks like you want:
df.values.tolist()
example:
df = pd.DataFrame([['A', 'B', 'C'],
['D', 'E', 'F']])
df.values.tolist()
output:
[['A', 'B', 'C'],
['D', 'E', 'F']]
other options
df.T.to_dict('list')
{0: ['A', 'B', 'C'],
1: ['D', 'E', 'F']}
df.to_dict('records')
[{0: 'A', 1: 'B', 2: 'C'},
{0: 'D', 1: 'E', 2: 'F'}]

Iterate over rows in pandas dataframe. If blanks exist before a specific column, move all column values over

I am attempting to iterate over all rows in a pandas dataframe and move all leftmost columns within each row over until all the non null column values in each row touch. The amount of column movement depends on the number of empty columns between the first null value and the cutoff column.
In this case I am attempting to 'close the gap' between values in the leftmost columns into the column 'd' touching the specific cutoff column 'eee'. The correlating 'abc' rows should help to visualize the problem.
Column 'eee' or columns to the right of 'eee' should not be touched or moved
def moveOver():
df = {
'aaa': ['a', 'a', 'a', 'a', 'a', 'a'],
'bbb': ['', 'b', 'b', 'b', '', 'b'],
'ccc': ['', '', 'c', 'c', '', 'c'],
'ddd': ['', '', '', 'd', '', ''],
'eee': ['b', 'c', 'd', 'e', 'b', 'd'],
'fff': ['c', 'd', 'e', 'f', 'c', 'e'],
'ggg': ['d', 'e', 'f', 'g', 'd', 'f']
}
In row 1 AND 5: 'a' would be moved over 3 column index's to column 'ddd'
In row 2: ['a','b'] would be moved over 2 column index's to columns ['ccc', 'ddd'] respectively
etc.
finalOutput = {
'aaa': ['', '', '', 'a', '', ''],
'bbb': ['', '', 'a', 'b', '', 'a'],
'ccc': ['', 'a', 'b', 'c', '', 'b'],
'ddd': ['a', 'b', 'c', 'd', 'a', 'c'],
'eee': ['b', 'c', 'd', 'e', 'b', 'd'],
'fff': ['c', 'd', 'e', 'f', 'c', 'e'],
'ggg': ['d', 'e', 'f', 'g', 'd', 'f']
}

You can do this:
keep_cols = df.columns[0:df.columns.get_loc('eee')]
df.loc[:,keep_cols] = [np.roll(v, Counter(v)['']) for v in df[keep_cols].values]
print(df):
aaa bbb ccc ddd eee fff ggg
0 a b c d
1 a b c d e
2 a b c d e f
3 a b c d e f g
4 a b c d
5 a b c d e f
Explanation:
You want to consider only those columns which are to the left of 'eee', so you take those columns as stored in keep_cols
Next you'd want each row to be shifted by some amount (we need to know how much), to shift I used numpy's roll. But how much amount? It is given by number of blank values - for that I used Counter from collections.

Merge specific rows in pandas Df

I have df after read_excel where some of values (from one column, with strings) are divided. How can i merge them back?
for example:
the df i have
{'CODE': ['A', None, 'B', None, None, 'C'],
'TEXT': ['A', 'a', 'B', 'b', 'b', 'C'],
'NUMBER': ['1', None, '2', None, None,'3']}
the df i want
{'CODE': ['A','B','C'],
'TEXT': ['Aa','Bbb','C'],
'NUMBER': ['1','2','3']}
I can't find the right solution. I tried to import data in different ways but it also did not help

You can forward fill missing values or Nones for groups with aggregate join and first non None value for NUMBER column:
d = {'CODE': ['A', None, 'B', None, None, 'C'],
'TEXT': ['A', 'a', 'B', 'b', 'b', 'C'],
'NUMBER': ['1', None, '2', None, None,'3']}
df = pd.DataFrame(d)
df1 = df.groupby(df['CODE'].ffill()).agg({'TEXT':''.join, 'NUMBER':'first'}).reset_index()
print (df1)
CODE TEXT NUMBER
0 A Aa 1
1 B Bbb 2
2 C C 3
You can generate dictionary:
cols = df.columns.difference(['CODE'])
d1 = dict.fromkeys(cols, 'first')
d1['TEXT'] = ''.join
df1 = df.groupby(df['CODE'].ffill()).agg(d1).reset_index()

Reverse the group/items in Python

I have a table like this:
Group
Item
A
a, b, c
B
b, c, d
And I want to convert to like this:
Item
Group
a
A
b
A, B
c
A, B
d
B
What is the best way to achieve this?
Thank you!!

If you are working in pandas, you can use 'explode' to unpack items, and can use 'to_list' lambda for the grouping stage.
Here is some info on 'explode' method https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.explode.html.
import pandas as pd
df = pd.DataFrame(data={'Group': ['A', 'B'], 'Item': [['a','b','c'], ['b','c','d']]})
Exploding
df.explode('Item').reset_index(drop=True).to_dict(orient='records')
[{'Group': 'A', 'Item': 'a'},
{'Group': 'A', 'Item': 'b'},
{'Group': 'A', 'Item': 'c'},
{'Group': 'B', 'Item': 'b'},
{'Group': 'B', 'Item': 'c'},
{'Group': 'B', 'Item': 'd'}]
Exploding and then using 'to_list' lambda
df.explode('Item').groupby('Item')['Group'].apply(lambda x: x.tolist()).reset_index().to_dict(orient='records')
[{'Item': 'a', 'Group': ['A']},
{'Item': 'b', 'Group': ['A', 'B']},
{'Item': 'c', 'Group': ['A', 'B']},
{'Item': 'd', 'Group': ['B']}]

Not the most efficient, but very short:
>>> table = {'A': ['a', 'b', 'c'], 'B': ['b', 'c', 'd']}
>>> reversed_table = {v: [k for k, vs in table.items() if v in vs] for v in set(v for vs in table.values() for v in vs)}
>>> print(reversed_table)
{'b': ['A', 'B'], 'c': ['A', 'B'], 'd': ['B'], 'a': ['A']}

With dictionaries, you wouldtypically approach it like this:
table = {'A': ['a', 'b', 'c'], 'B': ['b', 'c', 'd']}
revtable = dict()
for v,keys in table.items():
for k in keys:
revtable.setdefault(k,[]).append(v)
print(revtable)
# {'a': ['A'], 'b': ['A', 'B'], 'c': ['A', 'B'], 'd': ['B']}

Assuming that your tables are in the form of a pandas dataframe, you could try something like this:
import pandas as pd
import numpy as np
# Create initial dataframe
data = {'Group': ['A', 'B'], 'Item': [['a','b','c'], ['b','c','d']]}
df = pd.DataFrame(data=data)
Group Item
0 A [a, b, c]
1 B [b, c, d]
# Expand number of rows based on list column ("Item") contents
list_col = 'Item'
df = pd.DataFrame({
col:np.repeat(df[col].values, df[list_col].str.len())
for col in df.columns.drop(list_col)}
).assign(**{list_col:np.concatenate(df[list_col].values)})[df.columns]
Group Item
0 A a
1 A b
2 A c
3 B b
4 B c
5 B d
*Above snippet taken from here, which includes a more detailed explanation of the code
# Perform groupby operation
df = df.groupby('Item')['Group'].apply(list).reset_index(name='Group')
Item Group
0 a [A]
1 b [A, B]
2 c [A, B]
3 d [B]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Join json files in Pandas from multiple rows - python

Related

Excel vlookup in python dataframe

Divide dataframe into list of rows containing all columns

Iterate over rows in pandas dataframe. If blanks exist before a specific column, move all column values over

Merge specific rows in pandas Df

Reverse the group/items in Python

Categories

Resources