Converting table directly to tree structure with pandas

Converting table directly to tree structure with pandas - python

I want to convert this csv file Format:
into a hdf5 file with this structure:
I am using Pandas. Is there a simple way to do that?

You can use nested dictionaries via collections.defaultdict for this:
from collections import defaultdict
import pandas as pd
# read csv file
# df = pd.read_csv('input.csv', header=None)
df = pd.DataFrame([['A', 'a', 'a1'],
['A', 'a', 'a2'],
['A', 'b', 'b1'],
['A', 'b', 'b2'],
['A', 'c', 'c1'],
['A', 'c', 'c2']],
columns=['col1', 'col2', 'col3'])
d = defaultdict(lambda: defaultdict(list))
for row in df.itertuples():
d[row[1]][row[2]].append(row[3])
Result
defaultdict(<function __main__.<lambda>>,
{'A': defaultdict(list,
{'a': ['a1', 'a2'],
'b': ['b1', 'b2'],
'c': ['c1', 'c2']})})

Thanks, I will check out defaultdict. My solution is probably more hacky, but in case someone needs something customizable:
import pandas as pd
df = pd.DataFrame([['A', 'a', 'a1'],
['A', 'a', 'a2'],
['A', 'b', 'b1'],
['A', 'b', 'b2'],
['A', 'c', 'c1'],
['A', 'c', 'c2']],
columns=['col1', 'col2', 'col3'])
cols = ['col1', 'col2', 'col3']
children = {p : {} for p in cols}
parent = {p : {} for p in cols}
for x in df.iterrows():
for i in range(len(cols)-1):
_parent = x[1][cols[i]]
_child = x[1][cols[i+1]]
parent[cols[i+1]].update({_child : _parent})
if _parent in children[cols[i]]:
children_list = children[cols[i]][_parent]
children_list.add(_child)
children[cols[i]].update({_parent : children_list})
else:
children[cols[i]].update({_parent : set([_child])})
Result:
parent =
{'col1': {},
'col2': {'a': 'A', 'b': 'A', 'c': 'A'},
'col3': {'a1': 'a', 'a2': 'a', 'b1': 'b', 'b2': 'b', 'c1': 'c', 'c2': 'c'}}
Then you can walk up and down your hierarchy.

Related

Divide dataframe into list of rows containing all columns

From dataframe sructured like this
A B
0 1 2
1 3 4
I need to get list like this:
[{"A": 1, "B": 2}, {"A": 3, "B": 4}]

It looks like you want:
df.values.tolist()
example:
df = pd.DataFrame([['A', 'B', 'C'],
['D', 'E', 'F']])
df.values.tolist()
output:
[['A', 'B', 'C'],
['D', 'E', 'F']]
other options
df.T.to_dict('list')
{0: ['A', 'B', 'C'],
1: ['D', 'E', 'F']}
df.to_dict('records')
[{0: 'A', 1: 'B', 2: 'C'},
{0: 'D', 1: 'E', 2: 'F'}]

Reverse the group/items in Python

I have a table like this:
Group
Item
A
a, b, c
B
b, c, d
And I want to convert to like this:
Item
Group
a
A
b
A, B
c
A, B
d
B
What is the best way to achieve this?
Thank you!!

If you are working in pandas, you can use 'explode' to unpack items, and can use 'to_list' lambda for the grouping stage.
Here is some info on 'explode' method https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.explode.html.
import pandas as pd
df = pd.DataFrame(data={'Group': ['A', 'B'], 'Item': [['a','b','c'], ['b','c','d']]})
Exploding
df.explode('Item').reset_index(drop=True).to_dict(orient='records')
[{'Group': 'A', 'Item': 'a'},
{'Group': 'A', 'Item': 'b'},
{'Group': 'A', 'Item': 'c'},
{'Group': 'B', 'Item': 'b'},
{'Group': 'B', 'Item': 'c'},
{'Group': 'B', 'Item': 'd'}]
Exploding and then using 'to_list' lambda
df.explode('Item').groupby('Item')['Group'].apply(lambda x: x.tolist()).reset_index().to_dict(orient='records')
[{'Item': 'a', 'Group': ['A']},
{'Item': 'b', 'Group': ['A', 'B']},
{'Item': 'c', 'Group': ['A', 'B']},
{'Item': 'd', 'Group': ['B']}]

Not the most efficient, but very short:
>>> table = {'A': ['a', 'b', 'c'], 'B': ['b', 'c', 'd']}
>>> reversed_table = {v: [k for k, vs in table.items() if v in vs] for v in set(v for vs in table.values() for v in vs)}
>>> print(reversed_table)
{'b': ['A', 'B'], 'c': ['A', 'B'], 'd': ['B'], 'a': ['A']}

With dictionaries, you wouldtypically approach it like this:
table = {'A': ['a', 'b', 'c'], 'B': ['b', 'c', 'd']}
revtable = dict()
for v,keys in table.items():
for k in keys:
revtable.setdefault(k,[]).append(v)
print(revtable)
# {'a': ['A'], 'b': ['A', 'B'], 'c': ['A', 'B'], 'd': ['B']}

Assuming that your tables are in the form of a pandas dataframe, you could try something like this:
import pandas as pd
import numpy as np
# Create initial dataframe
data = {'Group': ['A', 'B'], 'Item': [['a','b','c'], ['b','c','d']]}
df = pd.DataFrame(data=data)
Group Item
0 A [a, b, c]
1 B [b, c, d]
# Expand number of rows based on list column ("Item") contents
list_col = 'Item'
df = pd.DataFrame({
col:np.repeat(df[col].values, df[list_col].str.len())
for col in df.columns.drop(list_col)}
).assign(**{list_col:np.concatenate(df[list_col].values)})[df.columns]
Group Item
0 A a
1 A b
2 A c
3 B b
4 B c
5 B d
*Above snippet taken from here, which includes a more detailed explanation of the code
# Perform groupby operation
df = df.groupby('Item')['Group'].apply(list).reset_index(name='Group')
Item Group
0 a [A]
1 b [A, B]
2 c [A, B]
3 d [B]

Convert dictionary to list with some data omitted

I'm trying to convert a dictionary of the format:
d = {'A1': ['a', 'a', 'A2 (A3-)', 'a'],
'B1': ['b', 'b', 'B2 (B3-)', 'b'],
'C1': ['c', 'c', 'C2 (C3)-', 'c']}
To a list of the form:
e = [['A1', 'A2', 'A3'], ['B1', 'B2', 'B3'], ['C1', 'C2', 'C3']]
I know I should use regex to get the A2 and A3 data, but I'm having trouble putting this all together...

import re
regex = re.compile(r'(\w+) \((\w+)-.*')
# I suppose that you meant (C3-) and not (C3)-
d = {'A1': ['a', 'a', 'A2 (A3-)', 'a'], 'B1': ['b', 'b', 'B2 (B3-)', 'b'], 'C1': ['c', 'c', 'C2 (C3-)', 'c']}
out = []
for key, values_list in d.items():
v2, v3 = regex.match(values_list[2]).groups()
out.append([key, v2, v3])
print(out)
# [['C1', 'C2', 'C3'], ['B1', 'B2', 'B3'], ['A1', 'A2', 'A3']]
Note that the order is random, as your original dict is unordered.

Divide list to multiple lists based on elements value

I have the following list:
initial_list = [['B', 'D', 'A', 'C', 'E']]
On each element of the list I apply a function and put the results in a dictionary:
for state in initial_list:
next_dict[state] = move([state], alphabet)
This gives the following result:
next_dict = {'D': ['E'], 'B': ['D'], 'A': ['C'], 'C': ['C'], 'E': ['D']}
What I would like to do is separate the keys from initial_list based on their
values in the next_dict dictionary, basically group the elements of the first list to elements with the same value in the next_dict:
new_list = [['A', 'C'], ['B', 'E'], ['D']]
'A' and 'C' will stay in the same group because they have the same value 'C', 'B' and 'D' will also share the same group because their value is 'D' and then 'D' will be in it's own group.
How can I achieve this result?

You need groupby, after having sorted your list by next_dict values :
It generates a break or new group every time the value of the key
function changes (which is why it is usually necessary to have sorted
the data using the same key function).
from itertools import groupby
initial_list = ['B', 'D', 'A', 'C', 'E']
def move(letter):
return {'A': 'C', 'C': 'C', 'D': 'E', 'E': 'D', 'B': 'D'}.get(letter)
sorted_list = sorted(initial_list, key=move)
print [list(v) for k,v in groupby(sorted_list, key=move)]
#=> [['A', 'C'], ['B', 'E'], ['D']]

Simplest way to achieve this will be to use itertools.groupby with key as dict.get as:
>>> from itertools import groupby
>>> next_dict = {'D': ['E'], 'B': ['D'], 'A': ['C'], 'C': ['C'], 'E': ['D']}
>>> initial_list = ['B', 'D', 'A', 'C', 'E']
>>> [list(i) for _, i in groupby(sorted(initial_list, key=next_dict.get), next_dict.get)]
[['A', 'C'], ['B', 'E'], ['D']]

I'm not exactly sure that's what you want but you can group the values based on their values in the next_dict:
>>> next_dict = {'D': 'E', 'B': 'D', 'A': 'C', 'C': 'C', 'E': 'D'}
>>> # external library but one can also use a defaultdict.
>>> from iteration_utilities import groupedby
>>> groupings = groupedby(['B', 'D', 'A', 'C', 'E'], key=next_dict.__getitem__)
>>> groupings
{'C': ['A', 'C'], 'D': ['B', 'E'], 'E': ['D']}
and then convert that to a list of their values:
>>> list(groupings.values())
[['A', 'C'], ['D'], ['B', 'E']]
Combine everything into a one-liner (not really recommended but a lot of people prefer that):
>>> list(groupedby(['B', 'D', 'A', 'C', 'E'], key=next_dict.__getitem__).values())
[['A', 'C'], ['D'], ['B', 'E']]

Try this:
next_next_dict = {}
for key in next_dict:
if next_dict[key][0] in next_next_dict:
next_next_dict[next_dict[key][0]] += key
else:
next_next_dict[next_dict[key][0]] = [key]
new_list = next_next_dict.values()
Or this:
new_list = []
for value in next_dict.values():
new_value = [key for key in next_dict.keys() if next_dict[key] == value]
if new_value not in new_list:
new_list.append(new_value)

We can sort your list with your dictionary mapping, and then use itertools.groupby to form the groups. The only amendment I made here is making your initial list an actual flat list.
>>> from itertools import groupby
>>> initial_list = ['B', 'D', 'A', 'C', 'E']
>>> next_dict = {'D': ['E'], 'B': ['D'], 'A': ['C'], 'C': ['C'], 'E': ['D']}
>>> s_key = lambda x: next_dict[x]
>>> [list(v) for k, v in groupby(sorted(initial_list, key=s_key), key=s_key)]
[['A', 'C'], ['B', 'E'], ['D']]

python reverse/transponse a dictionary [duplicate]

This question already has answers here:
Invert keys and values of the original dictionary
(3 answers)
Closed 8 years ago.
I am looking to tranpose a dictionary on python and after looking around i was not able to ifnd a solution for this. Does anybody know how could i reverse a dictionary like the following as input:
graph = {'A': ['B', 'C'],
'B': ['C', 'D'],
'C': ['D'],
'D': ['C'],
'E': ['F'],
'F': ['C']}
so that i get something like:
newgraph = {'A': [''],
'B': ['A'],
'C': ['A', 'B', 'D','F'],
'D': ['B', 'C'],
'E': [''],
'F': ['E']}

Use defaultdict:
newgraph = defaultdict(list)
for x, adj in graph.items():
for y in adj:
newgraph[y].append(x)
While it doesn't seem to make any sense to have the empty string '' in the empty lists, it's certainly possible:
for x in newgraph:
newgraph[x] = newgraph[x] or ['']

Use defaultdict:
>>> from collections import defaultdict
>>> graph = {'A': ['B', 'C'],
... 'B': ['C', 'D'],
... 'C': ['D'],
... 'D': ['C'],
... 'E': ['F'],
... 'F': ['C']}
>>> new_graph = defaultdict(list)
>>> for ele in graph.keys():
... new_graph[ele] = []
...
>>> for k, v in graph.items():
... for ele in v:
... new_graph[ele].append(k)
...
>>> pprint(new_graph)
{'A': [],
'B': ['A'],
'C': ['A', 'B', 'D', 'F'],
'D': ['B', 'C'],
'E': [],
'F': ['E']}

It's also possible without defaultdict.
Here I've left the empty keys in the new dict with the value None.
graph = {'A': ['B', 'C'],
'B': ['C', 'D'],
'C': ['D'],
'D': ['C'],
'E': ['F'],
'F': ['C']}
g = dict.fromkeys(graph.keys())
for k, v in graph.iteritems():
for x in v:
if g[x]: g[x] += [k]
else: g[x] = [k]
for k in sorted(graph.keys()):
print k, ':', g[k]
Output:
A : None
B : ['A']
C : ['A', 'B', 'D', 'F']
D : ['C', 'B']
E : None
F : ['E']

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Converting table directly to tree structure with pandas - python

I want to convert this csv file Format: into a hdf5 file with this structure: I am using Pandas. Is there a simple way to do that?

Related

Divide dataframe into list of rows containing all columns

Reverse the group/items in Python

Convert dictionary to list with some data omitted

Divide list to multiple lists based on elements value

python reverse/transponse a dictionary [duplicate]

Categories

Resources