Convert a python dictionary with sets' values to a binary dataframe - python

I have a dictionary where the values are sets:
my_dict = {1: {'a', 'b'}, 2: {'a', 'c'}, 3: {'b', 'c', 'd'}, 4: {'a'}}
I would like to convert it to a binary dataframe where the columns are the members of the keys' sets - so for the above example, the output is as follows:
a b c d
1 1 1 0 0
2 1 0 1 0
3 0 1 1 1
4 1 0 0 0
How can I do it in an efficient and scalable manner?

You can use pd.str.get_dummies, like this:
my_dict = {1: {'a', 'b'}, 2: {'a', 'c'}, 3: {'b', 'c', 'd'}, 4: {'a'}}
ser = pd.Series({k: list(v) for k, v in my_dict.items()}).str.join('|').str.get_dummies()
print(ser)

Related

Unpack values from nested dictionaries with depth level = 1

Is there a more elegant way to unpack values from nested dictionaries (depth level = 1) in a set?
d = {1: {10: 'a',
11: 'b'},
2: {20: 'a',
21: 'c'}}
print(set(c for b in [[*set(a.values())] for a in d.values()] for c in b))
# {'a', 'b', 'c'}
You can iterate over values of nested dict and add in set.
d = {1: {10: 'a',
11: 'b'},
2: {20: 'a',
21: 'c'}}
res = set(v for key,val in d.items() for v in val.values())
print(res)
# {'a', 'b', 'c'}

Reverse the group/items in Python

I have a table like this:
Group
Item
A
a, b, c
B
b, c, d
And I want to convert to like this:
Item
Group
a
A
b
A, B
c
A, B
d
B
What is the best way to achieve this?
Thank you!!
If you are working in pandas, you can use 'explode' to unpack items, and can use 'to_list' lambda for the grouping stage.
Here is some info on 'explode' method https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.explode.html.
import pandas as pd
df = pd.DataFrame(data={'Group': ['A', 'B'], 'Item': [['a','b','c'], ['b','c','d']]})
Exploding
df.explode('Item').reset_index(drop=True).to_dict(orient='records')
[{'Group': 'A', 'Item': 'a'},
{'Group': 'A', 'Item': 'b'},
{'Group': 'A', 'Item': 'c'},
{'Group': 'B', 'Item': 'b'},
{'Group': 'B', 'Item': 'c'},
{'Group': 'B', 'Item': 'd'}]
Exploding and then using 'to_list' lambda
df.explode('Item').groupby('Item')['Group'].apply(lambda x: x.tolist()).reset_index().to_dict(orient='records')
[{'Item': 'a', 'Group': ['A']},
{'Item': 'b', 'Group': ['A', 'B']},
{'Item': 'c', 'Group': ['A', 'B']},
{'Item': 'd', 'Group': ['B']}]
Not the most efficient, but very short:
>>> table = {'A': ['a', 'b', 'c'], 'B': ['b', 'c', 'd']}
>>> reversed_table = {v: [k for k, vs in table.items() if v in vs] for v in set(v for vs in table.values() for v in vs)}
>>> print(reversed_table)
{'b': ['A', 'B'], 'c': ['A', 'B'], 'd': ['B'], 'a': ['A']}
With dictionaries, you wouldtypically approach it like this:
table = {'A': ['a', 'b', 'c'], 'B': ['b', 'c', 'd']}
revtable = dict()
for v,keys in table.items():
for k in keys:
revtable.setdefault(k,[]).append(v)
print(revtable)
# {'a': ['A'], 'b': ['A', 'B'], 'c': ['A', 'B'], 'd': ['B']}
Assuming that your tables are in the form of a pandas dataframe, you could try something like this:
import pandas as pd
import numpy as np
# Create initial dataframe
data = {'Group': ['A', 'B'], 'Item': [['a','b','c'], ['b','c','d']]}
df = pd.DataFrame(data=data)
Group Item
0 A [a, b, c]
1 B [b, c, d]
# Expand number of rows based on list column ("Item") contents
list_col = 'Item'
df = pd.DataFrame({
col:np.repeat(df[col].values, df[list_col].str.len())
for col in df.columns.drop(list_col)}
).assign(**{list_col:np.concatenate(df[list_col].values)})[df.columns]
Group Item
0 A a
1 A b
2 A c
3 B b
4 B c
5 B d
*Above snippet taken from here, which includes a more detailed explanation of the code
# Perform groupby operation
df = df.groupby('Item')['Group'].apply(list).reset_index(name='Group')
Item Group
0 a [A]
1 b [A, B]
2 c [A, B]
3 d [B]

Update dictionary while condition

I am working on a code that given a dictionary of dictionaries which looks like this:
D = {1: {2: 'a', 3: 'b'}, 10: {11: 'a', 12: 'b'}}
where 1 and 2 are keys while inner dictionaries {2: 'a', 3: 'b'} and {11: 'a', 12: 'b'} are results after applying an addition of 1 or 2. D[1] + 1 = 2 and D[1] + 2 = 3. 'a' and 'b' indicates an addition to 1 or 2 respectively.
From D I'd like to keep applying those additions to its new products which I can get by doing this:
products = list(set([l for x, y in D.items() for l, m in y.items()]))
products = [2,3,11,12]
I use a set list just to avoid apply additions on products that already are on D.
So applying additions to every item in product and add them to D will end up in something like this:
D = {1: {2: 'a', 3: 'b'}, 10: {11: 'a', 12: 'b'}, 2: {3: 'a', 4: 'b'}, 3: {4: 'a', 5: 'b'}, 11: {12: 'a', 13: 'b'}, 12: {13: 'a', 14: 'b'}}
Note the new keys and its new inner dictionaries (products)
The thing is that I'd like to keep doing this with new products in a while loop until a number is achieved.
For instance for the next iteration products will be:
products = [3,4,5,12,13,14]
They should be used to apply additions if they are not in D, so this can be easily done by:
for i in products:
if i in D:
products.remove(i)
which will lead us to:
products = [4,5,13,14] # 3 and 12 are already on D
So we should apply addition to these products and add them to D
So I guess that to achieve this there must be something like:
D = {1: {2: 'a', 3: 'b'}, 10: {11: 'a', 12: 'b'}}
i = 0
while i < 4: # just an example of 4 number of iterations
products = list(set([l for x, y in D.items() for l, m in y.items()]))
for j in products:
if j in D:
products.remove(j)
# apply additions
# update D or use an auxiliary dict and them append to D
i +=1
There's just a single variable i, there.
You'll want to use for j on the inner loop,
to avoid disturbing the outer loop.

Permutation mapping of two lists in python

How can I create a permutation mapping of two lists in python?
For example I have two lists [1,2,3] and ['A','B','C']
Then my code should generate a list of 6 dictionaries
[ {1:'A',2:'B',3:'C'},
{1:'A',2:'C',3:'B'},
{1:'B',2:'A',3:'C'},
{1:'B',2:'C',3:'A'},
{1:'C',2:'A',3:'B'},
{1:'C',2:'B',3:'A'} ]
Using zip and itertools.permutations in a list comprehension:
>>> from itertools import permutations
>>> L1 = [1,2,3]
>>> L2 = ['A','B','C']
>>> [dict(zip(L1, p)) for p in permutations(L2)]
[{1: 'A', 2: 'B', 3: 'C'},
{1: 'A', 2: 'C', 3: 'B'},
{1: 'B', 2: 'A', 3: 'C'},
{1: 'B', 2: 'C', 3: 'A'},
{1: 'C', 2: 'A', 3: 'B'},
{1: 'C', 2: 'B', 3: 'A'}]
You seem to permutate only the values of the dicts, so you could do something like
from itertools import permutations
dicts = []
keys = [1, 2, 3]
for values in permutations(['A', 'B', 'C']):
new_dict = dict(zip(keys, values))
dicts.append(new_dict)

Merging dictionaries using a counter

I have the following dictionaries (example):
>>> x = {'a': 'foo', 'b': 'foobar'}
>>> y = {'c': 'barfoo', 'd': 'bar'}
I want to take the keys of each and make them the value of another dict, say z, such that the keys of z is an incremented counter, equal to the length of both the dicts.
>>> z = {1: 'a', 2: 'b', 3: 'c', 4: 'd'}
As you can notice, the keys of z is an incremented counter and the values are the keys of x and y.
How do I achieve this? I have tried various solutions and playing with zip, but none seem to work. Probably because I have to update the z dict in succession.
Any suggestions?
In [1]: import itertools
In [2]: x = {'a': 'foo', 'b': 'foobar'}
In [3]: y = {'c': 'barfoo', 'd': 'bar'}
In [4]: z = [key for key in itertools.chain(x, y)]
In [5]: z
Out[5]: ['a', 'b', 'c', 'd']
In [6]: dict(enumerate(z))
Out[6]: {0: 'a', 1: 'b', 2: 'c', 3: 'd'}
In [7]: dict(enumerate(z, 1))
Out[7]: {1: 'a', 2: 'b', 3: 'c', 4: 'd'}
If you want duplicate keys to occur only once, replace [4] with this:
z = set(key for key in itertools.chain(x, y))
Note that you also could do everything at once (for this example I've added 'a': 'meow' to y):
In [15]: dict(enumerate(set(key for key in itertools.chain(x, y)), 1))
Out[15]: {1: 'a', 2: 'c', 3: 'b', 4: 'd'}
In [16]: dict(enumerate((key for key in itertools.chain(x, y)), 1))
Out[16]: {1: 'a', 2: 'b', 3: 'a', 4: 'c', 5: 'd'}
import itertools as it
{i+1:k for i,k in enumerate(it.chain(x,y))}
# {1: 'a', 2: 'b', 3: 'c', 4: 'd'}
Note that dict- (and related set-) comprehensions are new in v2.7+.

Categories

Resources