Dataframe to Dictionary including List of dictionaries - python

I am trying to convert below dataframe to dictionary.
I want to group via column A and take a list of common sequence. for e.g.
Example 1:
n1 v1 v2
2 A C 3
3 A D 4
4 A C 5
5 A D 6
Expected output:
{'A': [{'C':'3','D':'4'},{'C':'5','D':'6'}]}
Example 2:
n1 n2 v1 v2
s1 A C 3
s1 A D 4
s1 A C 5
s1 A D 6
s1 B P 6
s1 B Q 3
Expected Output:
{'s1': {'A': [{'C': 3, 'D': 4}, {'C': 5, 'D': 6}], 'B': {'P': 6, 'Q': 3}}}
so basically C and D are repeating as a sequence,I want to club C and D in one dictionary and make a list of if it occurs multiple times.
Please note (Currently I am using below code):
def recur_dictify(frame):
if len(frame.columns) == 1:
if frame.values.size == 1: return frame.values[0][0]
return frame.values.squeeze()
grouped = frame.groupby(frame.columns[0])
d = {k: recur_dictify(g.iloc[:,1:]) for k,g in grouped}
return d
This returns :
{s1 : {'A': {'C': array(['3', '5'], dtype=object), 'D': array(['4', '6'], dtype=object),'B':{'E':'5','F':'6'}}
Also, there can be another series of s2 having E,F,G,E,F,G repeating and some X and Y having single values

Lets create a function dictify which create a dictionary with top level keys from name column and club's the repeating occurrences of values in column v1 into different sub dictionaries:
from collections import defaultdict
def dictify(df):
dct = defaultdict(list)
for k, g in df.groupby(['n1', df.groupby(['n1', 'v1']).cumcount()]):
dct[k[0]].append(dict([*g[['v1', 'v2']].values]))
return dict(dct)
dictify(df)
{'A': [{'C': 3, 'D': 4}, {'C': 5, 'D': 6}]}
UPDATE:
In case there can be variable number of primary grouping keys i.e. [n1, n2, ...] we can use a more generic method:
def update(dct, keys, val):
k, *_ = keys
dct[k] = update(dct.get(k, {}), _, val) if _ \
else [*np.hstack([dct[k], [val]])] if k in dct else val
return dct
def dictify(df, keys):
dct = dict()
for k, g1 in df.groupby(keys):
for _, g2 in g1.groupby(g1.groupby('v1').cumcount()):
update(dct, k, dict([*g2[['v1', 'v2']].values]))
return dict(dct)
dictify(df, ['n1', 'n2'])
{'s1': {'A': [{'C': 3, 'D': 4}, {'C': 5, 'D': 6}], 'B': {'P': 6, 'Q': 3}}}

Here is a simple one line statement that solves your problem:
def df_to_dict(df):
return {name: [dict(x.to_dict('split')['data'])
for _, x in d.drop('name', 1).groupby(d.index // 2)]
for name, d in df.groupby('name')}
Here is an example:
df = pd.DataFrame({'name': ['A'] * 4,
'v1': ['C', 'D'] * 2,
'v2': [3, 4, 5, 6]})
print(df_to_dict(df))
Output:
{'A': [{'C': 3, 'D': 4}, {'C': 5, 'D': 6}]}

Related

Python rearrange the grouped diction

I am trying to develop a code and half of it is done, I am grouping my diction. I want to create a function to go back to the a_dict from b_dict
I want to print it as this;
Expected output;
a_dict: {'A': 1, 'B': 2, 'C': 3, 'D': 1, 'E': 2, 'F': 3} # Original Diction
Grouped dict: {1: ['A', 'D'], 2: ['B', 'E'], 3: ['C', 'F']} # Grouped Diction
Expected dict: {'A': 1, 'D': 1, 'B': 2, 'E': 2, 'C': 3, 'F': 3} # Expected second output with go_back function. Current output can not do this
Code:
a_dict = {'A': 1, 'B': 2, 'C': 3, 'D': 1, 'E': 2, 'F': 3}
print('a_dict: ', a_dict)
def fun_dict(a_dict):
b_dict = {}
for i, v in a_dict.items():
b_dict[v] = [i] if v not in b_dict.keys() else b_dict[v] + [i]
return b_dict
def go_back(b_dict):
#
# Need a function to convert b_dict to c_dict to go back as the expected output
#
b_dict = fun_dict(a_dict)
print('Grouped dict: ', b_dict)
c_dict = fun_dict(b_dict)
print('Went to the original dict: ', c_dict)
The go_back you want could be like this:
def go_back(b_dict):
r = {}
for k, vv in b_dict.items():
for v in vv:
r[v] = k
return r
Result:
Went to the original dict: {'A': 1, 'D': 1, 'B': 2, 'E': 2, 'C': 3, 'F': 3}
Here is a proposal:
def go_back(b_dict):
return {e: k for k, v in b_dict.items() for e in v}

How to combine a list of dictionaries to one dictionary

I have a list of dicts:
d =[{'a': 4}, {'b': 20}, {'c': 5}, {'d': 3}]
I want to remove the curly braces and convert d to a single dict which looks like:
d ={'a': 4, 'b': 20, 'c': 5, 'd': 3}
If you don't mind duplicate keys replacing earlier keys you can use:
from functools import reduce # Python 3 compatibility
d = reduce(lambda a, b: dict(a, **b), d)
This merges the first two dictionaries then merges each following dictionary into the result built so far.
Demo:
>>> d =[{'a': 4}, {'b': 20}, {'c': 5}, {'d': 3}]
>>> reduce(lambda a, b: dict(a, **b), d)
{'a': 4, 'c': 5, 'b': 20, 'd': 3}
Or if you need this to work for arbitrary (non string) keys (and you are using Python 3.5 or greater):
>>> d =[{4: 4}, {20: 20}, {5: 5}, {3: 3}]
>>> reduce(lambda a, b: dict(a, **b), d) # This wont work
TypeError: keywords must be strings
>>> reduce(lambda a, b: {**a, **b}, d) # Use this instead
{4: 4, 20: 20, 5: 5, 3: 3}
The first solution hacks the behaviour of keyword arguments to the dict function. The second solution is using the more general ** operator introduced in Python 3.5.
You just need to iterate over d and append (update()) the element to a new dict e.g. newD.
d =[{'a': 4}, {'b': 20}, {'c': 5}, {'d': 3}]
newD = {}
for entry in d:
newD.update(entry)
>>> newD
{'c': 5, 'b': 20, 'a': 4, 'd': 3}
Note: If there are duplicate values in d the last one will be appear in newD.
Overwriting the values of existing keys, a brutal and inexperienced solution is
nd = {}
for el in d:
for k,v in el.items():
nd[k] = v
or, written as a dictionary comprehension:
d = {k:v for el in d for k,v in el.items()}
a = [{'a': 4}, {'b': 20}, {'c': 5}, {'d': 3}]
b = {}
[b.update(c) for c in a]
b = {'a': 4, 'b': 20, 'c': 5, 'd': 3}
if order is important:
from collections import OrderedDict
a = [{'a': 4}, {'b': 20}, {'c': 5}, {'d': 3}]
newD = OrderedDict()
[newD.update(c) for c in a]
out = dict(newD)

How to access indexes of a list of dictionaries?

Suppose I am give a list of dictionaries, where
dict1 = dict(a = 2, b = 5, c = 7)
dict2 = dict(c = 5, d = 5, e = 1)
dict3 = dict(e = 2, f = 4, g = 10)
list_of_dictionaries = [dict1, dict2, dict3]
How would I be able to, find the value of the highest index (aka the latest dictionary)?
So if I were to write a method to delete an item from the list of dictionaries, let's say I want to delete c from the dictionary.
How would I be able to delete the c from the second dictionary instead of the first?
The key is reversing through the list with reverse indexing (a_list[::-1]).
From there once you find any dictionary that matches the requirements alter it and quit the function or loop - hence the early returns.
This code:
def get_last(bucket,key):
for d in bucket[::-1]:
if key in d.keys():
return d[key]
return None
def set_last(bucket,key,val):
for d in bucket[::-1]:
if key in d.keys():
d[key] = val
return
def pop_last(bucket,key):
out = None
for d in bucket[::-1]:
if key in d.keys():
return d.pop(key)
dict1 = {'a': 2, 'b': 5, 'c': 7}
dict2 = {'c': 5, 'd': 5, 'e': 1}
dict3 = {'e': 2, 'f': 4, 'g': 10}
list_of_dictionaries = [dict1, dict2, dict3]
print get_last(list_of_dictionaries ,'c')
set_last(list_of_dictionaries ,'c',7)
print list_of_dictionaries
popped = pop_last(list_of_dictionaries ,'c')
print popped
print list_of_dictionaries
Gives:
5
[{'a': 2, 'c': 7, 'b': 5}, {'c': 7, 'e': 1, 'd': 5}, {'e': 2, 'g': 10, 'f': 4}]
7
[{'a': 2, 'c': 7, 'b': 5}, {'e': 1, 'd': 5}, {'e': 2, 'g': 10, 'f': 4}]
I am not exactly sure what you mean but I wan to show you a couple of things that might help:
First here is how your dictionaries should look like:
dict1 = {"a" :2, "b" : 5, "c" :7}
dict2 = {"c" :5, "d" :5, "e" :1}
dict3 = {"e" :2, "f" :4, "g" :10}
Then you asked this: "How would I be able to delete the c from the second dictionary instead of the first?"
You can do delete it this way:
del dict2["c"]

Updating a dictionary in python

I've been stuck on this question for quite sometime and just can't figure it out. I just want to be able to understand what I'm missing and why it's needed.
What I need to do is make a function which adds each given key/value pair to the dictionary. The argument key_value_pairs will be a list of tuples in the form (key, value).
def add_to_dict(d, key_value_pairs):
newinputs = [] #creates new list
for key, value in key_value_pairs:
d[key] = value #updates element of key with value
if key in key_value_pairs:
newinputs.append((d[key], value)) #adds d[key and value to list
return newinputs
I can't figure out how to update the "value" variable when d and key_value_pairs have different keys.
The first three of these scenarios work but the rest fail
>>> d = {}
>>> add_to_dict(d, [])
[]
>>> d
{}
>>> d = {}
>>> add_to_dict(d, [('a', 2])
[]
>>> d
{'a': 2}
>>> d = {'b': 4}
>>> add_to_dict(d, [('a', 2)])
[]
>>> d
{'a':2, 'b':4}
>>> d = {'a': 0}
>>> add_to_dict(d, [('a', 2)])
[('a', 0)]
>>> d
{'a':2}
>>> d = {'a', 0, 'b': 1}
>>> add_to_dict(d, [('a', 2), ('b': 4)])
[('a', 2), ('b': 1)]
>>> d
{'a': 2, 'b': 4}
>>> d = {'a': 0}
>>> add_to_dict(d, [('a', 1), ('a': 2)])
[('a', 0), ('a':1)]
>>> d
{'a': 2}
Thanks
Edited.
Python has this feature built-in:
>>> d = {'b': 4}
>>> d.update({'a': 2})
>>> d
{'a': 2, 'b': 4}
Or given you're not allowed to use dict.update:
>>> d = dict(d.items() + {'a': 2}.items()) # doesn't work in python 3
With python 3.9 you can use an |= update operator:
>>> d = {'b': 4}
>>> d |= {'a': 2}
>>> d
{'a': 2, 'b': 4}
Here's a more elegant solution, compared to Eric's 2nd snippet
>>> a = {'a' : 1, 'b' : 2}
>>> b = {'a' : 2, 'c' : 3}
>>> c = dict(a, **b)
>>> a
{'a': 1, 'b': 2}
>>> b
{'a': 2, 'c': 3}
>>> c
{'a': 2, 'b': 2, 'c': 3}
It works both in Python 2 and 3
And of course, the update method
>>> a
{'a': 1, 'b': 2}
>>> b
{'a': 2, 'c': 3}
>>> a.update(b)
>>> a
{'a': 2, 'b': 2, 'c': 3}
However, be careful with the latter, as might cause you issues in case of misuse like here
>>> a = {'a' : 1, 'b' : 2}
>>> b = {'a' : 2, 'c' : 3}
>>> c = a
>>> c.update(b)
>>> a
{'a': 2, 'b': 2, 'c': 3}
>>> b
{'a': 2, 'c': 3}
>>> c
{'a': 2, 'b': 2, 'c': 3}
The new version of Python3.9 introduces two new operators for dictionaries: union (|) and in-place union (|=). You can use | to merge two dictionaries, while |= will update a dictionary in place. Let's consider 2 dictionaries d1 and d2
d1 = {"name": "Arun", "height": 170}
d2 = {"age": 21, "height": 170}
d3 = d1 | d2 # d3 is the union of d1 and d2
print(d3)
Output:
{'name': 'Arun', 'height': 170, 'age': 21}
Update d1 with d2
d1 |= d2
print(d1)
Output:
{'name': 'Arun', 'height': 170, 'age': 21}
You can update d1 with a new key weight as
d1 |= {"weight": 80}
print(d1)
Output:
{'name': 'Arun', 'height': 170, 'age': 21, 'weight': 80}
So if I understand you correctly you want to return a list of of tuples with (key, old_value) for the keys that were replaced.
You have to save the old value before you replace it:
def add_to_dict(d, key_value_pairs):
newinputs = [] #creates new list
for key, value in key_value_pairs:
if key in d:
newinputs.append((key, d[key]))
d[key] = value #updates element of key with value
return newinputs
Each key in a python dict corresponds to exactly one value. The cases where d and key_value_pairs have different keys are not the same elements.
Is newinputs supposed to contain the key/value pairs that were previously not present in d? If so:
def add_to_dict(d, key_value_pairs):
newinputs = []
for key, value in key_value_pairs:
if key not in d:
newinputs.append((key, value))
d[key] = value
return newinputs
Is newinputs supposed to contain the key/value pairs where the key was present in d and then changed? If so:
def add_to_dict(d, key_value_pairs):
newinputs = []
for key, value in key_value_pairs:
if key in d:
newinputs.append((key, value))
d[key] = value
return newinputs
If I understand you correctly, you only want to add the keys that do not exist in the dictionary. Here is the code:
def add_to_dict(d, key_value_pairs):
newinputs = [];
for key, value in key_value_pairs:
if key not in d.keys():
d[key] = value
newinputs.append((key, value));
return newinputs
For each key in new key,value pairs list you have to check if the key is new to the dictionary and add it only then.
Hope it helps ;)

Python - Find non mutual items in two dicts

Lets say I have two dictionaries:
a = {'a': 1, 'b': 2, 'c': 3}
b = {'b': 2, 'c': 3, 'd': 4, 'e': 5}
What's the most pythonic way to find the non mutual items between the two of them such that for a and b I would get:
{'a': 1, 'd': 4, 'e': 5}
I had thought:
{key: b[key] for key in b if not a.get(key)}
but that only goes one way (b items not in a) and
a_only = {key: a[key] for key in a if not b.get(key)}.items()
b_only = {key: b[key] for key in b if not a.get(key)}.items()
dict(a_only + b_only)
seams very messy. Any other solutions?
>>> dict(set(a.iteritems()) ^ set(b.iteritems()))
{'a': 1, 'e': 5, 'd': 4}
Try with the symetric difference of set() :
out = {}
for key in set(a.keys()) ^ set(b.keys()):
out[key] = a.get(key, b.get(key))
diff = {key: a[key] for key in a if key not in b}
diff.update((key,b[key]) for key in b if key not in a)
just a bit cheaper version of what you have.
>>> a = {'a': 1, 'b': 2, 'c': 3}
>>> b = {'b': 2, 'c': 3, 'd': 4, 'e': 5}
>>> keys = set(a.keys()).symmetric_difference(set(b.keys()))
>>> result = {}
>>> for k in keys: result[k] = a.get(k, b.get(k))
...
>>> result
{'a': 1, 'e': 5, 'd': 4}
Whether this is less messy than your version is debatable, but at least it doesn't re-implement symmetric_difference.

Categories

Resources