make a dict/json from string with duplicate keys Python - python

I have a string that could be parsed as a JSON or dict object. My string variable looks like this :
my_string_variable = """{
"a":1,
"b":{
"b1":1,
"b2":2
},
"b": {
"b1":3,
"b2":2,
"b4":8
}
}"""
When I do json.loads(my_string_variable), I have a dict but only the second value of the key "b" is kept, which is normal because a dict can't contain duplicate keys.
What would be the best way to have some sort of defaultdict like this :
result = {
"a": 1,
"b": [{"b1": 1, "b2": 2}, {"b1": 3, "b2": 2, "b4": 8}],
}
I have already looked for similar questions but they all deal with dicts or lists as an input and then create defaultdicts to handle the duplicate keys.
In my case I have a string variable and I would want to know if there is a simple way to achieve this.

something like the following can be done.
import json
def join_duplicate_keys(ordered_pairs):
d = {}
for k, v in ordered_pairs:
if k in d:
if type(d[k]) == list:
d[k].append(v)
else:
newlist = []
newlist.append(d[k])
newlist.append(v)
d[k] = newlist
else:
d[k] = v
return d
raw_post_data = '{"a":1, "b":{"b1":1,"b2":2}, "b": { "b1":3, "b2":2,"b4":8} }'
newdict = json.loads(raw_post_data, object_pairs_hook=join_duplicate_keys)
print (newdict)
Please note that above code depends on value type, if type(d[k]) == list. So if original string itself gives a list then there could be some error handling required to make the code robust.

Accepted answer is perfectly fine. I just wanted to show another approach.
So at first, you dedicate a list for values in order to easily accumulate next values. At the end, you call pop on the lists which have only one item. This means that the list doesn't have duplicate values:
import json
from collections import defaultdict
my_string_variable = '{"a":1, "b":{"b1":1,"b2":2}, "b": { "b1":3, "b2":2,"b4":8} }'
def join_duplicate_keys(ordered_pairs):
d = defaultdict(list)
for k, v in ordered_pairs:
d[k].append(v)
return {k: v.pop() if len(v) == 1 else v for k, v in d.items()}
d = json.loads(my_string_variable, object_pairs_hook=join_duplicate_keys)
print(d)
output:
{'a': 1, 'b': [{'b1': 1, 'b2': 2}, {'b1': 3, 'b2': 2, 'b4': 8}]}

Related

Recursive convert values to string using dictionary comprehension

Using dictionary comprehension is it possible to convert all values recursively to string?
I have this dictionary
d = {
"root": {
"a": "1",
"b": 2,
"c": 3,
"d": 4
}
}
I tried
{k: str(v) for k, v in d.items()}
But the code above turns the entire root value into string and I want this:
d = {"root": {"a": "1", "b": "2", "c": "3", "d": "4"}}
This is not a dictionary comprehension, but it works, it's just one line, and it's recursive!
(f := lambda d: {k: f(v) for k, v in d.items()} if type(d) == dict else str(d))(d)
It only works with Python 3.8+ though (because of the use of an assignment expression).
You could do a recursive solution for arbitrarily nested dicts, but if you only have 2 levels the following is sufficient:
{k: {k2: str(v2) for k2, v2 in v.items()} for k, v in d.items()}
Assuming that your given input was wrong and root's value was a dictionary, your code would somewhat work. You just need to add d['root'].items()
newDict = {k:{k: str(v) for k, v in d[k].items()} for k,v in d.items()}
output
{'root': {'a': '1', 'b': '2', 'c': '3', 'd': '4'}}
The following solution might not be using dictionary comprehension, but it is recursive and can transform dictionaries of any depth, I don't think that's possible using comprehension alone:
def convert_to_string(d):
for key, value in d.items():
if isinstance(value, dict):
convert_to_string(value)
else:
d[key] = str(value)
Found a simpler way to to achieve this using json module. Just made the following
import json
string_json = json.dumps(d) # Convert to json string
d = json.loads(string_json, parse_int=str) # This convert the `int` to `str` recursively.
Using a function
def dictionary_string(dictionary: dict) -> dict:
return json.loads(json.dumps(dictionary), parse_int=str, parse_float=str)
Regards

How do I check to see if values in a dict are the exact same?

I currently have a dictionary d with key: string, and values is another dict.
In the d dictionary values, how can I check which key and values are ALL the same?
Example Dictionary:
zybook, zybooks, zybookz are keys. There can be more than three keys, but I only put two for now. And then the values of d are another dict with {file name : number}
d = {"zybook":
{
"noodle.json": 5,
"testing.json": 1,
"none.json": 5
},
"zybooks":
{
"noodle.json": 5,
"ok.json": 1
},
"zybookz":
{
"noodle.json": 5
}
}
Expected Output:
Because {"noodle.json": 5} {"noodle.json": 5} are both the same in zybook, zybooks, and zybookz the output will create another dictionary with all 3 matches.
{"noodle.json": 5}
My attempt:
I honestly don't know how to approach this.
d = {"zybook": { "noodle.json": 5, "testing.json": 1, "none.json": 5},
"zybooks": {"noodle.json": 5, "ok.json": 1},
"zybookz": {"noodle.json": 5}
}
for key, value in d.items():
for k, v in value.items():
if
from functools import reduce
sets = (set(val.items()) for val in d.values())
desired = dict(reduce(set.intersection, sets))
print(desired)
# {'noodle.json': 5}
We first form sets out of the file_name:num pairs of each dictionary. Then, reduce cumulatively looks each set and reduces them to the desired result by taking intersection of those sets. Lastly, converting to a dict as needed.
Try this:
from collections import Counter
res = {z[0]: z[1] for z, count in Counter([(k, v) for x in d for k, v in d[x].items()]).items() if count == len(d)}
With only the use of embedded Python methods
new = []
for v in d.values():
new+=list(v.items())
# [('noodle.json', 5), ('testing.json', 1), ('none.json', 5), ('noodle.json', 5), ('ok.json', 1)]
cnt_dict = {v:new.count(v) for v in new}
# {('noodle.json', 5): 3, ('testing.json', 1): 1, ('none.json', 5): 1, ('ok.json', 1): 1}
d2 = {k[0]:k[1] for k,v in cnt_dict.items() if v > 1}
print(d2)
# {'noodle.json': 5}

Pythonic way to get keys and values from nested dictionaries

just wondering, before I start to work on a function. I always like to hear some pythonic solutions.
I am trying to get keys and values from nested dictionaries:
for an example:
a = {'one': {'animal': 'chicken'},
'two': {'fish': {'sea':'shark'}}}
is there any pythonic way to get values from nested dictionary? Like get straight to value of 'fish'?
Thanks in advance
If you want to find all the items with the "fish" key in the nested dictionary, you can modify this answer flatten nested python dictionaries-compressing keys - answer #Imran
import collections
def get_by_key_in_nested_dict(d, key, parent_key='', sep='_'):
items = []
for k, v in d.items():
new_key = parent_key + sep + k if parent_key else k
if key==k:
items.append((new_key, v))
if isinstance(v, collections.MutableMapping):
items.extend(get_by_key_in_nested_dict(v, key, new_key, sep).items())
return dict(items)
with,
test = {
'one': {
'animal': 'chicken'
},
'two': {
'fish': {
'sea':'shark',
'fish':0
}
},
'fish':[1,2,3]
}
get_by_key_in_nested_dict(test,"fish")
You get all the items that have the key "fish"
{
'fish': [1, 2, 3],
'two_fish': {'fish': 0, 'sea': 'shark'},
'two_fish_fish': 0
}

Adding multiple values to an existing dictionary as SETS

I have a dictionary where I have the data already inside, i.e. keys have values and some of them have more than one value.
For example:
i = {"a": "111", "b": "222", "c": ["333", "444"]}
How can I change the type of the multiple values? I want them to be sets, not lists, such as:
i = {"a": {"111"}, "b": {"222"}, "c": {"333", "444"}}
One similar post is this one:
How to add multiple values to a dictionary key in python? [closed]
There it is explained how to add multiple elements to a dictionary, but they always seem to be lists.
How to change the type of the multiple values?
OR how to add them to the dictionary as sets, not lists?
Using a dict-comprehension makes converting an existing dict very easy:
i = {"a": "111", "b": "222", 'c': ["333", "444"]}
{k: set(v) if isinstance(v, list) else v for k, v in i.items()}
this converts all values that are lists to sets.
In a single line of code:
>>> i = {"a": "111", "b": "222", "c": ["333", "444"]}
>>> {k: set(v) for k, v in i.items()}
{'b': {'2'}, 'a': {'1'}, 'c': {'444', '333'}}
Or with a few more steps:
>>> i = {"a": "111", "b": "222", "c": ["333", "444"]}
>>> for k, v in i.items():
... i[k] = set(v)
>>> i
{'b': {'2'}, 'a': {'1'}, 'c': {'444', '333'}}
Instead of doing
my_dict['key'] = ['333', '444']
use a set literal:
my_dict['key'] = {'333', '444'}
That looks like a dict literal, but the lack of key: value like things makes it a set.

What's the most pythonic way to merge 2 dictionaries, but make the values the average values?

d1 = { 'apples': 2, 'oranges':5 }
d2 = { 'apples': 1, 'bananas': 3 }
result_dict = { 'apples': 1.5, 'oranges': 5, 'bananas': 3 }
What's the best way to do this?
Here is one way:
result = dict(d2)
for k in d1:
if k in result:
result[k] = (result[k] + d1[k]) / 2.0
else:
result[k] = d1[k]
This would work for any number of dictionaries:
dicts = ({"a": 5},{"b": 2, "a": 10}, {"a": 15, "b": 4})
keys = set()
averaged = {}
for d in dicts:
keys.update(d.keys())
for key in keys:
values = [d[key] for d in dicts if key in d]
averaged[key] = float(sum(values)) / len(values)
print averaged
# {'a': 10.0, 'b': 3.0}
Update: #mhyfritz showed a way how you could reduce 3 lines to one!
dicts = ({"a": 5},{"b": 2, "a": 10}, {"a": 15, "b": 4})
averaged = {}
keys = set().union(*dicts)
for key in keys:
values = [d[key] for d in dicts if key in d]
averaged[key] = float(sum(values)) / len(values)
print averaged
Your question was for the most 'Pythonic' way.
I think for a problem like this, the Pythonic way is one that is very clear. There are many ways to implement the solution to this problem! If you really do have only 2 dicts then the solutions that assume this are great because they are much simpler (and easier to read and maintain as a result). However, it's often a good idea to have the general solution because it means you won't need to duplicate the bulk of the logic for other cases where you have 3 dictionaries, for example.
As an addendum, phant0m's answer is nice because it uses a lot of Python's features to make the solution readable. We see a list comprehension:
[d[key] for d in dicts if key in d]
Use of Python's very useful set type:
keys = set()
keys.update(d.keys())
And generally, good use of Python's type methods and globals:
d.keys()
keys.update( ... )
keys.update
len(values)
Thinking of and implementing an algorithm to solve this problem is one thing, but making it this elegant and readable by utilising the power of the language is what most people would deem 'Pythonic'.
(I would use phant0m's solution)
Yet another way:
result = dict(d1)
for (k,v) in d2.items():
result[k] = (result.get(k,v) + v) / 2.0
A Counter and some Generators are useful in this situation
General Case:
>>> d1 = { 'apples': 2, 'oranges':5 }
>>> d2 = { 'apples': 1, 'bananas': 3 }
>>> all_d=[d1,d2]
>>> from collections import Counter
>>> counts=Counter(sum((d.keys() for d in all_d),[]))
>>> counts
Counter({'apples': 2, 'oranges': 1, 'bananas': 1})
>>> s=lambda k: sum((d.get(k,0) for d in all_d))
>>> result_set=dict(((k,1.0*s(k)/counts[k]) for k in counts.keys()))
>>> result_set
{'apples': 1.5, 'oranges': 5.0, 'bananas': 3.0}
d1 = { 'apples': 2, 'oranges':5 }
d2 = { 'apples': 1, 'bananas': 3, 'oranges':0 }
dicts = [d1, d2]
result_dict = {}
for dict in dicts:
for key, value in dict.iteritems():
if key in result_dict:
result_dict[key].append(value)
else:
result_dict[key] = [value]
for key, values in result_dict.iteritems():
result_dict[key] = float(sum(result_dict[key])) / len(result_dict[key])
print result_dict

Categories

Resources