Related
I have a json like:
pd = {
"RP": [
{
"Name": "PD",
"Value": "qwe"
},
{
"Name": "qwe",
"Value": "change"
}
],
"RFN": [
"All"
],
"RIT": [
{
"ID": "All",
"IDT": "All"
}
]
}
I am trying to change the value change to changed. This is a dictionary within a list which is within another dictionary. Is there a better/ more efficient/pythonic way to do this than what I did below:
for key, value in pd.items():
ls = pd[key]
for d in ls:
if type(d) == dict:
for k,v in d.items():
if v == 'change':
pd[key][ls.index(d)][k] = "changed"
This seems pretty inefficient due to the amount of times I am parsing through the data.
String replacement could work if you don't want to write depth/breadth-first search.
>>> import json
>>> json.loads(json.dumps(pd).replace('"Value": "change"', '"Value": "changed"'))
{'RP': [{'Name': 'PD', 'Value': 'qwe'}, {'Name': 'qwe', 'Value': 'changed'}],
'RFN': ['All'],
'RIT': [{'ID': 'All', 'IDT': 'All'}]}
I have the following sample list of dictionaries and I would like to replace any . in the dictionary with a _, so the list would look like the list below.
I tried using replace but get the following error:
dict object has no attribute 'replace'
if I try something like this:
orig = [
{
"health": "good",
"status": "up",
"date": "2022.03.10",
"device.id": "device01"
},
{
"health": "poor",
"status": "down",
"date": "2022.03.10",
"device.id": "device02"
}
]
length = len(orig)
for i in range(length):
orig[i].replace(".", "_")
Current list:
[
{
"health": "good",
"status": "up",
"date": "2022.03.10",
"device.id": "device01"
},
{
"health": "poor",
"status": "down",
"date": "2022.03.10",
"device.id": "device02"
}
]
The new list should look like this:
[
{
"health": "good",
"status": "up",
"date": "2022_03_10",
"device_id": "device01"
},
{
"health": "poor",
"status": "down",
"date": "2022_03_10",
"device_id": "device02"
}
]
I don't understand how what you're trying would even run. For the line orig[i].replace(".", "_"), orig[i] will be a dict, and since a dict has no replace() method, you'll get an error trying to execute this line.
You need to be working on additional level down, operating on each of the key/value pairs in each dict. Here's one solution:
orig= [{"health": "good", "status": "up", "date":"2022.03.10","device.id":"device01"}, {"health": "poor", "status": "down", "date":"2022.03.10","device.id":"device02"}]
result = []
for inner_dict in orig:
new_inner = {}
for k, v in inner_dict.items():
new_inner[k.replace('.', '_')] = v.replace('.', '_')
result.append(new_inner)
print(result)
If the keys didn't need to change, it would be simpler (see the other two answers that don't get it right). You then wouldn't have to create a new structure, but could just work on the values within the existing structure. But since the keys will also change, it's easiest just to build a new result from scratch, like this shows.
Result:
[{'health': 'good', 'status': 'up', 'date': '2022_03_10', 'device_id': 'device01'}, {'health': 'poor', 'status': 'down', 'date': '2022_03_10', 'device_id': 'device02'}]
Try this:
orig = list(map(lambda item: dict((k.replace('.', '_'), v.replace('.', '_')) for k, v in item.items()), orig))
The output should be your want.
Basically, the original data is a list of dict, and the target is to normalize(replace . -> _) each key and value in the dict.
So the inner transformation is using a dict() to produce a new dict from the original one, dict((k.replace('.', '_'), v.replace('.', '_')) for k, v in item.items())
And for the outer part is a pythonic map operation for iterating a list
Actually, #CryptoFool's answer should be more clear for beginners.
The answer by #CryptoFool seems like the one you want. A slightly more blunt force answer might be to just work with stings.
import json
orig= [
{"health": "good", "status": "up", "date":"2022.03.10","device.id":"device01"},
{"health": "poor", "status": "down", "date":"2022.03.10","device.id":"device02"}
]
orig_new = json.loads(json.dumps(orig).replace(".","_"))
print(orig_new)
That will give you :
[
{'health': 'good', 'status': 'up', 'date': '2022_03_10', 'device_id': 'device01'},
{'health': 'poor', 'status': 'down', 'date': '2022_03_10', 'device_id': 'device02'}
]
The following seems to do the trick:
def convert(list_dict, old_text, new_text):
def replace_dict(old_dict, old_text, new_text):
return {key.replace(old_text, new_text) : val.replace(old_text, new_text) for key, val in old_dict.items()}
for i in range(len(list_dict)):
list_dict[i] = replace_dict(list_dict[i], old_text, new_text)
orig= [{"health": "good", "status": "up", "date":"2022.03.10","device.id":"device01"}, {"health": "poor", "status": "down", "date":"2022.03.10","device.id":"device02"}]
convert(orig, '.', '-')
print(orig)
Basically, it modifies the old dictionary in-place but creates replacement dictionaries for each element.
you need to iterate in list of dictionaries or change format:
# first solution
new_list_of_dictionaries = []
for dictionary in orig:
new_dictionary = {}
for k, v in dictionary.items():
new_dictionary[k.replace(".", "_")] = v.replace(".", "_")
new_list_of_dictionaries.append(new_dictionary)
orig = new_list_of_dictionaries
# second_solution
import json
orig = json.loads(json.dumps(orig).replace(".", "_"))
I'm comparing json files between two different API endpoints to see which json records need an update, which need a create and what needs a delete. So, by comparing the two json files, I want to end up with three json files, one for each operation.
The json at both endpoints is structured like this (but they use different keys for same sets of values; different problem):
{
"records": [{
"id": "id-value-here",
"c": {
"d": "eee"
},
"f": {
"l": "last",
"f": "first"
},
"g": ["100", "89", "9831", "09112", "800"]
}, {
…
}]
}
So the json is represented as a list of dictionaries (with further nested lists and dictionaries).
If a given json endpoint (j1) id value ("id":) exists in the other endpoint json (j2), then that record should be added to j_update.
So far I have something like this, but I can see that .values() doesn't work because it's trying to operate on the list instead of on all the listed dictionaries(?):
j_update = {r for r in j1['records'] if r['id'] in
j2.values()}
This doesn't return an error, but it creates an empty set using test json files.
Seems like this should be simple, but tripping over the nesting I think of dictionaries in a list representing the json. Do I need to flatten j2, or is there a simpler dictionary method python has to achieve this?
====edit j1 and j2====
have same structure, use different keys; toy data
j1
{
"records": [{
"field_5": 2329309841,
"field_12": {
"email": "cmix#etest.com"
},
"field_20": {
"last": "Mixalona",
"first": "Clara"
},
"field_28": ["9002329309999", "9002329309112"],
"field_44": ["1002329309832"]
}, {
"field_5": 2329309831,
"field_12": {
"email": "mherbitz345#test.com"
},
"field_20": {
"last": "Herbitz",
"first": "Michael"
},
"field_28": ["9002329309831", "9002329309112", "8002329309999"],
"field_44": ["1002329309832"]
}, {
"field_5": 2329309855,
"field_12": {
"email": "nkatamaran#test.com"
},
"field_20": {
"first": "Noriss",
"last": "Katamaran"
},
"field_28": ["9002329309111", "8002329309112"],
"field_44": ["1002329309877"]
}]
}
j2
{
"records": [{
"id": 2329309831,
"email": {
"email": "mherbitz345#test.com"
},
"name_primary": {
"last": "Herbitz",
"first": "Michael"
},
"assign": ["8003329309831", "8007329309789"],
"hr_id": ["1002329309877"]
}, {
"id": 2329309884,
"email": {
"email": "yinleeshu#test.com"
},
"name_primary": {
"last": "Lee Shu",
"first": "Yin"
},
"assign": ["8002329309111", "9003329309831", "9002329309111", "8002329309999", "8002329309112"],
"hr_id": ["1002329309832"]
}, {
"id": 23293098338,
"email": {
"email": "amlouis#test.com"
},
"name_primary": {
"last": "Maxwell Louis",
"first": "Albert"
},
"assign": ["8002329309111", "8007329309789", "9003329309831", "8002329309999", "8002329309112"],
"hr_id": ["1002329309877"]
}]
}
If you read the json it will output a dict. You are looking for a particular key in the list of the values.
if 'records' in j2:
r = j2['records'][0].get('id', []) # defaults if id does not exist
It it prettier to do a recursive search but i dunno how you data is organized to quickly come up with a solution.
To give an idea for recursive search consider this example
def resursiveSearch(dictionary, target):
if target in dictionary:
return dictionary[target]
for key, value in dictionary.items():
if isinstance(value, dict):
target = resursiveSearch(value, target)
if target:
return target
a = {'test' : 'b', 'test1' : dict(x = dict(z = 3), y = 2)}
print(resursiveSearch(a, 'z'))
You tried:
j_update = {r for r in j1['records'] if r['id'] in j2.values()}
Aside from the r['id'/'field_5] problem, you have:
>>> list(j2.values())
[[{'id': 2329309831, ...}, ...]]
The id are buried inside a list and a dict, thus the test r['id'] in j2.values() always return False.
The basic solution is fairly simple.
First, create a set of j2 ids:
>>> present_in_j2 = set(record["id"] for record in j2["records"])
Then, rebuild the json structure of j1 but without the j1 field_5 that are not present in j2:
>>> {"records":[record for record in j1["records"] if record["field_5"] in present_in_j2]}
{'records': [{'field_5': 2329309831, 'field_12': {'email': 'mherbitz345#test.com'}, 'field_20': {'last': 'Herbitz', 'first': 'Michael'}, 'field_28': ['9002329309831', '9002329309112', '8002329309999'], 'field_44': ['1002329309832']}]}
It works, but it's not totally satisfying because of the weird keys of j1. Let's try to convert j1 to a more friendly format:
def map_keys(json_value, conversion_table):
"""Map the keys of a json value
This is a recursive DFS"""
def map_keys_aux(json_value):
"""Capture the conversion table"""
if isinstance(json_value, list):
return [map_keys_aux(v) for v in json_value]
elif isinstance(json_value, dict):
return {conversion_table.get(k, k):map_keys_aux(v) for k,v in json_value.items()}
else:
return json_value
return map_keys_aux(json_value)
The function focuses on dictionary keys: conversion_table.get(k, k) is conversion_table[k] if the key is present in the conversion table, or the key itself otherwise.
>>> j1toj2 = {"field_5":"id", "field_12":"email", "field_20":"name_primary", "field_28":"assign", "field_44":"hr_id"}
>>> mapped_j1 = map_keys(j1, j1toj2)
Now, the code is cleaner and the output may be more useful for a PUT:
>>> d1 = {record["id"]:record for record in mapped_j1["records"]}
>>> present_in_j2 = set(record["id"] for record in j2["records"])
>>> {"records":[record for record in mapped_j1["records"] if record["id"] in present_in_j2]}
{'records': [{'id': 2329309831, 'email': {'email': 'mherbitz345#test.com'}, 'name_primary': {'last': 'Herbitz', 'first': 'Michael'}, 'assign': ['9002329309831', '9002329309112', '8002329309999'], 'hr_id': ['1002329309832']}]}
Note: this is not a simple two-way map; the conversion is the important part.
I'm writing an application that will send and receive messages with a certain structure, which I must convert from and to an internal structure.
For example, the message:
{
"Person": {
"name": {
"first": "John",
"last": "Smith"
}
},
"birth_date": "1997.01.12",
"points": "330"
}
This must be converted to :
{
"Person": {
"firstname": "John",
"lastname": "Smith",
"birth": datetime.date(1997, 1, 12),
"points": 330
}
}
And vice-versa.
These messages have a lot of information, so I want to avoid having to manually write converters for both directions. Is there any way in Python to specify the mapping once, and use it for both cases?
In my research, I found an interesting Haskell library called JsonGrammar which allows for this (it's for JSON, but that's irrelevant for the case). But my knowledge of Haskell isn't good enough to attempt a port.
That's actually quite an interesting problem. You could define a list of transformation, for example in the form (key1, func_1to2, key2, func_2to1), or a similar format, where key could contain separators to indicate different levels of the dict, like "Person.name.first".
noop = lambda x: x
relations = [("Person.name.first", noop, "Person.firstname", noop),
("Person.name.last", noop, "Person.lastname", noop),
("birth_date", lambda s: datetime.date(*map(int, s.split("."))),
"Person.birth", lambda d: d.strftime("%Y.%m.%d")),
("points", int, "Person.points", str)]
Then, iterate the elements in that list and transform the entries in the dictionary according to whether you want to go from form A to B or vice versa. You will also need some helper function for accessing keys in nested dictionaries using those dot-separated keys.
def deep_get(d, key):
for k in key.split("."):
d = d[k]
return d
def deep_set(d, key, val):
*first, last = key.split(".")
for k in first:
d = d.setdefault(k, {})
d[last] = val
def convert(d, mapping, atob):
res = {}
for a, x, b, y in mapping:
a, b, f = (a, b, x) if atob else (b, a, y)
deep_set(res, b, f(deep_get(d, a)))
return res
Example:
>>> d1 = {"Person": { "name": { "first": "John", "last": "Smith" } },
... "birth_date": "1997.01.12",
... "points": "330" }
...
>>> print(convert(d1, relations, True))
{'Person': {'birth': datetime.date(1997, 1, 12),
'firstname': 'John',
'lastname': 'Smith',
'points': 330}}
Tobias has answered it quite well. If you are looking for a library that ensures the Model Transformation dynamically then you can explore the Python's Model transformation library PyEcore.
PyEcore allows you to handle models and metamodels (structured data model), and gives the key you need for building ModelDrivenEngineering-based tools and other applications based on a structured data model. It supports out-of-the-box:
Data inheritance,
Two-ways relationship management (opposite references),
XMI (de)serialization,
JSON (de)serialization etc
Edit
I have found something more interesting for you with example similar to yours, check out JsonBender.
import json
from jsonbender import bend, K, S
MAPPING = {
'Person': {
'firstname': S('Person', 'name', 'first'),
'lastname': S('Person', 'name', 'last'),
'birth': S('birth_date'),
'points': S('points')
}
}
source = {
"Person": {
"name": {
"first": "John",
"last": "Smith"
}
},
"birth_date": "1997.01.12",
"points": "330"
}
result = bend(MAPPING, source)
print(json.dumps(result))
Output:
{"Person": {"lastname": "Smith", "points": "330", "firstname": "John", "birth": "1997.01.12"}}
Here is my take on this (converter lambdas and dot-based notation idea taken from tobias_k):
import datetime
converters = {
(str, datetime.date): lambda s: datetime.date(*map(int, s.split("."))),
(datetime.date, str): lambda d: d.strftime("%Y.%m.%d"),
}
mapping = [
('Person.name.first', str, 'Person.firstname', str),
('Person.name.last', str, 'Person.lastname', str),
('birth_date', str, 'Person.birth', datetime.date),
('points', str, 'Person.points', int),
]
def covert_doc(doc, mapping, converters, inverse=False):
converted = {}
for keys1, type1, keys2, type2 in mapping:
if inverse:
keys1, type1, keys2, type2 = keys2, type2, keys1, type1
converter = converters.get((type1, type2), type2)
keys1 = keys1.split('.')
keys2 = keys2.split('.')
obj1 = doc
while keys1:
k, *keys1 = keys1
obj1 = obj1[k]
dict2 = converted
while len(keys2) > 1:
k, *keys2 = keys2
dict2 = dict2.setdefault(k, {})
dict2[keys2[0]] = converter(obj1)
return converted
# Test
doc1 = {
"Person": {
"name": {
"first": "John",
"last": "Smith"
}
},
"birth_date": "1997.01.12",
"points": "330"
}
doc2 = {
"Person": {
"firstname": "John",
"lastname": "Smith",
"birth": datetime.date(1997, 1, 12),
"points": 330
}
}
assert doc2 == covert_doc(doc1, mapping, converters)
assert doc1 == covert_doc(doc2, mapping, converters, inverse=True)
This nice things are that you can reuse converters (even to convert different document structures) and that you only need to define non-trivial conversions. The drawback is that, as it is, every pair of types must always use the same conversion (maybe it could be extended to add optional alternative conversions).
You can use lists to describe paths to values in objects with type converting functions, for example:
from_paths = [
(['Person', 'name', 'first'], None),
(['Person', 'name', 'last'], None),
(['birth_date'], lambda s: datetime.date(*map(int, s.split(".")))),
(['points'], lambda s: int(s))
]
to_paths = [
(['Person', 'firstname'], None),
(['Person', 'lastname'], None),
(['Person', 'birth'], lambda d: d.strftime("%Y.%m.%d")),
(['Person', 'points'], str)
]
and a little function to covert from and to (much like tobias suggests but without string separation and using reduce to get values from dict):
def convert(from_paths, to_paths, obj):
to_obj = {}
for (from_keys, convfn), (to_keys, _) in zip(from_paths, to_paths):
value = reduce(operator.getitem, from_keys, obj)
if convfn:
value = convfn(value)
curr_lvl_dict = to_obj
for key in to_keys[:-1]:
curr_lvl_dict = curr_lvl_dict.setdefault(key, {})
curr_lvl_dict[to_keys[-1]] = value
return to_obj
test:
from_json = '''{
"Person": {
"name": {
"first": "John",
"last": "Smith"
}
},
"birth_date": "1997.01.12",
"points": "330"
}'''
>>> obj = json.loads(from_json)
>>> new_obj = convert(from_paths, to_paths, obj)
>>> new_obj
{'Person': {'lastname': u'Smith',
'points': 330,
'birth': datetime.date(1997, 1, 12), 'firstname': u'John'}}
>>> convert(to_paths, from_paths, new_obj)
{'birth_date': '1997.01.12',
'Person': {'name': {'last': u'Smith', 'first': u'John'}},
'points': '330'}
>>>
I have a list of dict where a particular value is repeated multiple times, and I would like to remove the duplicate values.
My list:
te = [
{
"Name": "Bala",
"phone": "None"
},
{
"Name": "Bala",
"phone": "None"
},
{
"Name": "Bala",
"phone": "None"
},
{
"Name": "Bala",
"phone": "None"
}
]
function to remove duplicate values:
def removeduplicate(it):
seen = set()
for x in it:
if x not in seen:
yield x
seen.add(x)
When I call this function I get generator object.
<generator object removeduplicate at 0x0170B6E8>
When I try to iterate over the generator I get TypeError: unhashable type: 'dict'
Is there a way to remove the duplicate values or to iterate over the generator
You can easily remove duplicate keys by dictionary comprehension, since dictionary does not allow duplicate keys, as below-
te = [
{
"Name": "Bala",
"phone": "None"
},
{
"Name": "Bala",
"phone": "None"
},
{
"Name": "Bala",
"phone": "None"
},
{
"Name": "Bala",
"phone": "None"
},
{
"Name": "Bala1",
"phone": "None"
}
]
unique = { each['Name'] : each for each in te }.values()
print unique
Output-
[{'phone': 'None', 'Name': 'Bala1'}, {'phone': 'None', 'Name': 'Bala'}]
Because you can't add a dict to set. From this question:
You're trying to use a dict as a key to another dict or in a set. That does not work because the keys have to be hashable.
As a general rule, only immutable objects (strings, integers, floats, frozensets, tuples of immutables) are hashable (though exceptions are possible).
>>> foo = dict()
>>> bar = set()
>>> bar.add(foo)
Traceback (most recent call last):
File "<input>", line 1, in <module>
TypeError: unhashable type: 'dict'
>>>
Instead, you're already using if x not in seen, so just use a list:
>>> te = [
... {
... "Name": "Bala",
... "phone": "None"
... },
... {
... "Name": "Bala",
... "phone": "None"
... },
... {
... "Name": "Bala",
... "phone": "None"
... },
... {
... "Name": "Bala",
... "phone": "None"
... }
... ]
>>> def removeduplicate(it):
... seen = []
... for x in it:
... if x not in seen:
... yield x
... seen.append(x)
>>> removeduplicate(te)
<generator object removeduplicate at 0x7f3578c71ca8>
>>> list(removeduplicate(te))
[{'phone': 'None', 'Name': 'Bala'}]
>>>
You can still use a set for duplicate detection, you just need to convert the dictionary into something hashable such as a tuple. Your dictionaries can be converted to tuples by tuple(d.items()) where d is a dictionary. Applying that to your generator function:
def removeduplicate(it):
seen = set()
for x in it:
t = tuple(x.items())
if t not in seen:
yield x
seen.add(t)
>>> for d in removeduplicate(te):
... print(d)
{'phone': 'None', 'Name': 'Bala'}
>>> te.append({'Name': 'Bala', 'phone': '1234567890'})
>>> te.append({'Name': 'Someone', 'phone': '1234567890'})
>>> for d in removeduplicate(te):
... print(d)
{'phone': 'None', 'Name': 'Bala'}
{'phone': '1234567890', 'Name': 'Bala'}
{'phone': '1234567890', 'Name': 'Someone'}
This provides faster lookup (avg. O(1)) than a "seen" list (O(n)). Whether it is worth the extra computation of converting every dict into a tuple depends on the number of dictionaries that you have and how many duplicates there are. If there are a lot of duplicates, a "seen" list will grow quite large, and testing whether a dict has already been seen could become an expensive operation. This might justify the tuple conversion - you would have to test/profile it.
I just use md5 to compare everything.
filtered_json = []
md5_list = []
for item in json_fin:
md5_result = hashlib.md5(json.dumps(item, separators=(',', ':')).encode("utf-8")).hexdigest()
if md5_result not in md5_list:
md5_list.append(md5_result)
filtered_json.append(item)