Using .values() with list of dictionaries? - python

I'm comparing json files between two different API endpoints to see which json records need an update, which need a create and what needs a delete. So, by comparing the two json files, I want to end up with three json files, one for each operation.
The json at both endpoints is structured like this (but they use different keys for same sets of values; different problem):
{
"records": [{
"id": "id-value-here",
"c": {
"d": "eee"
},
"f": {
"l": "last",
"f": "first"
},
"g": ["100", "89", "9831", "09112", "800"]
}, {
…
}]
}
So the json is represented as a list of dictionaries (with further nested lists and dictionaries).
If a given json endpoint (j1) id value ("id":) exists in the other endpoint json (j2), then that record should be added to j_update.
So far I have something like this, but I can see that .values() doesn't work because it's trying to operate on the list instead of on all the listed dictionaries(?):
j_update = {r for r in j1['records'] if r['id'] in
j2.values()}
This doesn't return an error, but it creates an empty set using test json files.
Seems like this should be simple, but tripping over the nesting I think of dictionaries in a list representing the json. Do I need to flatten j2, or is there a simpler dictionary method python has to achieve this?
====edit j1 and j2====
have same structure, use different keys; toy data
j1
{
"records": [{
"field_5": 2329309841,
"field_12": {
"email": "cmix#etest.com"
},
"field_20": {
"last": "Mixalona",
"first": "Clara"
},
"field_28": ["9002329309999", "9002329309112"],
"field_44": ["1002329309832"]
}, {
"field_5": 2329309831,
"field_12": {
"email": "mherbitz345#test.com"
},
"field_20": {
"last": "Herbitz",
"first": "Michael"
},
"field_28": ["9002329309831", "9002329309112", "8002329309999"],
"field_44": ["1002329309832"]
}, {
"field_5": 2329309855,
"field_12": {
"email": "nkatamaran#test.com"
},
"field_20": {
"first": "Noriss",
"last": "Katamaran"
},
"field_28": ["9002329309111", "8002329309112"],
"field_44": ["1002329309877"]
}]
}
j2
{
"records": [{
"id": 2329309831,
"email": {
"email": "mherbitz345#test.com"
},
"name_primary": {
"last": "Herbitz",
"first": "Michael"
},
"assign": ["8003329309831", "8007329309789"],
"hr_id": ["1002329309877"]
}, {
"id": 2329309884,
"email": {
"email": "yinleeshu#test.com"
},
"name_primary": {
"last": "Lee Shu",
"first": "Yin"
},
"assign": ["8002329309111", "9003329309831", "9002329309111", "8002329309999", "8002329309112"],
"hr_id": ["1002329309832"]
}, {
"id": 23293098338,
"email": {
"email": "amlouis#test.com"
},
"name_primary": {
"last": "Maxwell Louis",
"first": "Albert"
},
"assign": ["8002329309111", "8007329309789", "9003329309831", "8002329309999", "8002329309112"],
"hr_id": ["1002329309877"]
}]
}

If you read the json it will output a dict. You are looking for a particular key in the list of the values.
if 'records' in j2:
r = j2['records'][0].get('id', []) # defaults if id does not exist
It it prettier to do a recursive search but i dunno how you data is organized to quickly come up with a solution.
To give an idea for recursive search consider this example
def resursiveSearch(dictionary, target):
if target in dictionary:
return dictionary[target]
for key, value in dictionary.items():
if isinstance(value, dict):
target = resursiveSearch(value, target)
if target:
return target
a = {'test' : 'b', 'test1' : dict(x = dict(z = 3), y = 2)}
print(resursiveSearch(a, 'z'))

You tried:
j_update = {r for r in j1['records'] if r['id'] in j2.values()}
Aside from the r['id'/'field_5] problem, you have:
>>> list(j2.values())
[[{'id': 2329309831, ...}, ...]]
The id are buried inside a list and a dict, thus the test r['id'] in j2.values() always return False.
The basic solution is fairly simple.
First, create a set of j2 ids:
>>> present_in_j2 = set(record["id"] for record in j2["records"])
Then, rebuild the json structure of j1 but without the j1 field_5 that are not present in j2:
>>> {"records":[record for record in j1["records"] if record["field_5"] in present_in_j2]}
{'records': [{'field_5': 2329309831, 'field_12': {'email': 'mherbitz345#test.com'}, 'field_20': {'last': 'Herbitz', 'first': 'Michael'}, 'field_28': ['9002329309831', '9002329309112', '8002329309999'], 'field_44': ['1002329309832']}]}
It works, but it's not totally satisfying because of the weird keys of j1. Let's try to convert j1 to a more friendly format:
def map_keys(json_value, conversion_table):
"""Map the keys of a json value
This is a recursive DFS"""
def map_keys_aux(json_value):
"""Capture the conversion table"""
if isinstance(json_value, list):
return [map_keys_aux(v) for v in json_value]
elif isinstance(json_value, dict):
return {conversion_table.get(k, k):map_keys_aux(v) for k,v in json_value.items()}
else:
return json_value
return map_keys_aux(json_value)
The function focuses on dictionary keys: conversion_table.get(k, k) is conversion_table[k] if the key is present in the conversion table, or the key itself otherwise.
>>> j1toj2 = {"field_5":"id", "field_12":"email", "field_20":"name_primary", "field_28":"assign", "field_44":"hr_id"}
>>> mapped_j1 = map_keys(j1, j1toj2)
Now, the code is cleaner and the output may be more useful for a PUT:
>>> d1 = {record["id"]:record for record in mapped_j1["records"]}
>>> present_in_j2 = set(record["id"] for record in j2["records"])
>>> {"records":[record for record in mapped_j1["records"] if record["id"] in present_in_j2]}
{'records': [{'id': 2329309831, 'email': {'email': 'mherbitz345#test.com'}, 'name_primary': {'last': 'Herbitz', 'first': 'Michael'}, 'assign': ['9002329309831', '9002329309112', '8002329309999'], 'hr_id': ['1002329309832']}]}

Related

retrieving data from json with python

I have a nested JSON data. I want to get the value of key "name" inside the dictionary "value" based on the key "id" in "key" dictionary (let the user enter the id). I don't want to use indexing which, because places are changing on every url differently. Also data is large, so I need one row solution (without for loop).
Code
import requests, re, json
r = requests.get('https://www.trendyol.com/apple/macbook-air-13-m1-8gb-256gb-ssd-altin-p-67940132').text
json_data1 = json.loads(re.search(r"window.__PRODUCT_DETAIL_APP_INITIAL_STATE__=({.*}});window", r).group(1))
print(json_data1)
print('json_data1:',json_data1['product']['attributes'][0]['value']['name'])
Output
{'product': {'attributes': [{'key': {'name': 'İşlemci Tipi', 'id': 168}, 'value': {'name': 'Apple M1', 'id': 243383}, 'starred': True, 'description': '', 'mediaUrls': []}, {'key': {'name': 'SSD Kapasitesi', 'id': 249}..........
json_data1: Apple M1
JSON Data
{
"product": {
"attributes": [
{
"key": { "name": "İşlemci Tipi", "id": 168 },
"value": { "name": "Apple M1", "id": 243383 },
"starred": true,
"description": "",
"mediaUrls": []
},
{
"key": { "name": "SSD Kapasitesi", "id": 249 },
"value": { "name": "256 GB", "id": 3376 },
"starred": true,
"description": "",
"mediaUrls": []
},
.
.
.
]
}
}
Expected Output is getting value by key id: (type must be str)
input >> id: 168
output >> name: Apple M1
Since you originally didn't want a for loop, but now it's a matter of speed,
Here's a solution with for loop, you can test it and see if it's faster than the one you already had
import json
with open("file.json") as f:
data = json.load(f)
search_key = int(input("Enter id: "))
for i in range(0, len(data['product']['attributes'])):
if search_key == data['product']['attributes'][i]['key']['id']:
print(data['product']['attributes'][i]['value']['name'])
Input >> Enter id: 168
Output >> Apple M1
I found the solution with for loop. It works fast so I preferred it.
for i in json_data1['product']['attributes']:
cpu = list(list(i.values())[0].values())[1]
if cpu == 168:
print(list(list(i.values())[1].values())[0])
Iteration is unavoidable if the index is unknown, but the cost can be reduced substantially by using a generator expression and Python's built-in next function:
next((x["value"]["name"] for x in data["product"]["attributes"] if x["key"]["id"] == 168), None)
To verify that a generator expression is in fact faster than a for loop, here is a comparison of the running time of xFranko's solution and the above:
import time
def time_func(func):
def timer(*args):
time1 = time.perf_counter()
func(*args)
time2 = time.perf_counter()
return (time2 - time1) * 1000
return timer
number_of_attributes = 100000
data = {
"product": {
"attributes": [
{
"key": { "name": "İşlemci Tipi", "id": i },
"value": { "name": "name" + str(i), "id": 243383 },
"starred": True,
"description": "",
"mediaUrls": []
} for i in range(number_of_attributes)
]
}
}
def getName_generator(id):
return next((x["value"]["name"] for x in data["product"]["attributes"] if x["key"]["id"] == id), None)
def getName_for_loop(id):
return_value = None
for i in range(0, len(data['product']['attributes'])):
if id == data['product']['attributes'][i]['key']['id']:
return_value = data['product']['attributes'][i]['value']['name']
return return_value
print("Generator:", time_func(getName_generator)(0))
print("For loop:", time_func(getName_for_loop)(0))
print()
print("Generator:", time_func(getName_generator)(number_of_attributes - 1))
print("For loop:", time_func(getName_for_loop)(number_of_attributes - 1))
My results:
Generator: 0.0075999999999964984
For loop: 43.73920000000003
Generator: 23.633300000000023
For loop: 49.839699999999986
Conclusion:
For large data sets, a generator expression is indeed faster, even if it has to traverse the entire set.

Create partial dict from recursively nested field list

After parsing a URL parameter for partial responses, e.g. ?fields=name,id,another(name,id),date, I'm getting back an arbitrarily nested list of strings, representing the individual keys of a nested JSON object:
['name', 'id', ['another', ['name', 'id']], 'date']
The goal is to map that parsed 'graph' of keys onto an original, larger dict and just retrieve a partial copy of it, e.g.:
input_dict = {
"name": "foobar",
"id": "1",
"another": {
"name": "spam",
"id": "42",
"but_wait": "there is more!"
},
"even_more": {
"nesting": {
"why": "not?"
}
},
"date": 1584567297
}
should simplyfy to:
output_dict = {
"name": "foobar",
"id": "1",
"another": {
"name": "spam",
"id": "42"
},
"date": 1584567297,
}
Sofar, I've glanced over nested defaultdicts, addict and glom, but the mappings they take as inputs are not compatible with my list (might have missed something, of course), and I end up with garbage.
How can I do this programmatically, and accounting for any nesting that might occur?
you can use:
def rec(d, f):
result = {}
for i in f:
if isinstance(i, list):
result[i[0]] = rec(d[i[0]], i[1])
else:
result[i] = d[i]
return result
f = ['name', 'id', ['another', ['name', 'id']], 'date']
rec(input_dict, f)
output:
{'name': 'foobar',
'id': '1',
'another': {'name': 'spam', 'id': '42'},
'date': 1584567297}
here the assumption is that on a nested list the first element is a valid key from the upper level and the second element contains valid keys from a nested dict which is the value for the first element

Pull key from json file when values is known (groovy or python)

Is there any way to pull the key from JSON if the only thing I know is the value? (In groovy or python)
An example:
I know the "_number" value and I need a key.
So let's say, known _number is 2 and as an output, I should get dsf34f43f34f34f
{
"id": "8e37ecadf4908f79d58080e6ddbc",
"project": "some_project",
"branch": "master",
"current_revision": "3rtgfgdfg2fdsf",
"revisions": {
"43g5g534534rf34f43f": {
"_number": 3,
"created": "2019-04-16 09:03:07.459000000",
"uploader": {
"_account_id": 4
},
"description": "Rebase"
},
"dsf34f43f34f34f": {
"_number": 2,
"created": "2019-04-02 10:54:14.682000000",
"uploader": {
"_account_id": 2
},
"description": "Rebase"
}
}
}
With Groovy:
def json = new groovy.json.JsonSlurper().parse("x.json" as File)
println(json.revisions.findResult{ it.value._number==2 ? it.key : null })
// => dsf34f43f34f34f
Python 3: (assuming that data is saved in data.json):
import json
with open('data.json') as f:
json_data = json.load(f)
for rev, revdata in json_data['revisions'].items():
if revdata['_number'] == 2:
print(rev)
Prints all revs where _number equals 2.
using dict-comprehension:
print({k for k,v in d['revisions'].items() if v.get('_number') == 2})
OUTPUT:
{'dsf34f43f34f34f'}

How do i check for duplicate entries before i add an entry the dictionary

Given i have the following dictionary which stores key(entry_id),value(entry_body,entry_title) pairs.
"entries": {
"1": {
"body": "ooo",
"title": "jack"
},
"2": {
"body": "ooo",
"title": "john"
}
}
How do i check whether the title of an entry that i want to add to the dictionary already exists.
For example: This is the new entry that i want to add.
{
"body": "nnnn",
"title": "jack"
}
Have you thought about changing your data structure? Without context, the IDs of the entries seem a little useless. Your question suggests you only want to store unique titles, so why not make them your keys?
Example:
"entries": {
"jack": "ooo",
"john": "ooo"
}
That way you can do an efficient if newname in entries membership test.
EDIT:
Based on your comment you can still preserve the IDs by extending the data structure:
"entries": {
"jack": {
"body": "ooo",
"id": 1
},
"john": {
"body": "ooo",
"id": 2
}
}
I agree with #Christian König's answer, your data structure seems like it could be made clearer and more efficient. Still, if you need a solution to this setup in particular, here's one that will work - and it automatically adds new integer keys to the entries dict.
I've added an extra case to show both a rejection and an accepted update.
def existing_entry(e, d):
return [True for entry in d["entries"].values() if entry["title"] == e["title"]]
def update_entries(e, entries):
if not any(existing_entry(e, entries)):
current_keys = [int(x) for x in list(entries["entries"].keys())]
next_key = str(max(current_keys) + 1)
entries["entries"][next_key] = e
print("Updated:", entries)
else:
print("Existing entry found.")
update_entries(new_entry_1, data)
update_entries(new_entry_2, data)
Output:
Existing entry found.
Updated:
{'entries':
{'1': {'body': 'ooo', 'title': 'jack'},
'2': {'body': 'ooo', 'title': 'john'},
'3': {'body': 'qqqq', 'title': 'jill'}
}
}
Data:
data = {"entries": {"1": {"body": "ooo", "title": "jack"},"2": {"body": "ooo","title": "john"}}}
new_entry_1 = {"body": "nnnn", "title": "jack"}
new_entry_2 = {"body": "qqqq", "title": "jill"}
This should work?
entry_dict = {
"1": {"body": "ooo", "title": "jack"},
"2": {"body": "ooo", "title": "john"}
}
def does_title_exist(title):
for entry_id, sub_dict in entry_dict.items():
if sub_dict["title"] == title:
print("Title %s already in dictionary at entry %s" %(title, entry_id))
return True
return False
print("Does the title exist? %s" % does_title_exist("jack"))
As Christian Suggests above this seems like an inefficient data structure for the job. It seems like if you just need index ID's a list may be better.
I think to achieve this, one will have to traverse the dictionary.
'john' in [the_dict[en]['title'] for en in the_dict]

Bidirectional data structure conversion in Python

Note: this is not a simple two-way map; the conversion is the important part.
I'm writing an application that will send and receive messages with a certain structure, which I must convert from and to an internal structure.
For example, the message:
{
"Person": {
"name": {
"first": "John",
"last": "Smith"
}
},
"birth_date": "1997.01.12",
"points": "330"
}
This must be converted to :
{
"Person": {
"firstname": "John",
"lastname": "Smith",
"birth": datetime.date(1997, 1, 12),
"points": 330
}
}
And vice-versa.
These messages have a lot of information, so I want to avoid having to manually write converters for both directions. Is there any way in Python to specify the mapping once, and use it for both cases?
In my research, I found an interesting Haskell library called JsonGrammar which allows for this (it's for JSON, but that's irrelevant for the case). But my knowledge of Haskell isn't good enough to attempt a port.
That's actually quite an interesting problem. You could define a list of transformation, for example in the form (key1, func_1to2, key2, func_2to1), or a similar format, where key could contain separators to indicate different levels of the dict, like "Person.name.first".
noop = lambda x: x
relations = [("Person.name.first", noop, "Person.firstname", noop),
("Person.name.last", noop, "Person.lastname", noop),
("birth_date", lambda s: datetime.date(*map(int, s.split("."))),
"Person.birth", lambda d: d.strftime("%Y.%m.%d")),
("points", int, "Person.points", str)]
Then, iterate the elements in that list and transform the entries in the dictionary according to whether you want to go from form A to B or vice versa. You will also need some helper function for accessing keys in nested dictionaries using those dot-separated keys.
def deep_get(d, key):
for k in key.split("."):
d = d[k]
return d
def deep_set(d, key, val):
*first, last = key.split(".")
for k in first:
d = d.setdefault(k, {})
d[last] = val
def convert(d, mapping, atob):
res = {}
for a, x, b, y in mapping:
a, b, f = (a, b, x) if atob else (b, a, y)
deep_set(res, b, f(deep_get(d, a)))
return res
Example:
>>> d1 = {"Person": { "name": { "first": "John", "last": "Smith" } },
... "birth_date": "1997.01.12",
... "points": "330" }
...
>>> print(convert(d1, relations, True))
{'Person': {'birth': datetime.date(1997, 1, 12),
'firstname': 'John',
'lastname': 'Smith',
'points': 330}}
Tobias has answered it quite well. If you are looking for a library that ensures the Model Transformation dynamically then you can explore the Python's Model transformation library PyEcore.
PyEcore allows you to handle models and metamodels (structured data model), and gives the key you need for building ModelDrivenEngineering-based tools and other applications based on a structured data model. It supports out-of-the-box:
Data inheritance,
Two-ways relationship management (opposite references),
XMI (de)serialization,
JSON (de)serialization etc
Edit
I have found something more interesting for you with example similar to yours, check out JsonBender.
import json
from jsonbender import bend, K, S
MAPPING = {
'Person': {
'firstname': S('Person', 'name', 'first'),
'lastname': S('Person', 'name', 'last'),
'birth': S('birth_date'),
'points': S('points')
}
}
source = {
"Person": {
"name": {
"first": "John",
"last": "Smith"
}
},
"birth_date": "1997.01.12",
"points": "330"
}
result = bend(MAPPING, source)
print(json.dumps(result))
Output:
{"Person": {"lastname": "Smith", "points": "330", "firstname": "John", "birth": "1997.01.12"}}
Here is my take on this (converter lambdas and dot-based notation idea taken from tobias_k):
import datetime
converters = {
(str, datetime.date): lambda s: datetime.date(*map(int, s.split("."))),
(datetime.date, str): lambda d: d.strftime("%Y.%m.%d"),
}
mapping = [
('Person.name.first', str, 'Person.firstname', str),
('Person.name.last', str, 'Person.lastname', str),
('birth_date', str, 'Person.birth', datetime.date),
('points', str, 'Person.points', int),
]
def covert_doc(doc, mapping, converters, inverse=False):
converted = {}
for keys1, type1, keys2, type2 in mapping:
if inverse:
keys1, type1, keys2, type2 = keys2, type2, keys1, type1
converter = converters.get((type1, type2), type2)
keys1 = keys1.split('.')
keys2 = keys2.split('.')
obj1 = doc
while keys1:
k, *keys1 = keys1
obj1 = obj1[k]
dict2 = converted
while len(keys2) > 1:
k, *keys2 = keys2
dict2 = dict2.setdefault(k, {})
dict2[keys2[0]] = converter(obj1)
return converted
# Test
doc1 = {
"Person": {
"name": {
"first": "John",
"last": "Smith"
}
},
"birth_date": "1997.01.12",
"points": "330"
}
doc2 = {
"Person": {
"firstname": "John",
"lastname": "Smith",
"birth": datetime.date(1997, 1, 12),
"points": 330
}
}
assert doc2 == covert_doc(doc1, mapping, converters)
assert doc1 == covert_doc(doc2, mapping, converters, inverse=True)
This nice things are that you can reuse converters (even to convert different document structures) and that you only need to define non-trivial conversions. The drawback is that, as it is, every pair of types must always use the same conversion (maybe it could be extended to add optional alternative conversions).
You can use lists to describe paths to values in objects with type converting functions, for example:
from_paths = [
(['Person', 'name', 'first'], None),
(['Person', 'name', 'last'], None),
(['birth_date'], lambda s: datetime.date(*map(int, s.split(".")))),
(['points'], lambda s: int(s))
]
to_paths = [
(['Person', 'firstname'], None),
(['Person', 'lastname'], None),
(['Person', 'birth'], lambda d: d.strftime("%Y.%m.%d")),
(['Person', 'points'], str)
]
and a little function to covert from and to (much like tobias suggests but without string separation and using reduce to get values from dict):
def convert(from_paths, to_paths, obj):
to_obj = {}
for (from_keys, convfn), (to_keys, _) in zip(from_paths, to_paths):
value = reduce(operator.getitem, from_keys, obj)
if convfn:
value = convfn(value)
curr_lvl_dict = to_obj
for key in to_keys[:-1]:
curr_lvl_dict = curr_lvl_dict.setdefault(key, {})
curr_lvl_dict[to_keys[-1]] = value
return to_obj
test:
from_json = '''{
"Person": {
"name": {
"first": "John",
"last": "Smith"
}
},
"birth_date": "1997.01.12",
"points": "330"
}'''
>>> obj = json.loads(from_json)
>>> new_obj = convert(from_paths, to_paths, obj)
>>> new_obj
{'Person': {'lastname': u'Smith',
'points': 330,
'birth': datetime.date(1997, 1, 12), 'firstname': u'John'}}
>>> convert(to_paths, from_paths, new_obj)
{'birth_date': '1997.01.12',
'Person': {'name': {'last': u'Smith', 'first': u'John'}},
'points': '330'}
>>>

Categories

Resources