How to properly keep structure when removing keys in JSON using python? - python

I'm using this as a reference: Elegant way to remove fields from nested dictionaries
I have a large number of JSON-formatted data here and we've determined a list of unnecessary keys (and all their underlying values) that we can remove.
I'm a bit new to working with JSON and Python specifically (mostly did sysadmin work) and initially thought it was just a plain dictionary of dictionaries. While some of the data looks like that, several more pieces of data consists of dictionaries of lists, which can furthermore contain more lists or dictionaries with no specific pattern.
The idea is to keep the data identical EXCEPT for the specified keys and associated values.
Test Data:
to_be_removed = ['leecher_here']
easy_modo =
{
'hello_wold':'konnichiwa sekai',
'leeching_forbidden':'wanpan kinshi',
'leecher_here':'nushiyowa'
}
lunatic_modo =
{
'hello_wold':
{'
leecher_here':'nushiyowa','goodbye_world':'aokigahara'
},
'leeching_forbidden':'wanpan kinshi',
'leecher_here':'nushiyowa',
'something_inside':
{
'hello_wold':'konnichiwa sekai',
'leeching_forbidden':'wanpan kinshi',
'leecher_here':'nushiyowa'
},
'list_o_dicts':
[
{
'hello_wold':'konnichiwa sekai',
'leeching_forbidden':'wanpan kinshi',
'leecher_here':'nushiyowa'
}
]
}
Obviously, the original question posted there isn't accounting for lists.
My code, modified appropriately to work with my requirements.
from copy import deepcopy
def remove_key(json,trash):
"""
<snip>
"""
keys_set = set(trash)
modified_dict = {}
if isinstance(json,dict):
for key, value in json.items():
if key not in keys_set:
if isinstance(value, dict):
modified_dict[key] = remove_key(value, keys_set)
elif isinstance(value,list):
for ele in value:
modified_dict[key] = remove_key(ele,trash)
else:
modified_dict[key] = deepcopy(value)
return modified_dict
I'm sure something's messing with the structure since it doesn't pass the test I wrote since the expected data is exactly the same, minus the removed keys. The test shows that, yes it's properly removing the data but for the parts where it's supposed to be a list of dictionaries, it's only getting returned as a dictionary instead which will have unfortunate implications down the line.
I'm sure it's because the function returns a dictionary but I don't know to proceed from here in order to maintain the structure.
At this point, I'm needing help on what I could have overlooked.

When you go through your json file, you only need to determine whether it is a list, a dict or neither. Here is a recursive way to modify your input dict in place:
def remove_key(d, trash=None):
if not trash: trash = []
if isinstance(d,dict):
keys = [k for k in d]
for key in keys:
if any(key==s for s in trash):
del d[key]
for value in d.values():
remove_key(value, trash)
elif isinstance(d,list):
for value in d:
remove_key(value, trash)
remove_key(lunatic_modo,to_be_removed)
remove_key(easy_modo,to_be_removed)
Result:
{
"hello_wold": {
"goodbye_world": "aokigahara"
},
"leeching_forbidden": "wanpan kinshi",
"something_inside": {
"hello_wold": "konnichiwa sekai",
"leeching_forbidden": "wanpan kinshi"
},
"list_o_dicts": [
{
"hello_wold": "konnichiwa sekai",
"leeching_forbidden": "wanpan kinshi"
}
]
}
{
"hello_wold": "konnichiwa sekai",
"leeching_forbidden": "wanpan kinshi"
}

Related

Nested Dictionary (JSON): Merge multiple keys stored in a list to access its value from the dict

I have a JSON with an unknown number of keys & values, I need to store the user's selection in a list & then access the selected key's value; (it'll be guaranteed that the keys in the list are always stored in the correct sequence).
Example
I need to access the value_key1-2.
mydict = {
'key1': {
'key1-1': {
'key1-2': 'value_key1-2'
},
},
'key2': 'value_key2'
}
I can see the keys & they're limited so I can manually use:
>>> print(mydict['key1']['key1-1']['key1-2'])
>>> 'value_key1-2'
Now after storing the user's selections in a list, we have the following list:
Uselection = ['key1', 'key1-1', 'key1-2']
How can I convert those list elements into the similar code we used earlier?
How can I automate it using Python?
You have to loop the list of keys and update the "current value" on each step.
val = mydict
try:
for key in Uselection:
val = val[key]
except KeyError:
handle non-existing keys here
Another, more 'posh' way to do the same (not generally recommended):
from functools import reduce
val = reduce(dict.get, Uselection, mydict)

sort values from a dictionary/json file

I've got this discord.py command that makes a leaderboard from a json
cogs/coins.json (the dictionary) looks like this:
{
"781524858026590218": {
"name": "kvbot test platform",
"total_coins": 129,
"data": {
"564050979079585803": {
"name": "Bluesheep33",
"coins": 127
},
"528647474596937733": {
"name": "ACAT_",
"coins": 2
}
}
(The green strings with numbers in the json files are discord guild/member ids)
How do I make the code shorter and clearer?
Thanks for helping in advance, because I really don't know the solution
When it comes to finding (sorting) the first ten items within a dict, then the way is much easier than repeatedly going through the dict and doing different things there.
And little better code, like Dict.get for safety access.
Based on a sample of JSON data.
with open('cogs/coins.json', 'r') as f:
coins_data = json.load(f)
# Get is safefy access to dict
# Dict.items() returns pairs of (Key, Val)
members_coins = list(coins_data.get(str(ctx.guild.id), None)['data'].items())
if members_coins is None: # If data not found
await ctx.send('Not data')
return
# Sort list by Val part of pair, and `coins` key, reverse for descending
members_coins.sort(key=lambda x: x[1]['coins'], reverse=True)
output = ''
# list[:10] for first 10 items (if list is smaller, thats okay, python don't mind)
for member_id, vals in members_coins[:10]:
output += f'{vals["name"]}: {vals["coins"]}'
# output += f'<#{member_id}>: {vals["coins"]}' # If you want "mention" display of user
await ctx.send(output)

Merge deep JSON files in Python

I have two JSON files, one that contains a fully defined object with multiple levbels of nesting, the other contains a stripped back version of the same object that lists just elements that need to be changed
File 1 example
{
"toplevel": {
"value": {
"settings": [
{
"name": "A Default Value",
"region": "US",
"inner": {
"name": "Another Default",
"setting": "help"
}
}
]
}
}
}
File 2 example
{
"toplevel": {
"value": {
"settings": [
{
"name": "A Real Value",
"inner": {
"name": "Another Real Value",
}
}
]
}
}
}
I want to merge the updates from file 2 into file 1.
my output should look like
{
"toplevel": {
"value": {
"settings": [
{
"name": "A Real Value",
"region": "US",
"inner": {
"name": "Another Real Value",
"setting": "help"
}
}
]
}
}
}
so far I've tried
f1 = json_load(file1)
f2 = json_load(file2)
f1['toplevel']['value']['settings'][0].update(f2['toplevel']['value']['settings'][0].items())
it works perfectly for the top level items, but obviously it overwrites the whole of the "inner" object, removing the "setting" key inside it.
Is there a way to traverse the whole tree and replace only the non-dictionary values? I don't have access to external libraries other than json and collections (for the ordered dict)
It depends slightly on what you want
Solution 1
If you simply want to replace all values by the new dictionary, you can use the following options:
result = {**file_1, **file_2}
from pprint import pprint
pprint(result)
This will result in:
{'toplevel': {'value': {'settings': [{'inner': {'name': 'Another Real Value'},
'name': 'A Real Value'}]}}}
Alternatively you can use
file_1.update(file_2)
pprint(file_1)
Which will lead to the same outcome, but will update file_1 in place.
Solution 2
If you only want to update the specific key in the nesting, and leave all other values intact, you can do this using recursion. In your example you are using dict, list and str values. So I will build the recursion using the same types.
def update_dict(original, update):
for key, value in update.items():
# Add new key values
if key not in original:
original[key] = update[key]
continue
# Update the old key values with the new key values
if key in original:
if isinstance(value, dict):
update_dict(original[key], update[key])
if isinstance(value, list):
update_list(original[key], update[key])
if isinstance(value, (str, int, float)):
original[key] = update[key]
return original
def update_list(original, update):
# Make sure the order is equal, otherwise it is hard to compare the items.
assert len(original) == len(update), "Can only handle equal length lists."
for idx, (val_original, val_update) in enumerate(zip(original, update)):
if not isinstance(val_original, type(val_update)):
raise ValueError(f"Different types! {type(val_original)}, {type(val_update)}")
if isinstance(val_original, dict):
original[idx] = update_dict(original[idx], update[idx])
if isinstance(val_original, (tuple, list)):
original[idx] = update_list(original[idx], update[idx])
if isinstance(val_original, (str, int, float)):
original[idx] = val_update
return original
The above might be a bit harder to understand, but I will try to explain it.
There are two methods, one which will merge two dictionaries and one that tries to merge two lists.
Merging dictionaries
In order to merge the two dictionaries I go over all the keys and values of the update dictionary, because this will probably be the smaller of the two.
The first block puts new keys in the original dictionary, this is updating values that weren't in the original dictionary at the start.
The second block is updating the nested values. There I distinguish three cases:
If the value is another dict, run the dictionary merge again, but one level deeper.
If the value is a list (or tuple), run the list merge function.
If the value is a str (or int, float), replace the original value with the updated value.
Merging lists
This is a bit trickier than dictionaries, because lists do not have an order or keys that I can compare. Therefore I have to make a heavy assumption that the list updates will always contain the same elements, see limitations on how to handle lists with more than 1 element.
Since the lists are of the same length, I can assume that the indices of the lists are matching. Now in order to check if all the values are the same, we have to do the following:
Make sure that the value types are the same, otherwise we will throw an error since I am not sure how to handle that case.
If the values are dictionaries, use the merging of dictionaries.
If the values are list (or tuple) us the list merging.
If the values are str (or int, float), override the original in place.
Result
using:
from pprint import pprint
pprint(update_dict(file_1, file_2))
The final result will be:
{'toplevel': {'value': {'settings': [{'inner': {'name': 'Another Real Value',
'setting': 'help'},
'name': 'A Real Value',
'region': 'US'}]}}}
Note that in contrast with the first solution the values 'setting': 'help' and 'region': 'US'} are now still in the original dictionary.
Limitations
Due to the same length constraint, if you do not want to update an element in the list you have to pass the same element type, but empty.
Example on how to ignore a list update:
... {'settings': [
{} # do not update the first element.
{'name': 'A new name'} # update second element.
]
}

Iteration through nested JSON objects

I'm a beginner in Python pulling JSON data consisting of nested objects (dictionaries?). I'm trying to iterate through everything to locate a key all of them share, and select only the objects that have a specific value in that key. I spent days researching and applying and now everything is kind of blurring together in some mix of JS/Python analysis paralysis. This is the general format for the JSON data:
{
"things":{
"firstThing":{
"one":"x",
"two":"y",
"three":"z"
},
"secondThing":{
"one":"a",
"two":"b",
"three":"c"
},
"thirdThing":{
"one":"x",
"two":"y",
"three":"z"
}
}
}
In this example I want to isolate the dictionaries where two == y. I'm unsure if I should be using
JSON selection (things.things[i].two)
for loop through things, then things[i] looking for two
k/v when I have 3 sets of keys
Can anyone point me in the right direction ?
Assuming this is only ever one level deep (things), and you want a 'duplicate' of this dictionary with only the matching child dicts included, then you can do this with a dictionary comprehension:
data = {
"things":{
"firstThing":{
"one":"x",
"two":"y",
"three":"z"
},
"secondThing":{
"one":"a",
"two":"b",
"three":"c"
},
"thirdThing":{
"one":"x",
"two":"y",
"three":"z"
}
}
}
print({"things": {k:v for k, v in data['things'].items() if 'two' in v and v['two'] == 'y'}})
Since you've tagged this with python I assume you'd prefer a python solution. If you know that your 'two' key (whatever it is) is only present at the level of objects that you want, this might be a nice place for a recursive solution: a generator that takes a dictionary and yields any sub-dictionaries that have the correct key and value. This way you don't have to think too much about the structure of your data. Something like this will work, if you're using at least Python 3.3:
def findSubdictsMatching(target, targetKey, targetValue):
if not isinstance(target, dict):
# base case
return
# check "in" rather than get() to allow None as target value
if targetKey in target and targetKey[target] == targetValue:
yield target
else:
for key, value in target.items():
yield from findSubdictsMatching(value, targetKey, targetValue)
This code allows You to add objects with "two":"y" to list:
import json
m = '{"things":{"firstThing":{"one":"x","two":"y","three":"z"},"secondThing":{"one":"a","two":"b","three":"c"},"thirdThing":{"one":"x","two":"y","three":"z"}}}'
l = json.loads(m)
y_objects = []
for key in l["things"]:
l_2 = l["things"][key]
for key_1 in l_2:
if key_1 == "two":
if l_2[key_1] == 'y':
y_objects.append(l_2)
print(y_objects)
Console:
[{'one': 'x', 'two': 'y', 'three': 'z'}, {'one': 'x', 'two': 'y', 'three': 'z'}]

Python `dict` indexed by tuple: Getting a slice of the pie

Let's say I have
my_dict = {
("airport", "London"): "Heathrow",
("airport", "Tokyo"): "Narita",
("hipsters", "London"): "Soho"
}
What is an efficient (no scanning of all keys), yet elegant way to get all airports out of this dictionary, i.e. expected output ["Heathrow", "Narita"]. In databases that can index by tuples, it's usually possible to do something like
airports = my_dict.get(("airport",*))
(but usually only with the 'stars' sitting at the rightmost places in the tuple since the index usually is only stored in one order).
Since I imagine Python to index dictionary with tuple keys in a similar way (using the keys's inherent order), I imagine there might be a method I could use to slice the index this way?
Edit1: Added expected output
Edit2: Removed last phrase. Added '(no scanning of all keys)' to the conditions to make it clearer.
The way your data is currently organized doesn't allow efficient lookup - essentially you have to scan all the keys.
Dictionaries are hash tables behind the scenes, and the only way to access a value is to get the hash of the key - and for that, you need the whole key.
Use a nested hierarchy like this, so you can do a direct O(1) lookup:
my_dict = {
"airport": {
"London": "Heathrow",
"Tokyo": "Narita",
},
"hipsters": {
"London": "Soho"
}
}
Check "airport" is present in the every key in the dictionary.
Demo:
>>> [value for key, value in my_dict.items() if "airport" in key]
['Narita', 'Heathrow']
>>>
Yes, Nested dictionary will be better option.
>>> my_dict = {
... "airport": {
... "London": "Heathrow",
... "Tokyo": "Narita",
... },
... "hipsters": {
... "London": "Soho"
... }
... }
>>>
>>> if "airport" in my_dict:
... result = my_dict["airport"].values()
... else:
... result = []
...
>>> print result
['Heathrow', 'Narita']
>>>
What I'd like to avoid, if possible, is to go through all dictionary keys and filter them down.
Why? Why do you think Python is doing the equivalent of a DB full table scan? Filtering a dictionary does not mean sequential scanning it.
Python:
[value for key, value in my_dict.items() if key[0] == "airport"]
Output:
['Narita', 'Heathrow']

Categories

Resources