Limit number of items / length of json for logging - python

I am working on an API that returns JSON. I am logging my responses, and sometimes the JSON is just absurdly long and basically clogs my log files. Is there a neat way to reduce the length of a JSON, purely for visually logging the data? (not in effect in production)
The basic approach is to reduce arrays over a length of 5 to [first 2, "...", last 2], and dictionaries with more than 4 items to {first 4, "..." : "..."}
The code below is ugly. I am aware that it should be a recursive solution that reduces the items in the same way for a JSON of arbitrary depth - it currently only does so for depth 2.
def log_reducer(response_log):
original_response_log = response_log
try:
if type(response_log) == dict:
if len(response_log) >= 4: # {123456}
response_log = dict(list(response_log.items())[:4])
response_log.update({"...": "..."}) # {1234...}
for key, value in response_log.items():
if type(value) == list:
if len(value) >= 5: # {key:[123456]}
new_item = value[:2] + ['...'] + value[-2:] # {[12...56]}
response_log.update({key: new_item})
if type(value) == dict:
if len(value) >= 4: # {key:{123456}}
reduced_dict = dict(list(value.items())[:4])
reduced_dict.update({"...": "..."})
response_log.update({key: reduced_dict}) # {{1234...}}
elif type(response_log) == list:
if len(response_log) >= 5: # [123456]
response_log = response_log[:2] + ['...'] + response_log[-2:] # [12...56]
for inner_item in response_log:
if type(inner_item) == list:
if len(inner_item) >= 5: # [[123456]]
reduced_list = inner_item[:2] + ['...'] + inner_item[-2:] # [[12...56]]
response_log.remove(inner_item)
response_log.append(reduced_list)
if type(inner_item) == dict:
if len(inner_item) >= 4: # [{123456}]
reduced_dict = dict(list(inner_item.items())[:4])
reduced_dict.update({"...": "..."}) # [{1234...}]
response_log.remove(inner_item)
response_log.append(reduced_dict)
except Exception as e:
return original_response_log
return response_log
The returned response_log is then logged with logger.info(str(response_log))
As you can see, the fact that there can be either arrays or dictionaries at every level makes this task a little more complex, and I am struggling to find a library or code snipped of any kind which would simplify this. If anyone wants to give it a shot, I would appreciate it a lot.
you can use a test JSON like this to see it in effect:
test_json = {"works": [1, 2, 3, 4, 5, 6],
"not_affected": [{"1": "1", "2": "2", "3": "3", "4": "4", "5": "5"}],
"1": "1", "2": "2", "3": "3",
"removed": "removed"
}
print("original", test_json)
reduced_log = log_reducer(test_json)
print("reduced", reduced_log)
print("original", test_json)
reduced_log = log_reducer([test_json]) # <- increases nesting depth
print("reduced", reduced_log)

This answer uses #calceamenta's idea, but implements the actual cutting-down logic:
def recursive_reduce(obj):
if isinstance(obj, (float, str, int, bool, type(None))):
return obj
if isinstance(obj, dict):
keys = list(sorted(obj))
obj['...'] = '...'
if len(keys) > 5:
new_keys = keys[:2] + ["..."] + keys[-2:]
else:
new_keys = keys
new_dict = {x:obj[x] for x in new_keys}
for k, v in new_dict.items():
new_dict[k] = recursive_reduce(v)
return new_dict
if isinstance(obj, list):
if len(obj) > 5:
new_list = obj[:2] + ["..."] + obj[-2:]
else:
new_list = obj
for i, v in enumerate(new_list):
new_list[i] = recursive_reduce(v)
return new_list
return str(obj)
test_json = {"works": [1, 2, 3, 4, 5, 6],
"not_affected": [{"1": "1", "2": "2", "3": "3", "4": "4", "5": "5"}],
"1": "1", "2": "2", "3": "3",
"removed": "removed"
}
print("original", test_json)
reduced_log = recursive_reduce(test_json)
print("reduced", reduced_log)
Output:
original {'works': [1, 2, 3, 4, 5, 6], 'not_affected': [{'1': '1', '2': '2', '3': '3', '4': '4', '5': '5'}], '1': '1', '2': '2', '3': '3', 'removed': 'removed'}
reduced {'1': '1', '2': '2', '...': '...', 'removed': 'removed', 'works': [1, 2, '...', 5, 6]}
Hope this helps :)

You can overwrite the string representation of dicts and lists in python using the def __str__(): method. Using this just recursively call the print function on all elements. It can have a simple boilerplate like this:
def custom_print(obj):
log_str = ''
if type(obj) == list:
for item in obj:
log_str += custom_print(item)
elif type(obj) == dict:
for k, item in obj.items():
custom_print(item)
Use this custom log function to print into your log file as per your log file format.

Related

Python: JSON, Problem with list of dictionaries, certain value into my dictionary

So, I have been thinking of this for a long time now, but can't seem to get it right. So I have to use a JSON file to make a dictionary where I get the keys: 'userIds' and the value 'completed' tasks in a dictionary. The best I got was the answer: {1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0, 10: 90}, with this code under:
import requests
response1 = requests.get("https://jsonplaceholder.typicode.com/todos")
data1 = response1.json()
dict1 = {}
keys = []
values = []
for user in data1:
if user not in keys or values:
keys.append(user['userId'])
values.append(0)
for key, value in zip(keys, values):
dict1[key] = value
for user in data1:
if user['completed'] == True:
dict1[key] += 1
print(dict1)
but I feel like this next code would be closer, but I can't figure out how to get it to work
import requests
response1 = requests.get("https://jsonplaceholder.typicode.com/todos")
data1 = response1.json()
dict1 = {}
keys = []
values = []
for user in data1:
if user not in keys or values:
keys.append(user['userId'])
values.append(0)
for key, value in zip(keys, values):
dict1[key] = value
for key, value in data1.items():
if user['completed'] == True:
dict1[key].update += 1
print(dict1)
After this, the output is just
" line 24, in
for key, value in data1.items():
AttributeError: 'list' object has no attribute 'items'",
And I do get why, I don't jsut know how to continue from here.
Would really appreciate anyones help, with this obnoxious task.
can u try this ?
import requests
response1 = requests.get("https://jsonplaceholder.typicode.com/todos")
data1 = response1.json()
dict1 = {}
keys = []
values = []
for user in data1:
if user not in keys or values:
keys.append(user['userId'])
values.append(0)
for key, value in zip(keys, values):
dict1[key] = value
print(data1)
for x in data1:
for key, value in x.items():
if key =="completed" :
if value == True:
dict1[x["userId"]] += 1
print(dict1)
Try with this approach:
import json
import requests
response1 = requests.get("https://jsonplaceholder.typicode.com/todos")
data1 = response1.json()
dict1 = {}
for user in data1:
uid = user['userId']
if user['completed']:
dict1[uid] = dict1.get(uid, 0) + 1
elif uid not in dict1:
dict1[uid] = 0
print(dict1)
print(json.dumps(dict1, indent=2))
If needed, you can also simplify the above logic using defaultdict, and leverage the fact that bool is a subclass of int:
from collections import defaultdict
dict1 = defaultdict(int)
for user in data1:
dict1[user['userId']] += user['completed']
Output:
{1: 11, 2: 8, 3: 7, 4: 6, 5: 12, 6: 6, 7: 9, 8: 11, 9: 8, 10: 12}
{
"1": 11,
"2": 8,
"3": 7,
"4": 6,
"5": 12,
"6": 6,
"7": 9,
"8": 11,
"9": 8,
"10": 12
}

Dictionary/recursive/ counting parts (exercise[LOGIC])

problem: create a recursive function that given an input key, would return the amount of basic components to build the given input key.
EX 1) input = "Engine"
output = Engine ==> metal: 3, rubber: 2
EX 2) input = "metal"
output = metal ==> metal: 1
EX 3) input = "piston"
output = piston ==> metal: 1, rubber: 1
car= {
"Engine" : ["pistons", "timing belt", "metal" ,"metal"],
"Pistons" : ["Metal", "rubber"],
"timing belt" : ["rubber"],
"metal" : [],
"rubber" : []
}
my code has different variable names and key name, but it's the same idea
parts = {
'A': ['B', 'B', 'C'],
'B': [],
'C': ['D','E','F'],
'D': [],
'E': ['B','D'],
'F': []
}
#above here its user input
counter_dictio = {
'A': [],
'B': [],
'C': [],
'D': [],
'E': [],
'F': []
}
def desamble(key, dictionary):
#check if array is empty
#ccounter +=1
if (len(dictionary[key])) == 0:
counter_dictio[key].append(key)
#if array is populated
#enter to this array
#desample(i, dictionary)
else:
for i in dictionary[key]:
desamble(i, dictionary)
key = "A"
desamble(key, parts)
One way to go is:
from collections import Counter
car= {
"engine": ["pistons", "timing belt", "metal", "metal"],
"pistons": ["metal", "rubber"],
"timing belt": ["rubber"],
"metal": [],
"rubber": []
}
def ingredients(key, dct):
if dct[key] == []:
yield key
else:
for sub_part in dct[key]:
yield from ingredients(sub_part, dct)
print(*ingredients('engine', car)) # metal rubber rubber metal metal
print(Counter(ingredients('engine', car))) # Counter({'metal': 3, 'rubber': 2})
ingredients makes a generator of ingredients, so you can use Counter to count them.
Here is code that will group by your components
from collections import defaultdict
parts = {
'A': ['B', 'B', 'C'],
'B': [],
'C': ['D', 'E', 'F'],
'D': [],
'E': ['B', 'D'],
'F': []
}
result = defaultdict(dict)
for k, v in parts.items():
row = result[k] # used to create empty dict on initial empty list
for item in v:
if row.get(item) is None:
row[item] = 1
else:
row[item] += 1
This will result in following dict
{'A': {'B': 2, 'C': 1}, 'B': {}, 'C': {'D': 1, 'E': 1, 'F': 1}, 'D': {}, 'E': {'B': 1, 'D': 1}, 'F': {}}
another solution without using recursively, and for a predetermined list would be:
#==== user input====
key = "E"
parts = {
'A': ['B', 'B', 'C'],
'B': [],
'C': ['D','E','F'],
'D': [],
'E': ['B','D'],
'F': []
}
#====end user input=====
#Function for the predetermined dictionary
def desamble(key, dictionary):
if key == "B" or key == "D" or key == "F":
print(key + "==> " + key + ": 1")
elif key == "E":
print(key + "==> " + "B: 1, D: 1" )
elif key == "C":
print(key + "==> " + "B: 1, D: 2, F: 1" )
elif key == "A":
print(key + "==> " + "B: 3, D: 2, F: 1" )
else:
print("Key " + key + " is not defined in dictionary")
#====end====
desamble(key, parts)
Another recursive way to solve this problem, adding a solution for the problem of circularity, meaning that in case that the parts call to eachother.
EX)
dictionary = {
"A": "B",
"B": "A"
}
from typing iport Counter
def func(key, dictionary, current=None):
if not current:
current = set() # could also be a list
if key in current:
raise ValueError # or whichever
if not dictionary.get(key):
return [key]
ret = []
for subkey in dictionary[key]:
ret.extend(func(subkey, dictionary, current.union({key})))
return ret
Print(Counter(func("A"), parts))
#by officerthegeeks

Keep of level of nested keys for recursive dict function?

I have a recursive function that grabs all keys from a python dictionary no matter what level of nesting. While this works great, I am trying to keep track of the level of nesting for each key at the same time. Some kind of counter, but not sure how to implement it. Below is what I have so far:
d = {"12": "a",
"3": "b",
"8": {
"12": "c",
"25": "d"
}
}
keys_list = []
def iterate(dictionary):
for key, value in dictionary.items():
if key not in keys_list:
keys_list.append(key)
if isinstance(value,dict):
iterate(value)
continue
iterate(d)
This returns:
keys_list = ['12', '3', '8', '25']
Right now the nested "12" is being ignored because it is already in the list, but I need some sort of unique identifier for the second 12 so it is included too. Any thoughts?
You could add a depth argument to your recursive function:
d = {"12": "a",
"3": "b",
"8": {
"12": "c",
"25": "d"
}
}
keys_list = []
def iterate(dictionary, depth=0):
for key, value in dictionary.items():
if key not in keys_list:
keys_list.append((key, depth))
if isinstance(value,dict):
depth += 1
iterate(value, depth)
continue
iterate(d)
print(keys_list)
Output:
[('12', 0), ('3', 0), ('8', 0), ('12', 1), ('25', 1)]
This gives a list of tuples, where the first value in each tuple is the key, and the second value is the depth.
EDIT
the code below should cover different cases more reliably than the code above (but I did change your iterate function somewhat):
d = {"12": "a",
"3": "b",
"8": {
"12": "c",
"25": "d"
},
"test":"a"
}
KEYS = []
DEPTH = 0 # keep counter updated globally too
def iterate(dictionary):
global DEPTH
for key, value in dictionary.items():
KEYS.append((key, DEPTH))
if isinstance(value, dict):
DEPTH += 1
iterate(value)
DEPTH = 0
iterate(d)
print(KEYS)
Output:
[('12', 0), ('3', 0), ('8', 0), ('12', 1), ('25', 1), ('test', 0)]

What can I do to speed up this code for a scrabble cheat?

I'm making a python script that accepts 7 letters and returns the highest scoring word along with all other possible words. At the moment it has a few "loops in loops" and others things that will slow down the process.
import json
#open file and read the words, output as a list
def load_words():
try:
filename = "dictionary_2.json"
with open(filename,"r") as english_dictionary:
valid_words = json.load(english_dictionary)
return valid_words
except Exception as e:
return str(e)
#make dictionary shorter as there will be maximum 7 letters
def quick():
s = []
for word in load_words():
if len(word)<7:
s.append(word)
return s
# takes letters from user and creates all combinations of the letters
def scrabble_input(a):
l=[]
for i in range(len(a)):
if a[i] not in l:
l.append(a[i])
for s in scrabble_input(a[:i] + a[i + 1:]):
if (a[i] + s) not in l:
l.append(a[i] + s)
return l
#finds all words that can be made with the input by matching combo's to the dictionary and returns them
def word_check(A):
words_in_dictionary = quick()
for word in scrabble_input(A):
if word in words_in_dictionary:
yield word
#gives each word a score
def values(input):
# scrabble values
score = {"a": 1, "c": 3, "b": 3, "e": 1, "d": 2, "g": 2,
"f": 4, "i": 1, "h": 4, "k": 5, "j": 8, "m": 3,
"l": 1, "o": 1, "n": 1, "q": 10, "p": 3, "s": 1,
"r": 1, "u": 1, "t": 1, "w": 4, "v": 4, "y": 4,
"x": 8, "z": 10}
word_total = 0
for word in word_check(input):
for i in word:
word_total = word_total + score[i.lower()]
yield (word_total, str(word))
word_total = 0
#prints the tuples that have (scrabble score, word used)
def print_words(a):
for i in values(a):
print i
#final line to run, prints answer
def answer(a):
print ('Your highest score is', max(values(a))[0], ', and below are all possible words:')
print_words(a)
answer(input("Enter your 7 letters"))
I have removed some of the for loops and have tried to make the json dictionary I found shorter by limiting it to 7 letter words max. I suppose I could do that initially so that it doesn't need to do that each time i run the script. Any other tips on how to speed it up?

Mapping of user accounts to dates and sorted list of dates return a mapping of the indexes the dates by user are in the sorted dates list efficently

I am duplicating Facebook's chat read receipt system. I wrote some basic code I think works. However my boss thinks it would be slow. I have no algorithms training. What is the most efficient way to return a mapping of indexes to numbers where the numbers are between two numbers in a sorted list and the index is the index of the first number in the between pair?
# Given say {"a": 3, "b": 10, "c": 7, "d": 19} and [1,5,15] return {0: ["a"], 1: ["b", "c"], 2: ["d"]}
def find_read_to(read_dates, message_dates):
read_indexes_to_user_ids = {}
for user_id in read_dates:
for i, date in enumerate(message_dates):
last_index = i + 1 == len(message_dates)
next_index = -1 if last_index else i + 1
if last_index or (read_dates[user_id] >= date and read_dates[user_id] < message_dates[next_index]):
if i in read_indexes_to_user_ids:
read_indexes_to_user_ids[i].append(user_id)
else:
read_indexes_to_user_ids[i] = [user_id]
break
return read_indexes_to_user_ids
find_read_to({"a": 3, "b": 10, "c": 7, "d": 19}, [1,5,15])
Version using bisect module
import bisect
def find_read_to(read_dates, message_dates):
read_indexes_to_user_ids = {}
user_ids, read_dates = zip(*read_dates.items())
def find_between(read_date):
answer = bisect.bisect_left(message_dates, read_date)
answer -= 1
if answer == -1:
return None
return answer
indexes_for_read_up_to = map(find_between, read_dates)
for i, index_for_read_up_to in enumerate(indexes_for_read_up_to):
user_id = user_ids[i]
if index_for_read_up_to is None:
continue
if index_for_read_up_to in read_indexes_to_user_ids:
read_indexes_to_user_ids[index_for_read_up_to].append(user_id)
else:
read_indexes_to_user_ids[index_for_read_up_to] = [user_id]
return read_indexes_to_user_ids
find_read_to({"a": 3, "b": 10, "c": 7, "d": 19}, [1,5,15])

Categories

Resources