Looping through all JSON children - python

Right now I have a for loop that looks one by one for whether the key value == a variable.
I'm doing this one by one by selecting the [0] and [1] index to get the first two children. There could be up to four children, is there a more efficient way to do this than elif?
# INITIALIZE NEW FILTERED DICTIONARY (RETAINING TOP LEVEL ITEMS)
newdata = OrderedDict({k:v for k,v in data.items() if k in ['stop_id', 'stop_name']})
newdata['mode'] = []
arrivalarray = []
# ITERATE CONDITIONALLY KEEPING NEEDED SECTIONS
for i in data['mode']:
if i['route'][0]['route_name'] == line:
if i['route'][0]['direction'][0]['direction_name'] == direction:
for s in i['route'][0]['direction'][0]['trip']:
arrivalarray.append(s['pre_away'])
elif i['route'][0]['direction'][1]['direction_name'] == direction:
for s in i['route'][0]['direction'][1]['trip']:
arrivalarray.append(s['pre_away'])

Well yes, you could use recursion instead of iteration and that is actually what DFS is.
def traverse_json(json, depth):
if depth = 0 :
return [];
else:
data = [];
for i in json.keys():
if isinstance(json[i], dict):
data += traverse_json(json[i], depth -1)
else :
data.append(json[i])
return data
You could start with the max depth you require.

Once you've loaded the JSON data, it's no longer JSON data. It's just a nested series of Python lists, dicts, strings, etc. As such, you can do what you'd do for any Python data structure, such as use a for loop to iterate over the elements of a list:
for d in i['route'][0]['direction']:
if d['direction_name'] == direction:
for s in d['trip']:
arrivalarray.append(s['pre_away'])

Related

How to detect last call of a recursive function?

I have a list of complex dictionaries like this:
data = [
{
"l_1_k": 1,
"l_1_ch": [
{
"l_2_k": 2,
"l_2_ch": [...more levels]
},
{
"l_2_k": 3,
"l_2_ch": [...more levels]
}
]
},
...more items
]
I'm trying to flatten this structure to a list of rows like this:
list = [
{ "l_1_k": 1, "l_2_k": 2, ... },
{ "l_1_k": 1, "l_2_k": 3, ... },
]
I need this list to build a pandas data frame.
So, I'm doing a recursion for each nesting level, and at the last level I'm trying to append to rows list.
def build_dict(d, row_dict, rows):
# d is the data dictionary at each nesting level
# row_dict is the final row dictionary
# rows is the final list of rows
for key, value in d.items():
if not isinstance(value, list):
row_dict[key] = value
else:
for child in value:
build_dict(child, row_dict, rows)
rows.append(row_dict) # <- How to detect the last recursion and call the append
I'm calling this function like this:
rows = []
for row in data:
build_dict(d=row, row_dict={}, rows=rows)
My question is how to detect the last call of this recursive function if I do not know how many nesting levels there are. With the current code, the row is duplicated at each nesting level.
Or, is there a better approach to obtain the final result?
After looking up some ideas, the solution I have in mind is this:
Declare the following function, taken from here:
def find_depth(d):
if isinstance(d, dict):
return 1 + (max(map(find_depth, d.values())) if d else 0)
return 0
In your function, increment every time you go deeper as follows:
def build_dict(d, row_dict, rows, depth=0):
# depth = 1 for the beginning
for key, value in d.items():
if not isinstance(value, list):
row_dict[key] = value
else:
for child in value:
build_dict(child, row_dict, rows, depth + 1)
Finally, test if you reach the maximum depth, if so, at the end of your function you can append it. You will need to add an extra variable which you will call:
def build_dict(d, row_dict, rows, max_depth, depth=0):
# depth = 1 for the beginning
for key, value in d.items():
if not isinstance(value, list):
row_dict[key] = value
else:
for child in value:
build_dict(child, row_dict, rows,max_depth, depth + 1)
if depth == max_depth:
rows.append(row_dict)
Call the function as:
build_dict(d=row, row_dict={}, rows=rows, max_depth=find_depth(data))
Do keep in mind since I don't have a data-set I can use, there might be a syntax error or two in there, but the approach should be fine.
I don't think it is good practice to try to play with mutable default argument in function prototype.
Also, I think that the function in the recursion loop should never be aware of the level it is in. That's the point of the recursion. Instead, you need to think about what the function should return, and when it should exit the recursion loop to climb back to the zeroth level. On the climb back, higher level function calls handle the return value of lower level function calls.
Here is the code that I think will work. I am not sure it is optimal, in term of computing time.
edit: fixed return list of dicts instead of dict only
def build_dict(d):
"""
returns a list when there is no lowerlevel list of dicts.
"""
lst = []
for key, value in d.items():
if not isinstance(value, list):
lst.append([{key: value}])
else:
lst_lower_levels = []
for child in value:
lst_lower_levels.extend(build_dict(child))
new_lst = []
for elm in lst:
for elm_ll in lst_lower_levels:
lst_of_dct = elm + elm_ll
new_lst.append([{k: v for d in lst_of_dct for k, v in d.items()}])
lst = new_lst
return lst
rows = []
for row in data:
rows.extend(build_dict(d=row))

In Python, How to assign value of variable to the dictionary, where the variable will keep getting values for each iteration

Ex:
for x in myresult:
y=str(x)
if y.startswith('(') and y.endswith(')'):
y = y[2:-3]
y=y.replace("\\","").replace(";",'')
chr_num = y.find("property_name")
chr_num=chr_num+15
PropertyName = y[chr_num:-1]
chr_num1 = y.find("phrase_value")
chr_num1 = chr_num1 + 14
chr_num2 = y.find("where")
chr_num2=chr_num2-2
PhraseValue = y[chr_num1:chr_num2]
This is the existing code. Now i want to store 'PhraseValue' in dictionary or array.
NOTE: PhraseValue will keep getting values for each iteraction
This is a very basic question. In your case, obviously, PropertyName and PhraseValue are overwritten on each iteration and contains only the last values at the end of the loop.
If you want to store multiple values, the easiest structure is a list.
ret = [] # empty list
for x in some_iterator():
y = some_computation(x)
ret.append(y) # add the value to the list
# ret contains all the y's
If you want to use a dict, you have to compute a key and a value:
ret = {} # empty dict
for x in some_iterator():
y = some_computation(x)
k = the_key(x) # maybe k = x
ret[k] = y # map k to y
# ret contains all the k's and their mapped values.
The choice between a list and a dict depends on your specific problem: use a dict if you want to find values by key, like in a dictionary; use a list if you need ordered values.
Assuming that PropertyName is the key, then you could simply add
results = {}
before the loop, and
results[PropertyName] = PhraseValue
as the last line of the if statement inside the loop.
This solution does have one problem. What if a given PropertyName occurs more than once? The above solution would only keep the last found value.
If you want to keep all values, you can use collections.defaultdict;
import collections
results = collections.defaultdict(list)
Then as the last line of the if statement inside the loop;
results[PropertyName].append(PhraseValue)

How to iterate over nested data when there is no reliable order but need of accessing and checking all elements of the lowest level?

I came across this question in a very specific context but I soon realized that it has a quite general relevance.
FYI: I'm getting data from a framework and at a point I have transformed it into a list of unordered pairs (could be list of lists or tupels of any size as well but atm. I have 100% pairs). In my case these pairs are representing relationships between data objects and I want to refine my data.
I have a list of unordered tupels and want a list of objects or in this case a dict of dicts. If the same letter indicates the same class and differing numbers indicate different instances I want to accomplish this transformation:
[(a1, x1), (x2, a2), (y1, a2), (y1, a1)] -> {a1:{"y":y1,"x":x1},a2:{"y":y1,"x":x2}}
Note that there can be many "a"s that are connected to the same "x" or "y" but every "a" has at most one "x" or "y" each and that I can't rely on neither the order of the tupels nor the order of the tupel's elements (because the framework does not make a difference between "a" and "x") and I obviously don't care about the order of elements in my dicts - I just need the proper relations. There are many other pairs I don't care about and they can contain "a" elements, "y" elements or "x" elements as well
So the main question is "How to iterate over nested data when there is no reliable order but a need of accessing and checking all elements of the lowest level?"
I tried it in several ways but they don't seem right. For simplicity I just check for A-X pairs here:
def first_draft(list_of_pairs):
result = {}
for pair in list_of_pairs:
if pair[0].__cls__ is A and pair[1].__class__ is X:
result[pair[0]] = {"X": pair[1]}
if pair[0].__cls__ is X and pair[1].__class__ is A:
result[pair[1]] = {"X": pair[0]}
return result
def second_draft(list_of_pairs):
result = {}
for pair in list_of_pairs:
for index, item in enumerate(pair):
if item.__cls__ is A:
other_index = (index + 1) % 2
if pair[other_index].__class__ is X:
result[item] = {"X":pair[other_index]}
return result
def third_draft(list_of_pairs):
result = {}
for pair in list_of_pairs:
for item in pair:
if item.__class__ is A:
for any_item in pair:
if any_item.__class__ is X:
result[item] = {"X":any_item}
return result
The third draft actually works for every size of sub lists and got rid of any non pythonic integer usage but iterating over the same list while iterating over itself? And quintuple nesting for just one line of code? That does not seem right to me and I learned "When there is a problem according to iteration in python and you don't know a good solution - there is a great solution in itertools!" - I just didn't find one.
Does someone now a buildin that can help me or simply a better way to implement my methods?
You can do something like this with strings:
l = [('a1', 'x1','z3'), ('x2', 'a2'), ('y1', 'a2'), ('y1', 'a1')]
res = {}
for tup in l:
main_class = ""
sub_classes = ""
for item in tup:
if item.startswith('a'):
main_class = item
sub_classes = list(tup)
sub_classes.remove(main_class)
if not main_class in res:
res[main_class] = {}
for item in sub_classes:
res[main_class][item[0]] = item[-1]
If your objects aren't strings, you just need to change if a.startswith('a'): to something that determines whether the first item in your pair should be the key or not.
This also handles tuples greater than length two. It iterates each tuple, finding the "main class", and then removes it from a list version of the tuple (so that the new list is all the sub classes).
Looks like Ned Batchelder (who said that every time one have a problem with iterables and don't think there is a nice solution in Python there is a solution in itertools) was right. I finally found a solution I overlooked last time: the permutations method
def final_draft(list_of_pairs):
result = {}
for pair in list_of_pairs:
for permutation in permutations(pair):
if permutation[0].__class__ is A:
my_a = permutation[0]
if permutation[1].__class__ is X:
my_x = permutation[1]
if my_a not in result:
result[my_a] = {}
result[my_a]["key for X"] = my_x
return result
I still have quintuple nesting because I added a check if the key exists (so my original drafts would have sextuple nesting and two productive lines of code) but I got rid of the double iteration over the same iterable and have both minimal index usage and the possibility of working with triplets in the future.
One could avoid the assignments but I prefere "my_a" before permutation[0]

Python nested lists within dict element

I'm trying to create the following datastructure (list containing several lists) within a shared dict:
{'my123': [['TEST'],['BLA']]}
code:
records = manager.dict({})
<within some loop>
dictkey = "my123"
tempval = "TEST" # as an example, gets new values with every loop
list = []
list.append(tempval)
if dictkey not in records.keys():
records[dictkey] = [list]
else:
records[dictkey][0].append([tempval])
The first list within the dict element 'my123' gets populated with "TEST", but when I loop a second time (where tempval is "BLA"), the list doesn't get nested.
Instead I'm getting:
{'my123': [['TEST']]}
What am I doing wrong in the else statement?
Edit:
Have modified the code, but still doesn't get added:
records = manager.dict({})
<within some loop>
dictkey = "my123"
tempval = "TEST" # as an example, gets new values with every loop
list = []
list.append(tempval)
if dictkey == "my123":
print tempval # prints new values with every loop to make sure
if dictkey not in records.keys():
records[dictkey] = [list]
else:
records[dictkey].append([list])
Remove the [0] part from the last line. The value in the dictionary is already a list. It is that list you wish to append the second list (['BLA']) to.
You're almost there. You will want to append the list like so:
records = manager.dict({})
# within some loop
dictkey = "my123"
tempval = "TEST" # as an example, gets new values with every loop
temp_list = [tempval] # holds a list of value
if dictkey not in records:
records[dictkey] = [temp_list]
else:
records[dictkey].append(temp_list) # append list of value
I've found the solution. Looks like the append in the else statement doesn't work for class multiprocessing.managers.DictProxy.
I've modified the else statement and now it's working.
records = manager.dict({})
< within some loop >
dictkey = "my123"
tempval = "TEST" # as an example, gets new values with every loop
temp_list = [tempval] # holds a list of value
if dictkey not in records:
records[dictkey] = [temp_list]
else:
records[dictkey] = records.get(dictkey, []) + [temp_list]
Thanks everyone for your help!

List index out of range

I'm trying to create my own Hash data structure in python. In __init__ I initialize a list (m_list) of size m and in another function I add hashes to it from my hash function.
I'm now trying to search through the list, looking for value k. I'm getting a list index out of range error on the if self.m_list[i] == k: line.
class Hash:
def __init__ (self, n, m, m_list=None):
self.n = n
self.m = m
self.a = choice(range(1, n))
self.b = choice(range(n))
if m_list is None:
m_list = []
self.m_list = m_list * m
def search(self, k):
found = False
for i in self.m_list:
if i is not None and found is False:
if self.m_list[i] == k:
found = True
if found:
print True
else:
print False
I created m_list using guidelines from Create an empty list in python with certain size
There are multiple problems with this code:
1) Indexing a list with its own contents.
for i in self.m_list:
when you loop on a list in python using this syntax, the value in the variable (i) is the value from in the list, not the index for that iteration.
There are two choices of ways to solve this. If you, for some reason need to have the index, you can loop by using the range function to create the indices and loop over them, like so:
for i in range(len(self.m_list)):
if not found and self.m_list[i] == k:
found = True
Or you can just use python's native iteration over the contents of the list:
for item in self.m_list:
if not found and item == k:
found = True
Another option, if you want easy access to both the index and the value is to use enumerate. enumerate returns tuples containing the index of the value and the value itself, so you can use python's multi-assignment to have access to both:
for i, val in enumerate(self.m_list):
if val == k:
...
if i == some_index
...
The original code will only ever return true if m_list[i] == i == k, so if you indented to check that this condition held, you could just check m_list[k] == k.
2) As noted in Peter's answer, [] * m always gives [], so no matter what the indexes being provided are, the list will have zero length and therefore any index will be out of range. To get a list with length m, you need to have one element in the list to duplicate. You can use None or 0 as that value: [0] * m gives a list of m zeroes, and [None] * m gives a list of m none values.
You are not creating a list of size m. [] * m gives you [], as you can see in an interactive shell. The linked answers show how multiplying a list will shallow copy the contents of the list m times, but of course [] has no contents to copy. Try if m_list is None: m_list = [None] * m or something similar.
Your search method makes no sense to me (there are better ways to store just the existence of integers) but that's a separate problem.

Categories

Resources