Related
I have a dictionary with thousands of elements of the following structure:
full_dict={'A': ['B','C','D','E'],
'B':['A','C','D','E','F','X']
'X':['W','Y','Z','S'],
'S':['W','K','T'],
...}
where every letter is a word.
Every word can be both a key and a value (together with other words) for another dict element.
I am trying to find a "path" from one word to another. For example a path from 'A' to 'S' is A-B-X-S as S is among values of X, X in B and B in A.
Currently, I am using this code:
query=['A','S']
if 'S' in full_dict['A']:
print ('Found during 1st iteration')
else:
for i in full_dict['A']:
if 'S' in full_dict[i]:
print ('Found during 2nd iteration')
else:
for ii in full_dict[i]:
etc.
I do 10 iterations and it works fine but I wonder if there is a better way to do it.
Thank you in advance.
You can use networkx package:
import networkx as nx
full_dict={'A': ['B','C','D','E'],
'B': ['A','C','D','E','F','X'],
'X': ['W','Y','Z','S'],
'S': ['W','K','T'],
}
g = nx.DiGraph() # directed graph
for k, v in full_dict.items():
g.add_edges_from((k, to) for to in v)
print(*nx.all_simple_paths(g, 'A', 'S')) # ['A', 'B', 'X', 'S']
A (nested) loop wouldn't work in general because it will involve a lot of nests (at best), and at worst, we don't generally know how many nests we would need.
Because of this, one could think of recursion instead. But this could be complicated since you might need to put much effort into things like breaking potential infinite loops or remembering intermediate results (to reduce computations).
So in my opinion, the best approach is a package that readily makes use of the graph structure of your data.
External libs are useful if you're doing this for production and need the high performance, optimizations, all possible paths, etc.
If you're doing this as an exercise or something where you can be sure of clean input (no loops within the word lists) or don't want an external lib for just one operation, then you can try a recursion:
full_dict = {
'A': ['B','C','D','E'],
'B': ['A','C','D','E','F','X'],
'X': ['W','Y','Z','S'],
'S': ['W','K','T'],
}
def find_path(source, target):
for key, words in full_dict.items():
if target in words:
yield target
if key == source: # check if it's source
yield source
else:
# otherwise, look for this key as new target
yield from find_path(source, key)
break # prevent further checking of remaining items
Usage:
list(find_key('A', 'S'))
# ['S', 'X', 'B', 'A']
# get it in the right order, reverse it
list(find_key('A', 'S'))[::-1]
# ['A', 'B', 'X', 'S']
The approach looks for the target word in the word list and then yields each key. Rather than check the word lists possible for every word in the source - since that creates many chains that may or may not get to the target.
More info on yield expressions.
Note that Python3's recursion limit is 1000, by default. So if the word chain may be longer, either increase the limit or use another method. Setting the limit to 4 and using this modified dict full_dict = {'A': ['B', 'D', 'E'], 'B': ['A', 'D', 'E', 'F', 'X'], 'X': ['W', 'Y', 'Z', 'C'], 'C': ['W', 'K', 'T', 'S'], 'S': []}, you'd hit the limit for A to S which is now a chain of 5 (['A', 'B', 'X', 'C', 'S']).
You could make it a recursive function:
def findLink(path,target,links):
if not isinstance(path,list): path = [path] # start path with initial word
for linked in links.get(path[-1],[]): # go through links of word
if linked == target: return path+[target] # done, return full path
if linked in path: continue # avoid circular linking
foundPath = findLink(path+[linked], target,links) # dig deeper
if foundPath: return foundPath # reached end on that path
output:
full_dict={'A': ['B','C','D','E'],
'B':['A','C','D','E','F','X'],
'X':['W','Y','Z','S'],
'S':['W','K','T'] }
print(findLink('A','S',full_dict))
['A', 'B', 'X', 'S']
could you help me to figure out smth in one exercise that relates to Lists topic (Ch 8 - https://www.py4e.com/)?
here is my code:
def delete_head(t):
del t[0]
letters = ['a', 'b', 'c', 'd', 'f', 'g']
q = delete_head(letters)
print(q)
delete_head(letters)
print(letters)
I got this output:
output
I cannot understand why output is not like ['b', 'c', 'd', 'f', 'g']
q = delete_head(letters)
This deletes 'a'
delete_head(letters)
and this deletes 'b' (since 'a' is already gone)
You are calling delete_head() twice, first in line 4 where you assign the return value of the function to q, and a second time in line 6.
Notice how q is None. This is because your function does not return the list. It get's a reference to the original list and deletes the first item. It does not return the list, but the changes aren't lost because you are using a reference to the original list. Therefore the return value of delete_head() is None
I am interested in simulating the sample space for the following question on a probability assignment:
A man will carve pumpkins for his two daughters and three sons. His wife will bring each kid’s pumpkins in a completely random order. The man has decided that as soon as he has carved pumpkins for two of his sons, he would ask his wife to carve the remaining pumpkins. Let W denote the number of pumpkins he will carve.
So the resulting sample space of W would look something like this:
sample_space=[['S','S'],
['S','D','S'],
['S','D','D','S'],
['D','S','S'],
['D','S','D','S'],
['D','D','S','S']]
I was thinking about having two lists, one of sons, one of daughters:
son_list1=['S','S','S']
daughter_list1=['D','D']
And then combining them with in every possible order:
result_list1=[['S','S','S','D','D'],
['S','S','D','S','D'],
['S','S','D','D','S'],
['S','D','S','S','D'],
['S','D','S','D','S'],
['S','D','D','S','S'],
['D','S','S','S','D'],
['D','S','S','D','S'],
['D','S','D','S','S'],
['D','D','S','S','S']]
I don't know if numbering each son and each daughter and then combining them would be easier where we have:
son_list2=['S1','S2','S3']
daughter_list2=['D1','D2']
where this resulting list would be something like:
result_list2=[['S1','S2','S3','D1','D2'],
['S1','S3','S2','D1','D2'],
['S2','S1','S3','D1','D2'],
['S2','S3','S1','D1','D2'],
['S3','S1','S2','D1','D2'],
['S3','S2','S1','D1','D2'],
...
['D2','D1','S3','S2','S1']]
But if this method would be easier, I could just get rid of the numbers after result_list2 was generaged and then delete the repeats.
Anyway, after I get the resulting list in the form of result_list1, I could create a "son counter" and then go through each list and then stop when the "son counter" reaches 2 and then from there delete the repeats to get the sample_space list.
Is there any better logic?
To solve this problem, I think the best solution would be to get all of the permutations of the order in which he carves each pumpkin.
I just used the following code, for getting all permutations of a set, from GeeksforGeeks. I just changed some of the variable names to make it more clear.
def permutation(passed_list):
# If passed_list is empty then there are no permutations
if len(passed_list) == 0:
return []
# If there is only one element in lst then, only
# one permutation is possible
if len(passed_list) == 1:
return [passed_list]
# Find the permutations for passed_list if there are
# more than 1 characters
perm_list = [] # empty list that will store current permutation
# Iterate the input(passed_list) and calculate the permutation
for i in range(len(passed_list)):
item = passed_list[i]
# Extract passed_list[i] or item from the list. remaining_list is
# remaining list
remaining_list = passed_list[:i] + passed_list[i + 1:]
# Generating all permutations where item is first
# element
for p in permutation(remaining_list):
perm_list.append([item] + p)
return perm_list
Then you can just iterate through all of the permutations, keeping track of the order as you go. Once you get to two sons, you stop going through that iteration, add that order to your sample space, and then go to the next permutation.
if __name__ == '__main__':
# Set of all children. It doesn't matter what order this list is in
children = ['S', 'S', 'S', 'D', 'D']
# perms is the list of all permutations of children list
perms = permutation(children)
# This set will hold the resulting sample space you are looking for
total_set = []
# For each permutation
for perm in perms:
order = [] # Contains the order of whose pumpkin he carves
son_counter = 0
for child in perm:
if child is 'S':
son_counter += 1
# Update the order
order.append(child)
if son_counter is 2:
# To keep from adding duplicate orders
if order not in total_set:
total_set.append(order)
# Reset the following two variables for the next iteration
order = []
son_counter = 0
break
print(total_set)
This gave me the following output:
[['S', 'S'], ['S', 'D', 'S'], ['S', 'D', 'D', 'S'], ['D', 'S', 'S'], ['D', 'S', 'D', 'S'], ['D', 'D', 'S', 'S']]
I believe this is the answer you are looking for.
Let me know if you have any questions!
You could use dynamic programming to build up the sample space from the bottom up. For example,
def create_samples(n_sons, n_daughters):
if n_sons == 0:
# stop carving
yield []
elif n_daughters == 0:
# must carve n_sons more pumpkins
yield ['S'] * n_sons
else:
# choose to carve for a sun
for sample in create_samples(n_sons - 1, n_daughters):
yield ['S'] + sample
# choose to carve for a daughter
for sample in create_samples(n_sons, n_daughters - 1):
yield ['D'] + sample
samples = list(create_samples(2, 2))
# [['S', 'S'],
# ['S', 'D', 'S'],
# ['S', 'D', 'D', 'S'],
# ['D', 'S', 'S'],
# ['D', 'S', 'D', 'S'],
# ['D', 'D', 'S', 'S']]
The function create_samples(n_sons, n_daughters) returns all samples that meet your condition, under the assumption that n_sons and n_daughters remain to be processed.
I have a dataset of multiple local store rankings that I'm looking to aggregate / combine into one national ranking, programmatically. I know that the local rankings are by sales volume, but I am not given the sales volume so must use the relative rankings to create as accurate a national ranking as possible.
As a short example, let's say that we have 3 local ranking lists, from best ranking (1st) to worst ranking (last), that represent different geographic boundaries that can overlap with one another.
ranking_1 = ['J','A','Z','B','C']
ranking_2 = ['A','H','K','B']
ranking_3 = ['Q','O','A','N','K']
We know that J or Q is the highest ranked store, as both are highest in ranking_1 and ranking_3, respectively, and they appear above A, which is the highest in ranking_2. We know that O is next, as it's above A in ranking_3. A comes next, and so on...
If I did this correctly on paper, the output of this short example would be:
global_ranking = [('J',1.5),('Q',1.5),('O',3),('A',4),('H',6),('N',6),('Z',6),('K',8),('B',9),('C',10)]
Note that when we don't have enough data to determine which of two stores is ranked higher, we consider it a tie (i.e. we know that one of J or Q is the highest ranked store, but don't know which is higher, so we put them both at 1.5). In the actual dataset, there are 100+ lists of 1000+ items in each.
I've had fun trying to figure out this problem and am curious if anyone has any smart approaches to it.
Modified Merge Sort algorithm will help here. The modification should take into account incomparable stores and though build groups of incomparable elements which you are willing to consider as equal (like Q and J)
This method seeks to analyze all of the stores at the front of the rankings. If they are not located in a lower than first position in any other ranking list, then they belong at this front level and are added to a 'level' list. Next, they are removed from the front runners and all of the list are adjusted so that there are new front runners. Repeat the process until there are no stores left.
def rank_stores(rankings):
"""
Rank stores with rankings by volume sales with over lap between lists.
:param rankings: list of rankings of stores also in lists.
:return: Ordered list with sets of items at same rankings.
"""
rank_global = []
# Evaluate all stores in the number one postion, if they are not below
# number one somewhere else, then they belong at this level.
# Then remove them from the front of the list, and repeat.
while sum([len(x) for x in rankings]) > 0:
tops = []
# Find out which of the number one stores are not in a lower position
# somewhere else.
for rank in rankings:
if not rank:
continue
else:
top = rank[0]
add = True
for rank_test in rankings:
if not rank_test:
continue
elif not rank_test[1:]:
continue
elif top in rank_test[1:]:
add = False
break
else:
continue
if add:
tops.append(top)
# Now add tops to total rankings list,
# then go through the rankings and pop the top if in tops.
rank_global.append(set(tops))
# Remove the stores that just made it to the top.
for rank in rankings:
if not rank:
continue
elif rank[0] in tops:
rank.pop(0)
else:
continue
return rank_global
For the rankings provided:
ranking_1 = ['J','A','Z','B','C']
ranking_2 = ['A','H','K','B']
ranking_3 = ['Q','O','A','N','K']
rankings = [ranking_1, ranking_2, ranking_3]
Then calling the function:
rank_stores(rankings)
Results in:
[{'J', 'Q'}, {'O'}, {'A'}, {'H', 'N', 'Z'}, {'K'}, {'B'}, {'C'}]
In some circumstances there may not be enough information to determine definite rankings. Try this order.
['Z', 'A', 'B', 'J', 'K', 'F', 'L', 'E', 'W', 'X', 'Y', 'R', 'C']
We can derive the following rankings:
a = ['Z', 'A', 'B', 'F', 'E', 'Y']
b = ['Z', 'J', 'K', 'L', 'X', 'R']
c = ['F', 'E', 'W', 'Y', 'C']
d = ['J', 'K', 'E', 'W', 'X']
e = ['K', 'F', 'W', 'R', 'C']
f = ['X', 'Y', 'R', 'C']
g = ['Z', 'F', 'W', 'X', 'Y', 'R', 'C']
h = ['Z', 'A', 'E', 'W', 'C']
i = ['L', 'E', 'Y', 'R', 'C']
j = ['L', 'E', 'W', 'R']
k = ['Z', 'B', 'K', 'L', 'W', 'Y', 'R']
rankings = [a, b, c, d, e, f, g, h, i, j, k]
Calling the function:
rank_stores(rankings)
results in:
[{'Z'},
{'A', 'J'},
{'B'},
{'K'},
{'F', 'L'},
{'E'},
{'W'},
{'X'},
{'Y'},
{'R'},
{'C'}]
In this scenario there is not enough information to determine where 'J' should be relative to 'A' and 'B'. Only that it is in the range beetween 'Z' and 'K'.
When multiplied among hundreds of rankings and stores, some of the stores will not be properly ranked on an absolute volume basis.
Problem:
I have some linked data and I want to build a structure like this one on this picture :
and get the level of each item because in the future I will make some calculations by staring at the lowest level of my tree structure.
Expected Result:
I need to get a structure that gives me items per level :
level 0: A
level 1: A = B, C,D
level 2: D = E, F, G
level 3: E = H,I , J, K
what I have tried so far:
I've tried this recursive code to simulate the behavior but I'm unable to get items the level of items.
dict_item = {"A": ["B","C","D"], "D": ["E","F","G"], "E":["H","I","J"]}
def build_bom(product):
if not dict_item.get(product):
return product
else :
return [build_bom(x) for x in dict_item.get(product)]
print(build_bom("A"))
My output is a nested list like this :
['B', 'C', [['H', 'I', 'J'], 'F', 'G']]
My Question:
I'm not sure if this is the best approach to handle my problem.
And how to get the desired output?
here is the desired output :
[ {"parent_E":["H", "I", "J"]},
{"parent_D": ["E", "F", "G"]},
{"parent_A"} :["D","C","B"]},
]
A list of dictionaries ( where keys are parents and values are children), the first element in the list is the lowest level of my structure and the last is the highest element.
PS: This is a simulation but in future, I will have to works on large datasets with this code.
Any Help will be appreciated
This is how I will approach this problem. First, I'll generate the tree from your dict_item object.
dict_item = {"A": ["B","C","D"], "D": ["E","F","G"], "E":["H","I","J"]}
def build_tree(x):
if x in dict_item:
return {x: [build_tree(v) for v in dict_item[x]]}
else:
return x
tree = build_tree("A")
print(tree)
>>> {'A': ['B', 'C', {'D': [{'E': ['H', 'I', 'J']}, 'F', 'G']}]}
Then, do a breadth-first search on the tree. Each time we hit an element that has children, we append it to a list:
results = []
queue = [tree]
while queue:
x = queue.pop(0)
if isinstance(x, dict):
parent, children = list(x.items())[0]
results.append({'parent_' + parent: dict_item[parent]})
for child in children:
queue.append(child)
print(results)
>>> [{'parent_A': ['B', 'C', 'D']}, {'parent_D': ['E', 'F', 'G']}, {'parent_E': ['H', 'I', 'J']}]
Then all we need to do now, is to reverse the list:
print list(reversed(results))
>>> [{'parent_E': ['H', 'I', 'J']}, {'parent_D': ['E', 'F', 'G']}, {'parent_A': ['B', 'C', 'D']}]