Python: Algorithm Traverse Tree that expands at each node - python

I have a dictionary with id and multiple values for each ID which are strings. For each value in that for each Id I make a database query and get set results:
{111: Run, Jump, swim}
{222: Eat, drink}
so for each value lets say run I execute a query and it returns another set of data then pick each value of that set and run query which will give another set and finally I come to a point where there is only one item get return. Once I finish get that final element of every single sub element of Run then I move to Jump and continue.
I asked this question before but didn't get any results so people told me to remove the code and just ask the question again. Here is the link for the same question which I ask few days ago. Do i need to implement something like disjoin set?

You can look at the categories/subcategories as a tree with N branches at each node (depending how many categories you have). From what I can gather you basically want to generate an in-order list of tree leaves.
One simple way of doing it is through generators (using the terminology from your original question):
def lookup(elem):
# do your SQL call here for a given category 'elem' and return
# a list of it's subcategories
return []
def leaves(lst):
if lst:
for elem in lst: # for every category
for sublist in leaves(lookup(elem)): # enumerate it's sub categories
yield sublist # and return it
yield elem # once lookup(elem) is [] return elem
d = { 111: [Run, Jump, swim] , 222: [Eat, drink] }
for key, lst in d.items():
print key, [elem for elem in leaves(lst)]
If you are unfamiliar with generators, they are simply iterator constructs that "yield" values instead of returning them. The difference is that yielding only pauses the iterator at that location which will continue where it stopped when the next value is requested.
With some clever recursion inside the generator you can simply parse the whole tree.
The [elem for elem in leaves(lst)] is a list comprehension and it simply builds a list containing elem for every element iterated by leaves.

This looks like plain tree traversal. You can use BFS or DFS.

Related

How to implement dicts / sets opposed to a list search, to increase speed

I am making a program that has to search through very long lists, and I have seen people suggesting that using sets and dicts speeds it up massively. However, I am at a loss as to how to make it work within my code. Currently, the program does this:
indexes = []
print("Collecting indexes...")
for term in sliced_5:
indexes.append(hex_crypted.index(term))
The code searches through the hex_crypted list, which contains 1,000,000+ terms, finds the index of the term, and then appends it to the the 'indexes' list.
I simply need to speed this process. Thanks for any help.
You want to build a lookup table so you don't need to repeatedly loop over hex_crypted. Then you can simply look up each term in the table.
print("Collecting indexes...")
lookup = {term: index for (index, term) in enumerate(hex_crypted)}
indexes = [lookup[term] for term in sliced_5]
The fastest method if you have a list is to do a set function on the list to return it as a set, but I don't think that is what you want to do in this case.
hex_crypted_set = set(hex_crypted)
If you need to keep that index for some reason, you'll want to instead build a dictionary first.
hex_crypted_dict = {}
for i in enumerate(hex_crypted):
hex_crypted_dict[i[1]] = i[0]
And then to get that index you just search the dict:
indexes = []
for term in sliced_5:
indexes.append(hex_crypted_dict[term])
You will end up with the appropriate indexes which correspond to the original long list and only iterate that long list one time, which will be a lot better performance than iterating it for every time you do a lookup.
The first step is to generate a dict, for example:
hex_crypted_dict = {v: i for i, v in enumerate(hex_crypted)}
Then your code changed to
indexes = []
hex_crypted_dict = {v: i for i, v in enumerate(hex_crypted)}
print("Collecting indexes...")
for term in sliced_5:
indexes.append(hex_crypted_dict[term])

How to search for strings within nested lists

One of the questions for an assignment I'm doing consists of looking within a nested lists consisting of "an ultrashort story and its author.", to find a string that was inputted by a user. Not to sure on how to go about this, here is the assignment brief below if anyone would like more clarification. There are also more questions I'm not to sure on eg "find all stories by a certain author". Some explanations, or point me in the right direction is greatly appreciated :)
list = []
mylist = [['a','b','c'],['d','e','f']]
string = input("String?")
if string in [elem for sublist in mylist for elem in sublist] == True:
list.append(elem)
This is just an example of something i've tried, the list above is similar enough to the one i'm actually using for the question. I've just currently been going through different methods of iterating over a nested lists and adding mathcing items to another list. above code is just one example of an attemp i've made at this proccess.
""" the image above states that the data is in the
form of an list of sublists, with each sublist containing
two strings
"""
stories = [
['story string 1', 'author string 1'],
['story string 2', 'author string 2']
]
""" find stories that contain a given string
"""
stories_with_substring = []
substring = 'some string' # search string
for story, author in stories:
# if the substring is not in the story, a ValueError is raised
try:
story.index(substring)
stories_with_substring.append((story, author))
except ValueError:
continue
""" find stories by a given author
"""
stories_by_author = []
target_author = 'first last'
for story, author in stories:
if author == target_author:
stories_by_author.append((story, author))
This line here
for story, author in stories:
'Unpacks' the array. It's equivalent to
for pair in stories:
story = pair[0]
author = pair[1]
Or to go even further:
i = 0
while i < len(stories):
pair = stories[i]
story = pair[0]
author = pair[1]
I'm sure you can see how useful this is when dealing with lists that contain lists/tuples.
You may need to call .lower() on some of the strings if you want the search to be case insensitive
You can do a few things here. Your example showed the use of a list comprehension, so let's focus on some other aspects of this problem.
Recursion
You can define a function that iterates through all the items in the top level list. Assuming you know for sure all items are either strings or more lists, you can use type() to check if each item is another list, or is a string. If it's a string, do your search - if it's a list, have your function call itself. Let's look at an example. Please note that we should never using variables named list or string - these are core value types and we don't want to accidentally overwrite them!
mylist = [['a','b','c'],['d','e','f']]
def find_nested_items(my_list, my_input):
results = []
for i in mylist:
if type(i) == list:
items = find_nested_items(i, my_input)
results += items
elif my_input in i:
results.append(i)
return results
We're doing a few things here:
Creating an empty list named results
Iterating through the top level items of my_list
If one of those items is another list, we have our function call itself - at some point this will trigger the condition where an item is not a list, and will eventually return the results from that. For now, we assume the results we're getting back are going to be correct, so we concatenate those results to our top level results list
If the item is not a list, we simply check for the existence of our input and if so, add it to our results list
This kind of recursion is typically very safe, because it's inherently limited by our data structure. It can't run forever unless the data structure itself is infinitely deep.
Generators
Next, let's look at a much cooler function of python 3: generators. Right now, we're doing all the work of collecting the results in one go. If we later on want to iterate through those results, we need to iterate over them separately.
Instead of doing that, we can define a generator. This works almost the same, practically speaking, but instead of collecting the results in one loop and then using them in a second, we can collect and use each result all within a single loop. A generator "yields" a value, then stops until it is called the next time. Let's modify our example to make it a generator:
mylist = [['a','b','c'],['d','e','f']]
def find_nested_items(my_list, my_input):
for i in mylist:
if type(i) == list:
yield from find_nested_items(i, my_input)
elif my_input in i:
yield i
You'll notice this version is a fair bit shorter. There's no need to hold items in a temporary list - each item is "yielded", which means it's passed directly to the caller to use immediately, and the caller will stop our generator until it needs the next value.
yield from basically does the same recursion, it simply sets up a generator within a generator to return those nested items back up the chain to the caller.
These are some good techniques to try - please give them a go!

Get unique entries in list of lists by an item

This seems like a fairly straightforward problem but I can't seem to find an efficient way to do it. I have a list of lists like this:
list = [['abc','def','123'],['abc','xyz','123'],['ghi','jqk','456']]
I want to get a list of unique entries by the third item in each child list (the 'id'), i.e. the end result should be
unique_entries = [['abc','def','123'],['ghi','jqk','456']]
What is the most efficient way to do this? I know I can use set to get the unique ids, and then loop through the whole list again. However, there are more than 2 million entries in my list and this is taking too long. Appreciate any pointers you can offer! Thanks.
How about this: Create a set that keeps track of ids already seen, and only append sublists where id's where not seen.
l = [['abc','def','123'],['abc','xyz','123'],['ghi','jqk','456']]
seen = set()
new_list = []
for sl in l:
if sl[2] not in seen:
new_list.append(sl)
seen.add(sl[2])
print new_list
Result:
[['abc', 'def', '123'], ['ghi', 'jqk', '456']]
One approach would be to create an inner loop. within the first loop you iterate over the outer list starting from 1, previously you will need to create an arraylist which will add the first element, inside the inner loop starting from index 0 you will check only if the third element is located as part of the third element within the arraylist current holding elements, if it is not found then on another arraylist whose scope is outside outher loop you will add this element, else you will use "continue" keyword. Finally you will print out the last arraylist created.

Search tuple elements within in list

I have a list in Python as
list_data = [('a','b',5),('aa','bb',50)]
and some variables:
a = ('a','b','2')
c = ('aaa','bbb','500')
Now how can I search if a is already there in list_data?
If yes add 2 to the value of a, if not append to list_data?
The result should be as
list_data = [('a','b',7),('aa','bb',50),('aaa','bbb','500')]
Actually, this question is a good way to several demonstrate Pythonic ways of doing things. So lets see what we can do.
In order to check if something is in python list you can just use operator in:
if a in list_data:
do_stuff()
But what you ask is a bit different. You want to do something like a search by multiple keys, if I understand correctly. In this case you can 'trim' your tuple by discarding last entry.
Slicing is handy for this:
value_trimmed = value[:-1]
Now you can make a list of trimmed tuples:
list_trimmed = []
for a in list_data:
list_trimmed.append(a[:-1])
And then search there:
if a[:-1] in list_trimmed:
do_smth()
This list can be constructed in a less verbose way using list_comprehension:
list_trimmed = [item[:-1] for item in list_data]
To find where your item exactly is you can use index() method of list:
list_trimmed.index(a[:-1])
This will return index of a[:-1] first occurrence in list_trimmed or throw if it cant be found. We can avoid explicitly checking if item is in the list, and do the insertion only if the exception is caught.
Your full code will look like this:
list_data = [('a','b',5), ('aa','bb',50)]
values_to_find = [('a','b','2'), ('aaa','bbb','500')]
list_trimmed = [item[:-1] for item in list_data]
for val in values_to_find:
val_trimmed = val[:-1]
try:
ind = list_trimmed.index(val_trimmed)
src_tuple = list_data[ind]
# we can't edit tuple inplace, since they are immutable in python
list_data[ind] = (src_tuple[0], src_tuple[1], src_tuple[2]+2)
except ValueError:
list_data.append(val)
print list_data
Of course, if speed or memory-efficiency is your main concern this code is not very appropriate, but you haven't mentioned these in your question, and that is not what python really about in my opinion.
Edit:
You haven't specified what happens when you check for ('aaa','bbb','500') second time - should we use the updated list and increment matching tuple's last element, or should we stick to the original list and insert another copy?
If we use updated list, it is not clear how to handle incrementing string '500' by 2 (we can convert it to integer, but you should have constructed your query appropriately in the first place).
Or maybe you meant add last element of tuple being searched to the tuple in list if found ? Please edit your question to make it clear.

How to implement a counter for each element of a python list?

Using Python 3.4
I've got a way that works, but I think there might be a better way.
I want to have a list with a method expand() which chooses a random element from the list, but every time that element is chosen, a counter is incremented. I tried subclassing str to be able to add attributes but it didn't work.
My main problem with what I've got is that the expression random.randint(0,len(self)-1) and using a local variable doesn't seem very Pythonic. Before I added the counter, I could just type random.choice(self)
class clauses(list):
def __init__(self):
self.uses = []
def __setitem__(self,key,value):
self.uses[key]=value
super().__setitem__(self,key,value)
def __delitem__(self,key):
del(self.uses[key])
super().__delitem__(key)
def append(self,value):
self.uses.append(0)
super().append(value)
def extend(self,sequence):
for x in sequence:
self.uses.append(0)
super().append(x)
def expand(self):
n = random.randint(0,len(self)-1)
self.uses[n] += 1
return(self[n])
Initializing an empty dictionary, along with your list should solve this, assuming there are no duplicate entries within the list.
When adding an element to the list, you can also add it to the dictionary by myDict[element]=0 where myDict is the initialized dictionary, and element is the item being added to the list.
Then, when the item is selected, you can simply do: myDict[element]+=1.
When dealing with an instance of duplicate entries, you could create a dictionary of dictionaries in which each key in the dictionary is a word, and the nested dictionary keys for each word are, say, index positions of the duplicate word (the values of course being the actual counts). This does add substantial complication, however, as when you remove an item from your list you will need to also update index positions. This nested data structure would like something like this though: { word1: {position1: count1}, word2: {position1: count1, position 2: count2}....}

Categories

Resources