How to search for strings within nested lists - python

One of the questions for an assignment I'm doing consists of looking within a nested lists consisting of "an ultrashort story and its author.", to find a string that was inputted by a user. Not to sure on how to go about this, here is the assignment brief below if anyone would like more clarification. There are also more questions I'm not to sure on eg "find all stories by a certain author". Some explanations, or point me in the right direction is greatly appreciated :)
list = []
mylist = [['a','b','c'],['d','e','f']]
string = input("String?")
if string in [elem for sublist in mylist for elem in sublist] == True:
list.append(elem)
This is just an example of something i've tried, the list above is similar enough to the one i'm actually using for the question. I've just currently been going through different methods of iterating over a nested lists and adding mathcing items to another list. above code is just one example of an attemp i've made at this proccess.

""" the image above states that the data is in the
form of an list of sublists, with each sublist containing
two strings
"""
stories = [
['story string 1', 'author string 1'],
['story string 2', 'author string 2']
]
""" find stories that contain a given string
"""
stories_with_substring = []
substring = 'some string' # search string
for story, author in stories:
# if the substring is not in the story, a ValueError is raised
try:
story.index(substring)
stories_with_substring.append((story, author))
except ValueError:
continue
""" find stories by a given author
"""
stories_by_author = []
target_author = 'first last'
for story, author in stories:
if author == target_author:
stories_by_author.append((story, author))
This line here
for story, author in stories:
'Unpacks' the array. It's equivalent to
for pair in stories:
story = pair[0]
author = pair[1]
Or to go even further:
i = 0
while i < len(stories):
pair = stories[i]
story = pair[0]
author = pair[1]
I'm sure you can see how useful this is when dealing with lists that contain lists/tuples.
You may need to call .lower() on some of the strings if you want the search to be case insensitive

You can do a few things here. Your example showed the use of a list comprehension, so let's focus on some other aspects of this problem.
Recursion
You can define a function that iterates through all the items in the top level list. Assuming you know for sure all items are either strings or more lists, you can use type() to check if each item is another list, or is a string. If it's a string, do your search - if it's a list, have your function call itself. Let's look at an example. Please note that we should never using variables named list or string - these are core value types and we don't want to accidentally overwrite them!
mylist = [['a','b','c'],['d','e','f']]
def find_nested_items(my_list, my_input):
results = []
for i in mylist:
if type(i) == list:
items = find_nested_items(i, my_input)
results += items
elif my_input in i:
results.append(i)
return results
We're doing a few things here:
Creating an empty list named results
Iterating through the top level items of my_list
If one of those items is another list, we have our function call itself - at some point this will trigger the condition where an item is not a list, and will eventually return the results from that. For now, we assume the results we're getting back are going to be correct, so we concatenate those results to our top level results list
If the item is not a list, we simply check for the existence of our input and if so, add it to our results list
This kind of recursion is typically very safe, because it's inherently limited by our data structure. It can't run forever unless the data structure itself is infinitely deep.
Generators
Next, let's look at a much cooler function of python 3: generators. Right now, we're doing all the work of collecting the results in one go. If we later on want to iterate through those results, we need to iterate over them separately.
Instead of doing that, we can define a generator. This works almost the same, practically speaking, but instead of collecting the results in one loop and then using them in a second, we can collect and use each result all within a single loop. A generator "yields" a value, then stops until it is called the next time. Let's modify our example to make it a generator:
mylist = [['a','b','c'],['d','e','f']]
def find_nested_items(my_list, my_input):
for i in mylist:
if type(i) == list:
yield from find_nested_items(i, my_input)
elif my_input in i:
yield i
You'll notice this version is a fair bit shorter. There's no need to hold items in a temporary list - each item is "yielded", which means it's passed directly to the caller to use immediately, and the caller will stop our generator until it needs the next value.
yield from basically does the same recursion, it simply sets up a generator within a generator to return those nested items back up the chain to the caller.
These are some good techniques to try - please give them a go!

Related

Search tuple elements within in list

I have a list in Python as
list_data = [('a','b',5),('aa','bb',50)]
and some variables:
a = ('a','b','2')
c = ('aaa','bbb','500')
Now how can I search if a is already there in list_data?
If yes add 2 to the value of a, if not append to list_data?
The result should be as
list_data = [('a','b',7),('aa','bb',50),('aaa','bbb','500')]
Actually, this question is a good way to several demonstrate Pythonic ways of doing things. So lets see what we can do.
In order to check if something is in python list you can just use operator in:
if a in list_data:
do_stuff()
But what you ask is a bit different. You want to do something like a search by multiple keys, if I understand correctly. In this case you can 'trim' your tuple by discarding last entry.
Slicing is handy for this:
value_trimmed = value[:-1]
Now you can make a list of trimmed tuples:
list_trimmed = []
for a in list_data:
list_trimmed.append(a[:-1])
And then search there:
if a[:-1] in list_trimmed:
do_smth()
This list can be constructed in a less verbose way using list_comprehension:
list_trimmed = [item[:-1] for item in list_data]
To find where your item exactly is you can use index() method of list:
list_trimmed.index(a[:-1])
This will return index of a[:-1] first occurrence in list_trimmed or throw if it cant be found. We can avoid explicitly checking if item is in the list, and do the insertion only if the exception is caught.
Your full code will look like this:
list_data = [('a','b',5), ('aa','bb',50)]
values_to_find = [('a','b','2'), ('aaa','bbb','500')]
list_trimmed = [item[:-1] for item in list_data]
for val in values_to_find:
val_trimmed = val[:-1]
try:
ind = list_trimmed.index(val_trimmed)
src_tuple = list_data[ind]
# we can't edit tuple inplace, since they are immutable in python
list_data[ind] = (src_tuple[0], src_tuple[1], src_tuple[2]+2)
except ValueError:
list_data.append(val)
print list_data
Of course, if speed or memory-efficiency is your main concern this code is not very appropriate, but you haven't mentioned these in your question, and that is not what python really about in my opinion.
Edit:
You haven't specified what happens when you check for ('aaa','bbb','500') second time - should we use the updated list and increment matching tuple's last element, or should we stick to the original list and insert another copy?
If we use updated list, it is not clear how to handle incrementing string '500' by 2 (we can convert it to integer, but you should have constructed your query appropriately in the first place).
Or maybe you meant add last element of tuple being searched to the tuple in list if found ? Please edit your question to make it clear.

Compare items in list with nested for-loop

I have a list of URLs in an open CSV which I have ordered alphabetically, and now I would like to iterate through the list and check for duplicate URLs. In a second step, the duplicate should then be removed from the list, but I am currently stuck on the checking part which I have tried to solve with a nested for-loop as follows:
for i in short_urls:
first_url = i
for s in short_urls:
second_url = s
if i == s:
print "duplicate"
else:
print "all good"
The print statements will obviously be replaced once the nested for-loop is working. Currently, the list contains a few duplicates, but my nested loop does not seem to work correctly as it does not recognise any of the duplicates.
My question is: are there better ways to do perform this exercise, and what is the problem with the current nested for-loop?
Many thanks :)
By construction, your method is faulty, even if you indent the if/else block correctly. For instance, imagine if you had [1, 2, 3] as short_urls for the sake of argument. The outer for loop will pick out 1 to compare to the list against. It will think it's finding a duplicate when in the inner for loop it encounters the first element, a 1 as well. Essentially, every element will be tagged as a duplicate and if you plan on removing duplicates, you'll end up with an empty list.
The better solution is to call set(short_urls) to get a set of your urls with the duplicates removed. If you want a list (as opposed to a set) of urls with the duplicates removed, you can convert the set back into a list with list(set(short_urls)).
In other words:
short_urls = ['google.com', 'twitter.com', 'google.com']
duplicates_removed_list = list(set(short_urls))
print duplicates_removed_list # Prints ['google.com', 'twitter.com']
if i == s:
is not inside the second for loop. You missed an indentation
for i in short_urls:
first_url = i
for s in short_urls:
second_url = s
if i == s:
print "duplicate"
else:
print "all good"
EDIT: Also you are comparing every element of an array with every element of the same array. This means compare the element at position 0 with the element at postion 0, which is obviously the same.
What you need to do is starting the second for at the position after that reached in the first for.

Python - List not converting to Tuple inorder to Sort

def mkEntry(file1):
for line in file1:
lst = (line.rstrip().split(","))
print("Old", lst)
print(type(lst))
tuple(lst)
print(type(lst)) #still showing type='list'
sorted(lst, key=operator.itemgetter(1, 2))
def main():
openFile = 'yob' + input("Enter the year <Do NOT include 'yob' or .'txt' : ") + '.txt'
file1 = open(openFile)
mkEntry(file1)
main()
TextFile:
Emma,F,20791
Tom,M,1658
Anthony,M,985
Lisa,F,88976
Ben,M,6989
Shelly,F,8975
and I get this output:
IndexError: string index out of range
I am trying to convert the lst to Tuple from List. So I will able to order the F to M and Smallest Number to Largest Numbers. In around line 7, it's still printing type list instead of type tuple. I don't know why it's doing that.
print(type(lst))
tuple(lst)
print(type(lst)) #still showing type='list'
You're not changing what lst refers to. You create a new tuple with tuple(lst) and immediately throw it away because you don't assign it to anything. You can do:
lst = tuple(lst)
Note that this will not fix your program. Notice that your sort operation is happening once per line of your file, which is not what you want. Try collecting each line into one sequence of tuples and then doing the sort.
Firstly, you are not saving the tuple you created anywhere:
tup = tuple(lst)
Secondly, there is no point in making it a tuple before sorting it - in fact, a list could be sorted in place as it's mutable, while a tuple would need another copy (although that's fairly cheap, the items it contains aren't copied).
Thirdly, the IndexError has nothing to do with whether it's a list or tuple, nor whether it is sorted. It most likely comes from the itemgetter, because there's a list item that doesn't have three entries in turn - for instance, the strings "F" or "M".
Fourthly, the sort you're doing, but not saving anywhere, is done on each individual line, not the table of data. Considering this means you're comparing a name, a number, and a gender, I rather doubt it's what you intended.
It's completely unclear why you're trying to convert data types, and the code doesn't match the structure of the data. How about moving back to the overview plan and sorting out what you want done? It could well be something like Python's csv module could help considerably.

python for loop list plus one item

I'm handling mouse clicks on objects based on the location of the object on the screen. I record the xy coord of the mouse click and see if it matches any of the objects that are allowed to be clicked on. The objects are in different lists or just single objects, but I want them in one big list so I can just loop through the whole thing once, if the click is on one of the objects, do the work, break. You can only click on one object.
First method: how i'm doing it now:
list = [obj1, obj2, obj3]
singleobj
copylist = list
copylist.append(singleobj)
for item in copylist:
if item.pos == mouseclick.pos:
doWork(item)
break
Second method: I'd rather do something like below, but obviously the list+singleobj is not valid:
for item in list+singleobj:
if item.pos == mouseclick.pos:
doWork(item)
break
Third method: Or if I absolutely had to, I could do this horrible terrible code:
list = [obj1, obj2, obj3]
foundobj = None
for item in list:
if item.pos == mouseclick.pos:
foundobj = item
break
if foundobj is None:
if singleobj.pos == mouseclick.pos:
foundobj = singleobj
#possibly repeated several times here....
if foundobj is not None:
doWork(foundobj)
The first method seems slow because I have to copy all of the (possibly many) lists and single objects into one list.
The second method seems ideal because it's compact and easy to maintain. Although, as it stands now it's simply pseudo code.
The third method is bulky and clunky.
Which method should I use? If the second, can you give the actual code?
For 2nd method you need itertools.chain
for item in itertools.chain(list, [singleobj]):
...
DrTyrsa's answer is what I was about to say about your precise question, but let me give you a few other tips:
copylist = list
copylist.append(singleobj)
this does not create a copy of the list, you might want to do copylist = list[:] or copylist = list(lst) (I changed the name here to lst, because list is a builtin. you know).
about your second method, you can do:
for item in list + [singleobj]:
...
and if you are going to use itertools, a tiny improvement is to use a tuple rather than a list to hold the extra object, as it's a little bit more lightweight:
for item in itertools.chain(list, (singleobj,)):
...
the other thing is that you should not be looping your objects to see if the coordinates match, you can have them indexed by their boundaries (in something like a BSP tree or a Quadtree) and do a faster lookup.

Python: Algorithm Traverse Tree that expands at each node

I have a dictionary with id and multiple values for each ID which are strings. For each value in that for each Id I make a database query and get set results:
{111: Run, Jump, swim}
{222: Eat, drink}
so for each value lets say run I execute a query and it returns another set of data then pick each value of that set and run query which will give another set and finally I come to a point where there is only one item get return. Once I finish get that final element of every single sub element of Run then I move to Jump and continue.
I asked this question before but didn't get any results so people told me to remove the code and just ask the question again. Here is the link for the same question which I ask few days ago. Do i need to implement something like disjoin set?
You can look at the categories/subcategories as a tree with N branches at each node (depending how many categories you have). From what I can gather you basically want to generate an in-order list of tree leaves.
One simple way of doing it is through generators (using the terminology from your original question):
def lookup(elem):
# do your SQL call here for a given category 'elem' and return
# a list of it's subcategories
return []
def leaves(lst):
if lst:
for elem in lst: # for every category
for sublist in leaves(lookup(elem)): # enumerate it's sub categories
yield sublist # and return it
yield elem # once lookup(elem) is [] return elem
d = { 111: [Run, Jump, swim] , 222: [Eat, drink] }
for key, lst in d.items():
print key, [elem for elem in leaves(lst)]
If you are unfamiliar with generators, they are simply iterator constructs that "yield" values instead of returning them. The difference is that yielding only pauses the iterator at that location which will continue where it stopped when the next value is requested.
With some clever recursion inside the generator you can simply parse the whole tree.
The [elem for elem in leaves(lst)] is a list comprehension and it simply builds a list containing elem for every element iterated by leaves.
This looks like plain tree traversal. You can use BFS or DFS.

Categories

Resources