This is a generic question and answer for a logical error I've seen in many questions from new programmers in a variety of languages.
The problem is searching an array for an element that matches some input criteria. The algorithm, in pseudo-code, looks something like this:
for each element of Array:
if element matches criteria:
do something with element
maybe break out of loop (if only interested in first match)
else:
print "Not found"
This code reports "Not found" even if it successfully finds a matching element.
The problem is that when you're searching for something linearly through an array, you can't know that it's not found until you reach the end of the array. The code in the question reports "Not found" for every non-matching element, even though there may be other matching elements.
The simple modification is to use a variable that tracks whether you found something, and then check this variable at the end of the loop.
found = false
for each element of Array:
if element matches criteria:
do something with element
found = true
maybe break out of loop (if only interested in first match)
if not found:
print "Not found"
Python has an else: block in its for loops. This executes code only if the loop runs to completion, rather than ending due to use of break. This allows you to avoid the found variable (although it might still be useful for later processing):
for element in someIterable:
if matchesCriteria(element):
print("Found")
break
else:
print("Not found")
Some languages have built-in mechanisms that can be used instead of writing your own loop.
Some languages have an any or some function that takes a callback function, and returns a boolean indicating whether it succeeds for any elements of the array.
If the language has an array filtering function, you can filter the input array with a function that checks the criteria, and then check whether the result is an empty array.
If you're trying to match an element exactly, most languages provide a find or index function that will search for a matching element.
If you'll be searching frequently, it may be better to convert the array to a data structure that can be searched more efficiently. Most languages provide set and/or hash table data structures (the latter goes under many names depending on the language, e.g. associative array, map, dictionary), and these are typically searchable in O(1) time, while scanning an array is O(n).
Related
This is a generic question and answer for a logical error I've seen in many questions from new programmers in a variety of languages.
The problem is searching an array for an element that matches some input criteria. The algorithm, in pseudo-code, looks something like this:
for each element of Array:
if element matches criteria:
do something with element
maybe break out of loop (if only interested in first match)
else:
print "Not found"
This code reports "Not found" even if it successfully finds a matching element.
The problem is that when you're searching for something linearly through an array, you can't know that it's not found until you reach the end of the array. The code in the question reports "Not found" for every non-matching element, even though there may be other matching elements.
The simple modification is to use a variable that tracks whether you found something, and then check this variable at the end of the loop.
found = false
for each element of Array:
if element matches criteria:
do something with element
found = true
maybe break out of loop (if only interested in first match)
if not found:
print "Not found"
Python has an else: block in its for loops. This executes code only if the loop runs to completion, rather than ending due to use of break. This allows you to avoid the found variable (although it might still be useful for later processing):
for element in someIterable:
if matchesCriteria(element):
print("Found")
break
else:
print("Not found")
Some languages have built-in mechanisms that can be used instead of writing your own loop.
Some languages have an any or some function that takes a callback function, and returns a boolean indicating whether it succeeds for any elements of the array.
If the language has an array filtering function, you can filter the input array with a function that checks the criteria, and then check whether the result is an empty array.
If you're trying to match an element exactly, most languages provide a find or index function that will search for a matching element.
If you'll be searching frequently, it may be better to convert the array to a data structure that can be searched more efficiently. Most languages provide set and/or hash table data structures (the latter goes under many names depending on the language, e.g. associative array, map, dictionary), and these are typically searchable in O(1) time, while scanning an array is O(n).
This question already has answers here:
What does the "yield" keyword do in Python?
(51 answers)
Closed 3 years ago.
I am really confused as what does keyword "yield" return in generator? what are the real use case of this, when should i use it.
how is it different from "return" keyword?
what i have learnt is generator is better in term of performance but i cannot think of any real use case, if asked in interviews !
Thanks in advance!
This may be useful for text processing. If you have a larg corpus and you want to normalize the characters in the text, you apply a normalize function for every text for example.
You would like a function that loads a text just when you are going to use it and not the complete corpus because it may be too large for your computer.
Example:
from lxml import etree
def get_data(data_directory, parser):
for filename in os.listdir(data_directory):
if filename.endswith("xml"):
tree = etree.parse(os.path.join(data_directory, filename), parser=parser)
yield tree.getroot()
else:
print("None")
return None
You have a directory where all your files are. You want to parse only the XML files.
You can do such processing with a yield statement as if you loaded all your data:
for root in get_data(DATA_DIRECTORY, parser):
result = process(root)
save_result(result)
Return sends a specified value back to its caller whereas Yield can produce a sequence of values. We should use yield when we want to iterate over a sequence, but don’t want to store the entire sequence in memory.
You can read more about the differences here
The difference between yielding a single value and returning a single value is that yield wraps the value in an iterator, which is also called a stream or enumerator in other languages. A list is one example of an enumerator, and to simplify this answer, you can pretend that all iterators are just lists.
The difference between yielding many values (say, inside a for loop and returning an iterator (or list), is when the values are calculated. With yield, one value is calculated, and returned to the caller. If the caller doesn't need the whole list of values, the rest of the list is not even calculated.
However, when returning a list, the entire list must be calculated beforehand. Say you have this function:
def findIndex(enumerator, item):
idx = 0
for value in enumerator:
if (value == item):
return idx
idx = idx + 1
It takes an iterator, and searches for an item, returning the index of that item.
Now, here's where iterators make a difference. Imagine that you are going to call findIndex like this:
findIndex(gimme_the_values(), 3);
Say that gimme_the_values is some function which calculates a list of integers; however, let's also say that, the process of calculating those integers takes a long time, for some reason. Maybe, you're scanning through a 1500 page document, looking for every number that occurs in it, and that's the list of values that you're returning.
Now, let's say that the first several numbers to occur in this document are the numbers 7, 1998, 3, and 18; and let's say that the three occurs on the 40th page. If you define gimme_the_values to use yield, you can stop generating that "list" at page 40 — you'll never even scan for and return the the 18. However, if gimme_the_values returns a list instead of yielding, you have to scan every page, and generate the whole list, even though you really only need the first 3 in this case.
I use psychopy2 v1.85.2 for my experiment in Mac. I have gotten a message after an experiment as follows and then have some trouble in inaccurate response.corr though getting an accurate response.keys in excel. Please tell me how to get accurate response.corr.
FutureWarning:elementwise comparison failed;returning scalar
instead,but in the future will perform elementwise comparison
if (response0.keys == str(correctAns0) or (response0.keys == correctAns0):
response0.keys will return a list, even if it contains just a single value. This is why it is named .keys rather than .key. e.g. if the subject pushed the 'a' key, the results would be the single element list ['a'].
You should treat it as a list and make comparisons like yours to a specified single item within that list. e.g.
# test against the zeroth list item rather than the entire list:
if response0.keys[0] == str(correctAns0): # etc
def process_filter_description(filter, images, ial):
'''Return a new list containing only items from list images that pass
the description filter (a str). ial is the related image association list.
Matching is done in a case insensitive manner.
'''
images = []
for items in ial:
Those are the only two lines of code I have so far. What is troubling me is the filter in the function. I really don't know what the filter is supposed to do or how to use it.
In no way am I asking for the full code. I just want help with what the filter is supposed to do and how I can use it.
Like I said in my comment, this is really vague. But I'll try to explain a little about the concept of a filter in python, specifically the filter() function.
The prototype of filter is: iterable <- filter(function, iterable).
iterable is something that can be iterated over. You can look up this term in the docs for a more exact explanation, but for your question, just know that a list is iterable.
function is a function that accepts a single element of the iterable you specify (in this case, an element of the list) and returns a boolean specifying whether the element should exist in the iterable that is returned. If the function returns True, the element will appear in the returned list, if False, it will not.
Here's a short example, showing how you can use the filter() function to filter out all even numbers (which I should point out, is the same as "filtering in" all odd numbers)
def is_odd(i): return i%2
l = [1,2,3,4,5] # This is a list
fl = filter(is_odd, l)
print fl # This will display [1,3,5]
You should convince yourself that is_odd works first. It will return 1 (=True) for odd numbers and 0 (=False) for even numbers.
In practice, you usually use a lambda function instead of defining a single-use top-level function, but you shouldn't worry about that, as this is just fine.
But anyway, you should be able to do something similar to accomplish your goal.
Well it says in the description line:
Return a new list containing only items from list images that pass the description filter (a str)
...
Matching is done in a case insensitive manner
So.. im guessing the filter is just a string, do you have any kind of text associated with the images ? some kind of description or name that could be matched against the filter string ?
I'm fairly new to python and have found that I need to query a list about whether it contains a certain item.
The majority of the postings I have seen on various websites (including this similar stackoverflow question) have all suggested something along the lines of
for i in list
if i == thingIAmLookingFor
return True
However, I have also found from one lone forum that
if thingIAmLookingFor in list
# do work
works.
I am wondering if the if thing in list method is shorthand for the for i in list method, or if it is implemented differently.
I would also like to which, if either, is more preferred.
In your simple example it is of course better to use in.
However... in the question you link to, in doesn't work (at least not directly) because the OP does not want to find an object that is equal to something, but an object whose attribute n is equal to something.
One answer does mention using in on a list comprehension, though I'm not sure why a generator expression wasn't used instead:
if 5 in (data.n for data in myList):
print "Found it"
But this is hardly much of an improvement over the other approaches, such as this one using any:
if any(data.n == 5 for data in myList):
print "Found it"
the "if x in thing:" format is strongly preferred, not just because it takes less code, but it also works on other data types and is (to me) easier to read.
I'm not sure how it's implemented, but I'd expect it to be quite a lot more efficient on datatypes that are stored in a more searchable form. eg. sets or dictionary keys.
The if thing in somelist is the preferred and fastest way.
Under-the-hood that use of the in-operator translates to somelist.__contains__(thing) whose implementation is equivalent to: any((x is thing or x == thing) for x in somelist).
Note the condition tests identity and then equality.
for i in list
if i == thingIAmLookingFor
return True
The above is a terrible way to test whether an item exists in a collection. It returns True from the function, so if you need the test as part of some code you'd need to move this into a separate utility function, or add thingWasFound = False before the loop and set it to True in the if statement (and then break), either of which is several lines of boilerplate for what could be a simple expression.
Plus, if you just use thingIAmLookingFor in list, this might execute more efficiently by doing fewer Python level operations (it'll need to do the same operations, but maybe in C, as list is a builtin type). But even more importantly, if list is actually bound to some other collection like a set or a dictionary thingIAmLookingFor in list will use the hash lookup mechanism such types support and be much more efficient, while using a for loop will force Python to go through every item in turn.
Obligatory post-script: list is a terrible name for a variable that contains a list as it shadows the list builtin, which can confuse you or anyone who reads your code. You're much better off naming it something that tells you something about what it means.