Difference between two "contains" operations for python lists

Difference between two "contains" operations for python lists - python

I'm fairly new to python and have found that I need to query a list about whether it contains a certain item.
The majority of the postings I have seen on various websites (including this similar stackoverflow question) have all suggested something along the lines of
for i in list
if i == thingIAmLookingFor
return True
However, I have also found from one lone forum that
if thingIAmLookingFor in list
# do work
works.
I am wondering if the if thing in list method is shorthand for the for i in list method, or if it is implemented differently.
I would also like to which, if either, is more preferred.

In your simple example it is of course better to use in.
However... in the question you link to, in doesn't work (at least not directly) because the OP does not want to find an object that is equal to something, but an object whose attribute n is equal to something.
One answer does mention using in on a list comprehension, though I'm not sure why a generator expression wasn't used instead:
if 5 in (data.n for data in myList):
print "Found it"
But this is hardly much of an improvement over the other approaches, such as this one using any:
if any(data.n == 5 for data in myList):
print "Found it"

the "if x in thing:" format is strongly preferred, not just because it takes less code, but it also works on other data types and is (to me) easier to read.
I'm not sure how it's implemented, but I'd expect it to be quite a lot more efficient on datatypes that are stored in a more searchable form. eg. sets or dictionary keys.

The if thing in somelist is the preferred and fastest way.
Under-the-hood that use of the in-operator translates to somelist.__contains__(thing) whose implementation is equivalent to: any((x is thing or x == thing) for x in somelist).
Note the condition tests identity and then equality.

for i in list
if i == thingIAmLookingFor
return True
The above is a terrible way to test whether an item exists in a collection. It returns True from the function, so if you need the test as part of some code you'd need to move this into a separate utility function, or add thingWasFound = False before the loop and set it to True in the if statement (and then break), either of which is several lines of boilerplate for what could be a simple expression.
Plus, if you just use thingIAmLookingFor in list, this might execute more efficiently by doing fewer Python level operations (it'll need to do the same operations, but maybe in C, as list is a builtin type). But even more importantly, if list is actually bound to some other collection like a set or a dictionary thingIAmLookingFor in list will use the hash lookup mechanism such types support and be much more efficient, while using a for loop will force Python to go through every item in turn.
Obligatory post-script: list is a terrible name for a variable that contains a list as it shadows the list builtin, which can confuse you or anyone who reads your code. You're much better off naming it something that tells you something about what it means.

Related

cleanest way to call one function on a list of items

In python 2, I used map to apply a function to several items, for instance, to remove all items matching a pattern:
map(os.remove,glob.glob("*.pyc"))
Of course I ignore the return code of os.remove, I just want all files to be deleted. It created a temp instance of a list for nothing, but it worked.
With Python 3, as map returns an iterator and not a list, the above code does nothing.
I found a workaround, since os.remove returns None, I use any to force iteration on the full list, without creating a list (better performance)
any(map(os.remove,glob.glob("*.pyc")))
But it seems a bit hazardous, specially when applying it to methods that return something. Another way to do that with a one-liner and not create an unnecessary list?

The change from map() (and many other functions from 2.7 to 3.x) returning a generator instead of a list is a memory saving technique. For most cases, there is no performance penalty to writing out the loop more formally (it may even be preferred for readability).
I would provide an example, but #vaultah nailed it in the comments: still a one-liner:
for x in glob.glob("*.pyc"): os.remove(x)

Nested list comprehension against .NET Arrays within Arrays

I have a .NET structure that has arrays with arrays in it. I want to crete a list of members of items from a specific array in a specific array using list comprehension in IronPython, if possible.
Here is what I am doing now:
tag_results = [item_result for item_result in results.ItemResults if item_result.ItemId == tag_id][0]
tag_vqts = [vqt for vqt in tag_results.VQTs]
tag_timestamps = [vqt.TimeStamp for vqt in tag_vqts]
So, get the single item result from the results array which matches my condition, then get the vqts arrays from those item results, THEN get all the timestamp members for each VQT in the vqts array.
Is wanting to do this in a single statement overkill? Later on, the timestamps are used in this manner:
vqts_to_write = [vqt for vqt in resampled_vqts if not vqt.TimeStamp in tag_timestamps]
I am not sure if a generator would be appropriate, since I am not really looping through them, I just want a list of all the timestamps for all the item results for this item/tag so that I can test membership in the list.
I have to do this multiple times for different contexts in my script, so I was just wondering if I am doing this in an efficient and pythonic manner. I am refactoring this into a method, which got me thinking about making it easier.
FYI, this is IronPython 2.6, embedded in a fixed environment that does not allow the use of numpy, pandas, etc. It is safe to assume I need a python 2.6 only solution.
My main question is:
Would collapsing this into a single line, if possible, obfuscate the code?
If collapsing is appropriate, would a method be overkill?
Two! My two main questions are:
Would collapsing this into a single line, if possible, obfuscate the code?
If collapsing is appropriate, would a method be overkill?
Is a generator appropriate for testing membership in a list?
Three! My three questions are... Amongst my questions are such diverse queries as...I'll come in again...
(it IS python...)

tag_results = [...][0] builds a whole new list just to get one item. This is what next() on a generator expression is for:
next(item_result for item_result in results.ItemResults if item_result.ItemId == tag_id)
which only iterates just enough to get a first item.
You can inline that, but I'd keep that as a separate expression for readability.
The remainder is easily put into one expression:
tag_results = next(item_result for item_result in results.ItemResults
if item_result.ItemId == tag_id)
tag_timestamps = [vqt.TimeStamp for vqt in tag_results.VQTs]
I'd make that a set if you only need to do membership testing:
tag_timestamps = set(vqt.TimeStamp for vqt in tag_results.VQTs)
Sets allow for constant time membership tests; testing against a list takes linear time as the whole list could end up being scanned for each such test.

Why python for loops don't default to one iteration for single objects

This may seem like an odd question but why doesn't python by default "iterate" through a single object by default.
I feel it would increase the resilience of for loops for low level programming/simple scripts.
At the same time it promotes sloppiness in defining data structures properly though. It also clashes with strings being iterable by character.
E.g.
x = 2
for a in x:
print(a)
As opposed to:
x = [2]
for a in x:
print(a)
Are there any reasons?
FURTHER INFO: I am writing a function that takes a column/multiple columns from a database and puts it into a list of lists. It would just be visually "nice" to have a number instead of a single element list without putting type sorting into the function (probably me just being OCD again though)
Pardon the slightly ambiguous question; this is a "why is it so?" not an "how to?". but in my ignorant world, I would prefer integers to be iterable for the case of the above mentioned function. So why would it not be implemented. Is it to do with it being an extra strain on computing adding an __iter__ to the integer object?
Discussion Points
Is an __iter__ too much of a drain on machine resources?
Do programmers want an exception to be thrown as they expect integers to be non-iterable
It brings up the idea of if you can't do it already, why not just let it, since people in the status quo will keep doing what they've been doing unaffected (unless of course the previous point is what they want); and
From a set theory perspective I guess strictly a set does not contain itself and it may not be mathematically "pure".

Python cannot iterate over an object that is not 'iterable'.
The 'for' loop actually calls inbuilt functions within the iterable data-type which allow it to extract elements from the iterable.
non-iterable data-types don't have these methods so there is no way to extract elements from them.
This Stack over flow question on 'if an object is iterable' is a great resource.

The problem is with the definition of "single object". Is "foo" a single object (Hint: it is an iterable with three strings)? Is [[1, 2, 3]][0] a single object (It is only one object, with 3 elements)?

The short answer is that there is no generalizable way to do it. However, you can write functions that have knowledge of your problem domain and can do conversions for you. I don't know your specific case, but suppose you want to handle an integer or list of integers transparently. You can create your own iterator:
def col_iter(item):
if isinstance(item, int):
yield item
else:
for i in item:
yield i
x = 2
for a in col_iter(x):
print a
y = [1,2,3,4]
for a in col_iter(y):
print a

The only thing that i can think of is that python for loops are looking for something to iterate through not just a value. If you think about it what would the value of "a" be? if you want it to be the number 2 then you don't need the for loop in the first place. If you want it to go through 1, 2 or 0, 1, 2 then you want. for a in range(x): not positive if that's the answer you're looking for but it's what i got.

How clever is the python interpreter?

Probably a stupid question, but I am wondering in general, and if anyone knows, how much foresight the Python interpreter has, specifically in the field of regular expressions and text parsing.
Suppose my code at some point looks like this:
mylist = ['a', 'b', 'c', ... ]
if 'g' in list: print(mylist.index('g'))
is there any safer way to do this with a while loop or similar. I mean, will the index be looked up with a second parsing from the beginning or are the two g's (in the above line) the same thing in Python's mind?

It'll do the lookup both times. If it's worth it (for, say, a very big list), use try:
try:
print(mylist.index('g'))
except ValueError:
pass

The result of the containment check is not cached, and so the index will need to be discovered anew. And the dynamic nature of Python makes implicit caching of such a thing unreliable since the __contains__() method may mutate the object (although it would be a violation of several programming principles to do so).

Your code will result in two lookups, first to determine if 'g' is in the list and second to find the index. Python won't try to consolidate them into a single lookup. If you're worried about efficiency you can use a dictionary instead of a list which will make both lookups O(1) instead of O(n).

You can easily make a dict to look up. Something like this:
mydict = {k:v for v,k in enumerate(mylist)}
The overhead of creating the dict won't be worthwhile unless you are doing a few such lookups on the same list

Try is better option to find the index of element in list.
try:
print(mylist.index('g'))
except ValueError:
print "value not in list"
pass

yeah it will be looked up twice, the python interpreter doesn't cache instructions, though I've being wondering if its possible (for certain things), if this is an issue, then you can use sets or dicts both of which have constant look up time.
Either way it seems you are LBYL, in python we tend to EAFP so its quite common to wrap such things in try ... except blocks

Using 'in' to match an attribute of Python objects in an array

I don't remember whether I was dreaming or not but I seem to recall there being a function which allowed something like,
foo in iter_attr(array of python objects, attribute name)
I've looked over the docs but this kind of thing doesn't fall under any obvious listed headers

Using a list comprehension would build a temporary list, which could eat all your memory if the sequence being searched is large. Even if the sequence is not large, building the list means iterating over the whole of the sequence before in could start its search.
The temporary list can be avoiding by using a generator expression:
foo = 12
foo in (obj.id for obj in bar)
Now, as long as obj.id == 12 near the start of bar, the search will be fast, even if bar is infinitely long.
As #Matt suggested, it's a good idea to use hasattr if any of the objects in bar can be missing an id attribute:
foo = 12
foo in (obj.id for obj in bar if hasattr(obj, 'id'))

Are you looking to get a list of objects that have a certain attribute? If so, a list comprehension is the right way to do this.
result = [obj for obj in listOfObjs if hasattr(obj, 'attributeName')]

you could always write one yourself:
def iterattr(iterator, attributename):
for obj in iterator:
yield getattr(obj, attributename)
will work with anything that iterates, be it a tuple, list, or whatever.
I love python, it makes stuff like this very simple and no more of a hassle than neccessary, and in use stuff like this is hugely elegant.

No, you were not dreaming. Python has a pretty excellent list comprehension system that lets you manipulate lists pretty elegantly, and depending on exactly what you want to accomplish, this can be done a couple of ways. In essence, what you're doing is saying "For item in list if criteria.matches", and from that you can just iterate through the results or dump the results into a new list.
I'm going to crib an example from Dive Into Python here, because it's pretty elegant and they're smarter than I am. Here they're getting a list of files in a directory, then filtering the list for all files that match a regular expression criteria.
files = os.listdir(path)
test = re.compile("test\.py$", re.IGNORECASE)
files = [f for f in files if test.search(f)]
You could do this without regular expressions, for your example, for anything where your expression at the end returns true for a match. There are other options like using the filter() function, but if I were going to choose, I'd go with this.
Eric Sipple

The function you are thinking of is probably operator.attrgettter. For example, to get a list that contains the value of each object's "id" attribute:
import operator
ids = map(operator.attrgetter("id"), bar)
If you want to check whether the list contains an object with an id == 12, then a neat and efficient (i.e. doesn't iterate the whole list unnecessarily) way to do it is:
any(obj.id == 12 for obj in bar)
If you want to use 'in' with attrgetter, while still retaining lazy iteration of the list:
import operator,itertools
foo = 12
foo in itertools.imap(operator.attrgetter("id"), bar)

What I was thinking of can be achieved using list comprehensions, but I thought that there was a function that did this in a slightly neater way.
i.e. 'bar' is a list of objects, all of which have the attribute 'id'
The mythical functional way:
foo = 12
foo in iter_attr(bar, 'id')
The list comprehension way:
foo = 12
foo in [obj.id for obj in bar]
In retrospect the list comprehension way is pretty neat anyway.

If you plan on searching anything of remotely decent size, your best bet is going to be to use a dictionary or a set. Otherwise, you basically have to iterate through every element of the iterator until you get to the one you want.
If this isn't necessarily performance sensitive code, then the list comprehension way should work. But note that it is fairly inefficient because it goes over every element of the iterator and then goes BACK over it again until it finds what it wants.
Remember, python has one of the most efficient hashing algorithms around. Use it to your advantage.

I think:
#!/bin/python
bar in dict(Foo)
Is what you are thinking of. When trying to see if a certain key exists within a dictionary in python (python's version of a hash table) there are two ways to check. First is the has_key() method attached to the dictionary and second is the example given above. It will return a boolean value.
That should answer your question.
And now a little off topic to tie this in to the list comprehension answer previously given (for a bit more clarity). List Comprehensions construct a list from a basic for loop with modifiers. As an example (to clarify slightly), a way to use the in dict language construct in a list comprehension:
Say you have a two dimensional dictionary foo and you only want the second dimension dictionaries which contain the key bar. A relatively straightforward way to do so would be to use a list comprehension with a conditional as follows:
#!/bin/python
baz = dict([(key, value) for key, value in foo if bar in value])
Note the if bar in value at the end of the statement**, this is a modifying clause which tells the list comprehension to only keep those key-value pairs which meet the conditional.** In this case baz is a new dictionary which contains only the dictionaries from foo which contain bar (Hopefully I didn't miss anything in that code example... you may have to take a look at the list comprehension documentation found in docs.python.org tutorials and at secnetix.de, both sites are good references if you have questions in the future.).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.