Using 'in' to match an attribute of Python objects in an array - python

I don't remember whether I was dreaming or not but I seem to recall there being a function which allowed something like,
foo in iter_attr(array of python objects, attribute name)
I've looked over the docs but this kind of thing doesn't fall under any obvious listed headers

Using a list comprehension would build a temporary list, which could eat all your memory if the sequence being searched is large. Even if the sequence is not large, building the list means iterating over the whole of the sequence before in could start its search.
The temporary list can be avoiding by using a generator expression:
foo = 12
foo in (obj.id for obj in bar)
Now, as long as obj.id == 12 near the start of bar, the search will be fast, even if bar is infinitely long.
As #Matt suggested, it's a good idea to use hasattr if any of the objects in bar can be missing an id attribute:
foo = 12
foo in (obj.id for obj in bar if hasattr(obj, 'id'))

Are you looking to get a list of objects that have a certain attribute? If so, a list comprehension is the right way to do this.
result = [obj for obj in listOfObjs if hasattr(obj, 'attributeName')]

you could always write one yourself:
def iterattr(iterator, attributename):
for obj in iterator:
yield getattr(obj, attributename)
will work with anything that iterates, be it a tuple, list, or whatever.
I love python, it makes stuff like this very simple and no more of a hassle than neccessary, and in use stuff like this is hugely elegant.

No, you were not dreaming. Python has a pretty excellent list comprehension system that lets you manipulate lists pretty elegantly, and depending on exactly what you want to accomplish, this can be done a couple of ways. In essence, what you're doing is saying "For item in list if criteria.matches", and from that you can just iterate through the results or dump the results into a new list.
I'm going to crib an example from Dive Into Python here, because it's pretty elegant and they're smarter than I am. Here they're getting a list of files in a directory, then filtering the list for all files that match a regular expression criteria.
files = os.listdir(path)
test = re.compile("test\.py$", re.IGNORECASE)
files = [f for f in files if test.search(f)]
You could do this without regular expressions, for your example, for anything where your expression at the end returns true for a match. There are other options like using the filter() function, but if I were going to choose, I'd go with this.
Eric Sipple

The function you are thinking of is probably operator.attrgettter. For example, to get a list that contains the value of each object's "id" attribute:
import operator
ids = map(operator.attrgetter("id"), bar)
If you want to check whether the list contains an object with an id == 12, then a neat and efficient (i.e. doesn't iterate the whole list unnecessarily) way to do it is:
any(obj.id == 12 for obj in bar)
If you want to use 'in' with attrgetter, while still retaining lazy iteration of the list:
import operator,itertools
foo = 12
foo in itertools.imap(operator.attrgetter("id"), bar)

What I was thinking of can be achieved using list comprehensions, but I thought that there was a function that did this in a slightly neater way.
i.e. 'bar' is a list of objects, all of which have the attribute 'id'
The mythical functional way:
foo = 12
foo in iter_attr(bar, 'id')
The list comprehension way:
foo = 12
foo in [obj.id for obj in bar]
In retrospect the list comprehension way is pretty neat anyway.

If you plan on searching anything of remotely decent size, your best bet is going to be to use a dictionary or a set. Otherwise, you basically have to iterate through every element of the iterator until you get to the one you want.
If this isn't necessarily performance sensitive code, then the list comprehension way should work. But note that it is fairly inefficient because it goes over every element of the iterator and then goes BACK over it again until it finds what it wants.
Remember, python has one of the most efficient hashing algorithms around. Use it to your advantage.

I think:
#!/bin/python
bar in dict(Foo)
Is what you are thinking of. When trying to see if a certain key exists within a dictionary in python (python's version of a hash table) there are two ways to check. First is the has_key() method attached to the dictionary and second is the example given above. It will return a boolean value.
That should answer your question.
And now a little off topic to tie this in to the list comprehension answer previously given (for a bit more clarity). List Comprehensions construct a list from a basic for loop with modifiers. As an example (to clarify slightly), a way to use the in dict language construct in a list comprehension:
Say you have a two dimensional dictionary foo and you only want the second dimension dictionaries which contain the key bar. A relatively straightforward way to do so would be to use a list comprehension with a conditional as follows:
#!/bin/python
baz = dict([(key, value) for key, value in foo if bar in value])
Note the if bar in value at the end of the statement**, this is a modifying clause which tells the list comprehension to only keep those key-value pairs which meet the conditional.** In this case baz is a new dictionary which contains only the dictionaries from foo which contain bar (Hopefully I didn't miss anything in that code example... you may have to take a look at the list comprehension documentation found in docs.python.org tutorials and at secnetix.de, both sites are good references if you have questions in the future.).

Related

How can I pass each element of a set to a function?

I have a set with multiple tuples: set1 = {(1,1),(2,1)} for example.
Now I want to pass each tuple of the set to a method with this signature: process_tuple(self, tuple).
I am doing it with a for loop like this:
for tuple in set1:
process_tuple(tuple)
Is there a better way to do it?
Your question is basically "how can I loop without using a loop". While it's possible to do what you're asking with out an explicit for loop, the loop is by far the clearest and best way to go.
There are some alternatives, but mostly they're just changing how the loop looks, not preventing it in the first place. If you want to collect the return values from the calls to your function in a list, you can use a list comprehension to build the list at the same time as you loop:
results = [process_tuple(tuple) for tuple in set1]
You can also do set or dict comprehensions if those seem useful to your specific needs. For example, you could build a dictionary mapping from the tuples in your set to their processed results with:
results_dict = {tuple: process_tuple(tuple) for tuple in set1}
If you don't want to write out for tuple in set1 at all, you could use the builtin map function to do the looping and passing of values for you. It returns an iterator, which you'll need to fully consume to run the function over the full input. Passing the map object to list sometimes makes sense, for instance, to convert inputs into numbers:
user_numbers = list(map(int, input("Enter space-separated integers: ").split()))
But I'd also strongly encourage you to think of your current code as perhaps the best solution. Just because you can change it to something else, doesn't mean you should.

How to test if any string is in a list?

I'm trying to make an AI. The AI knows to say 'Hello' to 'hi' and to stop the program on 'bye', and if you say something it doesn't know it will ask you to define it. For example, if you say 'Hello' it will ask what that means. You type 'hi' and from then on when you say 'Hello' it will say 'Hello' back. I store everything in a list called knowledge. It works like this:
knowledge = [[term, definition], [term, definition], [term, definition]]
I am trying to add an edit function, where you type edit foo and it will ask for you to input a string, to change the definition of foo. However, I'm stuck. First, of course, I need to test if it already has a definition for foo. But I can't do that. I need to be able to do it regardless of the definition. In other languages, there is typeOf(). However type() doesn't seem to work. Here's what I have, but it doesn't work:
if [term, type(str)] in knowledge:
Can someone help?
As noted by tehhowch in the comments, a dictionary would be more appropriate as these are "key: value" pairs.
Using a dictionary...
knowledge = {'foo': 'foo def', 'bar': 'bar def', 'baz': 'baz def'}
searchTerm= 'foo'
searchTerm in knowledge
Out[1]: True
Storing knowledge as a list of lists fails because each item in knowledge is itself a list. Therefore, searching those lists for a string type (your term) fails. Instead, you could pull the terms out as a separate list and then check that one list for the term you're looking for.
knowledge = [["foo", "foo definition"], ["bar", "bar definition"], ["baz", "baz
definition"]]
terms = [item[0] for item in knowledge]
searchTerm= "foo"
searchTerm in terms
Out[1]: True
As others have mentioned, Python would typically use a dict for this kind of associative array. You approach is analogous to a Lisp data structure called an Association List. These are less efficient than the hashtable structures used by dicts, but they still have some useful properties.
For example, if you look up a key by scanning through the pairs and getting the first one, this means that you can insert another pair with the same key at the front and it will shadow the old value. You don't have to remove it. This makes insertions fast (at least with Lisp-style linked lists). You can also "undo" this operation by deleting the new one, and the old one will then be found by the scanner.
Your check if [term, type(str)] in knowledge: could be made to work as
if [term, str] in ([term, type(definition)] for term, definition in knowledge):
This uses a generator expression to convert your term, definition pairs into term, type(definition) pairs on the fly.
You can use dictionary to store definitions rather than list of lists and python's isinstance function will help you check if term belongs to specific class or not. see below example:
knowledge = {'Hello':'greeting','Hi':'greeting','Bye':'good bye'}
term = "Hello"
if isinstance(term, str):
if term in knowledge:
print("Defination exist")
else:
print("Defination doesn't exist")
else:
print("Entered term is not string")

Efficient use of Python list comprehensions

I have a Python list of objects that could be pretty long. At particular times, I'm interested in all of the elements in the list that have a certain attribute, say flag, that evaluates to False. To do so, I've been using a list comprehension, like this:
objList = list()
# ... populate list
[x for x in objList if not x.flag]
Which seems to work well. After forming the sublist, I have a few different operations that I might need to do:
Subscript the sublist to get the element at index ind.
Calculate the length of the sublist (i.e. the number of elements that have flag == False).
Search the sublist for the first instance of a particular object (i.e. using the list's .index() method).
I've implemented these using the naive approach of just forming the sublist and then using its methods to get at the data I want. I'm wondering if there are more efficient ways to go about these. #1 and #3 at least seem like they could be optimized, because in #1 I only need the first ind + 1 matching elements of the sublist, not necessarily the entire result set, and in #3 I only need to search through the sublist until I find a matching element.
Is there a good Pythonic way to do this? I'm guessing I might be able to use the () syntax in some way to get a generator instead of creating the entire list, but I haven't happened upon the right way yet. I obviously could write loops manually, but I'm looking for something as elegant as the comprehension-based method.
If you need to do any of these operations a couple of times, the overhead of other methods will be higher, the list is the best way. It's also probably the clearest, so if memory isn't a problem, then I'd recommend just going with it.
If memory/speed is a problem, then there are alternatives - note that speed-wise, these might actually be slower, depending on the common case for your software.
For your scenarios:
#value = sublist[n]
value = nth(x for x in objList if not x.flag, n)
#value = len(sublist)
value = sum(not x.flag for x in objList)
#value = sublist.index(target)
value = next(dropwhile(lambda x: x != target, (x for x in objList if not x.flag)))
Using itertools.dropwhile() and the nth() recipe from the itertools docs.
I'm going to assume you might do any of these three things, and you might do them more than once.
In that case, what you want is basically to write a lazily evaluated list class. It would keep two pieces of data, a real list cache of evaluated items, and a generator of the rest. You could then do ll[10] and it would evaluate up to the 10th item, ll.index('spam') and it would evaluate until it finds 'spam', and then len(ll) and it would evaluate the rest of the list, all the while caching in the real list what it sees so nothing is done more than once.
Constructing it would look like this:
LazyList(x for x in obj_list if not x.flag)
But nothing would actually be computed until you actually start using it as above.
Since you commented that your objList can change, if you don't also need to index or search objList itself, then you might be better off just storing two different lists, one with .flag = True and one with .flag = False. Then you can use the second list directly instead of constructing it with a list comprehension each time.
If this works in your situation, it is likely the most efficient way to do it.

Generate a list of distinct empty mutables

I need to initialize a list of defaultdicts. If they were, say, strings, this would be tidy:
list_of_dds = [string] * n
…but for mutables, you get right into a mess with that approach:
>>> x=[defaultdict(list)] * 3
>>> x[0]['foo'] = 'bar'
>>> x
[defaultdict(<type 'list'>, {'foo': 'bar'}), defaultdict(<type 'list'>, {'foo': 'bar'}), defaultdict(<type 'list'>, {'foo': 'bar'})]
What I do want is an iterable of freshly-minted distinct instances of defaultdicts. I can do this:
list_of_dds = [defaultdict(list) for i in xrange(n)]
but I feel a little dirty using a list comprehension here. I think there's a better approach. Is there? Please tell me what it is.
Edit:
This is why I feel the list comprehension is suboptimal. I'm not usually the pre-optimization type, but I can't bring myself to ignore the speed difference here:
>>> timeit('x=[string.letters]*100', setup='import string')
0.9318461418151855
>>> timeit('x=[string.letters for i in xrange(100)]', setup='import string')
12.606678009033203
>>> timeit('x=[[]]*100')
0.890861988067627
>>> timeit('x=[[] for i in xrange(100)]')
9.716886043548584
Your approach using the list comprehension is correct. Why do you think it's dirty? What you want is a list of things whose length is defined by some base set. List comprehensions create lists based on some base set. What's wrong with using a list comprehension here?
Edit: The speed difference is a direct consequence of what you are trying to do. [[]]*100 is faster, because it only has to create one list. Creating a new list each time is slower, yeah, but you have to expect it to be slower if you actually want 100 different lists.
(It doesn't create a new string each time on your string examples, but it's still slower, because the list comprehension can't "know" ahead of time that all the elements are going to be the same, so it still has to reevaluate the expression every time. I don't know the internal details of the list comp, but it's possible there's also some list-resizing overhead because it doesn't necessarily know the size of the index iterable to start with, so it can't preallocate the list. In addition, note that some of the slowdown in your string example is due to looking up string.letters on every iteration. On my system using timeit.timeit('x=[letters for i in xrange(100)]', setup='from string import letters') instead --- looking up string.letters only once --- cuts the time by about 30%.)
The list comprehension is exactly what you should use.
The problem with the list multiplication is that the list containing a single mutable object is created and then you try to duplicate it. But by trying to duplicate the object from the object itself, the code used to create it is no longer relevant. Nothing you do with the object is going to do what you want, which is run the code used to create it N times, because the object has no idea what code was used to create it.
You could use copy.copy or copy.deepcopy to duplicate it, but that puts you right back in the same boat because then the call to copy/deepcopy just becomes the code you need to run N times.
A list comprehension is a very good fit here. What's wrong with it?

Difference between two "contains" operations for python lists

I'm fairly new to python and have found that I need to query a list about whether it contains a certain item.
The majority of the postings I have seen on various websites (including this similar stackoverflow question) have all suggested something along the lines of
for i in list
if i == thingIAmLookingFor
return True
However, I have also found from one lone forum that
if thingIAmLookingFor in list
# do work
works.
I am wondering if the if thing in list method is shorthand for the for i in list method, or if it is implemented differently.
I would also like to which, if either, is more preferred.
In your simple example it is of course better to use in.
However... in the question you link to, in doesn't work (at least not directly) because the OP does not want to find an object that is equal to something, but an object whose attribute n is equal to something.
One answer does mention using in on a list comprehension, though I'm not sure why a generator expression wasn't used instead:
if 5 in (data.n for data in myList):
print "Found it"
But this is hardly much of an improvement over the other approaches, such as this one using any:
if any(data.n == 5 for data in myList):
print "Found it"
the "if x in thing:" format is strongly preferred, not just because it takes less code, but it also works on other data types and is (to me) easier to read.
I'm not sure how it's implemented, but I'd expect it to be quite a lot more efficient on datatypes that are stored in a more searchable form. eg. sets or dictionary keys.
The if thing in somelist is the preferred and fastest way.
Under-the-hood that use of the in-operator translates to somelist.__contains__(thing) whose implementation is equivalent to: any((x is thing or x == thing) for x in somelist).
Note the condition tests identity and then equality.
for i in list
if i == thingIAmLookingFor
return True
The above is a terrible way to test whether an item exists in a collection. It returns True from the function, so if you need the test as part of some code you'd need to move this into a separate utility function, or add thingWasFound = False before the loop and set it to True in the if statement (and then break), either of which is several lines of boilerplate for what could be a simple expression.
Plus, if you just use thingIAmLookingFor in list, this might execute more efficiently by doing fewer Python level operations (it'll need to do the same operations, but maybe in C, as list is a builtin type). But even more importantly, if list is actually bound to some other collection like a set or a dictionary thingIAmLookingFor in list will use the hash lookup mechanism such types support and be much more efficient, while using a for loop will force Python to go through every item in turn.
Obligatory post-script: list is a terrible name for a variable that contains a list as it shadows the list builtin, which can confuse you or anyone who reads your code. You're much better off naming it something that tells you something about what it means.

Categories

Resources