How to test if any string is in a list? - python

I'm trying to make an AI. The AI knows to say 'Hello' to 'hi' and to stop the program on 'bye', and if you say something it doesn't know it will ask you to define it. For example, if you say 'Hello' it will ask what that means. You type 'hi' and from then on when you say 'Hello' it will say 'Hello' back. I store everything in a list called knowledge. It works like this:
knowledge = [[term, definition], [term, definition], [term, definition]]
I am trying to add an edit function, where you type edit foo and it will ask for you to input a string, to change the definition of foo. However, I'm stuck. First, of course, I need to test if it already has a definition for foo. But I can't do that. I need to be able to do it regardless of the definition. In other languages, there is typeOf(). However type() doesn't seem to work. Here's what I have, but it doesn't work:
if [term, type(str)] in knowledge:
Can someone help?

As noted by tehhowch in the comments, a dictionary would be more appropriate as these are "key: value" pairs.
Using a dictionary...
knowledge = {'foo': 'foo def', 'bar': 'bar def', 'baz': 'baz def'}
searchTerm= 'foo'
searchTerm in knowledge
Out[1]: True
Storing knowledge as a list of lists fails because each item in knowledge is itself a list. Therefore, searching those lists for a string type (your term) fails. Instead, you could pull the terms out as a separate list and then check that one list for the term you're looking for.
knowledge = [["foo", "foo definition"], ["bar", "bar definition"], ["baz", "baz
definition"]]
terms = [item[0] for item in knowledge]
searchTerm= "foo"
searchTerm in terms
Out[1]: True

As others have mentioned, Python would typically use a dict for this kind of associative array. You approach is analogous to a Lisp data structure called an Association List. These are less efficient than the hashtable structures used by dicts, but they still have some useful properties.
For example, if you look up a key by scanning through the pairs and getting the first one, this means that you can insert another pair with the same key at the front and it will shadow the old value. You don't have to remove it. This makes insertions fast (at least with Lisp-style linked lists). You can also "undo" this operation by deleting the new one, and the old one will then be found by the scanner.
Your check if [term, type(str)] in knowledge: could be made to work as
if [term, str] in ([term, type(definition)] for term, definition in knowledge):
This uses a generator expression to convert your term, definition pairs into term, type(definition) pairs on the fly.

You can use dictionary to store definitions rather than list of lists and python's isinstance function will help you check if term belongs to specific class or not. see below example:
knowledge = {'Hello':'greeting','Hi':'greeting','Bye':'good bye'}
term = "Hello"
if isinstance(term, str):
if term in knowledge:
print("Defination exist")
else:
print("Defination doesn't exist")
else:
print("Entered term is not string")

Related

Fast String "Startswith" Matching for Dict like object

I currently have some code which needs to be very performant, where I am essentially doing a string dictionary key lookup:
class Foo:
def __init__(self):
self.fast_lookup = {"a": 1, "b": 2}
def bar(self, s):
return self.fast_lookup[s]
self.fast_lookup has O(1) lookup time, and there is no try/if etc code that would slow down the lookup
Is there anyway to retain this speed while doing a "startswith" lookup instead? With the code above calling bar on s="az" would result in a key error, if it were changed to a "startswith" implementation then it would return 1.
NB: I am well aware how I could do this with a regex/startswith statement, I am looking for performance specifically for startswith dict lookup
An efficient way to do this would be to use the pyahocorasick module to construct a trie with the possible keys to match, then use the longest_prefix method to determine how much of a given string matches. If no "key" matched, it returns 0, otherwise it will say how much of the string passed exists in the automata.
After installing pyahocorasick, it would look something like:
import ahocorasick
class Foo:
def __init__(self):
self.fast_lookup = ahocorasick.Automaton()
for k, v in {"a": 1, "b": 2}.items():
self.fast_lookup.add_word(k, v)
def bar(self, s):
index = self.fast_lookup.longest_prefix(s)
if not index: # No prefix match at all
raise KeyError(s)
return self.fast_lookup.get(s[:index])
If it turns out the longest prefix doesn't actually map to a value (say, 'cat' is mapped, but you're looking up 'cab', and no other entry actually maps 'ca' or 'cab'), this will die with a KeyError. Tweak as needed to achieve precise behavior desired (you might need to use longest_prefix as a starting point and try to .get() for all substrings of that length or less until you get a hit for instance).
Note that this isn't the primary purpose of Aho-Corasick (it's an efficient way to search for many fixed strings in one or more long strings in a single pass), but tries as a whole are an efficient way to deal with prefix search of this form, and Aho-Corasick is implemented in terms of tries and provides most of the useful features of tries to make it more broadly useful (as in this case).
I dont fully understand the question, but what I would do is try and think of ways to reduce the work the lookup even has to do. If you know the basic searches the startswith is going to do, you can just add those as keys to the dictionary and values that point to the same object. Your dict will get pretty big pretty fast, however it will greatly reduce the lookup i believe. So maybe for a more dynamic method you can add dict keys for the first groups of letters up to three for each entry.
Without activly storing the references for each search, your code will always need to get each dict objects value until it gets one that matches. You cannot reduce that.

Find if any element of list exists in another list in Python

The question is about a quicker, ie. more pythonic, way to test if any of the elements in an iterable exists inside another iterable.
What I am trying to do is something like:
if "foo" in terms or "bar" in terms or "baz" in terms:
pass
But apparently this way repeats the 'in terms' clause and bloats the code, especially when
we are dealing with many more elements. So I wondered whether is a better way to do this in python.
You could also consider in your special case if it is possible to use sets instead of iterables. If both (foobarbaz and terms) are sets, then you can just write
if foobarbaz & terms:
pass
This isn't particularly faster than your way, but it is smaller code by far and thus probably better for reading.
And of course, not all iterables can be converted to sets, so it depends on your usecase.
Figured out a way, posting here for easy reference. Try this:
if any(term in terms for term in ("foo", "bar", "baz")):
pass
Faster than Alfe's answer, since it only tests for, rather than calculates the, intersection
if not set(terms).isdisjoint({'foo', 'bar', 'baz'}):
pass

How do the Python masters recognise and examine contents of Python 3 data structures?

I am a noob to Python.
I constantly find myself looking at a piece of code and trying to work out what is inside a data structure such as for example a dictionary. In fact the first thing I am trying to work out is "what sort of data structure is this?" and THEN I try to work out how to see what is inside it. I look at a variable and say "is this a dict, or a list, or a multidict or something else I'm not yet familiar with?". Then, "What's inside it?". It's consuming vast amounts of time and I just don't know if I'm taking the right approach.
So, the question is, "How do the Python masters find out what sort of data structure something is, and what techniques do they use to see what is inside those data structures?"
I hope the question is not too general but I'm spending ridiculous amounts of time just trying to fix issues with recognizing data structures and viewing their contents, let alone getting useful code written.
thanks heaps.
Using type() function for the variable will tell you the data type. For example:
inventory = {'cows': 4, 'pigs': 3, 'chickens': 5, 'bears': 2}
print(type(inventory))
will print
<class 'dict'>
which means the variable inventory is a dictionary.
Other possible data types are 'str' for string, 'int' for integer, 'float' for float,'tuple' for tuple, and 'bool' for boolean values.
To see what's inside a collection, you can simply use print() function.
aList = [ 'hunger', 'anger', 'burger']
print(aList)
will output
['hunger', 'anger', 'burger']
I usually care more about how a type is /used/ than what exactly a type is.
For example, if an object is used with say:
foo["hey"] = "there"
for key, value in foo.items():
print key, '->', value
Then I assume that 'foo' is some kind of dict-like object, and unless I have reason to investigate further, that's all I care about.
(Note: I'm still in python 2.x land, the syntax is slightly different in python 3.x, however the point remains)
In stead of "what is this?", with Python it can be better to ask "what does this do?" or "how is this used?". If you see something indexed, such as a['foo'], it shouldn't matter whether it is a dictionary or some other object, but simply that it is indexable by a string.
This idea is usually referred to as Duck Typing, so searching for this might give you some useful info. A quick search turned up this article, which seems relevant for you:
http://www.voidspace.org.uk/python/articles/duck_typing.shtml
I put an import pdb;pdb.set_trace() in the relevant place, and once in the debugger I use dir(), .__dict__ and pp, or any other forms of inspection necessary.

Difference between two "contains" operations for python lists

I'm fairly new to python and have found that I need to query a list about whether it contains a certain item.
The majority of the postings I have seen on various websites (including this similar stackoverflow question) have all suggested something along the lines of
for i in list
if i == thingIAmLookingFor
return True
However, I have also found from one lone forum that
if thingIAmLookingFor in list
# do work
works.
I am wondering if the if thing in list method is shorthand for the for i in list method, or if it is implemented differently.
I would also like to which, if either, is more preferred.
In your simple example it is of course better to use in.
However... in the question you link to, in doesn't work (at least not directly) because the OP does not want to find an object that is equal to something, but an object whose attribute n is equal to something.
One answer does mention using in on a list comprehension, though I'm not sure why a generator expression wasn't used instead:
if 5 in (data.n for data in myList):
print "Found it"
But this is hardly much of an improvement over the other approaches, such as this one using any:
if any(data.n == 5 for data in myList):
print "Found it"
the "if x in thing:" format is strongly preferred, not just because it takes less code, but it also works on other data types and is (to me) easier to read.
I'm not sure how it's implemented, but I'd expect it to be quite a lot more efficient on datatypes that are stored in a more searchable form. eg. sets or dictionary keys.
The if thing in somelist is the preferred and fastest way.
Under-the-hood that use of the in-operator translates to somelist.__contains__(thing) whose implementation is equivalent to: any((x is thing or x == thing) for x in somelist).
Note the condition tests identity and then equality.
for i in list
if i == thingIAmLookingFor
return True
The above is a terrible way to test whether an item exists in a collection. It returns True from the function, so if you need the test as part of some code you'd need to move this into a separate utility function, or add thingWasFound = False before the loop and set it to True in the if statement (and then break), either of which is several lines of boilerplate for what could be a simple expression.
Plus, if you just use thingIAmLookingFor in list, this might execute more efficiently by doing fewer Python level operations (it'll need to do the same operations, but maybe in C, as list is a builtin type). But even more importantly, if list is actually bound to some other collection like a set or a dictionary thingIAmLookingFor in list will use the hash lookup mechanism such types support and be much more efficient, while using a for loop will force Python to go through every item in turn.
Obligatory post-script: list is a terrible name for a variable that contains a list as it shadows the list builtin, which can confuse you or anyone who reads your code. You're much better off naming it something that tells you something about what it means.

Using 'in' to match an attribute of Python objects in an array

I don't remember whether I was dreaming or not but I seem to recall there being a function which allowed something like,
foo in iter_attr(array of python objects, attribute name)
I've looked over the docs but this kind of thing doesn't fall under any obvious listed headers
Using a list comprehension would build a temporary list, which could eat all your memory if the sequence being searched is large. Even if the sequence is not large, building the list means iterating over the whole of the sequence before in could start its search.
The temporary list can be avoiding by using a generator expression:
foo = 12
foo in (obj.id for obj in bar)
Now, as long as obj.id == 12 near the start of bar, the search will be fast, even if bar is infinitely long.
As #Matt suggested, it's a good idea to use hasattr if any of the objects in bar can be missing an id attribute:
foo = 12
foo in (obj.id for obj in bar if hasattr(obj, 'id'))
Are you looking to get a list of objects that have a certain attribute? If so, a list comprehension is the right way to do this.
result = [obj for obj in listOfObjs if hasattr(obj, 'attributeName')]
you could always write one yourself:
def iterattr(iterator, attributename):
for obj in iterator:
yield getattr(obj, attributename)
will work with anything that iterates, be it a tuple, list, or whatever.
I love python, it makes stuff like this very simple and no more of a hassle than neccessary, and in use stuff like this is hugely elegant.
No, you were not dreaming. Python has a pretty excellent list comprehension system that lets you manipulate lists pretty elegantly, and depending on exactly what you want to accomplish, this can be done a couple of ways. In essence, what you're doing is saying "For item in list if criteria.matches", and from that you can just iterate through the results or dump the results into a new list.
I'm going to crib an example from Dive Into Python here, because it's pretty elegant and they're smarter than I am. Here they're getting a list of files in a directory, then filtering the list for all files that match a regular expression criteria.
files = os.listdir(path)
test = re.compile("test\.py$", re.IGNORECASE)
files = [f for f in files if test.search(f)]
You could do this without regular expressions, for your example, for anything where your expression at the end returns true for a match. There are other options like using the filter() function, but if I were going to choose, I'd go with this.
Eric Sipple
The function you are thinking of is probably operator.attrgettter. For example, to get a list that contains the value of each object's "id" attribute:
import operator
ids = map(operator.attrgetter("id"), bar)
If you want to check whether the list contains an object with an id == 12, then a neat and efficient (i.e. doesn't iterate the whole list unnecessarily) way to do it is:
any(obj.id == 12 for obj in bar)
If you want to use 'in' with attrgetter, while still retaining lazy iteration of the list:
import operator,itertools
foo = 12
foo in itertools.imap(operator.attrgetter("id"), bar)
What I was thinking of can be achieved using list comprehensions, but I thought that there was a function that did this in a slightly neater way.
i.e. 'bar' is a list of objects, all of which have the attribute 'id'
The mythical functional way:
foo = 12
foo in iter_attr(bar, 'id')
The list comprehension way:
foo = 12
foo in [obj.id for obj in bar]
In retrospect the list comprehension way is pretty neat anyway.
If you plan on searching anything of remotely decent size, your best bet is going to be to use a dictionary or a set. Otherwise, you basically have to iterate through every element of the iterator until you get to the one you want.
If this isn't necessarily performance sensitive code, then the list comprehension way should work. But note that it is fairly inefficient because it goes over every element of the iterator and then goes BACK over it again until it finds what it wants.
Remember, python has one of the most efficient hashing algorithms around. Use it to your advantage.
I think:
#!/bin/python
bar in dict(Foo)
Is what you are thinking of. When trying to see if a certain key exists within a dictionary in python (python's version of a hash table) there are two ways to check. First is the has_key() method attached to the dictionary and second is the example given above. It will return a boolean value.
That should answer your question.
And now a little off topic to tie this in to the list comprehension answer previously given (for a bit more clarity). List Comprehensions construct a list from a basic for loop with modifiers. As an example (to clarify slightly), a way to use the in dict language construct in a list comprehension:
Say you have a two dimensional dictionary foo and you only want the second dimension dictionaries which contain the key bar. A relatively straightforward way to do so would be to use a list comprehension with a conditional as follows:
#!/bin/python
baz = dict([(key, value) for key, value in foo if bar in value])
Note the if bar in value at the end of the statement**, this is a modifying clause which tells the list comprehension to only keep those key-value pairs which meet the conditional.** In this case baz is a new dictionary which contains only the dictionaries from foo which contain bar (Hopefully I didn't miss anything in that code example... you may have to take a look at the list comprehension documentation found in docs.python.org tutorials and at secnetix.de, both sites are good references if you have questions in the future.).

Categories

Resources