This question's answers are a community effort. Edit existing answers to improve this post. It is not currently accepting new answers or interactions.
I'm looking for a string.contains or string.indexof method in Python.
I want to do:
if not somestring.contains("blah"):
continue
Use the in operator:
if "blah" not in somestring:
continue
If it's just a substring search you can use string.find("substring").
You do have to be a little careful with find, index, and in though, as they are substring searches. In other words, this:
s = "This be a string"
if s.find("is") == -1:
print("No 'is' here!")
else:
print("Found 'is' in the string.")
It would print Found 'is' in the string. Similarly, if "is" in s: would evaluate to True. This may or may not be what you want.
Does Python have a string contains substring method?
99% of use cases will be covered using the keyword, in, which returns True or False:
'substring' in any_string
For the use case of getting the index, use str.find (which returns -1 on failure, and has optional positional arguments):
start = 0
stop = len(any_string)
any_string.find('substring', start, stop)
or str.index (like find but raises ValueError on failure):
start = 100
end = 1000
any_string.index('substring', start, end)
Explanation
Use the in comparison operator because
the language intends its usage, and
other Python programmers will expect you to use it.
>>> 'foo' in '**foo**'
True
The opposite (complement), which the original question asked for, is not in:
>>> 'foo' not in '**foo**' # returns False
False
This is semantically the same as not 'foo' in '**foo**' but it's much more readable and explicitly provided for in the language as a readability improvement.
Avoid using __contains__
The "contains" method implements the behavior for in. This example,
str.__contains__('**foo**', 'foo')
returns True. You could also call this function from the instance of the superstring:
'**foo**'.__contains__('foo')
But don't. Methods that start with underscores are considered semantically non-public. The only reason to use this is when implementing or extending the in and not in functionality (e.g. if subclassing str):
class NoisyString(str):
def __contains__(self, other):
print(f'testing if "{other}" in "{self}"')
return super(NoisyString, self).__contains__(other)
ns = NoisyString('a string with a substring inside')
and now:
>>> 'substring' in ns
testing if "substring" in "a string with a substring inside"
True
Don't use find and index to test for "contains"
Don't use the following string methods to test for "contains":
>>> '**foo**'.index('foo')
2
>>> '**foo**'.find('foo')
2
>>> '**oo**'.find('foo')
-1
>>> '**oo**'.index('foo')
Traceback (most recent call last):
File "<pyshell#40>", line 1, in <module>
'**oo**'.index('foo')
ValueError: substring not found
Other languages may have no methods to directly test for substrings, and so you would have to use these types of methods, but with Python, it is much more efficient to use the in comparison operator.
Also, these are not drop-in replacements for in. You may have to handle the exception or -1 cases, and if they return 0 (because they found the substring at the beginning) the boolean interpretation is False instead of True.
If you really mean not any_string.startswith(substring) then say it.
Performance comparisons
We can compare various ways of accomplishing the same goal.
import timeit
def in_(s, other):
return other in s
def contains(s, other):
return s.__contains__(other)
def find(s, other):
return s.find(other) != -1
def index(s, other):
try:
s.index(other)
except ValueError:
return False
else:
return True
perf_dict = {
'in:True': min(timeit.repeat(lambda: in_('superstring', 'str'))),
'in:False': min(timeit.repeat(lambda: in_('superstring', 'not'))),
'__contains__:True': min(timeit.repeat(lambda: contains('superstring', 'str'))),
'__contains__:False': min(timeit.repeat(lambda: contains('superstring', 'not'))),
'find:True': min(timeit.repeat(lambda: find('superstring', 'str'))),
'find:False': min(timeit.repeat(lambda: find('superstring', 'not'))),
'index:True': min(timeit.repeat(lambda: index('superstring', 'str'))),
'index:False': min(timeit.repeat(lambda: index('superstring', 'not'))),
}
And now we see that using in is much faster than the others.
Less time to do an equivalent operation is better:
>>> perf_dict
{'in:True': 0.16450627865128808,
'in:False': 0.1609668098178645,
'__contains__:True': 0.24355481654697542,
'__contains__:False': 0.24382793854783813,
'find:True': 0.3067379407923454,
'find:False': 0.29860888058124146,
'index:True': 0.29647137792585454,
'index:False': 0.5502287584545229}
How can in be faster than __contains__ if in uses __contains__?
This is a fine follow-on question.
Let's disassemble functions with the methods of interest:
>>> from dis import dis
>>> dis(lambda: 'a' in 'b')
1 0 LOAD_CONST 1 ('a')
2 LOAD_CONST 2 ('b')
4 COMPARE_OP 6 (in)
6 RETURN_VALUE
>>> dis(lambda: 'b'.__contains__('a'))
1 0 LOAD_CONST 1 ('b')
2 LOAD_METHOD 0 (__contains__)
4 LOAD_CONST 2 ('a')
6 CALL_METHOD 1
8 RETURN_VALUE
so we see that the .__contains__ method has to be separately looked up and then called from the Python virtual machine - this should adequately explain the difference.
if needle in haystack: is the normal use, as #Michael says -- it relies on the in operator, more readable and faster than a method call.
If you truly need a method instead of an operator (e.g. to do some weird key= for a very peculiar sort...?), that would be 'haystack'.__contains__. But since your example is for use in an if, I guess you don't really mean what you say;-). It's not good form (nor readable, nor efficient) to use special methods directly -- they're meant to be used, instead, through the operators and builtins that delegate to them.
in Python strings and lists
Here are a few useful examples that speak for themselves concerning the in method:
>>> "foo" in "foobar"
True
>>> "foo" in "Foobar"
False
>>> "foo" in "Foobar".lower()
True
>>> "foo".capitalize() in "Foobar"
True
>>> "foo" in ["bar", "foo", "foobar"]
True
>>> "foo" in ["fo", "o", "foobar"]
False
>>> ["foo" in a for a in ["fo", "o", "foobar"]]
[False, False, True]
Caveat. Lists are iterables, and the in method acts on iterables, not just strings.
If you want to compare strings in a more fuzzy way to measure how "alike" they are, consider using the Levenshtein package
Here's an answer that shows how it works.
If you are happy with "blah" in somestring but want it to be a function/method call, you can probably do this
import operator
if not operator.contains(somestring, "blah"):
continue
All operators in Python can be more or less found in the operator module including in.
So apparently there is nothing similar for vector-wise comparison. An obvious Python way to do so would be:
names = ['bob', 'john', 'mike']
any(st in 'bob and john' for st in names)
>> True
any(st in 'mary and jane' for st in names)
>> False
You can use y.count().
It will return the integer value of the number of times a sub string appears in a string.
For example:
string.count("bah") >> 0
string.count("Hello") >> 1
Here is your answer:
if "insert_char_or_string_here" in "insert_string_to_search_here":
#DOSTUFF
For checking if it is false:
if not "insert_char_or_string_here" in "insert_string_to_search_here":
#DOSTUFF
OR:
if "insert_char_or_string_here" not in "insert_string_to_search_here":
#DOSTUFF
You can use regular expressions to get the occurrences:
>>> import re
>>> print(re.findall(r'( |t)', to_search_in)) # searches for t or space
['t', ' ', 't', ' ', ' ']
Related
In Python, I can see that the keyword in can be effectively used to check for sub-string like: str1 in str2
Is there anyway to do the reversed version of in, like having a (assuming) keyword contain to indicate str2 contain str1? This is for the flow of writing code like:
if str contain "foo":
return 1
elif str contain "bar":
return 2
elif str contain "boo":
return 3
elif str contain "far":
return 4
else
return 5
Personally, the (assuming) above code is more readable than using in:
if "foo" in str:
return 1
elif "bar" in str:
return 2
elif "boo" in str:
return 3
elif "far" in str:
return 4
else
return 5
In fact, there is a __contains__ method that works much like you described:
>>> 'the quick brown fox'.__contains__('quick')
True
There is also a contains function without the underscores:
>>> from operator import contains
>>> contains('the quick brown fox', 'quick')
True
You could use str.find method which Return -1 on failure:
if str.find('foo') != -1:
# do something
Example:
"asdfoo".find('foo')
# 3
"asdfo".find('foo')
# -1
The proposed contain keyword doesn't add any capability to the language. Perhaps you find it more readable, but the in operator exists in other languages. Also, note that "contain" is a verb that does not read so well when it's not properly conjugated.
Consider applying your reversal principle to other contexts, such as assignment: to be more readable, we should compute the value on the left, and then give the destination on the right.
I read this question
python: how to identify if a variable is an array or a scalar
but when using the following code I get a false on an np.array as can be demonstrated below.
import collections
isinstance(np.arange(10), collections.Sequence)
# returns false
I find it a bit annoying that I can't do len(1) and simply get 1.
The only work around I can think of is a try except statement such as the following:
a = 1
try:
print len(a)
except TypeError:
print 1
Is there a more Pythonic way to do this?
collections.Sequence only applies to sequence objects, which are a very specific type of iterable object. Incidentally, a numpy.ndarray (which is returned by numpy.arange) is not a sequence.
You need to test for either collections.Iterable, which represents any iterable object:
>>> isinstance([1, 2, 3], collections.Iterable)
True
>> isinstance(np.arange(10), collections.Iterable)
True
>>> isinstance(1, collections.Iterable)
False
>>>
or collections.Sized, which represents any object that works with len:
>>> isinstance([1, 2, 3], collections.Sized)
True
>>> isinstance(np.arange(10), collections.Sized)
True
>>> isinstance(1, collections.Sized)
False
>>>
You can then use a conditional expression or similar to do what you want:
print len(a) if isinstance(a, collections.Iterable) else 1
print len(a) if isinstance(a, collections.Sized) else 1
For a complete list of the available abstract base classes in the collections module, see Collections Abstract Base Classes in the Python docs.
I'll just throw in another potential option:
length = getattr(obj, '__len__', lambda:1)()
So get either the __len__ method from the object, or a function that always returns 1, then call it to get your result.
I wouldn't say it's Pythonic, but avoids an import and exception handling. However, I'd still go with comparing if it's a collections.Sized and a conditional statement and put it in a helper function called len_or_1 or something.
Although this isn't pythonic as it uses numpy here is another neat way to make this work:
import numpy as np
a = 1
aSh = np.shape(a)
if len(aSh) == 0:
print 1
else:
print max(aSh)
which gives a behaviour that should work with scalars, lists and matrices.
Found this little oddity while playing around.
>>> 'Hello' == ('Hello' or 'World')
True
>>> 'Hello' == ('World' or 'Hello')
False
>>> 'Hello' == ('Hello' and 'World')
False
>>> 'Hello' == ('World' and 'Hello')
True
Is there some trick to this logic that I'm not getting? Why is the order of the strings the determining factor of these queries? Should I not be using parentheses at all? Why does changing to "and" flip the outputs?
Thanks a buncharooni.
In Python, all objects may be considered "truthy" or "falsy". Python uses this fact to create a sneaky shortcut when evaluating boolean logic. If it encounters a value that would allow the logic to "short-circuit", such as a True at the beginning of an or, or a False at the start of an and, it simply returns the definitive value. This works because that value itself evaluates appropriately to be truthy or falsy, and therefore whatever boolean context it's being used in continues to function as expected. In fact, such operations always just return the first value they encounter that allows them to fully evaluate the expression (even if it's the last value).
# "short-circuit" behavior
>>> 2 or 0
2
>>> 0 and 2
0
# "normal" (fully-evaluated) behavior
>>> 'cat' and 'dog'
'dog'
>>> 0 or 2
2
x or y returns the first operand if its truthy, otherwise returns
the second operand.
x and y returns the first operand if its
falsey, otherwise returns the second operand.
For what it looks like you're trying to accomplish you may prefer this:
'Hello' in ['Hello', 'World']
My python IDE PyCharm by defaults suggests to change the following line of python:
if variable != 0:
to
if variable is not 0:
Why does it suggest this? Does it matter at all for the execution (i.e. does this behave different for any edge cases)?
It's a bug. You should not test integers by identity. Although it may work ok for small integers, it's just an implementation detail.
If you were checking variable is False, that would be ok. Perhaps the IDE is tripped up by the semantics
The != operator checks for non equality of value. The is operator is used to check for identity. In Python, you cannot have two instances of the same integer literal so the expressions have the same effect. The is not 0 reads more like English which is probably why the IDE is suggesting it (although I wouldn't accept the recommendation).
I did try some analysis. I dumped the bytecode for both the expressions and can't see any difference in the opcodes. One has COMPARE_OP 3 (!=) and the other has COMPARE_OP 9 (is not). They're the same. I then tried some performance runs and found that time taken is negligibly higher for the !=.
is not should be preferred if your matching object's identity not equality.
see these examples
>>> a=[1,2,3]
>>> b=[1,2,3] #both are eqaul
>>> if a is not b:
print('they are eqaul but they are not the same object')
they are eqaul but they are not the same object
>>> if a != b:
print('hello') #prints nothing because both have same value
>>> a=100000
>>> b=100000
>>> a is b
False
>>> if a is not b:
print('they are eqaul but they are not the same object')
they are eqaul but they are not the same object
>>> if a!=b:
print('something') #prints nothing as != matches their value not identity
But if the numbers stored in a and b are small integers or small strings then a is not b will not work as python does some caching, and they both point to the same object.
>>> a=2
>>> b=2
>>> a is b
True
>>> a='wow'
>>> b='wow'
>>> a is b
True
>>> a=9999999
>>> b=9999999
>>> a is b
False
The operator "is not" is checking for object identity and the operator != checks for object equality. I do not think there you should do this in your case but maybe your ide suggests this for the general case?
This question already has answers here:
Is there a difference between "==" and "is"?
(13 answers)
Closed 1 year ago.
Is it better to use the "is" operator or the "==" operator to compare two numbers in Python?
Examples:
>>> a = 1
>>> a is 1
True
>>> a == 1
True
>>> a is 0
False
>>> a == 0
False
Use ==.
Sometimes, on some python implementations, by coincidence, integers from -5 to 256 will work with is (in CPython implementations for instance). But don't rely on this or use it in real programs.
Others have answered your question, but I'll go into a little bit more detail:
Python's is compares identity - it asks the question "is this one thing actually the same object as this other thing" (similar to == in Java). So, there are some times when using is makes sense - the most common one being checking for None. Eg, foo is None. But, in general, it isn't what you want.
==, on the other hand, asks the question "is this one thing logically equivalent to this other thing". For example:
>>> [1, 2, 3] == [1, 2, 3]
True
>>> [1, 2, 3] is [1, 2, 3]
False
And this is true because classes can define the method they use to test for equality:
>>> class AlwaysEqual(object):
... def __eq__(self, other):
... return True
...
>>> always_equal = AlwaysEqual()
>>> always_equal == 42
True
>>> always_equal == None
True
But they cannot define the method used for testing identity (ie, they can't override is).
>>> a = 255556
>>> a == 255556
True
>>> a is 255556
False
I think that should answer it ;-)
The reason is that some often-used objects, such as the booleans True and False, all 1-letter strings and short numbers are allocated once by the interpreter, and each variable containing that object refers to it. Other numbers and larger strings are allocated on demand. The 255556 for instance is allocated three times, every time a different object is created. And therefore, according to is, they are not the same.
That will only work for small numbers and I'm guessing it's also implementation-dependent. Python uses the same object instance for small numbers (iirc <256), but this changes for bigger numbers.
>>> a = 2104214124
>>> b = 2104214124
>>> a == b
True
>>> a is b
False
So you should always use == to compare numbers.
== is what you want, "is" just happens to work on your examples.
>>> 2 == 2.0
True
>>> 2 is 2.0
False
Use ==