What is the most pythonic way to use len on a scalar? - python

I read this question
python: how to identify if a variable is an array or a scalar
but when using the following code I get a false on an np.array as can be demonstrated below.
import collections
isinstance(np.arange(10), collections.Sequence)
# returns false
I find it a bit annoying that I can't do len(1) and simply get 1.
The only work around I can think of is a try except statement such as the following:
a = 1
try:
print len(a)
except TypeError:
print 1
Is there a more Pythonic way to do this?

collections.Sequence only applies to sequence objects, which are a very specific type of iterable object. Incidentally, a numpy.ndarray (which is returned by numpy.arange) is not a sequence.
You need to test for either collections.Iterable, which represents any iterable object:
>>> isinstance([1, 2, 3], collections.Iterable)
True
>> isinstance(np.arange(10), collections.Iterable)
True
>>> isinstance(1, collections.Iterable)
False
>>>
or collections.Sized, which represents any object that works with len:
>>> isinstance([1, 2, 3], collections.Sized)
True
>>> isinstance(np.arange(10), collections.Sized)
True
>>> isinstance(1, collections.Sized)
False
>>>
You can then use a conditional expression or similar to do what you want:
print len(a) if isinstance(a, collections.Iterable) else 1
print len(a) if isinstance(a, collections.Sized) else 1
For a complete list of the available abstract base classes in the collections module, see Collections Abstract Base Classes in the Python docs.

I'll just throw in another potential option:
length = getattr(obj, '__len__', lambda:1)()
So get either the __len__ method from the object, or a function that always returns 1, then call it to get your result.
I wouldn't say it's Pythonic, but avoids an import and exception handling. However, I'd still go with comparing if it's a collections.Sized and a conditional statement and put it in a helper function called len_or_1 or something.

Although this isn't pythonic as it uses numpy here is another neat way to make this work:
import numpy as np
a = 1
aSh = np.shape(a)
if len(aSh) == 0:
print 1
else:
print max(aSh)
which gives a behaviour that should work with scalars, lists and matrices.

Related

How to get the index of an integer from a list if the list contains a boolean?

I am just starting with Python.
How to get index of integer 1 from a list if the list contains a boolean True object before the 1?
>>> lst = [True, False, 1, 3]
>>> lst.index(1)
0
>>> lst.index(True)
0
>>> lst.index(0)
1
I think Python considers 0 as False and 1 as True in the argument of the index method. How can I get the index of integer 1 (i.e. 2)?
Also what is the reasoning or logic behind treating boolean object this way in list?
As from the solutions, I can see it is not so straightforward.
The documentation says that
Lists are mutable sequences, typically used to store collections of
homogeneous items (where the precise degree of similarity will vary by
application).
You shouldn't store heterogeneous data in lists.
The implementation of list.index only performs the comparison using Py_EQ (== operator). In your case that comparison returns truthy value because True and False have values of the integers 1 and 0, respectively (the bool class is a subclass of int after all).
However, you could use generator expression and the built-in next function (to get the first value from the generator) like this:
In [4]: next(i for i, x in enumerate(lst) if not isinstance(x, bool) and x == 1)
Out[4]: 2
Here we check if x is an instance of bool before comparing x to 1.
Keep in mind that next can raise StopIteration, in that case it may be desired to (re-)raise ValueError (to mimic the behavior of list.index).
Wrapping this all in a function:
def index_same_type(it, val):
gen = (i for i, x in enumerate(it) if type(x) is type(val) and x == val)
try:
return next(gen)
except StopIteration:
raise ValueError('{!r} is not in iterable'.format(val)) from None
Some examples:
In [34]: index_same_type(lst, 1)
Out[34]: 2
In [35]: index_same_type(lst, True)
Out[35]: 0
In [37]: index_same_type(lst, 42)
ValueError: 42 is not in iterable
Booleans are integers in Python, and this is why you can use them just like any integer:
>>> 1 + True
2
>>> [1][False]
1
[this doesn't mean you should :)]
This is due to the fact that bool is a subclass of int, and almost always a boolean will behave just like 0 or 1 (except when it is cast to string - you will get "False" and "True" instead).
Here is one more idea how you can achieve what you want (however, try to rethink you logic taking into account information above):
>>> class force_int(int):
... def __eq__(self, other):
... return int(self) == other and not isinstance(other, bool)
...
>>> force_int(1) == True
False
>>> lst.index(force_int(1))
2
This code redefines int's method, which is used to compare elements in the index method, to ignore booleans.
Here is a very simple naive one-liner solution using map and zip:
>>> zip(map(type, lst), lst).index((int, 1))
2
Here we map the type of each element and create a new list by zipping the types with the elements and ask for the index of (type, value).
And here is a generic iterative solution using the same technique:
>>> from itertools import imap, izip
>>> def index(xs, x):
... it = (i for i, (t, e) in enumerate(izip(imap(type, xs), xs)) if (t, e) == x)
... try:
... return next(it)
... except StopIteration:
... raise ValueError(x)
...
>>> index(lst, (int, 1))
2
Here we basically do the same thing but iteratively so as to not cost us much in terms of memory/space efficiency. We an iterator of the same expression from above but using imap and izip instead and build a custom index function that returns the next value from the iterator or a raise a ValueError if there is no match.
Try to this.
for i, j in enumerate([True, False, 1, 3]):
if not isinstance(j, bool) and j == 1:
print i
Output:
2

Implementing .all() for a list of booleans?

Numpy has a great method .all() for arrays of booleans, that tests if all the values are true. I'd like to do the same without adding numpy to my project. Is there something similar in the standard libary? Otherwise, how would you implement it?
I can of course think of the obvious way to do it:
def all_true(list_of_booleans):
for v in list_of_booleans:
if not v:
return False
return True
Is there a more elegant way, perhaps a one-liner?
There is; it is called all(), surprisingly. It is implemented exactly as you describe, albeit in C. Quoting the docs:
Return True if all elements of the iterable are true (or if the
iterable is empty). Equivalent to:
def all(iterable):
for element in iterable:
if not element:
return False
return True
New in version 2.5.
This is not limited to just booleans. Note that this takes an iterable; passing in a generator expression means only enough of the generator expression is going to be evaluated to test the hypothesis:
>>> from itertools import count
>>> c = count()
>>> all(i < 10 for i in c)
False
>>> next(c)
11
There is an equivalent any() function as well.
There is a similar function, called all().

Python Conceptual Query [duplicate]

This question's answers are a community effort. Edit existing answers to improve this post. It is not currently accepting new answers or interactions.
My Google-fu has failed me.
In Python, are the following two tests for equality equivalent?
n = 5
# Test one.
if n == 5:
print 'Yay!'
# Test two.
if n is 5:
print 'Yay!'
Does this hold true for objects where you would be comparing instances (a list say)?
Okay, so this kind of answers my question:
L = []
L.append(1)
if L == [1]:
print 'Yay!'
# Holds true, but...
if L is [1]:
print 'Yay!'
# Doesn't.
So == tests value where is tests to see if they are the same object?
is will return True if two variables point to the same object (in memory), == if the objects referred to by the variables are equal.
>>> a = [1, 2, 3]
>>> b = a
>>> b is a
True
>>> b == a
True
# Make a new copy of list `a` via the slice operator,
# and assign it to variable `b`
>>> b = a[:]
>>> b is a
False
>>> b == a
True
In your case, the second test only works because Python caches small integer objects, which is an implementation detail. For larger integers, this does not work:
>>> 1000 is 10**3
False
>>> 1000 == 10**3
True
The same holds true for string literals:
>>> "a" is "a"
True
>>> "aa" is "a" * 2
True
>>> x = "a"
>>> "aa" is x * 2
False
>>> "aa" is intern(x*2)
True
Please see this question as well.
There is a simple rule of thumb to tell you when to use == or is.
== is for value equality. Use it when you would like to know if two objects have the same value.
is is for reference equality. Use it when you would like to know if two references refer to the same object.
In general, when you are comparing something to a simple type, you are usually checking for value equality, so you should use ==. For example, the intention of your example is probably to check whether x has a value equal to 2 (==), not whether x is literally referring to the same object as 2.
Something else to note: because of the way the CPython reference implementation works, you'll get unexpected and inconsistent results if you mistakenly use is to compare for reference equality on integers:
>>> a = 500
>>> b = 500
>>> a == b
True
>>> a is b
False
That's pretty much what we expected: a and b have the same value, but are distinct entities. But what about this?
>>> c = 200
>>> d = 200
>>> c == d
True
>>> c is d
True
This is inconsistent with the earlier result. What's going on here? It turns out the reference implementation of Python caches integer objects in the range -5..256 as singleton instances for performance reasons. Here's an example demonstrating this:
>>> for i in range(250, 260): a = i; print "%i: %s" % (i, a is int(str(i)));
...
250: True
251: True
252: True
253: True
254: True
255: True
256: True
257: False
258: False
259: False
This is another obvious reason not to use is: the behavior is left up to implementations when you're erroneously using it for value equality.
Is there a difference between == and is in Python?
Yes, they have a very important difference.
==: check for equality - the semantics are that equivalent objects (that aren't necessarily the same object) will test as equal. As the documentation says:
The operators <, >, ==, >=, <=, and != compare the values of two objects.
is: check for identity - the semantics are that the object (as held in memory) is the object. Again, the documentation says:
The operators is and is not test for object identity: x is y is true
if and only if x and y are the same object. Object identity is
determined using the id() function. x is not y yields the inverse
truth value.
Thus, the check for identity is the same as checking for the equality of the IDs of the objects. That is,
a is b
is the same as:
id(a) == id(b)
where id is the builtin function that returns an integer that "is guaranteed to be unique among simultaneously existing objects" (see help(id)) and where a and b are any arbitrary objects.
Other Usage Directions
You should use these comparisons for their semantics. Use is to check identity and == to check equality.
So in general, we use is to check for identity. This is usually useful when we are checking for an object that should only exist once in memory, referred to as a "singleton" in the documentation.
Use cases for is include:
None
enum values (when using Enums from the enum module)
usually modules
usually class objects resulting from class definitions
usually function objects resulting from function definitions
anything else that should only exist once in memory (all singletons, generally)
a specific object that you want by identity
Usual use cases for == include:
numbers, including integers
strings
lists
sets
dictionaries
custom mutable objects
other builtin immutable objects, in most cases
The general use case, again, for ==, is the object you want may not be the same object, instead it may be an equivalent one
PEP 8 directions
PEP 8, the official Python style guide for the standard library also mentions two use-cases for is:
Comparisons to singletons like None should always be done with is or
is not, never the equality operators.
Also, beware of writing if x when you really mean if x is not None --
e.g. when testing whether a variable or argument that defaults to None
was set to some other value. The other value might have a type (such
as a container) that could be false in a boolean context!
Inferring equality from identity
If is is true, equality can usually be inferred - logically, if an object is itself, then it should test as equivalent to itself.
In most cases this logic is true, but it relies on the implementation of the __eq__ special method. As the docs say,
The default behavior for equality comparison (== and !=) is based on
the identity of the objects. Hence, equality comparison of instances
with the same identity results in equality, and equality comparison of
instances with different identities results in inequality. A
motivation for this default behavior is the desire that all objects
should be reflexive (i.e. x is y implies x == y).
and in the interests of consistency, recommends:
Equality comparison should be reflexive. In other words, identical
objects should compare equal:
x is y implies x == y
We can see that this is the default behavior for custom objects:
>>> class Object(object): pass
>>> obj = Object()
>>> obj2 = Object()
>>> obj == obj, obj is obj
(True, True)
>>> obj == obj2, obj is obj2
(False, False)
The contrapositive is also usually true - if somethings test as not equal, you can usually infer that they are not the same object.
Since tests for equality can be customized, this inference does not always hold true for all types.
An exception
A notable exception is nan - it always tests as not equal to itself:
>>> nan = float('nan')
>>> nan
nan
>>> nan is nan
True
>>> nan == nan # !!!!!
False
Checking for identity can be much a much quicker check than checking for equality (which might require recursively checking members).
But it cannot be substituted for equality where you may find more than one object as equivalent.
Note that comparing equality of lists and tuples will assume that identity of objects are equal (because this is a fast check). This can create contradictions if the logic is inconsistent - as it is for nan:
>>> [nan] == [nan]
True
>>> (nan,) == (nan,)
True
A Cautionary Tale:
The question is attempting to use is to compare integers. You shouldn't assume that an instance of an integer is the same instance as one obtained by another reference. This story explains why.
A commenter had code that relied on the fact that small integers (-5 to 256 inclusive) are singletons in Python, instead of checking for equality.
Wow, this can lead to some insidious bugs. I had some code that checked if a is b, which worked as I wanted because a and b are typically small numbers. The bug only happened today, after six months in production, because a and b were finally large enough to not be cached. – gwg
It worked in development. It may have passed some unittests.
And it worked in production - until the code checked for an integer larger than 256, at which point it failed in production.
This is a production failure that could have been caught in code review or possibly with a style-checker.
Let me emphasize: do not use is to compare integers.
== determines if the values are equal, while is determines if they are the exact same object.
What's the difference between is and ==?
== and is are different comparison! As others already said:
== compares the values of the objects.
is compares the references of the objects.
In Python names refer to objects, for example in this case value1 and value2 refer to an int instance storing the value 1000:
value1 = 1000
value2 = value1
Because value2 refers to the same object is and == will give True:
>>> value1 == value2
True
>>> value1 is value2
True
In the following example the names value1 and value2 refer to different int instances, even if both store the same integer:
>>> value1 = 1000
>>> value2 = 1000
Because the same value (integer) is stored == will be True, that's why it's often called "value comparison". However is will return False because these are different objects:
>>> value1 == value2
True
>>> value1 is value2
False
When to use which?
Generally is is a much faster comparison. That's why CPython caches (or maybe reuses would be the better term) certain objects like small integers, some strings, etc. But this should be treated as implementation detail that could (even if unlikely) change at any point without warning.
You should only use is if you:
want to check if two objects are really the same object (not just the same "value"). One example can be if you use a singleton object as constant.
want to compare a value to a Python constant. The constants in Python are:
None
True1
False1
NotImplemented
Ellipsis
__debug__
classes (for example int is int or int is float)
there could be additional constants in built-in modules or 3rd party modules. For example np.ma.masked from the NumPy module)
In every other case you should use == to check for equality.
Can I customize the behavior?
There is some aspect to == that hasn't been mentioned already in the other answers: It's part of Pythons "Data model". That means its behavior can be customized using the __eq__ method. For example:
class MyClass(object):
def __init__(self, val):
self._value = val
def __eq__(self, other):
print('__eq__ method called')
try:
return self._value == other._value
except AttributeError:
raise TypeError('Cannot compare {0} to objects of type {1}'
.format(type(self), type(other)))
This is just an artificial example to illustrate that the method is really called:
>>> MyClass(10) == MyClass(10)
__eq__ method called
True
Note that by default (if no other implementation of __eq__ can be found in the class or the superclasses) __eq__ uses is:
class AClass(object):
def __init__(self, value):
self._value = value
>>> a = AClass(10)
>>> b = AClass(10)
>>> a == b
False
>>> a == a
So it's actually important to implement __eq__ if you want "more" than just reference-comparison for custom classes!
On the other hand you cannot customize is checks. It will always compare just if you have the same reference.
Will these comparisons always return a boolean?
Because __eq__ can be re-implemented or overridden, it's not limited to return True or False. It could return anything (but in most cases it should return a boolean!).
For example with NumPy arrays the == will return an array:
>>> import numpy as np
>>> np.arange(10) == 2
array([False, False, True, False, False, False, False, False, False, False], dtype=bool)
But is checks will always return True or False!
1 As Aaron Hall mentioned in the comments:
Generally you shouldn't do any is True or is False checks because one normally uses these "checks" in a context that implicitly converts the condition to a boolean (for example in an if statement). So doing the is True comparison and the implicit boolean cast is doing more work than just doing the boolean cast - and you limit yourself to booleans (which isn't considered pythonic).
Like PEP8 mentions:
Don't compare boolean values to True or False using ==.
Yes: if greeting:
No: if greeting == True:
Worse: if greeting is True:
They are completely different. is checks for object identity, while == checks for equality (a notion that depends on the two operands' types).
It is only a lucky coincidence that "is" seems to work correctly with small integers (e.g. 5 == 4+1). That is because CPython optimizes the storage of integers in the range (-5 to 256) by making them singletons. This behavior is totally implementation-dependent and not guaranteed to be preserved under all manner of minor transformative operations.
For example, Python 3.5 also makes short strings singletons, but slicing them disrupts this behavior:
>>> "foo" + "bar" == "foobar"
True
>>> "foo" + "bar" is "foobar"
True
>>> "foo"[:] + "bar" == "foobar"
True
>>> "foo"[:] + "bar" is "foobar"
False
https://docs.python.org/library/stdtypes.html#comparisons
is tests for identity
== tests for equality
Each (small) integer value is mapped to a single value, so every 3 is identical and equal. This is an implementation detail, not part of the language spec though
Your answer is correct. The is operator compares the identity of two objects. The == operator compares the values of two objects.
An object's identity never changes once it has been created; you may think of it as the object's address in memory.
You can control comparison behaviour of object values by defining a __cmp__ method or a rich comparison method like __eq__.
Have a look at Stack Overflow question Python's β€œis” operator behaves unexpectedly with integers.
What it mostly boils down to is that "is" checks to see if they are the same object, not just equal to each other (the numbers below 256 are a special case).
In a nutshell, is checks whether two references point to the same object or not.== checks whether two objects have the same value or not.
a=[1,2,3]
b=a #a and b point to the same object
c=list(a) #c points to different object
if a==b:
print('#') #output:#
if a is b:
print('##') #output:##
if a==c:
print('###') #output:##
if a is c:
print('####') #no output as c and a point to different object
As the other people in this post answer the question in details the difference between == and is for comparing Objects or variables, I would emphasize mainly the comparison between is and == for strings which can give different results and I would urge programmers to carefully use them.
For string comparison, make sure to use == instead of is:
str = 'hello'
if (str is 'hello'):
print ('str is hello')
if (str == 'hello'):
print ('str == hello')
Out:
str is hello
str == hello
But in the below example == and is will get different results:
str2 = 'hello sam'
if (str2 is 'hello sam'):
print ('str2 is hello sam')
if (str2 == 'hello sam'):
print ('str2 == hello sam')
Out:
str2 == hello sam
Conclusion and Analysis:
Use is carefully to compare between strings.
Since is for comparing objects and since in Python 3+ every variable such as string interpret as an object, let's see what happened in above paragraphs.
In python there is id function that shows a unique constant of an object during its lifetime. This id is using in back-end of Python interpreter to compare two objects using is keyword.
str = 'hello'
id('hello')
> 140039832615152
id(str)
> 140039832615152
But
str2 = 'hello sam'
id('hello sam')
> 140039832615536
id(str2)
> 140039832615792
As John Feminella said, most of the time you will use == and != because your objective is to compare values. I'd just like to categorise what you would do the rest of the time:
There is one and only one instance of NoneType i.e. None is a singleton. Consequently foo == None and foo is None mean the same. However the is test is faster and the Pythonic convention is to use foo is None.
If you are doing some introspection or mucking about with garbage collection or checking whether your custom-built string interning gadget is working or suchlike, then you probably have a use-case for foo is bar.
True and False are also (now) singletons, but there is no use-case for foo == True and no use case for foo is True.
Most of them already answered to the point. Just as an additional note (based on my understanding and experimenting but not from a documented source), the statement
== if the objects referred to by the variables are equal
from above answers should be read as
== if the objects referred to by the variables are equal and objects belonging to the same type/class
. I arrived at this conclusion based on the below test:
list1 = [1,2,3,4]
tuple1 = (1,2,3,4)
print(list1)
print(tuple1)
print(id(list1))
print(id(tuple1))
print(list1 == tuple1)
print(list1 is tuple1)
Here the contents of the list and tuple are same but the type/class are different.

Why does my IDE suggest to rewrite != 0 to is not 0

My python IDE PyCharm by defaults suggests to change the following line of python:
if variable != 0:
to
if variable is not 0:
Why does it suggest this? Does it matter at all for the execution (i.e. does this behave different for any edge cases)?
It's a bug. You should not test integers by identity. Although it may work ok for small integers, it's just an implementation detail.
If you were checking variable is False, that would be ok. Perhaps the IDE is tripped up by the semantics
The != operator checks for non equality of value. The is operator is used to check for identity. In Python, you cannot have two instances of the same integer literal so the expressions have the same effect. The is not 0 reads more like English which is probably why the IDE is suggesting it (although I wouldn't accept the recommendation).
I did try some analysis. I dumped the bytecode for both the expressions and can't see any difference in the opcodes. One has COMPARE_OP 3 (!=) and the other has COMPARE_OP 9 (is not). They're the same. I then tried some performance runs and found that time taken is negligibly higher for the !=.
is not should be preferred if your matching object's identity not equality.
see these examples
>>> a=[1,2,3]
>>> b=[1,2,3] #both are eqaul
>>> if a is not b:
print('they are eqaul but they are not the same object')
they are eqaul but they are not the same object
>>> if a != b:
print('hello') #prints nothing because both have same value
>>> a=100000
>>> b=100000
>>> a is b
False
>>> if a is not b:
print('they are eqaul but they are not the same object')
they are eqaul but they are not the same object
>>> if a!=b:
print('something') #prints nothing as != matches their value not identity
But if the numbers stored in a and b are small integers or small strings then a is not b will not work as python does some caching, and they both point to the same object.
>>> a=2
>>> b=2
>>> a is b
True
>>> a='wow'
>>> b='wow'
>>> a is b
True
>>> a=9999999
>>> b=9999999
>>> a is b
False
The operator "is not" is checking for object identity and the operator != checks for object equality. I do not think there you should do this in your case but maybe your ide suggests this for the general case?

Does Python have a string 'contains' substring method?

This question's answers are a community effort. Edit existing answers to improve this post. It is not currently accepting new answers or interactions.
I'm looking for a string.contains or string.indexof method in Python.
I want to do:
if not somestring.contains("blah"):
continue
Use the in operator:
if "blah" not in somestring:
continue
If it's just a substring search you can use string.find("substring").
You do have to be a little careful with find, index, and in though, as they are substring searches. In other words, this:
s = "This be a string"
if s.find("is") == -1:
print("No 'is' here!")
else:
print("Found 'is' in the string.")
It would print Found 'is' in the string. Similarly, if "is" in s: would evaluate to True. This may or may not be what you want.
Does Python have a string contains substring method?
99% of use cases will be covered using the keyword, in, which returns True or False:
'substring' in any_string
For the use case of getting the index, use str.find (which returns -1 on failure, and has optional positional arguments):
start = 0
stop = len(any_string)
any_string.find('substring', start, stop)
or str.index (like find but raises ValueError on failure):
start = 100
end = 1000
any_string.index('substring', start, end)
Explanation
Use the in comparison operator because
the language intends its usage, and
other Python programmers will expect you to use it.
>>> 'foo' in '**foo**'
True
The opposite (complement), which the original question asked for, is not in:
>>> 'foo' not in '**foo**' # returns False
False
This is semantically the same as not 'foo' in '**foo**' but it's much more readable and explicitly provided for in the language as a readability improvement.
Avoid using __contains__
The "contains" method implements the behavior for in. This example,
str.__contains__('**foo**', 'foo')
returns True. You could also call this function from the instance of the superstring:
'**foo**'.__contains__('foo')
But don't. Methods that start with underscores are considered semantically non-public. The only reason to use this is when implementing or extending the in and not in functionality (e.g. if subclassing str):
class NoisyString(str):
def __contains__(self, other):
print(f'testing if "{other}" in "{self}"')
return super(NoisyString, self).__contains__(other)
ns = NoisyString('a string with a substring inside')
and now:
>>> 'substring' in ns
testing if "substring" in "a string with a substring inside"
True
Don't use find and index to test for "contains"
Don't use the following string methods to test for "contains":
>>> '**foo**'.index('foo')
2
>>> '**foo**'.find('foo')
2
>>> '**oo**'.find('foo')
-1
>>> '**oo**'.index('foo')
Traceback (most recent call last):
File "<pyshell#40>", line 1, in <module>
'**oo**'.index('foo')
ValueError: substring not found
Other languages may have no methods to directly test for substrings, and so you would have to use these types of methods, but with Python, it is much more efficient to use the in comparison operator.
Also, these are not drop-in replacements for in. You may have to handle the exception or -1 cases, and if they return 0 (because they found the substring at the beginning) the boolean interpretation is False instead of True.
If you really mean not any_string.startswith(substring) then say it.
Performance comparisons
We can compare various ways of accomplishing the same goal.
import timeit
def in_(s, other):
return other in s
def contains(s, other):
return s.__contains__(other)
def find(s, other):
return s.find(other) != -1
def index(s, other):
try:
s.index(other)
except ValueError:
return False
else:
return True
perf_dict = {
'in:True': min(timeit.repeat(lambda: in_('superstring', 'str'))),
'in:False': min(timeit.repeat(lambda: in_('superstring', 'not'))),
'__contains__:True': min(timeit.repeat(lambda: contains('superstring', 'str'))),
'__contains__:False': min(timeit.repeat(lambda: contains('superstring', 'not'))),
'find:True': min(timeit.repeat(lambda: find('superstring', 'str'))),
'find:False': min(timeit.repeat(lambda: find('superstring', 'not'))),
'index:True': min(timeit.repeat(lambda: index('superstring', 'str'))),
'index:False': min(timeit.repeat(lambda: index('superstring', 'not'))),
}
And now we see that using in is much faster than the others.
Less time to do an equivalent operation is better:
>>> perf_dict
{'in:True': 0.16450627865128808,
'in:False': 0.1609668098178645,
'__contains__:True': 0.24355481654697542,
'__contains__:False': 0.24382793854783813,
'find:True': 0.3067379407923454,
'find:False': 0.29860888058124146,
'index:True': 0.29647137792585454,
'index:False': 0.5502287584545229}
How can in be faster than __contains__ if in uses __contains__?
This is a fine follow-on question.
Let's disassemble functions with the methods of interest:
>>> from dis import dis
>>> dis(lambda: 'a' in 'b')
1 0 LOAD_CONST 1 ('a')
2 LOAD_CONST 2 ('b')
4 COMPARE_OP 6 (in)
6 RETURN_VALUE
>>> dis(lambda: 'b'.__contains__('a'))
1 0 LOAD_CONST 1 ('b')
2 LOAD_METHOD 0 (__contains__)
4 LOAD_CONST 2 ('a')
6 CALL_METHOD 1
8 RETURN_VALUE
so we see that the .__contains__ method has to be separately looked up and then called from the Python virtual machine - this should adequately explain the difference.
if needle in haystack: is the normal use, as #Michael says -- it relies on the in operator, more readable and faster than a method call.
If you truly need a method instead of an operator (e.g. to do some weird key= for a very peculiar sort...?), that would be 'haystack'.__contains__. But since your example is for use in an if, I guess you don't really mean what you say;-). It's not good form (nor readable, nor efficient) to use special methods directly -- they're meant to be used, instead, through the operators and builtins that delegate to them.
in Python strings and lists
Here are a few useful examples that speak for themselves concerning the in method:
>>> "foo" in "foobar"
True
>>> "foo" in "Foobar"
False
>>> "foo" in "Foobar".lower()
True
>>> "foo".capitalize() in "Foobar"
True
>>> "foo" in ["bar", "foo", "foobar"]
True
>>> "foo" in ["fo", "o", "foobar"]
False
>>> ["foo" in a for a in ["fo", "o", "foobar"]]
[False, False, True]
Caveat. Lists are iterables, and the in method acts on iterables, not just strings.
If you want to compare strings in a more fuzzy way to measure how "alike" they are, consider using the Levenshtein package
Here's an answer that shows how it works.
If you are happy with "blah" in somestring but want it to be a function/method call, you can probably do this
import operator
if not operator.contains(somestring, "blah"):
continue
All operators in Python can be more or less found in the operator module including in.
So apparently there is nothing similar for vector-wise comparison. An obvious Python way to do so would be:
names = ['bob', 'john', 'mike']
any(st in 'bob and john' for st in names)
>> True
any(st in 'mary and jane' for st in names)
>> False
You can use y.count().
It will return the integer value of the number of times a sub string appears in a string.
For example:
string.count("bah") >> 0
string.count("Hello") >> 1
Here is your answer:
if "insert_char_or_string_here" in "insert_string_to_search_here":
#DOSTUFF
For checking if it is false:
if not "insert_char_or_string_here" in "insert_string_to_search_here":
#DOSTUFF
OR:
if "insert_char_or_string_here" not in "insert_string_to_search_here":
#DOSTUFF
You can use regular expressions to get the occurrences:
>>> import re
>>> print(re.findall(r'( |t)', to_search_in)) # searches for t or space
['t', ' ', 't', ' ', ' ']

Categories

Resources