Is there any low-level, implementation-related difference (performance-ish) between these approaches..?
# check if string is empty
# the preferred way it seems [1]
if string:
print string
else:
print "It's empty."
# versus [2]
if string is '':
# or [3]
if string == '':
For example, when testing for None, I still find it more readable and explicit to do:
if some_var is not None:
..instead of..
if not some_var:
if not some_var, at least for me, always reads "if some_var does not exist".
Which is better to use, what are the proper use cases for ==, is and bool-testing?
Never use is for (value) equality testing. Only use it to test for object identity. It may work for the example if string is '', but this is implementation dependent, and you can't rely on it.
>>> a = "hi"
>>> a is "hi"
True
>>> a = "hi there!"
>>> a is "hi there!"
False
Other than that, use whatever conveys the meaning of your code best.
I prefer the shorter if string:, but if string != '': may be more explicit.
Then again if variable: works on every kind of object, so if variable isn't confined to one type, this is better than if variable != "" and variable != 0: etc.
To expand on Tim Pietzcker's answer:
if string:
print string
This tests if string evaluates to True. I.E.
>>> bool("")
False
>>> bool(None)
False
>>> bool("test")
True
So it's not only testing if it is empty, but if it is None or empty. This could have an impact depending on how you treat None/empty.
Firstly, don't use if string is '': because this is not guaranteed to work. The fact that CPython interns short strings is an implementation detail and should not be relied on.
Using if string: to check that string is non-empty is, I think, a Pythonic way to do it.
But there is nothing wrong about using if string == ''.
Related
str = "chair table planttest"
substr = ["plant", "tab"]
x = func(substr, str)
print(x)
I want code that will return True if str contains all strings in the substr list, regardless of position. If str does not contain all the elements in substr, it should return False.
I have been trying to use re.search or re.findall but am having a very hard time understanding regex operators.
Thanks in advance!
You need to loop through all of the substrings and use the in operator to decide if they are in the string. Like this:
def hasAllSubstrings(partsArr, fullStr):
allFound = true
for part in partsArr:
if part not in fullStr:
allFound = false
return allFound
# Test
full = "chair table planttest"
parts = ["plant", "tab"]
x = hasAllSubstrings(parts, full)
print(x)
Let's see what the hasAllSubstrings function.
It creates a boolean variable that decides if all substrings have been found.
It loops through each part in the partsArr sent in to the function.
If that part is not in the full string fullStr, it sets the boolean to false.
If multiple are not found, it will still be false, and it won't change. If everything is found, it will always be true and not false.
At the end, it returns whether it found everything.
After the function definition, we test the function with your example. There is also one last thing you should take note of: your variables shouldn't collide with built-ins. You used the str variable, and I changed it to full because str is a (commonly used) function in the Python standard library.
By using that name, you effectively just disabled the str function. Make sure to avoid those names as they can easily make your programs harder to debug and more confusing. Oh, and by the way, str is useful when you need to convert a number or some other object into a string.
You can simply use the built-in function all() and the in keyword:
fullstr = "chair table planttest"
substr = ["plant", "tab"]
x = all(s in fullstr for s in substr)
In continuation to the answer of # Lakshya Raj, you can add break after allFound = false.
Because, as soon as the first item of the sub_strung is not found in str, it gives your desired output. No need to loop further.
allFound = false
break
What is an efficient way to check that a string s in Python consists of just one character, say 'A'? Something like all_equal(s, 'A') which would behave like this:
all_equal("AAAAA", "A") = True
all_equal("AAAAAAAAAAA", "A") = True
all_equal("AAAAAfAAAAA", "A") = False
Two seemingly inefficient ways would be to: first convert the string to a list and check each element, or second to use a regular expression. Are there more efficient ways or are these the best one can do in Python? Thanks.
This is by far the fastest, several times faster than even count(), just time it with that excellent mgilson's timing suite:
s == len(s) * s[0]
Here all the checking is done inside the Python C code which just:
allocates len(s) characters;
fills the space with the first character;
compares two strings.
The longer the string is, the greater is time bonus. However, as mgilson writes, it creates a copy of the string, so if your string length is many millions of symbols, it may become a problem.
As we can see from timing results, generally the fastest ways to solve the task do not execute any Python code for each symbol. However, the set() solution also does all the job inside C code of the Python library, but it is still slow, probably because of operating string through Python object interface.
UPD: Concerning the empty string case. What to do with it strongly depends on the task. If the task is "check if all the symbols in a string are the same", s == len(s) * s[0] is a valid answer (no symbols mean an error, and exception is ok). If the task is "check if there is exactly one unique symbol", empty string should give us False, and the answer is s and s == len(s) * s[0], or bool(s) and s == len(s) * s[0] if you prefer receiving boolean values. Finally, if we understand the task as "check if there are no different symbols", the result for empty string is True, and the answer is not s or s == len(s) * s[0].
>>> s = 'AAAAAAAAAAAAAAAAAAA'
>>> s.count(s[0]) == len(s)
True
This doesn't short circuit. A version which does short-circuit would be:
>>> all(x == s[0] for x in s)
True
However, I have a feeling that due the the optimized C implementation, the non-short circuiting version will probably perform better on some strings (depending on size, etc)
Here's a simple timeit script to test some of the other options posted:
import timeit
import re
def test_regex(s,regex=re.compile(r'^(.)\1*$')):
return bool(regex.match(s))
def test_all(s):
return all(x == s[0] for x in s)
def test_count(s):
return s.count(s[0]) == len(s)
def test_set(s):
return len(set(s)) == 1
def test_replace(s):
return not s.replace(s[0],'')
def test_translate(s):
return not s.translate(None,s[0])
def test_strmul(s):
return s == s[0]*len(s)
tests = ('test_all','test_count','test_set','test_replace','test_translate','test_strmul','test_regex')
print "WITH ALL EQUAL"
for test in tests:
print test, timeit.timeit('%s(s)'%test,'from __main__ import %s; s="AAAAAAAAAAAAAAAAA"'%test)
if globals()[test]("AAAAAAAAAAAAAAAAA") != True:
print globals()[test]("AAAAAAAAAAAAAAAAA")
raise AssertionError
print
print "WITH FIRST NON-EQUAL"
for test in tests:
print test, timeit.timeit('%s(s)'%test,'from __main__ import %s; s="FAAAAAAAAAAAAAAAA"'%test)
if globals()[test]("FAAAAAAAAAAAAAAAA") != False:
print globals()[test]("FAAAAAAAAAAAAAAAA")
raise AssertionError
On my machine (OS-X 10.5.8, core2duo, python2.7.3) with these contrived (short) strings, str.count smokes set and all, and beats str.replace by a little, but is edged out by str.translate and strmul is currently in the lead by a good margin:
WITH ALL EQUAL
test_all 5.83863711357
test_count 0.947771072388
test_set 2.01028490067
test_replace 1.24682998657
test_translate 0.941282987595
test_strmul 0.629556179047
test_regex 2.52913498878
WITH FIRST NON-EQUAL
test_all 2.41147494316
test_count 0.942595005035
test_set 2.00480484962
test_replace 0.960338115692
test_translate 0.924381017685
test_strmul 0.622269153595
test_regex 1.36632800102
The timings could be slightly (or even significantly?) different between different systems and with different strings, so that would be worth looking into with an actual string you're planning on passing.
Eventually, if you hit the best case for all enough, and your strings are long enough, you might want to consider that one. It's a better algorithm ... I would avoid the set solution though as I don't see any case where it could possibly beat out the count solution.
If memory could be an issue, you'll need to avoid str.translate, str.replace and strmul as those create a second string, but this isn't usually a concern these days.
You could convert to a set and check there is only one member:
len(set("AAAAAAAA"))
Try using the built-in function all:
all(c == 'A' for c in s)
If you need to check if all the characters in the string are same and is equal to a given character, you need to remove all duplicates and check if the final result equals the single character.
>>> set("AAAAA") == set("A")
True
In case you desire to find if there is any duplicate, just check the length
>>> len(set("AAAAA")) == 1
True
Adding another solution to this problem
>>> not "AAAAAA".translate(None,"A")
True
Interesting answers so far. Here's another:
flag = True
for c in 'AAAAAAAfAAAA':
if not c == 'A':
flag = False
break
The only advantage I can think of to mine is that it doesn't need to traverse the entire string if it finds an inconsistent character.
not len("AAAAAAAAA".replace('A', ''))
I'm confused.
foo = ("empty", 0)
foo[0] is "empty"
Returns False. This seems to be a problem with keyword strings, as "list" fails as well. "empt" and other strings return true. This only seems to happen with tuples, as if foo is a list the code also returns true
I've tested this with python 3.4.3 and python 3.5 and both behave this way, python2.7 doesn't seem to have this issue though and returns true as expected.
Am I missing some standard on tuples in python3? I've attempted to google-foo this problem but am coming up short.
Edit:
To clear things up, my exact question is why does
foo = ("empty", 0)
foo[0] is "empty"
return False, but
foo = ("empt", 0)
foo[0] is "empt"
return True?
As the other answers already mentioned: You are comparing strings by identity and this is likely to fail. Assumptions about the identity of string literals can not be made.
However, you actually found a subtle issue.
>>> t = ("list",); t[0] is "list"
False
>>> t = ("notlist",); t[0] is "notlist"
True
>>> t = ("list",)
>>> id(t[0])
140386135830064
>>> id("list")
140386137208400
>>> t[0] is "list"
False
>>> l = ("notlist",)
>>> id(l[0])
140386135830456
>>> id("notlist")
140386135830456
>>> l[0] is "notlist"
True
# interestingly, this works:
>>> ("list",)[0] is "list"
True
(Tested with Python 3.5.1+ interactive shell)
This is plainly implementation-dependent behavior by some component of python, presumably lexer or parser.
Bottom-line: Use == for string comparison, as long as you do not depend on object identity.
a is b
Equals to
id(a) == id(b)
As mentioned here Comparing strings
Also read this
Built-in Functions - id
Using a literal value with is is almost certainly going to give you an implementation-dependent result. The only literal that Python guarantees to be a singleton is None. Any other literal may or may not resolve to a reference to an existing object. You can't assume that the interpreter will "recognize" the duplicate value and use the same underlying object to represent it.
Im trying to write a function that get 2 arguments (2 strings actually) and compares them (ignoring the difference in upper/lower cases). For example:
cmr_func('House', 'HouSe')
true
cmr_func('Chair123', 'CHAIr123')
true
cmr_func('Mandy123', 'Mandy1234')
False.
Well, I tried something, but it seems very stupid and bad designed function, which anyway does not work. I would like to get idea. I believe i need to use some built-in str function, but im not sure how they can help me.
I thought about using in function with some loop. But i dont know on what kind of object should i apply a loop.
def str_comp(a,b):
for i in a:
i.lower()
for i in b:
i.lower()
if a == b:
print 'true'
else:
print 'false'
Any hint or idea are welcomed. Thanks :)
You can just convert both strings to lower-case and compare those results:
def str_comp (a, b):
return a.lower() == b.lower()
The idea behind this is normalization. Basically, you want to take any input string, and normalize it in a way that all other strings that are considered equal result in the same normalized string. And then, you just need to compare the normalized strings for equality.
Other operations you might consider are stripping whitespace (using str.strip()), or even more complex operations like converting umlauts to 2-letter combinations (e.g. ä to ae).
The problem with your solution is that you seem to assume that iterating over a string will allow you to modify the characters individually. But strings are immutable, so you cannot modify an existing string without creating a new one. As such, when you iterate over a string using for i in a you get many individual, independent strings for each character which are in no way linked to the original string a. So modifying i will not affect a.
Similarly, just calling str.lower() will not modify the string either (since it’s immutable), so instead, the function will return a new string with all letters converted to lower-case.
Finally, you shouldn’t return a string “True” or “False”. Python has boolean constants True and False which should be used for that. And if you use them, you don’t need to do the following either:
if condition:
return True
else:
return False
Since condition already is interpreted as a boolean, you can just return the condition directly to get the same result:
return condition
First you dont need to iterate the String to make all chars lowercase.
You can just:
a.lower()
b.lower()
Or you can do it all together:
def str_comp(a,b):
return a.lower() == b.lower()
Dont forget you're also returning True or False a Boolean, not returning a String (in this case the string "True" or "False")
If you want to return a String is function would different :
def str_comp(a,b):
if a.lower() == b.lower()
return "True"
return "False"
The function str.lower() actually works in a slightly different way:
This is no in-place modification. Calling a.lower() returns a copy of a with only lowercase letters and does not change a itself.
str.lower() can be called on whole strings, not just characters, so the for i in a loop won't be necessary.
Therefore you could simplify your function like following:
def str_comp(a, b):
if a.lower() == b.lower():
print 'true'
else:
print 'false'
def are_strings_equal(string_1, string_2):
return string_1.lower() == string_2.lower()
Is it correct to check for empty strings using is in Python? It does identity checking, while == tests equality.
Consider the following (the idea of using join is borrowed from this answer):
>>> ne1 = "aaa"
>>> ne2 = "".join('a' for _ in range(3))
>>> ne1 == ne2
True
>>> ne1 is ne2
False
>>>
so here is works as one may expect. Now take a look at this code:
>>> e1 = ""
>>> e2 = "aaa".replace("a", "")
>>> e3 = "" * 2
>>> e4 = "bbb".join(range(0))
>>> e1, e2, e3, e4
('', '', '', '')
>>> e1 is e2
True
>>> e1 is e3
True
>>> e1 is e4
True
>>> id(e1), id(e2), id(e3), id(e4)
(35963168, 35963168, 35963168, 35963168) # why?
The correct way to check for an empty string is to just do:
if yourstring:
print "string is not empty"
e.g. bool(yourstring) will be False if your string is empty. The reason your example works is because CPython caches certain strings and integers and reuses them for efficiency. It's an implementation detail and shouldn't be relied upon.
A Python implementation may choose to intern small strings (well, it may choose to intern anything immutable, really); Cpython does so.
You should never rely on this behavior. If you want to check if a string is the empty string, always use mystring == "".
If you're sure the object you're checking is always a string, you can also evaluate it in a boolean context (e.g., if mystring:), but keep in mind that this won't distinguish the empty string from 0, False, or None.
The best way to check for an empty sequence (string, list, ...) is:
if variable:
pass
from Truth Value Testing
The following values are considered false:
any empty sequence, for example, '', (), []
Please also read the documentation regarding comparison:
is compares the identity
== compares the equality depending on the type
Strings are compared lexicographically using the numeric equivalents
(the result of the built-in function ord()) of their characters.
Unicode and 8-bit strings are fully interoperable in this behavior
The "is" operator returns true if you are pointing to the same object, as python try to not repeat the strings (it checks if this strings already exists in memory) it will return the same object.
I wouldn't use "is" to compare strings, it's like the "==" operator in Java, it usualy works, but maybe you can get new instances and it will return false, I prefer using == that it will call the method eq and returns True if both strings are equals, even if they are different objects.