In Scala I could test if a string has a capital letter like this:
val nameHasUpperCase = name.exists(_.isUpper)
The most comprehensive form in Python I can think of is:
a ='asdFggg'
functools.reduce(lambda x, y: x or y, [c.isupper() for c in a])
->True
Somewhat clumsy. Is there a better way to do this?
The closest to the Scala statement is probably an any(..) statement here:
any(x.isupper() for x in a)
This will work in using a generator: from the moment such element is found, any(..) will stop and return True.
This produces:
>>> a ='asdFggg'
>>> any(x.isupper() for x in a)
True
Or another one with map(..):
any(map(str.isupper,a))
Another way of doing this would be comparing the original string to it being completely lower case:
>>> a ='asdFggg'
>>> a == a.lower()
False
And if you want this to return true, then use != instead of ==
There is also
nameHasUpperCase = bool(re.search(r'[A-Z]', name))
Related
How can I compare strings in a case insensitive way in Python?
I would like to encapsulate comparison of a regular strings to a repository string, using simple and Pythonic code. I also would like to have ability to look up values in a dict hashed by strings using regular python strings.
Assuming ASCII strings:
string1 = 'Hello'
string2 = 'hello'
if string1.lower() == string2.lower():
print("The strings are the same (case insensitive)")
else:
print("The strings are NOT the same (case insensitive)")
As of Python 3.3, casefold() is a better alternative:
string1 = 'Hello'
string2 = 'hello'
if string1.casefold() == string2.casefold():
print("The strings are the same (case insensitive)")
else:
print("The strings are NOT the same (case insensitive)")
If you want a more comprehensive solution that handles more complex unicode comparisons, see other answers.
Comparing strings in a case insensitive way seems trivial, but it's not. I will be using Python 3, since Python 2 is underdeveloped here.
The first thing to note is that case-removing conversions in Unicode aren't trivial. There is text for which text.lower() != text.upper().lower(), such as "ß":
>>> "ß".lower()
'ß'
>>> "ß".upper().lower()
'ss'
But let's say you wanted to caselessly compare "BUSSE" and "Buße". Heck, you probably also want to compare "BUSSE" and "BUẞE" equal - that's the newer capital form. The recommended way is to use casefold:
str.casefold()
Return a casefolded copy of the string. Casefolded strings may be used for
caseless matching.
Casefolding is similar to lowercasing but more aggressive because it is
intended to remove all case distinctions in a string. [...]
Do not just use lower. If casefold is not available, doing .upper().lower() helps (but only somewhat).
Then you should consider accents. If your font renderer is good, you probably think "ê" == "ê" - but it doesn't:
>>> "ê" == "ê"
False
This is because the accent on the latter is a combining character.
>>> import unicodedata
>>> [unicodedata.name(char) for char in "ê"]
['LATIN SMALL LETTER E WITH CIRCUMFLEX']
>>> [unicodedata.name(char) for char in "ê"]
['LATIN SMALL LETTER E', 'COMBINING CIRCUMFLEX ACCENT']
The simplest way to deal with this is unicodedata.normalize. You probably want to use NFKD normalization, but feel free to check the documentation. Then one does
>>> unicodedata.normalize("NFKD", "ê") == unicodedata.normalize("NFKD", "ê")
True
To finish up, here this is expressed in functions:
import unicodedata
def normalize_caseless(text):
return unicodedata.normalize("NFKD", text.casefold())
def caseless_equal(left, right):
return normalize_caseless(left) == normalize_caseless(right)
Using Python 2, calling .lower() on each string or Unicode object...
string1.lower() == string2.lower()
...will work most of the time, but indeed doesn't work in the situations #tchrist has described.
Assume we have a file called unicode.txt containing the two strings Σίσυφος and ΣΊΣΥΦΟΣ. With Python 2:
>>> utf8_bytes = open("unicode.txt", 'r').read()
>>> print repr(utf8_bytes)
'\xce\xa3\xce\xaf\xcf\x83\xcf\x85\xcf\x86\xce\xbf\xcf\x82\n\xce\xa3\xce\x8a\xce\xa3\xce\xa5\xce\xa6\xce\x9f\xce\xa3\n'
>>> u = utf8_bytes.decode('utf8')
>>> print u
Σίσυφος
ΣΊΣΥΦΟΣ
>>> first, second = u.splitlines()
>>> print first.lower()
σίσυφος
>>> print second.lower()
σίσυφοσ
>>> first.lower() == second.lower()
False
>>> first.upper() == second.upper()
True
The Σ character has two lowercase forms, ς and σ, and .lower() won't help compare them case-insensitively.
However, as of Python 3, all three forms will resolve to ς, and calling lower() on both strings will work correctly:
>>> s = open('unicode.txt', encoding='utf8').read()
>>> print(s)
Σίσυφος
ΣΊΣΥΦΟΣ
>>> first, second = s.splitlines()
>>> print(first.lower())
σίσυφος
>>> print(second.lower())
σίσυφος
>>> first.lower() == second.lower()
True
>>> first.upper() == second.upper()
True
So if you care about edge-cases like the three sigmas in Greek, use Python 3.
(For reference, Python 2.7.3 and Python 3.3.0b1 are shown in the interpreter printouts above.)
Section 3.13 of the Unicode standard defines algorithms for caseless
matching.
X.casefold() == Y.casefold() in Python 3 implements the "default caseless matching" (D144).
Casefolding does not preserve the normalization of strings in all instances and therefore the normalization needs to be done ('å' vs. 'å'). D145 introduces "canonical caseless matching":
import unicodedata
def NFD(text):
return unicodedata.normalize('NFD', text)
def canonical_caseless(text):
return NFD(NFD(text).casefold())
NFD() is called twice for very infrequent edge cases involving U+0345 character.
Example:
>>> 'å'.casefold() == 'å'.casefold()
False
>>> canonical_caseless('å') == canonical_caseless('å')
True
There are also compatibility caseless matching (D146) for cases such as '㎒' (U+3392) and "identifier caseless matching" to simplify and optimize caseless matching of identifiers.
I saw this solution here using regex.
import re
if re.search('mandy', 'Mandy Pande', re.IGNORECASE):
# is True
It works well with accents
In [42]: if re.search("ê","ê", re.IGNORECASE):
....: print(1)
....:
1
However, it doesn't work with unicode characters case-insensitive. Thank you #Rhymoid for pointing out that as my understanding was that it needs the exact symbol, for the case to be true. The output is as follows:
In [36]: "ß".lower()
Out[36]: 'ß'
In [37]: "ß".upper()
Out[37]: 'SS'
In [38]: "ß".upper().lower()
Out[38]: 'ss'
In [39]: if re.search("ß","ßß", re.IGNORECASE):
....: print(1)
....:
1
In [40]: if re.search("SS","ßß", re.IGNORECASE):
....: print(1)
....:
In [41]: if re.search("ß","SS", re.IGNORECASE):
....: print(1)
....:
You can use casefold() method. The casefold() method ignores cases when comparing.
firstString = "Hi EVERYONE"
secondString = "Hi everyone"
if firstString.casefold() == secondString.casefold():
print('The strings are equal.')
else:
print('The strings are not equal.')
Output:
The strings are equal.
The usual approach is to uppercase the strings or lower case them for the lookups and comparisons. For example:
>>> "hello".upper() == "HELLO".upper()
True
>>>
How about converting to lowercase first? you can use string.lower().
a clean solution that I found, where I'm working with some constant file extensions.
from pathlib import Path
class CaseInsitiveString(str):
def __eq__(self, __o: str) -> bool:
return self.casefold() == __o.casefold()
GZ = CaseInsitiveString(".gz")
ZIP = CaseInsitiveString(".zip")
TAR = CaseInsitiveString(".tar")
path = Path("/tmp/ALL_CAPS.TAR.GZ")
GZ in path.suffixes, ZIP in path.suffixes, TAR in path.suffixes, TAR == ".tAr"
# (True, False, True, True)
You can mention case=False in the str.contains()
data['Column_name'].str.contains('abcd', case=False)
def search_specificword(key, stng):
key = key.lower()
stng = stng.lower()
flag_present = False
if stng.startswith(key+" "):
flag_present = True
symb = [',','.']
for i in symb:
if stng.find(" "+key+i) != -1:
flag_present = True
if key == stng:
flag_present = True
if stng.endswith(" "+key):
flag_present = True
if stng.find(" "+key+" ") != -1:
flag_present = True
print(flag_present)
return flag_present
Output:
search_specificword("Affordable housing", "to the core of affordable outHousing in europe")
False
search_specificword("Affordable housing", "to the core of affordable Housing, in europe")
True
from re import search, IGNORECASE
def is_string_match(word1, word2):
# Case insensitively function that checks if two words are the same
# word1: string
# word2: string | list
# if the word1 is in a list of words
if isinstance(word2, list):
for word in word2:
if search(rf'\b{word1}\b', word, IGNORECASE):
return True
return False
# if the word1 is same as word2
if search(rf'\b{word1}\b', word2, IGNORECASE):
return True
return False
is_match_word = is_string_match("Hello", "hELLO")
True
is_match_word = is_string_match("Hello", ["Bye", "hELLO", "#vagavela"])
True
is_match_word = is_string_match("Hello", "Bye")
False
Consider using FoldedCase from jaraco.text:
>>> from jaraco.text import FoldedCase
>>> FoldedCase('Hello World') in ['hello world']
True
And if you want a dictionary keyed on text irrespective of case, use FoldedCaseKeyedDict from jaraco.collections:
>>> from jaraco.collections import FoldedCaseKeyedDict
>>> d = FoldedCaseKeyedDict()
>>> d['heLlo'] = 'world'
>>> list(d.keys()) == ['heLlo']
True
>>> d['hello'] == 'world'
True
>>> 'hello' in d
True
>>> 'HELLO' in d
True
def insenStringCompare(s1, s2):
""" Method that takes two strings and returns True or False, based
on if they are equal, regardless of case."""
try:
return s1.lower() == s2.lower()
except AttributeError:
print "Please only pass strings into this method."
print "You passed a %s and %s" % (s1.__class__, s2.__class__)
This is another regex which I have learned to love/hate over the last week so usually import as (in this case yes) something that reflects how im feeling!
make a normal function.... ask for input, then use ....something = re.compile(r'foo*|spam*', yes.I)...... re.I (yes.I below) is the same as IGNORECASE but you cant make as many mistakes writing it!
You then search your message using regex's but honestly that should be a few pages in its own , but the point is that foo or spam are piped together and case is ignored.
Then if either are found then lost_n_found would display one of them. if neither then lost_n_found is equal to None. If its not equal to none return the user_input in lower case using "return lost_n_found.lower()"
This allows you to much more easily match up anything thats going to be case sensitive. Lastly (NCS) stands for "no one cares seriously...!" or not case sensitive....whichever
if anyone has any questions get me on this..
import re as yes
def bar_or_spam():
message = raw_input("\nEnter FoO for BaR or SpaM for EgGs (NCS): ")
message_in_coconut = yes.compile(r'foo*|spam*', yes.I)
lost_n_found = message_in_coconut.search(message).group()
if lost_n_found != None:
return lost_n_found.lower()
else:
print ("Make tea not love")
return
whatz_for_breakfast = bar_or_spam()
if whatz_for_breakfast == foo:
print ("BaR")
elif whatz_for_breakfast == spam:
print ("EgGs")
I'm trying to figure out the solution to this particular challenge and so far I'm stumped.
Basically, what I'm looking to do is:
Check if substring false exists in string s
If false exists in s, return the boolean True
However, I am not allowed to use any conditional statements at all.
Maybe there is a way to do this with objects?
There is always str.__contains__ if it's needed as a function somewhere:
In [69]: str.__contains__('**foo**', 'foo')
Out[69]: True
This could be used for things like filter or map:
sorted(some_list, key=partial(str.__contains__,'**foo**'))
The much more common usecase is to assign a truth value for each element in a list using a comprehension. Then we can make use of the in keyword in Python:
In[70]: ['foo' in x for x in ['**foo**','abc']]
Out[70]: [True, False]
The latter should always be preferred. There are edge cases where only a function might be possible.
But even then you could pass a lambda and use the statement:
sorted(some_list, key=lambda x: 'foo' in x)
Evaluating condition without using if statement:
True:
>>> s = 'abcdefalse'
>>> 'false' in s
True
False:
>>> s = 'abcdefals'
>>> 'false' in s
False
Return blank if False:
>>> s = 'abcdefals'
>>> 'false' in s or ''
''
I want to concat few strings together, and add the last one only if a boolean condition is True.
Like this (a, b and c are strings):
something = a + b + (c if <condition>)
But Python does not like it. Is there a nice way to do it without the else option?
Thanks! :)
Try something below without using else. It works by indexing empty string when condition False (0) and indexing string c when condition True (1)
something = a + b + ['', c][condition]
I am not sure why you want to avoid using else, otherwise, the code below seems more readable:
something = a + b + (c if condition else '')
This should work for simple scenarios -
something = ''.join([a, b, c if condition else ''])
It is possible, but it's not very Pythonic:
something = a + b + c * condition
This will work because condition * False will return '', while condition * True will return original condition. However, You must be careful here, condition could also be 0 or 1, but any higher number or any literal will break the code.
Is there a nice way to do it without the else option?
Well, yes:
something = ''.join([a, b])
if condition:
something = ''.join([something, c])
But I don't know whether you mean literally without else, or without the whole if statement.
a_list = ['apple', 'banana,orange', 'strawberry']
b_list = []
for i in a_list:
for j in i.split(','):
b_list.append(j)
print(b_list)
I'm not talking about the ternary operator. Can I write an if else statement in one line outside of a value expression? I want to shorten this code.
if x == 'A':
return True
if x == 'B':
return True
if x == 'C':
return True
return False
You can use the in operator like this:
return x in ('A', 'B', 'C')
For Python 3.2+:
return x in {'A', 'B', 'C'}
From docs:
Python’s peephole optimizer now recognizes patterns such x in {1, 2, 3}
as being a test for membership in a set of constants. The optimizer
recasts the set as a frozenset and stores the pre-built constant.
Now that the speed penalty is gone, it is practical to start writing
membership tests using set-notation.
You can shorten your code by using in:
return x in ("A", "B", "C")
Or, if x is a single character:
return x in "ABC"
If your checks are more complicated than membership in a set and you want to reduce the number of lines of code, you can use conditional expressions:
return True if x == 'A' else True if x == 'B' else True if x == 'C' else False
If you need to apply the same test for many items in a sequence, you may want to use the any function:
return any(x == c for c in "ABC")
This will work for tests that are more complex that equality (so that in can't handle them) and for sequences that are too long for chained conditional expressions. Consider a "near equality" test for floating point numbers:
return any(abs(x - n) < epsilon for n in big_sequence_of_floats)
I am new to python and was wondering if someone could help me out with this. I am trying to see if the elements in b are in a. This is my attempt. Currently I am not getting any output. Any help would be appreciated, thank you!
a = [1]
b = [1,2,3,4,5,6,7]
for each in b:
if each not in a == True:
print(each + "is not in a")
You are testing two different things, and the outcome is False; Python is chaining the operators, effectively testing if (each is in a) and (a == True):
>>> 'a' in ['a'] == True
False
>>> ('a' in ['a']) and (['a'] == True)
False
>>> ('a' in ['a']) == True
True
You never need to test for True on an if statement anyway:
if each not in a:
is enough.
You should be able to just say:
if each not in a:
print ("%d is not in a" % each)
Your actual expression is using operator chaining:
if a > b > c:
parses as:
if (a > b) and (b > c):
in python. which means your expression is actually being parsed as:
if (each not in a) and (a == True):
but a == True will always return False, so that if block will never execute.
a = [1,2,3]
b = [1,2,3,4,5,6,7]
c = [7,8,9]
print set(a) <= set(b) #all elements of a are in b
print set(c) <= set(b) #all elements of c are in b
It is better to see the difference between B and A
set(b).difference(set(a))
You don't need ==True. Just:
if each not in a:
This is really easy using sets:
a = [1]
b = [1, 2]
only_in_b = set(b) - set(a)
# set([2])
Another way:
for bb in b:
try:
a.index(bb)
except:
print 'value is not in the list: ' + str(bb)
I would like to add that if the two lists are large. This is not the best way to do it. Your algorithm is O(n^2). The best way is to traverse a, adding the elements as keys to a dictionary. Afterwards, traverse the second list, checking if the elements are already in the dictionary, this instead is an O(n) algorithm.