I'm looking for ignore case string comparison in Python.
I tried with:
if line.find('mandy') >= 0:
but no success for ignore case. I need to find a set of words in a given text file. I am reading the file line by line. The word on a line can be mandy, Mandy, MANDY, etc. (I don't want to use toupper/tolower, etc.).
I'm looking for the Python equivalent of the Perl code below.
if ($line=~/^Mandy Pande:/i)
If you don't want to use str.lower(), you can use a regular expression:
import re
if re.search('mandy', 'Mandy Pande', re.IGNORECASE):
# Is True
There's another post here. Try looking at this.
BTW, you're looking for the .lower() method:
string1 = "hi"
string2 = "HI"
if string1.lower() == string2.lower():
print "Equals!"
else:
print "Different!"
One can use the in operator after applying str.casefold to both strings.
str.casefold is the recommended method for use in case-insensitive comparison.
Return a casefolded copy of the string. Casefolded strings may be used for caseless matching.
Casefolding is similar to lowercasing but more aggressive because it is intended to remove all case distinctions in a string. For example, the German lowercase letter 'ß' is equivalent to "ss". Since it is already lowercase, lower() would do nothing to 'ß'; casefold() converts it to "ss".
The casefolding algorithm is described in section 3.13 of the Unicode Standard.
New in version 3.3.
For case-insensitive substring search:
needle = "TEST"
haystack = "testing"
if needle.casefold() in haystack.casefold():
print('Found needle in haystack')
For case-insensitive string comparison:
a = "test"
b = "TEST"
if a.casefold() == b.casefold():
print('a and b are equal, ignoring case')
Try:
if haystackstr.lower().find(needlestr.lower()) != -1:
# True
a = "MandY"
alow = a.lower()
if "mandy" in alow:
print "true"
work around
you can also use: s.lower() in str.lower()
You can use in operator in conjunction with lower method of strings.
if "mandy" in line.lower():
import re
if re.search('(?i)Mandy Pande:', line):
...
See this.
In [14]: re.match("mandy", "MaNdY", re.IGNORECASE)
Out[14]: <_sre.SRE_Match object at 0x23a08b8>
If it is a pandas series, you can mention case=False in the str.contains
data['Column_name'].str.contains('abcd', case=False)
OR if it is just two string comparisons try the other method below
You can use casefold() method. The casefold() method ignores cases when comparing.
firstString = "Hi EVERYONE"
secondString = "Hi everyone"
if firstString.casefold() == secondString.casefold():
print('The strings are equal.')
else:
print('The strings are not equal.')
Output:
The strings are equal.
Related
I am looking for a function that combines the methods isalpha() and isspace() into a single method.
I want to check if a given string only contains letters and/or spaces, for example:
"This is text".isalpha_or_space()
# True
However, with the 2 methods, I get:
"This is text".isalpha() or "This is text".isspace()
# False
as the string is not only alpha nor space.
Of course, I could iterate over every character and check it for space or alpha.
I could also compare the string with ("abcdefghijklmnopqrstuvwxyz" + " ")
However, both of these approaches don't seem very pythonic to me - convince me otherwise.
The most Pythonic will be to use a def for this:
def isalpha_or_space(self):
if self == "":
return False
for char in self:
if not (char.isalpha() or char.isspace()):
return False
return True
It is not easy to contribute this as a method on str, since Python does not encourage the monkeypatching of built-in types. My recommendation is just to leave this as a module level function.
Nonetheless, it is still possible to mimic the interface of a method, since most namespaces in Python are writable if you know where to find them. The suggestion below is not Pythonic, and relies on implementation detail.
>>> import gc
>>> def monkeypatch(type_, func):
... gc.get_referents(type_.__dict__)[0][func.__name__] = func
...
>>> monkeypatch(str, isalpha_or_space)
>>> "hello world".isalpha_or_space()
True
Use a regular expression (regex):
>>> import re
>>> result = re.match('[a-zA-Z\s]+$', "This is text")
>>> bool(result)
True
Breakdown:
re - Python's regex module
[a-zA-Z\s] - Any letter or whitespace
+ - One or more of the previous item
$ - End of string
The above works with ASCII letters. For the full Unicode range on Python 3, unfortunately the regex is a bit complicated:
>>> result = re.match('([^\W\d_]|\s)+$', 'un café')
Breakdown:
(x|y) - x or y
[^\W\d_] - Any word character except a number or an underscore
From Mark Tolonen's answer on How to match all unicode alphabetic characters and spaces in a regex?
You can use the following solution:
s != '' and all(c.isalpha() or c.isspace() for c in s)
How to check if a string has at least one of certain character?
If the string cool = "Sam!", how do i check to see if that string has at least one !
Use the in operator
>>> cool = "Sam!"
>>> '!' in cool
True
>>> '?' in cool
False
As you can see '!' in cool returns a boolean and can be used further in your code
In Python, a string is a sequence (like an array); therefore, the in operator can be used to check for the existence of a character in a Python string. The in operator is used to assert membership in a sequence such as strings, lists, and tuples.
cool = "Sam!"
'!' in cool # True
Alternately you can use any of the following for more information:
cool.find("!") # Returns index of "!" which is 3
cool.index("!") # Same as find() but throws exception if not found
cool.count("!") # Returns number of instances of "!" in cool which is 1
More info that you may find helpful:
http://www.tutorialspoint.com/python/python_strings.htm
Use the following
cool="Sam!"
if "!" in cool:
pass # your code
Or just:
It_Is="!" in cool
# some code
if It_Is:
DoSmth()
else:
DoNotDoSmth()
I need to write a function that returns the location of the first sub-string in a string in Python without using the string.find(sub)
def sublocation(string,sub):
i=0
while i<len(string):
o=0
while o<len(sub):
u=i
if string[i] == sub[o]:
o=o+1
u=u+1
result=True
i=len(string)
else:
i=i+1-u
result=False
You can use find().
Ex:
>>> s = "this is the string for finding sub string"
>>> s.find('string')
12
I tried this :
myString = "Hellow World!"
index = str.find(myString, "World")
print(index)
And the output was :
>>>
7
>>>
Is that what you want?
you can use the building-in function find,e.m.,"Hello world".find("world") will return 6.The fuction index can do it too,but it will raise exception when string does not have the substring you search.
I think it might help you to learn how to loop in python. If the sub-strings are separated you can use .split() to split it into a list of sub-strings over which you can iterate natively. 'enumerate' enumerate even gives you the index.
def sub_search_location(inp_str,sub_string):
for pos, sub in enumerate(inp_str.split(' ')):
if sub_string == sub:
return pos
else:
return False
>>> sub_search_location("this is the string for finding sub string", "string")
3
However this works only for sub strings without spaces
>>> sub_search_location("this is the string for finding sub string", "sub string")
False
Of course you can use str.find as others have recommended, but looking at your code I think you have more important things to learn than just a call to an arbitrary method.
Try using the method .find(). It will return an index of the substring that you're trying to find.
>>> import re
>>> s = 'this is a test'
>>> reg1 = re.compile('test$')
>>> match1 = reg1.match(s)
>>> print match1
None
in Kiki that matches the test at the end of the s. What do I miss? (I tried re.compile(r'test$') as well)
Use
match1 = reg1.search(s)
instead. The match function only matches at the start of the string ... see the documentation here:
Python offers two different primitive operations based on regular expressions: re.match() checks for a match only at the beginning of the string, while re.search() checks for a match anywhere in the string (this is what Perl does by default).
Your regex does not match the full string. You can use search instead as Useless mentioned, or you can change your regex to match the full string:
'^this is a test$'
Or somewhat harder to read but somewhat less useless:
'^t[^t]*test$'
It depends on what you're trying to do.
It's because of that match method returns None if it couldn't find expected pattern, if it find the pattern it would return an object with type of _sre.SRE_match .
So, if you want Boolean (True or False) result from match you must check the result is None or not!
You could examine texts are matched or not somehow like this:
string_to_evaluate = "Your text that needs to be examined"
expected_pattern = "pattern"
if re.match(expected_pattern, string_to_evaluate) is not None:
print("The text is as you expected!")
else:
print("The text is not as you expected!")
How do I make a function where it will filter out all the non-letters from the string? For example, letters("jajk24me") will return back "jajkme". (It needs to be a for loop) and will string.isalpha() function help me with this?
My attempt:
def letters(input):
valids = []
for character in input:
if character in letters:
valids.append( character)
return (valids)
If it needs to be in that for loop, and a regular expression won't do, then this small modification of your loop will work:
def letters(input):
valids = []
for character in input:
if character.isalpha():
valids.append(character)
return ''.join(valids)
(The ''.join(valids) at the end takes all of the characters that you have collected in a list, and joins them together into a string. Your original function returned that list of characters instead)
You can also filter out characters from a string:
def letters(input):
return ''.join(filter(str.isalpha, input))
or with a list comprehension:
def letters(input):
return ''.join([c for c in input if c.isalpha()])
or you could use a regular expression, as others have suggested.
import re
valids = re.sub(r"[^A-Za-z]+", '', my_string)
EDIT: If it needs to be a for loop, something like this should work:
output = ''
for character in input:
if character.isalpha():
output += character
See re.sub, for performance consider a re.compile to optimize the pattern once.
Below you find a short version which matches all characters not in the range from A to Z and replaces them with the empty string. The re.I flag ignores the case, thus also lowercase (a-z) characters are replaced.
import re
def charFilter(myString)
return re.sub('[^A-Z]+', '', myString, 0, re.I)
If you really need that loop there are many awnsers, explaining that specifically. However you might want to give a reason why you need a loop.
If you want to operate on the number sequences and thats the reason for the loop consider replacing the replacement string parameter with a function like:
import re
def numberPrinter(matchString) {
print(matchString)
return ''
}
def charFilter(myString)
return re.sub('[^A-Z]+', '', myString, 0, re.I)
The method string.isalpha() checks whether string consists of alphabetic characters only. You can use it to check if any modification is needed.
As to the other part of the question, pst is just right. You can read about regular expressions in the python doc: http://docs.python.org/library/re.html
They might seem daunting but are really useful once you get the hang of them.
Of course you can use isalpha. Also, valids can be a string.
Here you go:
def letters(input):
valids = ""
for character in input:
if character.isalpha():
valids += character
return valids
Not using a for-loop. But that's already been thoroughly covered.
Might be a little late, and I'm not sure about performance, but I just thought of this solution which seems pretty nifty:
set(x).intersection(y)
You could use it like:
from string import ascii_letters
def letters(string):
return ''.join(set(string).intersection(ascii_letters))
NOTE:
This will not preserve linear order. Which in my use case is fine, but be warned.