I'm having trouble figuring out the above question and have a felling I should be testing every character with "for character in string" however I cant really figure out how that would work
This is what I have now but I know it doesnt work as intended because it only allows me to test letters but I also need to know spaces so for example " MY dear aunt sally" should say yes contains only letters and spaces
#Find if string only contains letters and spaces
if string.isalpha():
print("Only alphabetic letters and spaces: yes")
else:
print("Only alphabetic letters and spaces: no")
You can use a generator expression within all built-in function :
if all(i.isalpha() or i.isspace() for i in my_string)
But note that i.isspace() will check if the character is a whitespace if you just want space you can directly compare with space :
if all(i.isalpha() or i==' ' for i in my_string)
Demo:
>>> all(i.isalpha() or i==' ' for i in 'test string')
True
>>> all(i.isalpha() or i==' ' for i in 'test string') #delimiter is tab
False
>>> all(i.isalpha() or i==' ' for i in 'test#string')
False
>>> all(i.isalpha() or i.isspace() for i in 'test string')
True
>>> all(i.isalpha() or i.isspace() for i in 'test string')
True
>>> all(i.isalpha() or i.isspace() for i in 'test#string')
False
just another way for Fun, I know its not that good:
>>> a
'hello baby'
>>> b
'hello1 baby'
>>> re.findall("[a-zA-Z ]",a)==list(a) # return True if string is only alpha and space
True
>>> re.findall("[a-zA-Z ]",b)==list(b) # returns False
False
Cascade replace with isalpha:
'a b'.replace(' ', '').isalpha() # True
replace returns a copy of the original string with everything but the spaces. Then you can use isalpha on that return value (since the return value is a string itself) to test if it only contains alphabet characters.
To match all whitespace, you're probably going to want to use Kasra's answer, but just for completeness, I'll demonstrate using re.sub with a whitespace character class:
import re
re.sub(r'\s', '', 'a b').isalpha()
The below re.match fucntion would return a match object only if the input contain alphabets or spaces.
>>> re.match(r'[A-Za-z ]+$', 'test string')
<_sre.SRE_Match object; span=(0, 11), match='test string'>
>>> re.match(r'(?=.*? )[A-Za-z ]+$', 'test#bar')
>>>
Related
Is there any library that allows me to check If all the individual characters in one string exists in another string. When i try to use in what happens is the character has to be a substring. It only works for 1234 and 123. However i want something that checks individual characters. I want a library that gives me the output: string 2 is in string 1 for the following code.
string1 = '1234'
string2 = '24'
if string2 in string1:
print('string2 is in string1')
else:
print('string2 is not in string1')
You can use all() with a generator. This returns a true only if all conditions are a true else false:
string1 = '1234'
string2 = '24'
if all(x in string1 for x in string2):
print('string2 is in string1')
else:
print('string2 is not in string1')
Or, you can use set's issubset:
set(string2).issubset(string1)
How can I check if there is text in a string in python or just whitespace?
Example:
" " should return False
"test" should return True
"tes t " should return True
teststring = " "
print(teststring.isspace())
# True
str.strip() function will remove any leading or trailing space from you string.
Then you can easily check what you want by checking the length of the new string.
>>> my_str_with_space = ' \r\n string \r\n '
>>> my_str_with_space
' \r\n string \r\n '
>>> my_str = my_str_with_space.strip()
>>> my_str
'string'
So create a simple function to check if string is empty or not by checking the string length.
>>> def str_not_empty(s):
... return bool(len(s))
Then use it.
>>> str_not_empty(my_str)
True
>>> str_empty = ''
>>> str_not_empty(str_empty)
False
(The function is optional, but it was useful for the example)
I tried matching words including the letter "ab" or "ba" e.g. "ab"olition, f"ab"rics, pro"ba"ble. I came up with the following regular expression:
r"[Aa](?=[Bb])[Bb]|[Bb](?=[Aa])[Aa]"
But it includes words that start or end with ", (, ), / ....non-alphanumeric characters. How can I erase it? I just want to match words list.
import sys
import re
word=[]
dict={}
f = open('C:/Python27/brown_half.txt', 'rU')
w = open('C:/Python27/brown_halfout.txt', 'w')
data = f.read()
word = data.split() # word is list
f.close()
for num2 in word:
match2 = re.findall("\w*(ab|ba)\w*", num2)
if match2:
dict[num2] = (dict[num2] + 1) if num2 in dict.keys() else 1
for key2 in sorted(dict.iterkeys()):print "%s: %s" % (key2, dict[key2])
print len(dict.keys())
Here, I don't know how to mix it up with "re.compile~~" method that 1st comment said...
To match all the words with ab or ba (case insensitive):
import re
text = 'fabh, obar! (Abtt) yybA, kk'
pattern = re.compile(r"(\w*(ab|ba)\w*)", re.IGNORECASE)
# to print all the matches
for match in pattern.finditer(text):
print match.group(0)
# to print the first match
print pattern.search(text).group(0)
https://regex101.com/r/uH3xM9/1
Regular expressions are not the best tool for the job in this case. They'll complicate stuff way too much for such simple circumstances. You can instead use Python's builtin in operator (works for both Python 2 and 3)...
sentence = "There are no probable situations whereby that may happen, or so it seems since the Abolition."
words = [''.join(filter(lambda x: x.isalpha(), token)) for token in sentence.split()]
for word in words:
word = word.lower()
if 'ab' in word or 'ba' in word:
print('Word "{}" matches pattern!'.format(word))
As you can see, 'ab' in word evaluates to True if the string 'ab' is found as-is (that is, exactly) in word, or False otherwise. For example 'ba' in 'probable' == True and 'ab' in 'Abolition' == False. The second line takes take of dividing the sentence in words and taking out any punctuation character. word = word.lower() makes word lowercase before the comparisons, so that for word = 'Abolition', 'ab' in word == True.
I would do it this way:
Strip your string from unwanted chars using the below two
techniques, your choice:
a - By building a translation dictionary and using translate method:
>>> import string
>>> del_punc = dict.fromkeys(ord(c) for c in string.punctuation)
s = 'abolition, fabrics, probable, test, case, bank;, halfback 1(ablution).'
>>> s = s.translate(del_punc)
>>> print(s)
'abolition fabrics probable test case bank halfback 1ablution'
b - using re.sub method:
>>> import string
>>> import re
>>> s = 'abolition, fabrics, probable, test, case, bank;, halfback 1(ablution).'
>>> s = re.sub(r'[%s]'%string.punctuation, '', s)
>>> print(s)
'abolition fabrics probable test case bank halfback 1ablution'
Next will be finding your words containing 'ab' or 'ba':
a - Splitting over whitespaces and finding occurrences of your desired strings, which is the one I recommend to you:
>>> [x for x in s.split() if 'ab' in x.lower() or 'ba' in x.lower()]
['abolition', 'fabrics', 'probable', 'bank', 'halfback', '1ablution']
b -Using re.finditer method:
>>> pat
re.compile('\\b.*?(ab|ba).*?\\b', re.IGNORECASE)
>>> for m in pat.finditer(s):
print(m.group())
abolition
fabrics
probable
test case bank
halfback
1ablution
string = "your string here"
lowercase = string.lower()
if 'ab' in lowercase or 'ba' in lowercase:
print(true)
else:
print(false)
Try this one
[(),/]*([a-z]|(ba|ab))+[(),/]*
basically i want the script to check the last digit of the raw_input to see if it's integer or not. if integer print true, else print false
here is the code:
word = raw_input("input your alphanumeric word:")
end = re.search(r'\d+$', word)
if end is not None:
print "numeric digit should be last"
else
print "true"
Regex is an overkill for problems as simple as this. Simply index the last element word[-1] and check if its a digit via the builtin String method str.isdigit
word[-1].isdigit()
Note, you may have to consider the fact that the word may be an empty string and have to handle it appropriately.
bool(word) and word[-1].isdigit()
or as suggested by #iCodez use slicing instead of indexing, as slicing would not throw index error for empty string
word[-1:].isdigit()
Example
>>> word = raw_input("input your alphanumeric word:")
input your alphanumeric word:asdf
>>> bool(word) and word[-1].isdigit()
False
>>> word = raw_input("input your alphanumeric word:")
input your alphanumeric word:asd1
>>> bool(word) and word[-1].isdigit()
True
>>> word = raw_input("input your alphanumeric word:")
input your alphanumeric word:
>>> bool(word) and word[-1].isdigit()
False
You can look at the last character of the string using slicing [-1], and use the string method isdigit to see if it is a number.
s1 = 'hello world'
s2 = 'hello again3'
>>> s1[-1].isdigit()
False
>>> s2[-1].isdigit()
True
print 'true' if re.search(r'\d$', word) else 'false'
You could also use:
print 'true' if word[-1].isdigit() else 'false'
...but this will throw an IndexError if the word is zero-length.
In Python, I have a lot of strings, containing spaces.
I would like to clear all spaces from the text, except if it is in quotation marks.
Example input:
This is "an example text" containing spaces.
And I want to get:
Thisis"an example text"containingspaces.
line.split() is not good, I think, because it clears all of spaces from the text.
What do you recommend?
For the simple case that only " are used as quotes:
>>> import re
>>> s = 'This is "an example text" containing spaces.'
>>> re.sub(r' (?=(?:[^"]*"[^"]*")*[^"]*$)', "", s)
'Thisis"an example text"containingspaces.'
Explanation:
[ ] # Match a space
(?= # only if an even number of spaces follows --> lookahead
(?: # This is true when the following can be matched:
[^"]*" # Any number of non-quote characters, then a quote, then
[^"]*" # the same thing again to get an even number of quotes.
)* # Repeat zero or more times.
[^"]* # Match any remaining non-quote characters
$ # and then the end of the string.
) # End of lookahead.
There is probably a more elegant solution than this, but:
>>> test = "This is \"an example text\" containing spaces."
>>> '"'.join([x if i % 2 else "".join(x.split())
for i, x in enumerate(test.split('"'))])
'Thisis"an example text"containingspaces.'
We split the text on quotes, then iterate through them in a list comprehension. We remove the spaces by splitting and rejoining if the index is odd (not inside quotes), and don't if it is even (inside quotes). We then rejoin the whole thing with quotes.
Using re.findall is probably the more easily understood/flexible method:
>>> s = 'This is "an example text" containing spaces.'
>>> ''.join(re.findall(r'(?:".*?")|(?:\S+)', s))
'Thisis"an example text"containingspaces.'
You could (ab)use the csv.reader:
>>> import csv
>>> ''.join(next(csv.reader([s.replace('"', '"""')], delimiter=' ')))
'Thisis"an example text"containingspaces.'
Or using re.split:
>>> ''.join(filter(None, re.split(r'(?:\s*(".*?")\s*)|[ ]', s)))
'Thisis"an example text"containingspaces.'
Use regular expressions!
import cStringIO, re
result = cStringIO.StringIO()
regex = re.compile('("[^"]*")')
text = 'This is "an example text" containing spaces.'
for part in regex.split(text):
if part and part[0] == '"':
result.write(part)
else:
result.write(part.replace(" ", ""))
return result.getvalue()
You can do this with csv as well:
import csv
out=[]
for e in csv.reader('This is "an example text" containing spaces. '):
e=''.join(e)
if e==' ': continue
if ' ' in e: out.extend('"'+e+'"')
else: out.extend(e)
print ''.join(out)
Prints Thisis"an example text"containingspaces.
'"'.join(v if i%2 else v.replace(' ', '') for i, v in enumerate(line.split('"')))
quotation_mark = '"'
space = " "
example = 'foo choo boo "blaee blahhh" didneid ei did '
formated_example = ''
if example[0] == quotation_mark:
inside_quotes = True
else:
inside_quotes = False
for character in example:
if inside_quotes != True:
formated_example += character
else:
if character != space:
formated_example += character
if character == quotation_mark:
if inside_quotes == True:
inside_quotes = False
else:
inside_quotes = True
print formated_example