Check special symbols in string endings - python

How to check special symbols such as !?,(). in the words ending? For example Hello??? or Hello,, or Hello! returns True but H!??llo or Hel,lo returns False.
I know how to check the only last symbol of string but how to check if two or more last characters are symbols?

You may have to use regex for this.
import re
def checkword(word):
m = re.match("\w+[!?,().]+$", word)
if m is not None:
return True
return False
That regex is:
\w+ # one or more word characters (a-zA-z)
[!?,().]+ # one or more of the characters inside the brackets
# (this is called a character class)
$ # assert end of string
Using re.match forces the match to begin at the beginning of the string, or else we'd have to use ^ before the regular expression.

You can try something like this:
word = "Hello!"
def checkSym(word):
return word[-1] in "!?,()."
print(checkSym(word))
The result is:
True
Try giving different strings as input and check the results.
In case you want to find every symbol from the end of the string, you can use:
def symbolCount(word):
i = len(word)-1
c = 0
while word[i] in "!?,().":
c = c + 1
i = i - 1
return c
Testing it with word = "Hello!?.":
print(symbolCount(word))
The result is:
3

If you want to get a count of the 'special' characters at the end of a given string.
special = '!?,().'
s = 'Hello???'
count = 0
for c in s[::-1]:
if c in special:
count += 1
else:
break
print("Found {} special characters at the end of the string.".format(count))

You can use re.findall:
import re
s = "Hello???"
if re.findall('\W+$', s):
pass

You could try this.
string="gffrwr."
print(string[-1] in "!?,().")

Related

python question about loops comparing string to string

I am trying to emulate a circumstance where i send information and only get a true or false as a return. So i can check each character and if it is true, that means that character is in the string. I would know there would be a position 0 to some number x. I would receive a true result and eventually only receive false result and then I would know the string has been solved. In my circumstance i would not know the target string.
I am trying to iterate through all characters and see if it matches the string character. if it does, I add the character to a list until the list contains all the characters of the string. but for some reason, this isn't working.
import string
hi = list()
swoll = "dkjfksjdfksjdkfjksdjfsjkdfjsjreuvnslei"
characters = string.ascii_lowercase + string.ascii_uppercase + string.digits
for ch in characters:
print(''.join(hi) + ch)
for i in swoll:
if i == ch:
hi.append(ch)
print(''.join(hi))
break
else:
continue
results:
a
b
c
d
d
de
de
def
def
defg
defh
defi
defi
defij
defij
defijk
defijk
defijkl
defijkl
defijklm
defijkln
defijkln
defijklno
defijklnp
defijklnq
defijklnr
defijklnr
defijklnrs
defijklnrs
defijklnrst
defijklnrsu
defijklnrsu
defijklnrsuv
defijklnrsuv
defijklnrsuvw
defijklnrsuvx
defijklnrsuvy
defijklnrsuvz
defijklnrsuvA
defijklnrsuvB
defijklnrsuvC
defijklnrsuvD
defijklnrsuvE
defijklnrsuvF
defijklnrsuvG
defijklnrsuvH
defijklnrsuvI
defijklnrsuvJ
defijklnrsuvK
defijklnrsuvL
defijklnrsuvM`
As you can see, it does not match the string
When I tried the code above, I was expecting the string to come out the same as the other string.
Based on my understanding of the question, I've implemented a function which I believe emulates the interface you are talking to:
spos = 0
def in_swoll(ch):
global spos
if spos == len(swoll) or ch != swoll[spos]:
return False
spos += 1
return True
This will return True and increment the counter into swoll when a character matches, otherwise it will return False.
You can then use this function in a loop which iterates until False is returned for all characters in characters. Inside the loop characters is iterated until a match is found, at which point it is added to hi:
hi = []
while True:
for ch in characters:
if in_swoll(ch):
hi.append(ch)
print(''.join(hi))
break
else:
# no matches, we're done
break
Output for your sample data:
d
dk
dkj
dkjf
dkjfk
dkjfks
dkjfksj
dkjfksjd
dkjfksjdf
dkjfksjdfk
dkjfksjdfks
dkjfksjdfksj
dkjfksjdfksjd
dkjfksjdfksjdk
dkjfksjdfksjdkf
dkjfksjdfksjdkfj
dkjfksjdfksjdkfjk
dkjfksjdfksjdkfjks
dkjfksjdfksjdkfjksd
dkjfksjdfksjdkfjksdj
dkjfksjdfksjdkfjksdjf
dkjfksjdfksjdkfjksdjfs
dkjfksjdfksjdkfjksdjfsj
dkjfksjdfksjdkfjksdjfsjk
dkjfksjdfksjdkfjksdjfsjkd
dkjfksjdfksjdkfjksdjfsjkdf
dkjfksjdfksjdkfjksdjfsjkdfj
dkjfksjdfksjdkfjksdjfsjkdfjs
dkjfksjdfksjdkfjksdjfsjkdfjsj
dkjfksjdfksjdkfjksdjfsjkdfjsjr
dkjfksjdfksjdkfjksdjfsjkdfjsjre
dkjfksjdfksjdkfjksdjfsjkdfjsjreu
dkjfksjdfksjdkfjksdjfsjkdfjsjreuv
dkjfksjdfksjdkfjksdjfsjkdfjsjreuvn
dkjfksjdfksjdkfjksdjfsjkdfjsjreuvns
dkjfksjdfksjdkfjksdjfsjkdfjsjreuvnsl
dkjfksjdfksjdkfjksdjfsjkdfjsjreuvnsle
dkjfksjdfksjdkfjksdjfsjkdfjsjreuvnslei

Split string by comma, but ignore commas within brackets

I'm trying to split a string by commas using python:
s = "year:2020,concepts:[ab553,cd779],publisher:elsevier"
But I want to ignore any commas within brackets []. So the result for above would be:
["year:2020", "concepts:[ab553,cd779]", "publisher:elsevier"]
Anybody have advice on how to do this? I tried to use re.split like so:
params = re.split(",(?![\w\d\s])", param)
But it is not working properly.
result = re.split(r",(?!(?:[^,\[\]]+,)*[^,\[\]]+])", subject, 0)
, # Match the character “,” literally
(?! # Assert that it is impossible to match the regex below starting at this position (negative lookahead)
(?: # Match the regular expression below
[^,\[\]] # Match any single character NOT present in the list below
# The literal character “,”
# The literal character “[”
# The literal character “]”
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
, # Match the character “,” literally
)
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
[^,\[\]] # Match any single character NOT present in the list below
# The literal character “,”
# The literal character “[”
# The literal character “]”
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
] # Match the character “]” literally
)
Updated to support more than 2 items in brackets. E.g.
year:2020,concepts:[ab553,cd779],publisher:elsevier,year:2020,concepts:[ab553,cd779,xx345],publisher:elsevier
This regex works on your example:
,(?=[^,]+?:)
Here, we use a positive lookahead to look for commas followed by non-comma and colon characters, then a colon. This correctly finds the <comma><key> pattern you are searching for. Of course, if the keys are allowed to have commas, this would have to be adapted a little further.
You can check out the regexr here
You can work this out using a user-defined function instead of split:
s = "year:2020,concepts:[ab553,cd779],publisher:elsevier"
def split_by_commas(s):
lst = list()
last_bracket = ''
word = ""
for c in s:
if c == '[' or c == ']':
last_bracket = c
if c == ',' and last_bracket == ']':
lst.append(word)
word = ""
continue
elif c == ',' and last_bracket == '[':
word += c
continue
elif c == ',':
lst.append(word)
word = ""
continue
word += c
lst.append(word)
return lst
main_lst = split_by_commas(s)
print(main_lst)
The result of the run of above code:
['year:2020', 'concepts:[ab553,cd779]', 'publisher:elsevier']
Using a pattern with only a lookahead to assert a character to the right, will not assert if there is an accompanying character on the left.
Instead of using split, you could either match 1 or more repetitions of values between square brackets, or match any character except a comma.
(?:[^,]*\[[^][]*])+[^,]*|[^,]+
Regex demo
s = "year:2020,concepts:[ab553,cd779],publisher:elsevier"
params = re.findall(r"(?:[^,]*\[[^][]*])+[^,]*|[^,]+", s)
print(params)
Output
['year:2020', 'concepts:[ab553,cd779]', 'publisher:elsevier']
I adapted #Bemwa's solution (which didn't work for my use-case)
def split_by_commas(s):
lst = list()
brackets = 0
word = ""
for c in s:
if c == "[":
brackets += 1
elif c == "]":
if brackets > 0:
brackets -= 1
elif c == "," and not brackets:
lst.append(word)
word = ""
continue
word += c
lst.append(word)
return lst

Regex Python with min a letter, a number and min a non-alphanumeric character

I would like to check if a string contains at least: 12 characters, min a letter, min a number and finally min a non-alphanumeric character.
I am in the process of creating a Regex but it does not meet my expectations.
Here is the Regex:
regex = re.compile('([A-Za-z]+[0-9]+\W+){12,}')
def is_valid(string):
return re.fullmatch(regex, string) is not None
test_string = "abdfjhfl58425!!"
print(is_valid(test_string))
When the string contains numbers after letters, it does not match!
Could you help me? Thank you.
Your regex is wrong. I found this on another post which describes a different scenario albeit very similar.
You can tweak this regex so that it reads like this:
^(.{0,12}|[^a-zA-Z]{1,}|[^\d]{1,}|[^\W]{1,})$|[\s]
Now what you have here is a regex that matches only when the password is invalid. Meaning that if you have no matches, the password is valid, and if you have matches the password is invalid. So you will need to alter the code to suit but try that regex above instead and it should work for all combinations.
The final working code would then be (with extra tests):
import re
regex = re.compile('^(.{0,12}|[^a-zA-Z]{1,}|[^\d]{1,}|[^\W]{1,})$|[\s]')
def is_valid(string):
return re.fullmatch(regex, string) is None
test_string = "abdfl58425B!!"
print(is_valid(test_string))
test_string = "ABRER58425B!!"
print(is_valid(test_string))
test_string = "eruaso58425!!"
print(is_valid(test_string))
Regex is not really suited to this task as it involves remembering counts of each type of character. You could construct a regex to do it but it would end up being very long and unreadable. Much simpler to write a function to count the number of occurrences of each type of character, something like:
def is_valid(test_string):
if len(test_string) >= 12 \
and len([c for c in test_string if c.isalpha()]) >= 1 \
and len([c for c in test_string if c.isnumeric()]) >= 1 \
and len([c for c in test_string if not c.isalnum()]) >= 1:
return True
else:
return False
If that helps: if you want to do the same thing but without ReGex, you can use this function that I had done! It works perfectly!
def is_strong_password(a_string):
if len(a_string) >= 12:
chiffre = 0
lettre = 0
alnum = 0
for x in a_string:
if x.isalpha():
lettre += 1
if x.isdigit():
chiffre += 1
if not x.isalnum():
alnum += 1
if lettre > 1 and chiffre > 1 and alnum > 1:
return True
else:
return False
else:
return False
You could four positive lookaheads:
(?i)(?=.{12})(?=.*[a-z])(?=.*\d)(?=.*[^a-z\d])
Demo
(?i) specifies that matches are to be case-indifferent.
The four positive lookaheads are as follows:
(?=.{12}) # assert that the string contains (at least) 12 characters
(?=.*[a-z]) # assert that the string contains a letter
(?=.*\d) # assert that the string contains a digit
(?=.*[^a-z\d]) # assert that the string contains a non-alphanumeric character

How to check that a string contains only “a-z”, “A-Z” and “0-9” characters [duplicate]

This question already has an answer here:
Learning Regular Expressions [closed]
(1 answer)
Closed 3 years ago.
I am importing string and trying to check if text contains only "a-z", "A-Z", and "0-9".
But I get only input and it doesn't print success when I enter letters and digits
import string
text=input("Enter: ")
correct = string.ascii_letters + string.digits
if text in correct:
print("Success")
You could use regex for this, e.g. check string against following pattern:
import re
pattern = re.compile("[A-Za-z0-9]+")
pattern.fullmatch(string)
Explanation:
[A-Za-z0-9] matches a character in the range of A-Z, a-z and 0-9, so letters and numbers.
+ means to match 1 or more of the preceeding token.
The re.fullmatch() method allows to check if the whole string matches the regular expression pattern. Returns a corresponding match object if match found, else returns None if the string does not match the pattern.
All together:
import re
if __name__ == '__main__':
string = "YourString123"
pattern = re.compile("[A-Za-z0-9]+")
# if found match (entire string matches pattern)
if pattern.fullmatch(string) is not None:
print("Found match: " + string)
else:
# if not found match
print("No match")
Just use str.isalnum()
>>> '123AbC'.isalnum()
True
>>> '1&A'.isalnum()
False
Referencing the docs:
Return true if all characters in the string are alphanumeric and there
is at least one character, false otherwise. A character c is alphanumeric
if one of the following returns True: c.isalpha(), c.isdecimal(),
c.isdigit(), or c.isnumeric().
If you don't want str.isdigit() or str.isnumeric() to be checked which may allow for decimal points in digits just use str.isnumeric() and str.isalpha():
>>> all(c.isnumeric() or c.isalpha() for c in '123AbC')
True
>>> all(c.isnumeric() or c.isalpha() for c in '1&A')
False
You must compare each letter of the incoming text separately.
import string
text = input("Enter: ")
correct = string.ascii_letters + string.digits
status = True
for char in text:
if char not in correct:
status = False
if status:
print('Correct')
else:
print('InCorrect')
You are testing if the entire string is in the string 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'
So say the input is 'testInput'
You are checking if 'testInput' is in the string 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789' which it is not.
You need to check each character individually
You can do this with a function for ease of use
import string
def validCharacters(text):
correct = string.ascii_letters + string.digits
for character in text:
if character not in correct:
return False
return True
text = input("Enter: ")
if validCharacters(text):
print("Correct")
else:
print("Incorrect")
We can remove all characters that aren't A-Z, a-z, or 0-9 then check the length of the remaining string. If it's greater than 0, there are characters that aren't the ones above:
import re
text = input("Enter: ")
result = re.sub("[A-Za-z0-9]", '', text)
if len(result) == 0:
print("Success")
else:
print("Failure")
Result:
abcDEF123 -> Success
hello! -> Failure
You can do it in many various ways, I would harness sets for that following way:
import string
correct = {char for char in string.ascii_letters + string.digits}
def is_correct(text):
return {char for char in text}.issubset(correct)
print(is_correct('letters123')) # True
print(is_correct('???')) # False
print(is_correct('\n')) # False
I simply check if set consisting of characters of given text is subset of set of all legal characters. Keep in mind that this always process whole text, which is not best from performance point of view (as you might end check after you find first illegal character), but it should not be problem unless you need to deal with very long texts or in very small amount of time.
You can use regex match for checking if the string contains only alpha and letters
import re
text = input("enter:")
regex = r"([0-9a-zA-Z]+)"
match = re.match(regex, string)
if match != None:
print("success")

How to find the largest repeating substring given character in Python?

Given some string say 'aabaaab', how would I go about finding the largest substring of a. So it should return 'aaa'. Any help would be greatly appreciated.
def sub_string(s):
best_run = 0
current_run = 0
for char in s:
if char == 'a'
current_run += 1
else:
current_letter = char
return(best_run)
I have something like the one above. Not sure where I can fix it up.
not the most efficient, but a straightforward solution:
word = "aasfgaaassaasdsddaaaaaafff"
substr_count = 0
substr_counts = []
character = "f"
for i, letter in enumerate(word):
if (letter == character):
substr_count += 1
else:
substr_counts.append(substr_count)
substr_count = 0
if (i == len(word) - 1):
substr_counts.append(substr_count)
print(max(substr_counts))
If you want a short method using standard python tools (and avoid writing loops to reconstruct the string as you iterate), you can use regex to split the string by any non-a characters than get the max() according to len:
import re
test_string = 'aabaaab'
split_string_list = re.split( '[^a]', test_string )
longest_string_subset = max( split_string_list, key=len )
print( longest_string_subset )
The re library is for regex, the '[^a]' is a regex statement for any non-a character. Basically, the 'aabaaab' is being split into a list according to any matches on the regex statement, so that it becomes [ 'aa' 'aaa' '' ]. Then, the max() statement looks for the longest string based on len (aka length).
You can read more about functions like re.split() in the docs: https://docs.python.org/2/library/re.html

Categories

Resources