Trying to Iterate through strings to add certain characters to new string - python

I am trying to remove certain characters from a string and it was suggested that i try and make a new string and just add characters that meet my criteria. I use a for loop to iterate through the string but then characters arent added to the new string-it prints a blank string. Ill include an ss.

you should use the in operator for comparing a value with multiple other values:
if s[i] not in 'AEIOUaeiou' or s[i-1] == ' ':
# if you prefer lists / tuples / sets of characters, you can use those instead:
# if s[i] not in ['A', 'E', 'I', 'O', ...]
answer_string += s[i]

You should use:
answer_string += s[i]
As, your current statement answer_string + s is not doing what you are hoping for.
This is what I understood from the given context. It would be better if you could post a code snippet with reference for better understanding the issue.

You have to check the character with each on it's own. That would be:
if s[i] != "A" or s[i] != "B" ... :
A more elegant solution would be:
if s[i] not in ["ABCD..."]:
Also what #avats said, you should be adding the single character, not the whole string
answer_string += s[i]
To make the checking case insensitive, so you wouldn't have to type out all the uppercase letters and lowercase, use lower():
if s[i].lower() not in ["abcd"]:

Related

Regex-python: Match a string in alphabetical order [duplicate]

So I have a challenge I'm working on - find the longest string of alphabetical characters in a string. For example, "abcghiijkyxz" should result in "ghiijk" (Yes the i is doubled).
I've been doing quite a bit with loops to solve the problem - iterating over the entire string, then for each character, starting a second loop using lower and ord. No help needed writing that loop.
However, it was suggested to me that Regex would be great for this sort of thing. My regex is weak (I know how to grab a static set, my look-forwards knowledge extends to knowing they exist). How would I write a Regex to look forward, and check future characters for being next in alphabetical order? Or is the suggestion to use Regex not practical for this type of thing?
Edit: The general consensus seems to be that Regex is indeed terrible for this type of thing.
Just to demonstrate why regex is not practical for this sort of thing, here is a regex that would match ghiijk in your given example of abcghiijkyxz. Note it'll also match abc, y, x, z since they should technically be considered for longest string of alphabetical characters in order. Unfortunately, you can't determine which is the longest with regex alone, but this does give you all the possibilities. Please note that this regex works for PCRE and will not work with python's re module! Also, note that python's regex library does not currently support (*ACCEPT). Although I haven't tested, the pyre2 package (python wrapper for Google's re2 pyre2 using Cython) claims it supports the (*ACCEPT) control verb, so this may currently be possible using python.
See regex in use here
((?:a+(?(?!b)(*ACCEPT))|b+(?(?!c)(*ACCEPT))|c+(?(?!d)(*ACCEPT))|d+(?(?!e)(*ACCEPT))|e+(?(?!f)(*ACCEPT))|f+(?(?!g)(*ACCEPT))|g+(?(?!h)(*ACCEPT))|h+(?(?!i)(*ACCEPT))|i+(?(?!j)(*ACCEPT))|j+(?(?!k)(*ACCEPT))|k+(?(?!l)(*ACCEPT))|l+(?(?!m)(*ACCEPT))|m+(?(?!n)(*ACCEPT))|n+(?(?!o)(*ACCEPT))|o+(?(?!p)(*ACCEPT))|p+(?(?!q)(*ACCEPT))|q+(?(?!r)(*ACCEPT))|r+(?(?!s)(*ACCEPT))|s+(?(?!t)(*ACCEPT))|t+(?(?!u)(*ACCEPT))|u+(?(?!v)(*ACCEPT))|v+(?(?!w)(*ACCEPT))|w+(?(?!x)(*ACCEPT))|x+(?(?!y)(*ACCEPT))|y+(?(?!z)(*ACCEPT))|z+(?(?!$)(*ACCEPT)))+)
Results in:
abc
ghiijk
y
x
z
Explanation of a single option, i.e. a+(?(?!b)(*ACCEPT)):
a+ Matches a (literally) one or more times. This catches instances where several of the same characters are in sequence such as aa.
(?(?!b)(*ACCEPT)) If clause evaluating the condition.
(?!b) Condition for the if clause. Negative lookahead ensuring what follows is not b. This is because if it's not b, we want the following control verb to take effect.
(*ACCEPT) If the condition (above) is met, we accept the current solution. This control verb causes the regex to end successfully, skipping the rest of the pattern. Since this token is inside a capturing group, only that capturing group is ended successfully at that particular location, while the parent pattern continues to execute.
So what happens if the condition is not met? Well, that means that (?!b) evaluated to false. This means that the following character is, in fact, b and so we allow the matching (rather capturing in this instance) to continue. Note that the entire pattern is wrapped in (?:)+ which allows us to match consecutive options until the (*ACCEPT) control verb or end of line is met.
The only exception to this whole regular expression is that of z. Being that it's the last character in the English alphabet (which I presume is the target of this question), we don't care what comes after, so we can simply put z+(?(?!$)(*ACCEPT)), which will ensure nothing matches after z. If you, instead, want to match za (circular alphabetical order matching - idk if this is the proper terminology, but it sounds right to me) you can use z+(?(?!a)(*ACCEPT)))+ as seen here.
As mentioned, regex is not the best tool for this. Since you are interested in a continuous sequence, you can do this with a single for loop:
def LNDS(s):
start = 0
cur_len = 1
max_len = 1
for i in range(1,len(s)):
if ord(s[i]) in (ord(s[i-1]), ord(s[i-1])+1):
cur_len += 1
else:
if cur_len > max_len:
max_len = cur_len
start = i - cur_len
cur_len = 1
if cur_len > max_len:
max_len = cur_len
start = len(s) - cur_len
return s[start:start+max_len]
>>> LNDS('abcghiijkyxz')
'ghiijk'
We keep a running total of how many non-decreasing characters we have seen, and when the non-decreasing sequence ends we compare it to the longest non-decreasing sequence we saw previously, updating our "best seen so far" if it is longer.
Generate all the regex substrings like ^a+b+c+$ (longest to shortest).
Then match each of those regexs against all the substrings (longest to shortest) of "abcghiijkyxz" and stop at the first match.
def all_substrings(s):
n = len(s)
for i in xrange(n, 0, -1):
for j in xrange(n - i + 1):
yield s[j:j + i]
def longest_alphabetical_substring(s):
for t in all_substrings("abcdefghijklmnopqrstuvwxyz"):
r = re.compile("^" + "".join(map(lambda x: x + "+", t)) + "$")
for u in all_substrings(s):
if r.match(u):
return u
print longest_alphabetical_substring("abcghiijkyxz")
That prints "ghiijk".
Regex: char+ meaning a+b+c+...
Details:
+ Matches between one and unlimited times
Python code:
import re
def LNDS(text):
array = []
for y in range(97, 122): # a - z
st = r"%s+" % chr(y)
for x in range(y+1, 123): # b - z
st += r"%s+" % chr(x)
match = re.findall(st, text)
if match:
array.append(max(match, key=len))
else:
break
if array:
array = [max(array, key=len)]
return array
Output:
print(LNDS('abababababab abc')) >>> ['abc']
print(LNDS('abcghiijkyxz')) >>> ['ghiijk']
For string abcghiijkyxz regex pattern:
a+b+ i+j+k+l+
a+b+c+ j+k+
a+b+c+d+ j+k+l+
b+c+ k+l+
b+c+d+ l+m+
c+d+ m+n+
d+e+ n+o+
e+f+ o+p+
f+g+ p+q+
g+h+ q+r+
g+h+i+ r+s+
g+h+i+j+ s+t+
g+h+i+j+k+ t+u+
g+h+i+j+k+l+ u+v+
h+i+ v+w+
h+i+j+ w+x+
h+i+j+k+ x+y+
h+i+j+k+l+ y+z+
i+j+
i+j+k+
Code demo
To actually "solve" the problem, you could use
string = 'abcxyzghiijkl'
def sort_longest(string):
stack = []; result = [];
for idx, char in enumerate(string):
c = ord(char)
if idx == 0:
# initialize our stack
stack.append((char, c))
elif idx == len(string) - 1:
result.append(stack)
elif c == stack[-1][1] or c == stack[-1][1] + 1:
# compare it to the item before (a tuple)
stack.append((char, c))
else:
# append the stack to the overall result
# and reinitialize the stack
result.append(stack)
stack = []
stack.append((char, c))
return ["".join(item[0]
for item in sublst)
for sublst in sorted(result, key=len, reverse=True)]
print(sort_longest(string))
Which yields
['ghiijk', 'abc', 'xyz']
in this example.
The idea is to loop over the string and keep track of a stack variable which is filled by your requirements using ord().
It's really easy with regexps!
(Using trailing contexts here)
rexp=re.compile(
"".join(['(?:(?=.' + chr(ord(x)+1) + ')'+ x +')?'
for x in "abcdefghijklmnopqrstuvwxyz"])
+'[a-z]')
a = 'bcabhhjabjjbckjkjabckkjdefghiklmn90'
re.findall(rexp, a)
#Answer: ['bc', 'ab', 'h', 'h', 'j', 'ab', 'j', 'j', 'bc', 'k', 'jk', 'j', 'abc', 'k', 'k', 'j', 'defghi', 'klmn']

How does the loop help iterate in this code

The problem at hand is that given a string S, we can transform every letter individually to be lowercase or uppercase to create another string.
Desired result is a list of all possible strings we could create.
Eg:
Input:
S = "a1b2"
Output:
["a1b2", "a1B2", "A1b2", "A1B2"]
I see the below code generates the correct result, but I'm a beginner in Python and can you help me understand how does loop line 5 & 7 work, which assign value to res.
def letterCasePermutation(self, S):
res = ['']
for ch in S:
if ch.isalpha():
res = [i+j for i in res for j in [ch.upper(), ch.lower()]]
else:
res = [i+ch for i in res]
return res
The result is a list of all possible strings up to this point. One call to the function handles the next character.
If the character is a non-letter (line 7), the comprehension simply adds that character to each string in the list.
If the character is a letter, then the new list contains two strings for each one in the input: one with the upper-case version added, one for the lower-case version.
If you're still confused, then I strongly recommend that you make an attempt to understand this with standard debugging techniques. Insert a couple of useful print statements to display the values that confuse you.
def letterCasePermutation(self, S):
res = ['']
for ch in S:
print("char = ", ch)
if ch.isalpha():
res = [i+j for i in res for j in [ch.upper(), ch.lower()]]
else:
res = [i+ch for i in res]
print(res)
return res
letterCasePermutation(None, "a1b2")
Output:
char = a
['A', 'a']
char = 1
['A1', 'a1']
char = b
['A1B', 'A1b', 'a1B', 'a1b']
char = 2
['A1B2', 'A1b2', 'a1B2', 'a1b2']
Best way to analyze this code is include the line:
print(res)
at the end of the outer for loop, as first answer suggests.
Then run it with the string '123' and the string 'abc' which will isolate the two conditionals. This gives the following output:
['1']
['12']
['123']
and
['A','a']
['AB','Ab','aB','ab']
['ABC','ABc','AbC','aBC','Abc','aBc','abC','abc']
Here we can see the loop is just taking the previously generated list as its input, and if the next string char is not a letter, is simply tagging the number/symbol onto the end of each string in the list, via string concatenation. If the next char in the initial input string is a letter, however, then the list is doubled in length by creating two copies for each item in the list, while simultaneously appending an upper version of the new char to the first copy, and a lower version of the new char to the second copy.
For an interesting result, see how the code fails if this change is made at line 2:
res = []

How do i extract a sub string which is composed of certain characters from a string in python?

For example,
suppose I have a string "beabeefeab".
I want to extract a substring which is composed of only 'b' and 'a'
that is "babab".
I applied brute force by implementing a nested loop and deleting all characters but 'b' and 'a'
You can do it using a simple list comprehension
a = "beabeefeab"
print("".join([i for i in a if (i == 'a' or i =='b')]))
Output:
babab
Not very elegant, but it works.
a = "beabeefeab"
answer = ""
for char in a:
if char == "a" or char == "b":
answer += char
print(answer)
Output
babab
Using a set to keep the allowed chars, making this a bit more extensible:
s = "beabeefeab"
allowed = set('ab')
print("".join(x for x in s if x in allowed))

How can I use Regex to find a string of characters in alphabetical order using Python?

So I have a challenge I'm working on - find the longest string of alphabetical characters in a string. For example, "abcghiijkyxz" should result in "ghiijk" (Yes the i is doubled).
I've been doing quite a bit with loops to solve the problem - iterating over the entire string, then for each character, starting a second loop using lower and ord. No help needed writing that loop.
However, it was suggested to me that Regex would be great for this sort of thing. My regex is weak (I know how to grab a static set, my look-forwards knowledge extends to knowing they exist). How would I write a Regex to look forward, and check future characters for being next in alphabetical order? Or is the suggestion to use Regex not practical for this type of thing?
Edit: The general consensus seems to be that Regex is indeed terrible for this type of thing.
Just to demonstrate why regex is not practical for this sort of thing, here is a regex that would match ghiijk in your given example of abcghiijkyxz. Note it'll also match abc, y, x, z since they should technically be considered for longest string of alphabetical characters in order. Unfortunately, you can't determine which is the longest with regex alone, but this does give you all the possibilities. Please note that this regex works for PCRE and will not work with python's re module! Also, note that python's regex library does not currently support (*ACCEPT). Although I haven't tested, the pyre2 package (python wrapper for Google's re2 pyre2 using Cython) claims it supports the (*ACCEPT) control verb, so this may currently be possible using python.
See regex in use here
((?:a+(?(?!b)(*ACCEPT))|b+(?(?!c)(*ACCEPT))|c+(?(?!d)(*ACCEPT))|d+(?(?!e)(*ACCEPT))|e+(?(?!f)(*ACCEPT))|f+(?(?!g)(*ACCEPT))|g+(?(?!h)(*ACCEPT))|h+(?(?!i)(*ACCEPT))|i+(?(?!j)(*ACCEPT))|j+(?(?!k)(*ACCEPT))|k+(?(?!l)(*ACCEPT))|l+(?(?!m)(*ACCEPT))|m+(?(?!n)(*ACCEPT))|n+(?(?!o)(*ACCEPT))|o+(?(?!p)(*ACCEPT))|p+(?(?!q)(*ACCEPT))|q+(?(?!r)(*ACCEPT))|r+(?(?!s)(*ACCEPT))|s+(?(?!t)(*ACCEPT))|t+(?(?!u)(*ACCEPT))|u+(?(?!v)(*ACCEPT))|v+(?(?!w)(*ACCEPT))|w+(?(?!x)(*ACCEPT))|x+(?(?!y)(*ACCEPT))|y+(?(?!z)(*ACCEPT))|z+(?(?!$)(*ACCEPT)))+)
Results in:
abc
ghiijk
y
x
z
Explanation of a single option, i.e. a+(?(?!b)(*ACCEPT)):
a+ Matches a (literally) one or more times. This catches instances where several of the same characters are in sequence such as aa.
(?(?!b)(*ACCEPT)) If clause evaluating the condition.
(?!b) Condition for the if clause. Negative lookahead ensuring what follows is not b. This is because if it's not b, we want the following control verb to take effect.
(*ACCEPT) If the condition (above) is met, we accept the current solution. This control verb causes the regex to end successfully, skipping the rest of the pattern. Since this token is inside a capturing group, only that capturing group is ended successfully at that particular location, while the parent pattern continues to execute.
So what happens if the condition is not met? Well, that means that (?!b) evaluated to false. This means that the following character is, in fact, b and so we allow the matching (rather capturing in this instance) to continue. Note that the entire pattern is wrapped in (?:)+ which allows us to match consecutive options until the (*ACCEPT) control verb or end of line is met.
The only exception to this whole regular expression is that of z. Being that it's the last character in the English alphabet (which I presume is the target of this question), we don't care what comes after, so we can simply put z+(?(?!$)(*ACCEPT)), which will ensure nothing matches after z. If you, instead, want to match za (circular alphabetical order matching - idk if this is the proper terminology, but it sounds right to me) you can use z+(?(?!a)(*ACCEPT)))+ as seen here.
As mentioned, regex is not the best tool for this. Since you are interested in a continuous sequence, you can do this with a single for loop:
def LNDS(s):
start = 0
cur_len = 1
max_len = 1
for i in range(1,len(s)):
if ord(s[i]) in (ord(s[i-1]), ord(s[i-1])+1):
cur_len += 1
else:
if cur_len > max_len:
max_len = cur_len
start = i - cur_len
cur_len = 1
if cur_len > max_len:
max_len = cur_len
start = len(s) - cur_len
return s[start:start+max_len]
>>> LNDS('abcghiijkyxz')
'ghiijk'
We keep a running total of how many non-decreasing characters we have seen, and when the non-decreasing sequence ends we compare it to the longest non-decreasing sequence we saw previously, updating our "best seen so far" if it is longer.
Generate all the regex substrings like ^a+b+c+$ (longest to shortest).
Then match each of those regexs against all the substrings (longest to shortest) of "abcghiijkyxz" and stop at the first match.
def all_substrings(s):
n = len(s)
for i in xrange(n, 0, -1):
for j in xrange(n - i + 1):
yield s[j:j + i]
def longest_alphabetical_substring(s):
for t in all_substrings("abcdefghijklmnopqrstuvwxyz"):
r = re.compile("^" + "".join(map(lambda x: x + "+", t)) + "$")
for u in all_substrings(s):
if r.match(u):
return u
print longest_alphabetical_substring("abcghiijkyxz")
That prints "ghiijk".
Regex: char+ meaning a+b+c+...
Details:
+ Matches between one and unlimited times
Python code:
import re
def LNDS(text):
array = []
for y in range(97, 122): # a - z
st = r"%s+" % chr(y)
for x in range(y+1, 123): # b - z
st += r"%s+" % chr(x)
match = re.findall(st, text)
if match:
array.append(max(match, key=len))
else:
break
if array:
array = [max(array, key=len)]
return array
Output:
print(LNDS('abababababab abc')) >>> ['abc']
print(LNDS('abcghiijkyxz')) >>> ['ghiijk']
For string abcghiijkyxz regex pattern:
a+b+ i+j+k+l+
a+b+c+ j+k+
a+b+c+d+ j+k+l+
b+c+ k+l+
b+c+d+ l+m+
c+d+ m+n+
d+e+ n+o+
e+f+ o+p+
f+g+ p+q+
g+h+ q+r+
g+h+i+ r+s+
g+h+i+j+ s+t+
g+h+i+j+k+ t+u+
g+h+i+j+k+l+ u+v+
h+i+ v+w+
h+i+j+ w+x+
h+i+j+k+ x+y+
h+i+j+k+l+ y+z+
i+j+
i+j+k+
Code demo
To actually "solve" the problem, you could use
string = 'abcxyzghiijkl'
def sort_longest(string):
stack = []; result = [];
for idx, char in enumerate(string):
c = ord(char)
if idx == 0:
# initialize our stack
stack.append((char, c))
elif idx == len(string) - 1:
result.append(stack)
elif c == stack[-1][1] or c == stack[-1][1] + 1:
# compare it to the item before (a tuple)
stack.append((char, c))
else:
# append the stack to the overall result
# and reinitialize the stack
result.append(stack)
stack = []
stack.append((char, c))
return ["".join(item[0]
for item in sublst)
for sublst in sorted(result, key=len, reverse=True)]
print(sort_longest(string))
Which yields
['ghiijk', 'abc', 'xyz']
in this example.
The idea is to loop over the string and keep track of a stack variable which is filled by your requirements using ord().
It's really easy with regexps!
(Using trailing contexts here)
rexp=re.compile(
"".join(['(?:(?=.' + chr(ord(x)+1) + ')'+ x +')?'
for x in "abcdefghijklmnopqrstuvwxyz"])
+'[a-z]')
a = 'bcabhhjabjjbckjkjabckkjdefghiklmn90'
re.findall(rexp, a)
#Answer: ['bc', 'ab', 'h', 'h', 'j', 'ab', 'j', 'j', 'bc', 'k', 'jk', 'j', 'abc', 'k', 'k', 'j', 'defghi', 'klmn']

Rm duplication in list comprehension

Input is a string, the idea is to count the letters A-z only, and print them alphabetically with the count of appearances.
As usual I kept at this 'till I got a working result, but now seek to optimize it in order to better understand the Python way of doing things.
def string_lower_as_list(string):
"""
>>> string_lower_as_list('a bC')
['a', ' ', 'b', 'c']
"""
return list(string.lower())
from sys import argv
letters = [letter for letter in string_lower_as_list(argv[1])
if ord(letter) < 124 and ord(letter) > 96]
uniques = sorted(set(letters))
for let in uniques:
print let, letters.count(let)
How do I remove the duplication of ord(letter) in the list comprehension?
Would there have been any benefit in using a Dictionary or Tuple in this instance, if so, how?
EDIT
Should have said, Python 2.7 on win32
You can compare letters directly and you actually only need to compare lower case letters
letters = [letter for letter in string_lower_as_list(argv[1])
if "a" <= letter <= "z"]
But better would be to use a dictionary to count the values. letters.count has to traverse the list every time you call it. But you are already traversing the list to filter out the right characters, so why not count them at the same time?
letters = {}
for letter in string_lower_as_list(argv[1]):
if "a" <= letter <= "z":
letters[letter] = letters.get(letter, 0) + 1
for letter in sorted(letters):
print letter, letters[letter]
Edit: As the others said, you don't have to convert the string to a list. You can iterate over it directly: for letter in argv[1].lower().
How do I remove the duplication of ord(letter) in the list comprehension?
You can use a very Python-specific and somewhat magical idiom that doesn't work in other languages: if 96 < ord(letter) < 124.
Would there have been any benefit in using a Dictionary or Tuple in this instance, if so, how?
You could try using the collections.Counter class added in Python 2.7.
P.S. You don't need to convert the string to a list in order to iterate over it in the list comprehension. Any iterable will work, and strings are iterable.
P.S. 2. To get the property 'this letter is alphabetic', instead of lowercasing and comparing to a range, just use str.isalpha. Unicode objects provide the same method, which allows the same code to Just Work with text in foreign languages, without having to know which characters are "letters". :)
You don't have to convert string to list, string is iterable:
letters = {}
for letter in argv[1].lower():
if "a" <= letter <= "z":
letters[letter] = letters.get(letter, 0) + 1
for letter in sorted(letters.keys()):
print letter, letters[letter]

Categories

Resources