First and last occurence of a symbol (python without regex)

First and last occurence of a symbol (python without regex) - python

I am dealing with a string from 'ACGT' alphabet (a genetic sequence) padded by letters 'N' in the beginning and in the end:
NNN...NNACGT...GGCTAANNNN...NNN
I would like to find the positions where the actual sequence begins and ends. It could be easily done by using regular expressions, but I would like to have a simpler solution using basic python string operations. Your suggestions will be appreciated.

To get the remainder (removing padding from left and right) it seems like all you need is:
<YourString>.strip('N')
If you need to find indices maybe refer to lstrip and rstrip instead:
sStart = len(<YourString>)-len(<YourString>.lstrip('N'))+1
sEnd = len(<YourString>.rstrip('N'))

Since you mentioned you wanted to find the 'positions'. The code below will give you the positions where the actual sequence starts and ends in the string.
s = 'NNNNAANNNN'
i, j = s.find(next((x for x in s if x != 'N'), None)), s.rfind(next((x for x in reversed(s) if x != 'N'), None))
print(i, j)
print(s[i:j+1])
#Output
4 5
A A

Use strip()
s = "NNNNNACGTGGCTAANNNNNNN"
s = s.strip('N')
print(s)

Related

How do i make the program print specific letters in this specific format i give to it?

so i need to code a program which, for example if given the input 3[a]2[b], prints "aaabb" or when given 3[ab]2[c],prints "abababcc"(basicly prints that amount of that letter in the given order). i tried to use a for loop to iterate the first given input and then detect "[" letters in it so it'll know that to repeatedly print but i don't know how i can make it also understand where that string ends
also this is where i could get it to,which probably isnt too useful:
string=input()
string=string[::-1]
bulundu=6
for i in string:
if i!="]":
if i!="[":
lst.append(i)
if i=="[":
break

The approach I took is to remove the brackets, split the items into a list, then walk the list, and if the item is a number, add that many repeats of the next item to the result for output:
import re
data = "3[a]2[b]"
# Remove brackets and convert to a list
data = re.sub(r'[\[\]]', ' ', data).split()
result = []
for i, item in enumerate(data):
# If item is a number, print that many of the next item
if item.isdigit():
result.append(data[i+1] * int(item))
print(''.join(result))
# aaabb

A different approach, inspired by Subbu's use of re.findall. This approach finds all 'pairs' of numbers and letters using match groups, then multiplies them to produce the required text:
import re
data = "3[a]2[b]"
matches = re.findall('(\d+)\[([a-zA-Z]+)\]',data)
# [(3, 'a'), (2, 'b')]
for x in matches:
print(x[1] * int(x[0]), end='')
#aaabb

Lenghty and documented version using NO regex but simple string and list manipulation:
first split the input into parts that are numbers and texts
then recombinate them again
I opted to document with inline comments
This could be done like so:
# testcases are tuples of input and correct result
testcases = [ ("3[a]2[b]","aaabb"),
("3[ab]2[c]","abababcc"),
("5[12]6[c]","1212121212cccccc"),
("22[a]","a"*22)]
# now we use our algo for all those testcases
for inp,res in testcases:
split_inp = [] # list that takes the splitted values of the input
num = 0 # accumulator variable for more-then-1-digit numbers
in_text = False # bool that tells us if we are currently collecting letters
# go over all letters : O(n)
for c in inp:
# when a [ is reached our num is complete and we need to store it
# we collect all further letters until next ] in a list that we
# add at the end of your split_inp
if c == "[":
split_inp.append(num) # add the completed number
num = 0 # and reset it to 0
in_text = True # now in text
split_inp.append([]) # add a list to collect letters
# done collecting letters
elif c == "]":
in_text = False # no longer collecting, convert letters
split_inp[-1] = ''.join(split_inp[-1]) # to text
# between [ and ] ... simply add letter to list at end
elif in_text:
split_inp[-1].append(c) # add letter
# currently collecting numbers
else:
num *= 10 # increase current number by factor 10
num += int(c) # add newest number
print(repr(inp), split_inp, sep="\n") # debugging output for parsing part
# now we need to build the string from our parsed data
amount = 0
result = [] # intermediate list to join ['aaa','bb']
# iterate the list, if int remember it, it text, build composite
for part in split_inp:
if isinstance(part, int):
amount = part
else:
result.append(part*amount)
# join the parts
result = ''.join(result)
# check if all worked out
if result == res:
print("CORRECT: ", result + "\n")
else:
print (f"INCORRECT: should be '{res}' but is '{result}'\n")
Result:
'3[a]2[b]'
[3, 'a', 2, 'b']
CORRECT: aaabb
'3[ab]2[c]'
[3, 'ab', 2, 'c']
CORRECT: abababcc
'5[12]6[c]'
[5, '12', 6, 'c']
CORRECT: 1212121212cccccc
'22[a]'
[22, 'a']
CORRECT: aaaaaaaaaaaaaaaaaaaaaa
This will also handle cases of '5[12]' wich some of the other solutions wont.

You can capture both the number of repetitions n and the pattern to repeat v in one go using the described pattern. This essentially matches any sequence of digits - which is the first group we need to capture, reason why \d+ is between brackets (..) - followed by a [, followed by anything - this anything is the second pattern of interest, hence it is between backets (...) - which is then followed by a ].
findall will find all these matches in the passed line, then the first match - the number - will be cast to an int and used as a multiplier for the string pattern. The list of int(n) * v is then joined with an empty space. Malformed patterns may throw exceptions or return nothing.
Anyway, in code:
import re
pattern = re.compile("(\d+)\[(.*?)\]")
def func(x): return "".join([v*int(n) for n,v in pattern.findall(x)])
print(func("3[a]2[b]"))
print(func("3[ab]2[c]"))
OUTPUT
aaabb
abababcc
FOLLOW UP
Another solution which achieves the same result, without using regular expression (ok, not nice at all, I get it...):
def func(s): return "".join([int(x[0])*x[1] for x in map(lambda x:x.split("["), s.split("]")) if len(x) == 2])

I am not much more than a beginner and looking at the other answers, I thought understanding regex might be a challenge for a new contributor such as yourself since I myself haven't really dealt with regex.
The beginner friendly way to do this might be to loop through the input string and use string functions like isnumeric() and isalpha()
data = "3[a]2[b]"
chars = []
nums = []
substrings = []
for i, char in enumerate(data):
if char.isnumeric():
nums.append(char)
if char.isalpha():
chars.append(char)
for i, char in enumerate(chars):
substrings.append(char * int(nums[i]))
string = "".join(substrings)
print(string)
OUTPUT:
aaabb
And on trying different values for data:
data = "0[a]2[b]3[p]"
OUTPUT bbppp
data = "1[a]1[a]2[a]"
OUTPUT aaaa
NOTE: In case you're not familiar with the above functions, they are string functions, which are fairly self-explanatory. They are used as <your_string_here>.isalpha() which returns true if and only if the string is an alphabet (whitespace, numerics, and symbols return false
And, similarly for isnumeric()
For example,
"]".isnumeric() and "]".isalpha() return False
"a".isalpha() returns True
IF YOU NEED ANY CLARIFICATION ON A FUNCTION USED, PLEASE DO NOT HESITATE TO LEAVE A COMMENT

Regex-python: Match a string in alphabetical order [duplicate]

So I have a challenge I'm working on - find the longest string of alphabetical characters in a string. For example, "abcghiijkyxz" should result in "ghiijk" (Yes the i is doubled).
I've been doing quite a bit with loops to solve the problem - iterating over the entire string, then for each character, starting a second loop using lower and ord. No help needed writing that loop.
However, it was suggested to me that Regex would be great for this sort of thing. My regex is weak (I know how to grab a static set, my look-forwards knowledge extends to knowing they exist). How would I write a Regex to look forward, and check future characters for being next in alphabetical order? Or is the suggestion to use Regex not practical for this type of thing?
Edit: The general consensus seems to be that Regex is indeed terrible for this type of thing.

Just to demonstrate why regex is not practical for this sort of thing, here is a regex that would match ghiijk in your given example of abcghiijkyxz. Note it'll also match abc, y, x, z since they should technically be considered for longest string of alphabetical characters in order. Unfortunately, you can't determine which is the longest with regex alone, but this does give you all the possibilities. Please note that this regex works for PCRE and will not work with python's re module! Also, note that python's regex library does not currently support (*ACCEPT). Although I haven't tested, the pyre2 package (python wrapper for Google's re2 pyre2 using Cython) claims it supports the (*ACCEPT) control verb, so this may currently be possible using python.
See regex in use here
((?:a+(?(?!b)(*ACCEPT))|b+(?(?!c)(*ACCEPT))|c+(?(?!d)(*ACCEPT))|d+(?(?!e)(*ACCEPT))|e+(?(?!f)(*ACCEPT))|f+(?(?!g)(*ACCEPT))|g+(?(?!h)(*ACCEPT))|h+(?(?!i)(*ACCEPT))|i+(?(?!j)(*ACCEPT))|j+(?(?!k)(*ACCEPT))|k+(?(?!l)(*ACCEPT))|l+(?(?!m)(*ACCEPT))|m+(?(?!n)(*ACCEPT))|n+(?(?!o)(*ACCEPT))|o+(?(?!p)(*ACCEPT))|p+(?(?!q)(*ACCEPT))|q+(?(?!r)(*ACCEPT))|r+(?(?!s)(*ACCEPT))|s+(?(?!t)(*ACCEPT))|t+(?(?!u)(*ACCEPT))|u+(?(?!v)(*ACCEPT))|v+(?(?!w)(*ACCEPT))|w+(?(?!x)(*ACCEPT))|x+(?(?!y)(*ACCEPT))|y+(?(?!z)(*ACCEPT))|z+(?(?!$)(*ACCEPT)))+)
Results in:
abc
ghiijk
y
x
z
Explanation of a single option, i.e. a+(?(?!b)(*ACCEPT)):
a+ Matches a (literally) one or more times. This catches instances where several of the same characters are in sequence such as aa.
(?(?!b)(*ACCEPT)) If clause evaluating the condition.
(?!b) Condition for the if clause. Negative lookahead ensuring what follows is not b. This is because if it's not b, we want the following control verb to take effect.
(*ACCEPT) If the condition (above) is met, we accept the current solution. This control verb causes the regex to end successfully, skipping the rest of the pattern. Since this token is inside a capturing group, only that capturing group is ended successfully at that particular location, while the parent pattern continues to execute.
So what happens if the condition is not met? Well, that means that (?!b) evaluated to false. This means that the following character is, in fact, b and so we allow the matching (rather capturing in this instance) to continue. Note that the entire pattern is wrapped in (?:)+ which allows us to match consecutive options until the (*ACCEPT) control verb or end of line is met.
The only exception to this whole regular expression is that of z. Being that it's the last character in the English alphabet (which I presume is the target of this question), we don't care what comes after, so we can simply put z+(?(?!$)(*ACCEPT)), which will ensure nothing matches after z. If you, instead, want to match za (circular alphabetical order matching - idk if this is the proper terminology, but it sounds right to me) you can use z+(?(?!a)(*ACCEPT)))+ as seen here.

As mentioned, regex is not the best tool for this. Since you are interested in a continuous sequence, you can do this with a single for loop:
def LNDS(s):
start = 0
cur_len = 1
max_len = 1
for i in range(1,len(s)):
if ord(s[i]) in (ord(s[i-1]), ord(s[i-1])+1):
cur_len += 1
else:
if cur_len > max_len:
max_len = cur_len
start = i - cur_len
cur_len = 1
if cur_len > max_len:
max_len = cur_len
start = len(s) - cur_len
return s[start:start+max_len]
>>> LNDS('abcghiijkyxz')
'ghiijk'
We keep a running total of how many non-decreasing characters we have seen, and when the non-decreasing sequence ends we compare it to the longest non-decreasing sequence we saw previously, updating our "best seen so far" if it is longer.

Generate all the regex substrings like ^a+b+c+$ (longest to shortest).
Then match each of those regexs against all the substrings (longest to shortest) of "abcghiijkyxz" and stop at the first match.
def all_substrings(s):
n = len(s)
for i in xrange(n, 0, -1):
for j in xrange(n - i + 1):
yield s[j:j + i]
def longest_alphabetical_substring(s):
for t in all_substrings("abcdefghijklmnopqrstuvwxyz"):
r = re.compile("^" + "".join(map(lambda x: x + "+", t)) + "$")
for u in all_substrings(s):
if r.match(u):
return u
print longest_alphabetical_substring("abcghiijkyxz")
That prints "ghiijk".

Regex: char+ meaning a+b+c+...
Details:
+ Matches between one and unlimited times
Python code:
import re
def LNDS(text):
array = []
for y in range(97, 122): # a - z
st = r"%s+" % chr(y)
for x in range(y+1, 123): # b - z
st += r"%s+" % chr(x)
match = re.findall(st, text)
if match:
array.append(max(match, key=len))
else:
break
if array:
array = [max(array, key=len)]
return array
Output:
print(LNDS('abababababab abc')) >>> ['abc']
print(LNDS('abcghiijkyxz')) >>> ['ghiijk']
For string abcghiijkyxz regex pattern:
a+b+ i+j+k+l+
a+b+c+ j+k+
a+b+c+d+ j+k+l+
b+c+ k+l+
b+c+d+ l+m+
c+d+ m+n+
d+e+ n+o+
e+f+ o+p+
f+g+ p+q+
g+h+ q+r+
g+h+i+ r+s+
g+h+i+j+ s+t+
g+h+i+j+k+ t+u+
g+h+i+j+k+l+ u+v+
h+i+ v+w+
h+i+j+ w+x+
h+i+j+k+ x+y+
h+i+j+k+l+ y+z+
i+j+
i+j+k+
Code demo

To actually "solve" the problem, you could use
string = 'abcxyzghiijkl'
def sort_longest(string):
stack = []; result = [];
for idx, char in enumerate(string):
c = ord(char)
if idx == 0:
# initialize our stack
stack.append((char, c))
elif idx == len(string) - 1:
result.append(stack)
elif c == stack[-1][1] or c == stack[-1][1] + 1:
# compare it to the item before (a tuple)
stack.append((char, c))
else:
# append the stack to the overall result
# and reinitialize the stack
result.append(stack)
stack = []
stack.append((char, c))
return ["".join(item[0]
for item in sublst)
for sublst in sorted(result, key=len, reverse=True)]
print(sort_longest(string))
Which yields
['ghiijk', 'abc', 'xyz']
in this example.
The idea is to loop over the string and keep track of a stack variable which is filled by your requirements using ord().

It's really easy with regexps!
(Using trailing contexts here)
rexp=re.compile(
"".join(['(?:(?=.' + chr(ord(x)+1) + ')'+ x +')?'
for x in "abcdefghijklmnopqrstuvwxyz"])
+'[a-z]')
a = 'bcabhhjabjjbckjkjabckkjdefghiklmn90'
re.findall(rexp, a)
#Answer: ['bc', 'ab', 'h', 'h', 'j', 'ab', 'j', 'j', 'bc', 'k', 'jk', 'j', 'abc', 'k', 'k', 'j', 'defghi', 'klmn']

How can I use Regex to find a string of characters in alphabetical order using Python?

So I have a challenge I'm working on - find the longest string of alphabetical characters in a string. For example, "abcghiijkyxz" should result in "ghiijk" (Yes the i is doubled).
I've been doing quite a bit with loops to solve the problem - iterating over the entire string, then for each character, starting a second loop using lower and ord. No help needed writing that loop.
However, it was suggested to me that Regex would be great for this sort of thing. My regex is weak (I know how to grab a static set, my look-forwards knowledge extends to knowing they exist). How would I write a Regex to look forward, and check future characters for being next in alphabetical order? Or is the suggestion to use Regex not practical for this type of thing?
Edit: The general consensus seems to be that Regex is indeed terrible for this type of thing.

Just to demonstrate why regex is not practical for this sort of thing, here is a regex that would match ghiijk in your given example of abcghiijkyxz. Note it'll also match abc, y, x, z since they should technically be considered for longest string of alphabetical characters in order. Unfortunately, you can't determine which is the longest with regex alone, but this does give you all the possibilities. Please note that this regex works for PCRE and will not work with python's re module! Also, note that python's regex library does not currently support (*ACCEPT). Although I haven't tested, the pyre2 package (python wrapper for Google's re2 pyre2 using Cython) claims it supports the (*ACCEPT) control verb, so this may currently be possible using python.
See regex in use here
((?:a+(?(?!b)(*ACCEPT))|b+(?(?!c)(*ACCEPT))|c+(?(?!d)(*ACCEPT))|d+(?(?!e)(*ACCEPT))|e+(?(?!f)(*ACCEPT))|f+(?(?!g)(*ACCEPT))|g+(?(?!h)(*ACCEPT))|h+(?(?!i)(*ACCEPT))|i+(?(?!j)(*ACCEPT))|j+(?(?!k)(*ACCEPT))|k+(?(?!l)(*ACCEPT))|l+(?(?!m)(*ACCEPT))|m+(?(?!n)(*ACCEPT))|n+(?(?!o)(*ACCEPT))|o+(?(?!p)(*ACCEPT))|p+(?(?!q)(*ACCEPT))|q+(?(?!r)(*ACCEPT))|r+(?(?!s)(*ACCEPT))|s+(?(?!t)(*ACCEPT))|t+(?(?!u)(*ACCEPT))|u+(?(?!v)(*ACCEPT))|v+(?(?!w)(*ACCEPT))|w+(?(?!x)(*ACCEPT))|x+(?(?!y)(*ACCEPT))|y+(?(?!z)(*ACCEPT))|z+(?(?!$)(*ACCEPT)))+)
Results in:
abc
ghiijk
y
x
z
Explanation of a single option, i.e. a+(?(?!b)(*ACCEPT)):
a+ Matches a (literally) one or more times. This catches instances where several of the same characters are in sequence such as aa.
(?(?!b)(*ACCEPT)) If clause evaluating the condition.
(?!b) Condition for the if clause. Negative lookahead ensuring what follows is not b. This is because if it's not b, we want the following control verb to take effect.
(*ACCEPT) If the condition (above) is met, we accept the current solution. This control verb causes the regex to end successfully, skipping the rest of the pattern. Since this token is inside a capturing group, only that capturing group is ended successfully at that particular location, while the parent pattern continues to execute.
So what happens if the condition is not met? Well, that means that (?!b) evaluated to false. This means that the following character is, in fact, b and so we allow the matching (rather capturing in this instance) to continue. Note that the entire pattern is wrapped in (?:)+ which allows us to match consecutive options until the (*ACCEPT) control verb or end of line is met.
The only exception to this whole regular expression is that of z. Being that it's the last character in the English alphabet (which I presume is the target of this question), we don't care what comes after, so we can simply put z+(?(?!$)(*ACCEPT)), which will ensure nothing matches after z. If you, instead, want to match za (circular alphabetical order matching - idk if this is the proper terminology, but it sounds right to me) you can use z+(?(?!a)(*ACCEPT)))+ as seen here.

As mentioned, regex is not the best tool for this. Since you are interested in a continuous sequence, you can do this with a single for loop:
def LNDS(s):
start = 0
cur_len = 1
max_len = 1
for i in range(1,len(s)):
if ord(s[i]) in (ord(s[i-1]), ord(s[i-1])+1):
cur_len += 1
else:
if cur_len > max_len:
max_len = cur_len
start = i - cur_len
cur_len = 1
if cur_len > max_len:
max_len = cur_len
start = len(s) - cur_len
return s[start:start+max_len]
>>> LNDS('abcghiijkyxz')
'ghiijk'
We keep a running total of how many non-decreasing characters we have seen, and when the non-decreasing sequence ends we compare it to the longest non-decreasing sequence we saw previously, updating our "best seen so far" if it is longer.

Generate all the regex substrings like ^a+b+c+$ (longest to shortest).
Then match each of those regexs against all the substrings (longest to shortest) of "abcghiijkyxz" and stop at the first match.
def all_substrings(s):
n = len(s)
for i in xrange(n, 0, -1):
for j in xrange(n - i + 1):
yield s[j:j + i]
def longest_alphabetical_substring(s):
for t in all_substrings("abcdefghijklmnopqrstuvwxyz"):
r = re.compile("^" + "".join(map(lambda x: x + "+", t)) + "$")
for u in all_substrings(s):
if r.match(u):
return u
print longest_alphabetical_substring("abcghiijkyxz")
That prints "ghiijk".

Regex: char+ meaning a+b+c+...
Details:
+ Matches between one and unlimited times
Python code:
import re
def LNDS(text):
array = []
for y in range(97, 122): # a - z
st = r"%s+" % chr(y)
for x in range(y+1, 123): # b - z
st += r"%s+" % chr(x)
match = re.findall(st, text)
if match:
array.append(max(match, key=len))
else:
break
if array:
array = [max(array, key=len)]
return array
Output:
print(LNDS('abababababab abc')) >>> ['abc']
print(LNDS('abcghiijkyxz')) >>> ['ghiijk']
For string abcghiijkyxz regex pattern:
a+b+ i+j+k+l+
a+b+c+ j+k+
a+b+c+d+ j+k+l+
b+c+ k+l+
b+c+d+ l+m+
c+d+ m+n+
d+e+ n+o+
e+f+ o+p+
f+g+ p+q+
g+h+ q+r+
g+h+i+ r+s+
g+h+i+j+ s+t+
g+h+i+j+k+ t+u+
g+h+i+j+k+l+ u+v+
h+i+ v+w+
h+i+j+ w+x+
h+i+j+k+ x+y+
h+i+j+k+l+ y+z+
i+j+
i+j+k+
Code demo

To actually "solve" the problem, you could use
string = 'abcxyzghiijkl'
def sort_longest(string):
stack = []; result = [];
for idx, char in enumerate(string):
c = ord(char)
if idx == 0:
# initialize our stack
stack.append((char, c))
elif idx == len(string) - 1:
result.append(stack)
elif c == stack[-1][1] or c == stack[-1][1] + 1:
# compare it to the item before (a tuple)
stack.append((char, c))
else:
# append the stack to the overall result
# and reinitialize the stack
result.append(stack)
stack = []
stack.append((char, c))
return ["".join(item[0]
for item in sublst)
for sublst in sorted(result, key=len, reverse=True)]
print(sort_longest(string))
Which yields
['ghiijk', 'abc', 'xyz']
in this example.
The idea is to loop over the string and keep track of a stack variable which is filled by your requirements using ord().

It's really easy with regexps!
(Using trailing contexts here)
rexp=re.compile(
"".join(['(?:(?=.' + chr(ord(x)+1) + ')'+ x +')?'
for x in "abcdefghijklmnopqrstuvwxyz"])
+'[a-z]')
a = 'bcabhhjabjjbckjkjabckkjdefghiklmn90'
re.findall(rexp, a)
#Answer: ['bc', 'ab', 'h', 'h', 'j', 'ab', 'j', 'j', 'bc', 'k', 'jk', 'j', 'abc', 'k', 'k', 'j', 'defghi', 'klmn']

Python: How to check if two inputs A and B are anagrams without all punctuation, and all uppercase letters were lower case letters

The first part of the question is to check if input A and input B are anagrams, which I can do easily enough.
s = input ("Word 1?")
b = sorted(s)
c = ''.join(b)
t = input("Word 2?")
a = sorted(t)
d = ''.join(b)
if d == c:
print("Anagram!")
else:
print("Not Anagram!")
The problem is the second part of the question - I need to check if two words are anagrams if all of the punctuation is removed, the upper case letters turned to lower case, but the question assumes no spaces are used. So, for example, (ACdB;,.Eo,."kl) and (oadcbE,LK) are anagrams. The question also asks for loops to be used.
s = input ("Word 1?")
s = s.lower()
for i in range (0, len(s)):
if ord(s[i]) < 97 or ord(s[i]) >122:
s = s.replace(s[i], '')
b = sorted(s)
c = ''.join(b)
print(c)
Currently, the above code is saying the string index is out of range.

Here's the loop you need to add, in psuedocode:
s = input ("Word 1?")
s_letters = ''
for letter in s:
if it's punctuation: skip it
else if it's uppercase: add the lowercase version to s_letters
else: add it to s_letters
b = sorted(s_letters)
Except of course that you need to add the same thing for t as well. If you've learned about functions, you will want to write this as a function, and call it twice, instead of copying and pasting it with minor changes.

There are three big problems with your loop. You need to solve all three of these, not just one.
First, s = s.replace(s[i], '') doesn't replace the ith character with a space, it replaces the ith character and every other copy of the same character with a space. That's going to screw up the rest of your loop if there are any duplicates. It's also very slow, because you have to search the entire string over and over again.
The right way to replace the character at a specific index is to use slicing: s = s[:i] + s[i+1:].
Or, you could make this a lot simpler by turning the string into a list of characters (s = list(s)), you can mutate it in-place (del s[i]).
Next, we're going through the loop 6 times, checking s[0], s[1], s[2], s[3], s[4], and s[5]. But somewhere along the way, we're going to remove some of the characters (ideally three of them). So some of those indices will be past the end of the string, which will raise an IndexError. I won't explain how to fix this yet, because it ties directly into the next problem.
Modifying a sequence while you loop over it always breaks your loop.* Imagine starting with s = '123abc'. Let's step through the loop.
i = 0, so you check s[0], which is 1, so you remove it, leaving s = '23abc'.
i = 1, so you check s[1], which is 3, so you remove it, leaving s = '2abc'.
i = 2, so you check s[2], which is b, so you leave it, leaving s = '2abc'.
And so on.
The 2 got moved to s[0] by removing the 1. But you're never going to come back to i = 0 once you've passed it. So, you're never going to check the 2. You can solve this in a few different ways—iterating backward, doing a while instead of an if each time through the for, etc.—but most of those solutions will just exacerbate the previous problem.
The easy way to solve both problems is to just not modify the string while you loop over it. You could do this by, e.g., building up a list of indexes to remove as you go along, then applying that in reverse order.
But a much easier way to do it is to just build up the characters you want to keep as you go along. And that also solves the first problem for your automatically.
So:
new_s = []
for i in range (0, len(s)):
if ord(s[i]) < 97 or ord(s[i]) >122:
pass
else:
new_s.append(s[i])
b = sorted(new_s)
And with that relative minor change, your code works.
While we're at it, there are a few ways you're overcomplicating things.
First, you don't need to do ord(s[i]) < 97; you can just do s[i] < 'a'. This makes things a lot more readable.
But, even more simply, you can just use the isalpha or islower method. (Since you've already converted to lower, and you're only dealing with one character at a time, it doesn't really matter which.) Besides being more readable, and harder to get wrong, this has the advantage of working with non-ASCII characters, like é.
Finally, you almost never want to write a loop like this:
for i in range(len(s)):
That forces you to write s[i] all over the place, when you could have just looped over s in the first place:
for ch in s:
So, putting it all together, here's your code, with the two simple fixes, and the cleanup:
s = input ("Word 1?")
s = s.lower()
new_s = []
for ch in s:
if ch.isalpha():
new_s.append(ch)
b = sorted(new_s)
c = ''.join(b)
print(c)
If you know about comprehensions or higher-order functions, you'll recognize this pattern as exactly what a list comprehension does. So, you can turn the whole 4 lines of code that build new_s into either of these one-liners, which are more readable as well as being shorter:
new_s = (ch for ch in s if ch.isalpha)
new_s = filter(str.isalpha, s)
And in fact, the whole thing can become a one-liner:
b = sorted(ch for ch in s.lower() if ch.isalpha)
But your teacher asked you to use a for statement, so you'd better keep it as a for statement.
* This isn't quite true. If you only modify the part of the sequence after the current index, and you make sure the sequence aways has the right length by the time you get to each index even though it may have had a different length before you did (using a while loop instead of a for loop, to reevaluate len(seq) each time, makes this part trivial instead of hard), then it works. But it's easier to just never do it to than learn the rules and carefully analyze your code to see if you're getting away with it this time.

Swapping every second character in a string in Python

I have the following problem: I would like to write a function in Python which, given a string, returns a string where every group of two characters is swapped.
For example given "ABCDEF" it returns "BADCFE".
The length of the string would be guaranteed to be an even number.
Can you help me how to do it in Python?

To add another option:
>>> s = 'abcdefghijkl'
>>> ''.join([c[1] + c[0] for c in zip(s[::2], s[1::2])])
'badcfehgjilk'

import re
print re.sub(r'(.)(.)', r'\2\1', "ABCDEF")

from itertools import chain, izip_longest
''.join(chain.from_iterable(izip_longest(s[1::2], s[::2], fillvalue = '')))
You can also use islices instead of regular slices if you have very large strings or just want to avoid the copying.
Works for odd length strings even though that's not a requirement of the question.

While the above solutions do work, there is a very simple solution shall we say in "layman's" terms. Someone still learning python and string's can use the other answers but they don't really understand how they work or what each part of the code is doing without a full explanation by the poster as opposed to "this works". The following executes the swapping of every second character in a string and is easy for beginners to understand how it works.
It is simply iterating through the string (any length) by two's (starting from 0 and finding every second character) and then creating a new string (swapped_pair) by adding the current index + 1 (second character) and then the actual index (first character), e.g., index 1 is put at index 0 and then index 0 is put at index 1 and this repeats through iteration of string.
Also added code to ensure string is of even length as it only works for even length.
string = "abcdefghijklmnopqrstuvwxyz123"
# use this prior to below iteration if string needs to be even but is possibly odd
if len(string) % 2 != 0:
string = string[:-1]
# iteration to swap every second character in string
swapped_pair = ""
for i in range(0, len(string), 2):
swapped_pair += (string[i + 1] + string[i])
# use this after above iteration for any even or odd length of strings
if len(swapped_pair) % 2 != 0:
swapped_adj += swapped_pair[-1]
print(swapped_pair)
badcfehgjilknmporqtsvuxwzy21 # output if the "needs to be even" code used
badcfehgjilknmporqtsvuxwzy213 # output if the "even or odd" code used

Here's a nifty solution:
def swapem (s):
if len(s) < 2: return s
return "%s%s%s"%(s[1], s[0], swapem (s[2:]))
for str in ("", "a", "ab", "abcdefgh", "abcdefghi"):
print "[%s] -> [%s]"%(str, swapem (str))
though possibly not suitable for large strings :-)
Output is:
[] -> []
[a] -> [a]
[ab] -> [ba]
[abcdefgh] -> [badcfehg]
[abcdefghi] -> [badcfehgi]

If you prefer one-liners:
''.join(reduce(lambda x,y: x+y,[[s[1+(x<<1)],s[x<<1]] for x in range(0,len(s)>>1)]))

Here's a another simple solution:
"".join([(s[i:i+2])[::-1]for i in range(0,len(s),2)])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

First and last occurence of a symbol (python without regex) - python

To get the remainder (removing padding from left and right) it seems like all you need is: <YourString>.strip('N') If you need to find indices maybe refer to lstrip and rstrip instead: sStart = len(<YourString>)-len(<YourString>.lstrip('N'))+1 sEnd = len(<YourString>.rstrip('N'))

Use strip() s = "NNNNNACGTGGCTAANNNNNNN" s = s.strip('N') print(s)

Related

How do i make the program print specific letters in this specific format i give to it?

Regex-python: Match a string in alphabetical order [duplicate]

How can I use Regex to find a string of characters in alphabetical order using Python?

Python: How to check if two inputs A and B are anagrams without all punctuation, and all uppercase letters were lower case letters

Swapping every second character in a string in Python

Categories

Resources