strip remove '_' unexpectedly - python

>>> x = 'abc_cde_fgh'
>>> x.strip('abc_cde')
'fgh'
_fgh is expected.
How to understard this result?

Strip removes any characters it finds from either end from the substring: it doesn't remove a trailing or leading word.
This example demonstrates it nicely:
x.strip('ab_ch')
'de_fg'
Since the characters "a", "b", "c", "h", and "_" are in the remove case, the leading "abc_c" are all removed. The other characters are not removed.
If you would like to remove a leading or trailing word, I would recommend using re or startswith/endswith.
def rstrip_word(str, word):
if str.endswith(word):
return str[:-len(word)]
return str
def lstrip_word(str, word):
if str.startswith(word):
return str[len(word):]
return str
def strip_word(str, word):
return rstrip_word(lstrip_word(str, word), word)
Removing Multiple Words
A very simple implementation (a greedy one) to remove multiple words from a string can be done as follows:
def rstrip_word(str, *words):
for word in words:
if str.endswith(word):
return str[:-len(word)]
return str
def lstrip_word(str, *words):
for word in words:
if str.startswith(word):
return str[len(word):]
return str
def strip_word(str, *words):
return rstrip_word(lstrip_word(str, *words), *words)
Please note this algorithm is greedy, it will find the first possible example and then return: it may not behave as you expect. Finding the maximum length match (although not too tricky) is a bit more involved.
>>> strip_word(x, "abc", "adc_")
'_cde_fgh'

strip() removes characters, not a substring. For example:
x.strip('abcde_')
'fgh'

In the documentation of the strip method "The chars argument is a string specifying the set of characters to be removed." That is why every chars except "fgh" are removed (including the two underscores).

Related

Swap last two characters in a string, make it lowercase, and add a space

I'm trying to take the last two letters of a string, swap them, make them lowercase, and leave a space in the middle. For some reason the output gives me white space before the word.
For example if input was APPLE then the out put should be e l
It would be nice to also be nice to ignore non string characters so if the word was App3e then the output would be e p
def last_Letters(word):
last_two = word[-2:]
swap = last_two[-1:] + last_two[:1]
for i in swap:
if i.isupper():
swap = swap.lower()
return swap[0]+ " " +swap[1]
word = input(" ")
print(last_Letters(word))
You can try with the following function:
import re
def last_Letters(word):
letters = re.sub(r'\d', '', word)
if len(letters) > 1:
return letters[-1].lower() + ' ' + letters[-2].lower()
return None
It follows these steps:
removes all the digits
if there are at least two characters:
lowers every character
builds the required string by concatenation of the nth letter, a space and the nth-1 letter
and returns the string
returns "None"
Since I said there was a simpler way, here's what I would write:
text = input()
result = ' '.join(reversed([ch.lower() for ch in text if ch.isalpha()][-2:]))
print(result)
How this works:
[ch.lower() for ch in text] creates a list of lowercase characters from some iterable text
adding if ch.isalpha() filters out anything that isn't an alphabetical character
adding [-2:] selects the last two from the preceding sequence
and reversed() takes the sequence and returns an iterable with the elements in reverse
' '.join(some_iterable) will join the characters in the iterable together with spaces in between.
So, result is set to be the last two characters of all of the alphabetical characters in text, in reverse order, separated by a space.
Part of what makes Python so powerful and popular, is that once you learn to read the syntax, the code very naturally tells you exactly what it is doing. If you read out the statement, it is self-describing.

Print a string without any other characters except letters, and replace the space with an underscore

I need to print a string, using this rules:
The first letter should be capital and make all other letters are lowercase. Only the characters a-z A-Z are allowed in the name, any other letters have to be deleted(spaces and tabs are not allowed and use underscores are used instead) and string could not be longer then 80 characters.
It seems to me that it is possible to do it somehow like this:
name = "hello2 sjsjs- skskskSkD"
string = name[0].upper() + name[1:].lower()
lenght = len(string) - 1
answer = ""
for letter in string:
x = letter.isalpha()
if x == False:
answer = string.replace(letter,"")
........
return answer
I think it's better to use a for loop or isalpha () here, but I can't think of a better way to do it. Can someone tell me how to do this?
For one-to-one and one-to-None mappings of characters, you can use the .translate() method of strings. The string module provides lists (strings) of the various types of characters including one for all letters in upper and lowercase (string.ascii_letters) but you could also use your own constant string such as 'abcdef....xyzABC...XYZ'.
import string
def cleanLetters(S):
nonLetters = S.translate(str.maketrans('','',' '+string.ascii_letters))
return S.translate(str.maketrans(' ','_',nonLetters))
Output:
cleanLetters("hello2 sjsjs- skskskSkD")
'hello_sjsjs_skskskSkD'
One method to accomplish this is to use regular expressions (regex) via the built-in re library. This enables the capturing of only the valid characters, and ignoring the rest.
Then, using basic string tools for the replacement and capitalisation, then a slice at the end.
For example:
import re
name = 'hello2 sjsjs- skskskSkD'
trans = str.maketrans({' ': '_', '\t': '_'})
''.join(re.findall('[a-zA-Z\s\t]', name)).translate(trans).capitalize()[:80]
>>> 'Hello_sjsjs_skskskskd'
Strings are immutable, so every time you do string.replace() it needs to iterate over the entire string to find characters to replace, and a new string is created. Instead of doing this, you could simply iterate over the current string and create a new list of characters that are valid. When you're done iterating over the string, use str.join() to join them all.
answer_l = []
for letter in string:
if letter == " " or letter == "\t":
answer_l.append("_") # Replace spaces or tabs with _
elif letter.isalpha():
answer_l.append(letter) # Use alphabet characters as-is
# else do nothing
answer = "".join(answer_l)
With string = 'hello2 sjsjs- skskskSkD', we have answer = 'hello_sjsjs_skskskSkD';
Now you could also write this using a generator expression instead of creating the entire list and then joining it. First, we define a function that returns the letter or "_" for our first two conditions, and an empty string for the else condition
def translate(letter):
if letter == " " or letter == "\t":
return "_"
elif letter.isalpha():
return letter
else:
return ""
Then,
answer = "".join(
translate(letter) for letter in string
)
To enforce the 80-character limit, just take answer[:80]. Because of the way slices work in python, this won't throw an error even when the length of answer is less than 80.

Excluding ASCII characters

I'm creating a palindrome checker and it works however I need to find a way to replace/remove punctuation from the given input. I'm trying to do for chr(i) i in range 32,47 then substitute those in with ''. The characters I need excluded are 32 - 47. I've tried using the String module but I can only get it to either exclude spaces or punctuation it can't be both for whatever reason.
I've already tried the string module but can't get that to remove spaces and punctuation at the same time.
def is_palindrome_stack(string):
s = ArrayStack()
for character in string:
s.push(character)
reversed_string = ''
while not s.is_empty():
reversed_string = reversed_string + s.pop()
if string == reversed_string:
return True
else:
return False
def remove_punctuation(text):
return text.replace(" ",'')
exclude = set(string.punctuation)
return ''.join(ch for ch in text if ch not in exclude)
That is because you are returning from your method in the very first line, in return text.replace(" ",''). Change it to text = text.replace(" ", "") and it should work fine.
Also, the indentation is probably messed up in your post, maybe during copy pasting.
Full method snippet:
def remove_punctuation(text):
text = text.replace(" ",'')
exclude = set(string.punctuation)
return ''.join(ch for ch in text if ch not in exclude)
You might use str methods to get rid of unwanted characters following way:
import string
tr = ''.maketrans('','',' '+string.punctuation)
def remove_punctuation(text):
return text.translate(tr)
txt = 'Point.Space Question?'
output = remove_punctuation(txt)
print(output)
Output:
PointSpaceQuestion
maketrans create replacement table, it accepts 3 str-s: first and second must be equal length, n-th character from first will be replaced with n-th character from second, third str is characters to remove. You need only to remove (not replace) characters so first two arguments are empty strs.

Issue with stripping a variable? [duplicate]

>>> x = 'abc_cde_fgh'
>>> x.strip('abc_cde')
'fgh'
_fgh is expected.
How to understard this result?
Strip removes any characters it finds from either end from the substring: it doesn't remove a trailing or leading word.
This example demonstrates it nicely:
x.strip('ab_ch')
'de_fg'
Since the characters "a", "b", "c", "h", and "_" are in the remove case, the leading "abc_c" are all removed. The other characters are not removed.
If you would like to remove a leading or trailing word, I would recommend using re or startswith/endswith.
def rstrip_word(str, word):
if str.endswith(word):
return str[:-len(word)]
return str
def lstrip_word(str, word):
if str.startswith(word):
return str[len(word):]
return str
def strip_word(str, word):
return rstrip_word(lstrip_word(str, word), word)
Removing Multiple Words
A very simple implementation (a greedy one) to remove multiple words from a string can be done as follows:
def rstrip_word(str, *words):
for word in words:
if str.endswith(word):
return str[:-len(word)]
return str
def lstrip_word(str, *words):
for word in words:
if str.startswith(word):
return str[len(word):]
return str
def strip_word(str, *words):
return rstrip_word(lstrip_word(str, *words), *words)
Please note this algorithm is greedy, it will find the first possible example and then return: it may not behave as you expect. Finding the maximum length match (although not too tricky) is a bit more involved.
>>> strip_word(x, "abc", "adc_")
'_cde_fgh'
strip() removes characters, not a substring. For example:
x.strip('abcde_')
'fgh'
In the documentation of the strip method "The chars argument is a string specifying the set of characters to be removed." That is why every chars except "fgh" are removed (including the two underscores).

Removing Words that contain non-ascii characters using Python

I am using the following function to strip out non-ascii characters
def removeNonAscii(s):
return "".join(filter(lambda x: ord(x)<128, s))
def removeNonAscii1(s):
return "".join(i for i in s if ord(i)<128)
I would now like to remove the entire word if it contains any non-ascii characters. I thought of measuring the length pre and post function application but I am confident that there is a more efficient way. Any ideas?
If you define the word based on spaces, something like this might work:
def containsNonAscii(s):
return any(ord(i)>127 for i in s)
words = sentence.split()
cleaned_words = [word for word in words if not containsNonAscii(word)]
cleaned_sentence = ' '.join(cleaned_words)
Note that this will collapse repeated whitespace into just one space.
The most clean (but not necessarily most efficient) way is to convert a word to a binary and attempt to decode it as ASCII. If the attempt fails, the word has non-ASCII characters:
def is_ascii(w):
try:
w.encode().decode("us-ascii")
return True
except UnicodeEncodeError:
return False
I came up with the following function. I removes all words that contain any ASCII character but probably the range can be extended as desired.
def removeWordsWithASCII(s):
" ".join(filter(lambda x: not re.search(r'[\x20-\x7E]', x), s.split(' ')))

Categories

Resources