Excluding ASCII characters

Excluding ASCII characters - python

I'm creating a palindrome checker and it works however I need to find a way to replace/remove punctuation from the given input. I'm trying to do for chr(i) i in range 32,47 then substitute those in with ''. The characters I need excluded are 32 - 47. I've tried using the String module but I can only get it to either exclude spaces or punctuation it can't be both for whatever reason.
I've already tried the string module but can't get that to remove spaces and punctuation at the same time.
def is_palindrome_stack(string):
s = ArrayStack()
for character in string:
s.push(character)
reversed_string = ''
while not s.is_empty():
reversed_string = reversed_string + s.pop()
if string == reversed_string:
return True
else:
return False
def remove_punctuation(text):
return text.replace(" ",'')
exclude = set(string.punctuation)
return ''.join(ch for ch in text if ch not in exclude)

That is because you are returning from your method in the very first line, in return text.replace(" ",''). Change it to text = text.replace(" ", "") and it should work fine.
Also, the indentation is probably messed up in your post, maybe during copy pasting.
Full method snippet:
def remove_punctuation(text):
text = text.replace(" ",'')
exclude = set(string.punctuation)
return ''.join(ch for ch in text if ch not in exclude)

You might use str methods to get rid of unwanted characters following way:
import string
tr = ''.maketrans('','',' '+string.punctuation)
def remove_punctuation(text):
return text.translate(tr)
txt = 'Point.Space Question?'
output = remove_punctuation(txt)
print(output)
Output:
PointSpaceQuestion
maketrans create replacement table, it accepts 3 str-s: first and second must be equal length, n-th character from first will be replaced with n-th character from second, third str is characters to remove. You need only to remove (not replace) characters so first two arguments are empty strs.

Related

Python return if statement

Unclear on how to frame the following function correctly:
Creating a function that will take in a string and return the string in camel case without spaces (or pascal case if the first letter was already capital), removing special characters
text = "This-is_my_test_string,to-capitalize"
def to_camel_case(text):
# Return 1st letter of text + all letters after
return text[:1] + text.title()[1:].replace(i" ") if not i.isdigit()
# Output should be "ThisIsMyTestStringToCapitalize"
the "if" statement at the end isn't working out, and I wrote this somewhat experimentally, but with a syntax fix, could the logic work?

Providing the input string does not contain any spaces then you could do this:
from re import sub
def to_camel_case(text, pascal=False):
r = sub(r'[^a-zA-Z0-9]', ' ', text).title().replace(' ', '')
return r if pascal else r[0].lower() + r[1:]
ts = 'This-is_my_test_string,to-capitalize'
print(to_camel_case(ts, pascal=True))
print(to_camel_case(ts))
Output:
ThisIsMyTestStringToCapitalize
thisIsMyTestStringToCapitalize

Here is a short solution using regex. First it uses title() as you did, then the regex finds non-alphanumeric-characters and removes them, and finally we take the first character to handle pascal / camel case.
import re
def to_camel_case(s):
s1 = re.sub('[^a-zA-Z0-9]+', '', s.title())
return s[0] + s1[1:]
text = "this-is2_my_test_string,to-capitalize"
print(to_camel_case(text)) # ThisIsMyTestStringToCapitalize

The below should work for your example.
Splitting apart your example by anything that isn's alphanumeric or a space. Then capitalizing each word. Finally, returning the re-joined string.
import re
def to_camel_case(text):
words = re.split(r'[^a-zA-Z0-9\s]', text)
return "".join([word.capitalize() for word in words])
text_to_camelcase = "This-is_my_test_string,to-capitalize"
print(to_camel_case(text_to_camelcase))

use the split function to split between anything that is not a letter or a whitespace and the function .capitalize() to capitalize single words
import re
text_to_camelcase = "This-is_my_test_string,to-capitalize"
def to_camel_case(text):
split_text = re.split(r'[^a-zA-Z0-9\s]', text)
cap_string = ''
for word in split_text:
cap_word = word.capitalize()
cap_string += cap_word
return cap_string
print(to_camel_case(text_to_camelcase))

Swap last two characters in a string, make it lowercase, and add a space

I'm trying to take the last two letters of a string, swap them, make them lowercase, and leave a space in the middle. For some reason the output gives me white space before the word.
For example if input was APPLE then the out put should be e l
It would be nice to also be nice to ignore non string characters so if the word was App3e then the output would be e p
def last_Letters(word):
last_two = word[-2:]
swap = last_two[-1:] + last_two[:1]
for i in swap:
if i.isupper():
swap = swap.lower()
return swap[0]+ " " +swap[1]
word = input(" ")
print(last_Letters(word))

You can try with the following function:
import re
def last_Letters(word):
letters = re.sub(r'\d', '', word)
if len(letters) > 1:
return letters[-1].lower() + ' ' + letters[-2].lower()
return None
It follows these steps:
removes all the digits
if there are at least two characters:
lowers every character
builds the required string by concatenation of the nth letter, a space and the nth-1 letter
and returns the string
returns "None"

Since I said there was a simpler way, here's what I would write:
text = input()
result = ' '.join(reversed([ch.lower() for ch in text if ch.isalpha()][-2:]))
print(result)
How this works:
[ch.lower() for ch in text] creates a list of lowercase characters from some iterable text
adding if ch.isalpha() filters out anything that isn't an alphabetical character
adding [-2:] selects the last two from the preceding sequence
and reversed() takes the sequence and returns an iterable with the elements in reverse
' '.join(some_iterable) will join the characters in the iterable together with spaces in between.
So, result is set to be the last two characters of all of the alphabetical characters in text, in reverse order, separated by a space.
Part of what makes Python so powerful and popular, is that once you learn to read the syntax, the code very naturally tells you exactly what it is doing. If you read out the statement, it is self-describing.

Print a string without any other characters except letters, and replace the space with an underscore

I need to print a string, using this rules:
The first letter should be capital and make all other letters are lowercase. Only the characters a-z A-Z are allowed in the name, any other letters have to be deleted(spaces and tabs are not allowed and use underscores are used instead) and string could not be longer then 80 characters.
It seems to me that it is possible to do it somehow like this:
name = "hello2 sjsjs- skskskSkD"
string = name[0].upper() + name[1:].lower()
lenght = len(string) - 1
answer = ""
for letter in string:
x = letter.isalpha()
if x == False:
answer = string.replace(letter,"")
........
return answer
I think it's better to use a for loop or isalpha () here, but I can't think of a better way to do it. Can someone tell me how to do this?

For one-to-one and one-to-None mappings of characters, you can use the .translate() method of strings. The string module provides lists (strings) of the various types of characters including one for all letters in upper and lowercase (string.ascii_letters) but you could also use your own constant string such as 'abcdef....xyzABC...XYZ'.
import string
def cleanLetters(S):
nonLetters = S.translate(str.maketrans('','',' '+string.ascii_letters))
return S.translate(str.maketrans(' ','_',nonLetters))
Output:
cleanLetters("hello2 sjsjs- skskskSkD")
'hello_sjsjs_skskskSkD'

One method to accomplish this is to use regular expressions (regex) via the built-in re library. This enables the capturing of only the valid characters, and ignoring the rest.
Then, using basic string tools for the replacement and capitalisation, then a slice at the end.
For example:
import re
name = 'hello2 sjsjs- skskskSkD'
trans = str.maketrans({' ': '_', '\t': '_'})
''.join(re.findall('[a-zA-Z\s\t]', name)).translate(trans).capitalize()[:80]
>>> 'Hello_sjsjs_skskskskd'

Strings are immutable, so every time you do string.replace() it needs to iterate over the entire string to find characters to replace, and a new string is created. Instead of doing this, you could simply iterate over the current string and create a new list of characters that are valid. When you're done iterating over the string, use str.join() to join them all.
answer_l = []
for letter in string:
if letter == " " or letter == "\t":
answer_l.append("_") # Replace spaces or tabs with _
elif letter.isalpha():
answer_l.append(letter) # Use alphabet characters as-is
# else do nothing
answer = "".join(answer_l)
With string = 'hello2 sjsjs- skskskSkD', we have answer = 'hello_sjsjs_skskskSkD';
Now you could also write this using a generator expression instead of creating the entire list and then joining it. First, we define a function that returns the letter or "_" for our first two conditions, and an empty string for the else condition
def translate(letter):
if letter == " " or letter == "\t":
return "_"
elif letter.isalpha():
return letter
else:
return ""
Then,
answer = "".join(
translate(letter) for letter in string
)
To enforce the 80-character limit, just take answer[:80]. Because of the way slices work in python, this won't throw an error even when the length of answer is less than 80.

Cut a substring from the end until the first occurrence of a certain character

I have a string lets say something like below:
abc$defg..hij/klmn
How can I get substring which is cut out from last character until we encounter the $ sign. Note $ could be a special character and there could other special characters in the string.
The output should be:
defg..hij/klmn
I a using python 2.7 and above.

That is an alternate method. It checks each character from the end until a special character is met.
text = "abc$defg..hij/klmn"
newstring = text[::-1]
output = ""
for character in newstring:
if character != "$":
output += character
else:
break
print(output[::-1])

You could use the split function:
your_string = "abc$defg..hij/klmn"
split_char = "$"
substring = your_string.split(split_char)[-1]

You'll need to first get the occurrence of that first character and then slice from that index plus 1:
testStr = "abc$defg..hij/klmn"
try:
index = testStr.index()
start = index + 1
print(str[start:])
except:
print("Not in string")
Note: This will return a single string from after the first & to the end. If you want multiple strings enclosed within $, the accepted answer works well.

Stripping Hex code from a plain text file in Python [duplicate]

I have a string. How do I remove all text after a certain character? (In this case ...)
The text after will ... change so I that's why I want to remove all characters after a certain one.

Split on your separator at most once, and take the first piece:
sep = '...'
stripped = text.split(sep, 1)[0]
You didn't say what should happen if the separator isn't present. Both this and Alex's solution will return the entire string in that case.

Assuming your separator is '...', but it can be any string.
text = 'some string... this part will be removed.'
head, sep, tail = text.partition('...')
>>> print head
some string
If the separator is not found, head will contain all of the original string.
The partition function was added in Python 2.5.
S.partition(sep) -> (head, sep, tail)
Searches for the separator sep in S, and returns the part before it,
the separator itself, and the part after it. If the separator is not
found, returns S and two empty strings.

If you want to remove everything after the last occurrence of separator in a string I find this works well:
<separator>.join(string_to_split.split(<separator>)[:-1])
For example, if string_to_split is a path like root/location/child/too_far.exe and you only want the folder path, you can split by "/".join(string_to_split.split("/")[:-1]) and you'll get
root/location/child

Without a regular expression (which I assume is what you want):
def remafterellipsis(text):
where_ellipsis = text.find('...')
if where_ellipsis == -1:
return text
return text[:where_ellipsis + 3]
or, with a regular expression:
import re
def remwithre(text, there=re.compile(re.escape('...')+'.*')):
return there.sub('', text)

import re
test = "This is a test...we should not be able to see this"
res = re.sub(r'\.\.\..*',"",test)
print(res)
Output: "This is a test"

The method find will return the character position in a string. Then, if you want remove every thing from the character, do this:
mystring = "123⋯567"
mystring[ 0 : mystring.index("⋯")]
>> '123'
If you want to keep the character, add 1 to the character position.

From a file:
import re
sep = '...'
with open("requirements.txt") as file_in:
lines = []
for line in file_in:
res = line.split(sep, 1)[0]
print(res)

This is in python 3.7 working to me
In my case I need to remove after dot in my string variable fees
fees = 45.05
split_string = fees.split(".", 1)
substring = split_string[0]
print(substring)

Yet another way to remove all characters after the last occurrence of a character in a string (assume that you want to remove all characters after the final '/').
path = 'I/only/want/the/containing/directory/not/the/file.txt'
while path[-1] != '/':
path = path[:-1]

another easy way using re will be
import re, clr
text = 'some string... this part will be removed.'
text= re.search(r'(\A.*)\.\.\..+',url,re.DOTALL|re.IGNORECASE).group(1)
// text = some string

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Excluding ASCII characters - python

Related

Python return if statement

Swap last two characters in a string, make it lowercase, and add a space

Print a string without any other characters except letters, and replace the space with an underscore

Cut a substring from the end until the first occurrence of a certain character

Stripping Hex code from a plain text file in Python [duplicate]

Categories

Resources