Repeatedly remove characters from string - python

>>> split=['((((a','b','+b']
>>> [ (w[1:] if w.startswith((' ','!', '#', '#', '$', '%', '^', '&', '*', "(", ")", '-', '_', '+', '=', '~', ':', "'", ';', ',', '.', '?', '|', '\\', '/', '<', '>', '{', '}', '[', ']', '"')) else w) for w in split]
['(((a','b','b']
I wanted ['a', 'b', 'b'] instead.
I want to create a repeat function to repeat the command. I make my split clear all the '(' from the start. Suppose my split is longer, I want to clear all ((( in front of the words. I don't use replace because it will change the '(' in between of words.
E.g. if the '(' is in the middle of a word like 'aa(aa', I don't want to change this.

There is no need to repeat your expression, you are not using the right tools, is all. You are looking for the str.lstrip() method:
[w.lstrip(' !##$%^&*()-_+=~:\';,.?|\\/<>{}[]"') for w in split]
The method treats the string argument as a set of characters and does exactly what you tried to do in your code; repeatedly remove the left-most character if it is part of that set.
There is a corresponding str.rstrip() for removing characters from the end, and str.strip() to remove them from both ends.
Demo:
>>> split=['((((a', 'b', '+b']
>>> [w.lstrip(' !##$%^&*()-_+=~:\';,.?|\\/<>{}[]"') for w in split]
['a', 'b', 'b']
If you really needed to repeat an expression, you could just create a new function for that task:
def strip_left(w):
while w.startswith((' ','!', '#', '#', '$', '%', '^', '&', '*', "(", ")", '-', '_', '+', '=', '~', ':', "'", ';', ',', '.', '?', '|', '\\', '/', '<', '>', '{', '}', '[', ']', '"')):
w = w[1:]
return w
[strip_left(w) for w in split]

Related

How to split strings with multiple delimiters while keep the delimiters | python

For example, I have a string section 213(d)-456(c)
How can I split it to get a list of strings:
['section', '213', '(', 'd', ')', '-', '456', '(', 'c', ')'].
Thank you!
You can do so using Regex.
import re
text = "section 213(d)-456(c)"
output = re.split("(\W)", text)
Output: ['section', ' ', '213', '(', 'd', ')', '', '-', '456', '(', 'c', ')', '']
Here \W is for non-word character!
You can come close with
re.split(r'([-\s()])', 'section 213(d)-456(c)')
When the delimiter contains a capture group, the result includes the captured text.
However, this will also include the space delimiters in the result:
['section', ' ', '213', '(', 'd', ')', '', '-', '456', '(', 'c', ')', '']
You can easily remove these afterward.

remove all the special chars from a list [duplicate]

This question already has answers here:
Removing punctuation from a list in python
(2 answers)
Closed last year.
i have a list of strings with some strings being the special characters what would be the approach to exclude them in the resultant list
list = ['ben','kenny',',','=','Sean',100,'tag242']
expected output = ['ben','kenny','Sean',100,'tag242']
please guide me with the approach to achieve the same. Thanks
The string module has a list of punctuation marks that you can use and exclude from your list of words:
import string
punctuations = list(string.punctuation)
input_list = ['ben','kenny',',','=','Sean',100,'tag242']
output = [x for x in input_list if x not in punctuations]
print(output)
Output:
['ben', 'kenny', 'Sean', 100, 'tag242']
This list of punctuation marks includes the following characters:
['!', '"', '#', '$', '%', '&', "'", '(', ')', '*', '+', ',', '-', '.', '/', ':', ';', '<', '=', '>', '?', '#', '[', '\\', ']', '^', '_', '`', '{', '|', '}', '~']
It can simply be done using the isalnum() string function. isalnum() returns true if the string contains only digits or letters, if a string contains any special character other than that, the function will return false. (no modules needed to be imported for isalnum() it is a default function)
code:
list = ['ben','kenny',',','=','Sean',100,'tag242']
olist = []
for a in list:
if str(a).isalnum():
olist.append(a)
print(olist)
output:
['ben', 'kenny', 'Sean', 100, 'tag242']
my_list = ['ben', 'kenny', ',' ,'=' ,'Sean', 100, 'tag242']
stop_words = [',', '=']
filtered_output = [i for i in my_list if i not in stop_words]
The list with stop words can be expanded if you need to remove other characters.

Write metacharacters to a list

I try to encapsulate regex metacharaters to a list
In [1]: mc = ['^', '$', '[', ']', '{', '}', '-', '?', '*', '+', '(', ')', '|', '\']
Enter and get errors
SyntaxError: EOL while scanning string literal
How to resolve the problem?
The problem is the backslash, which is an escape character. The correct representation of a single backslash would be '\\' or "\\".
While all the answers above seem to work, for readability it might be better to write
mc = list("^$[]{}-?*+()|\\")
This makes it much easier to see which characters are being used, reducing visual clutter at very little cost.
It should be:
mc = ['^', '$', '[', ']', '{', '}', '-', '?', '*', '+', '(', ')', '|', '\\']
You need to escape the final backslash \ with another one, as in the list above \\.
You need to escape the final backslash:
mc = ['^', '$', '[', ']', '{', '}', '-', '?', '*', '+', '(', ')', '|', '\\']
In your example, the backslash is escaping the last quote, so it's not valid python.
The backslash next to a " ' " is an escape sequence
In [1]: mc = ['^', '$', '[', ']', '{', '}', '-', '?', '*', '+', '(', ')', '|', '\\']

Python Telegram Bot Markdown symbol '[' or ']'

how to get in send_message symbol '[' and ']' if I use parse_mode = 'Markdown'? Now it replaces the characters with a space
In markdown, special characters can be escaped with a backslash
\[\]
The list of characters you must escape is ('_', '*', '[', ']', '(', ')', '~', '`', '>', '#', '+', '-', '=', '|', '{', '}', '.', '!')
To add a specific for Telegram escaping you can just use PlainText from telegram-text:
from telegram_text import PlainText
element = PlainText("[text to escape]")
escaped_text = element.to_markdown()
escaped_text
'\\[text to escape\\]'

Multiple symbols replace not working

I need to check a string for some symbols and replace them with a whitespace. My code:
string = 'so\bad'
symbols = ['•', '!', '"', '#', '$', '%', '&', '\'', '(', ')', '*', '+', ',', '-', '.', '/', ':', ';', '<', '>', '=', '?', '#', '[', ']', '\\', '^', '_', '`', '{', '}', '~', '|', '"', '⌐', '¬', '«', '»', '£', '$', '°', '§', '–', '—']
for symbol in symbols:
string = string.replace(symbol, ' ')
print string
>> sad
Why does it replace a\b with nothing?
This is because \b is ASCII backspace character:
>>> string = 'so\bad'
>>> print string
sad
You can find it and all the other escape characters from Python Reference Manual.
In order to get the behavior you expect escape the backslash character or use raw strings:
# Both result to 'so bad'
string = 'so\\bad'
string = r'so\bad'
The issue you are facing is the use of \ as a escape character.
\b is a special character (backspace)
Use a String literal with prefix r.
With the r, backslashes \ are treated as literal
string = r'so\bad'
You are not replacing anything "\b" is backspace, moving your cursor to the left one step.
Note that even if you omit the symbols list and your for symbol in symbols: code, you will always get the result "sad" when you print string. This is because \b means something as an ascii character, and is being interpreted together.
Check out this stackoverflow answer for a solution on how to work around this issue: How can I print out the string "\b" in Python

Categories

Resources