Python Telegram Bot Markdown symbol '[' or ']' - python

how to get in send_message symbol '[' and ']' if I use parse_mode = 'Markdown'? Now it replaces the characters with a space

In markdown, special characters can be escaped with a backslash
\[\]

The list of characters you must escape is ('_', '*', '[', ']', '(', ')', '~', '`', '>', '#', '+', '-', '=', '|', '{', '}', '.', '!')
To add a specific for Telegram escaping you can just use PlainText from telegram-text:
from telegram_text import PlainText
element = PlainText("[text to escape]")
escaped_text = element.to_markdown()
escaped_text
'\\[text to escape\\]'

Related

How to remove my punctuation array from original text

I have punctuation array like this
punctuation_data = [ '=' '+' '_' '-' ')' '(' '*' '&' '^' '%'
'SSSS' 'AAAA' 'wwww' '!' '~' '،']
and i have text to remove punctuation of this text, i use this but its not working
list = [''.join(c for c in original_data if c not in punctuation_data)
for s in list]
Edit: Original post did not delete longer substrings. I included a function that loops through the punctuation data and deletes the substrings.
You need to separate your list by comma. Also, don't use predefined names like list.
This will work:
punctuation_data = [ '=', '+', '_', '-', ')', '(', '*', '&', '^', '%',
'SSSS', 'AAAA', 'wwww', '!', '~', '،']
orig_string = ['3+5=8']
def delete_substrings(orig_sub_string, punctuation_data):
for element_to_delete in punctuation_data:
orig_sub_string = orig_sub_string.replace(element_to_delete, "")
return orig_sub_string
lst = [''.join(c for c in orig_sub_string if c not in punctuation_data) for orig_sub_string in orig_string]
print(lst) #['358']
Since you're trying match a number of strings of varying lengths, it's best to use regex instead. Escape the strings with re.escape first so that they don't get interpreted as special characters in regex:
import re
punctuation_data = [ '=', '+', '_', '-', ')', '(', '*', '&', '^', '%', 'SSSS', 'AAAA', 'wwww', '!', '~', '،']
print(re.sub('|'.join(map(re.escape, punctuation_data)), '', 'abc*xyzAAAA123'))
This outputs:
abcxyz123
this is worked for me
original_data = 'What is hello'
punctuation_data = [ '=' '+' '_' '-' ')' '(' '*' '&' '^'
'%'
'SSSS' 'AAAA' 'wwww' '!' '~' '،']
original_data = original_data.split()
resultwords = [word for word in original_data if
word.lower() not in punctuation_data]
result = ' '.join(resultwords)
print result

Write metacharacters to a list

I try to encapsulate regex metacharaters to a list
In [1]: mc = ['^', '$', '[', ']', '{', '}', '-', '?', '*', '+', '(', ')', '|', '\']
Enter and get errors
SyntaxError: EOL while scanning string literal
How to resolve the problem?
The problem is the backslash, which is an escape character. The correct representation of a single backslash would be '\\' or "\\".
While all the answers above seem to work, for readability it might be better to write
mc = list("^$[]{}-?*+()|\\")
This makes it much easier to see which characters are being used, reducing visual clutter at very little cost.
It should be:
mc = ['^', '$', '[', ']', '{', '}', '-', '?', '*', '+', '(', ')', '|', '\\']
You need to escape the final backslash \ with another one, as in the list above \\.
You need to escape the final backslash:
mc = ['^', '$', '[', ']', '{', '}', '-', '?', '*', '+', '(', ')', '|', '\\']
In your example, the backslash is escaping the last quote, so it's not valid python.
The backslash next to a " ' " is an escape sequence
In [1]: mc = ['^', '$', '[', ']', '{', '}', '-', '?', '*', '+', '(', ')', '|', '\\']

python remove weird apostrophe and other weird characters not in string.punctuation [duplicate]

This question already has answers here:
Remove punctuation from Unicode formatted strings
(4 answers)
Closed 6 years ago.
This is my string:
mystring = "How’s it going?"
This is what i did:
import string
exclude = set(string.punctuation)
def strip_punctuations(mystring):
for c in string.punctuation:
new_string=''.join(ch for ch in mystring if ch not in exclude)
new_string = chat_string.replace("\xe2\x80\x99","")
new_string = chat_string.replace("\xc2\xa0\xc2\xa0","")
return chat_string
OUTPUT:
If i did not include this line new_string = chat_string.replace("\xe2\x80\x99","") this will be the output:
'How\xe2\x80\x99s it going'
i realized
exclude does not have that weird looking apostrophe in the list:
print set(exclude)
set(['!', '#', '"', '%', '$', "'", '&', ')', '(', '+', '*', '-', ',', '/', '.', ';', ':', '=', '<', '?', '>', '#', '[', ']', '\\', '_', '^', '`', '{', '}', '|', '~'])
How do i ensure all such characters are taken out instead of manually replacing them in the future?
If you are working with long texts like news articles or web scraping, then you can either use "goose" or "NLTK" python libraries. These two are not pre-installed. Here are the links to the libraries. goose, NLTK
You can go through the document and learn how to do.
OR
if you don't want to use these libraries, you may want to create your own "exclude" list manually.
import re
toReplace = "how's it going?"
regex = re.compile('[!#%$\"&)\'(+*-/.;:=<?>#\[\]_^`\{\}|~"\\\\"]')
newVal = regex.sub('', toReplace)
print(newVal)
The regex matches all the characters you've set and it replaces them with empty whitespace.

Multiple symbols replace not working

I need to check a string for some symbols and replace them with a whitespace. My code:
string = 'so\bad'
symbols = ['•', '!', '"', '#', '$', '%', '&', '\'', '(', ')', '*', '+', ',', '-', '.', '/', ':', ';', '<', '>', '=', '?', '#', '[', ']', '\\', '^', '_', '`', '{', '}', '~', '|', '"', '⌐', '¬', '«', '»', '£', '$', '°', '§', '–', '—']
for symbol in symbols:
string = string.replace(symbol, ' ')
print string
>> sad
Why does it replace a\b with nothing?
This is because \b is ASCII backspace character:
>>> string = 'so\bad'
>>> print string
sad
You can find it and all the other escape characters from Python Reference Manual.
In order to get the behavior you expect escape the backslash character or use raw strings:
# Both result to 'so bad'
string = 'so\\bad'
string = r'so\bad'
The issue you are facing is the use of \ as a escape character.
\b is a special character (backspace)
Use a String literal with prefix r.
With the r, backslashes \ are treated as literal
string = r'so\bad'
You are not replacing anything "\b" is backspace, moving your cursor to the left one step.
Note that even if you omit the symbols list and your for symbol in symbols: code, you will always get the result "sad" when you print string. This is because \b means something as an ascii character, and is being interpreted together.
Check out this stackoverflow answer for a solution on how to work around this issue: How can I print out the string "\b" in Python

Repeatedly remove characters from string

>>> split=['((((a','b','+b']
>>> [ (w[1:] if w.startswith((' ','!', '#', '#', '$', '%', '^', '&', '*', "(", ")", '-', '_', '+', '=', '~', ':', "'", ';', ',', '.', '?', '|', '\\', '/', '<', '>', '{', '}', '[', ']', '"')) else w) for w in split]
['(((a','b','b']
I wanted ['a', 'b', 'b'] instead.
I want to create a repeat function to repeat the command. I make my split clear all the '(' from the start. Suppose my split is longer, I want to clear all ((( in front of the words. I don't use replace because it will change the '(' in between of words.
E.g. if the '(' is in the middle of a word like 'aa(aa', I don't want to change this.
There is no need to repeat your expression, you are not using the right tools, is all. You are looking for the str.lstrip() method:
[w.lstrip(' !##$%^&*()-_+=~:\';,.?|\\/<>{}[]"') for w in split]
The method treats the string argument as a set of characters and does exactly what you tried to do in your code; repeatedly remove the left-most character if it is part of that set.
There is a corresponding str.rstrip() for removing characters from the end, and str.strip() to remove them from both ends.
Demo:
>>> split=['((((a', 'b', '+b']
>>> [w.lstrip(' !##$%^&*()-_+=~:\';,.?|\\/<>{}[]"') for w in split]
['a', 'b', 'b']
If you really needed to repeat an expression, you could just create a new function for that task:
def strip_left(w):
while w.startswith((' ','!', '#', '#', '$', '%', '^', '&', '*', "(", ")", '-', '_', '+', '=', '~', ':', "'", ';', ',', '.', '?', '|', '\\', '/', '<', '>', '{', '}', '[', ']', '"')):
w = w[1:]
return w
[strip_left(w) for w in split]

Categories

Resources