How to get formatted text with entities?

How to get formatted text with entities? - python

I need some help with telethon:
I have text from telethon message. Example: 'some text message'
And i have an entity. Example: {"_": "MessageEntityBold", "offset": 5, "length": 4}
I need some method or tip to get formated text like this: '''some <b>text</b> message'''

The Message.text returns the text formatted using the current parse mode of the client. By default, this is Telegram's markdown, which means you would get some **text** message with the following code:
print(message.text)
Please note that since it currently relies on the client.parse_mode, you cannot use the .text property for messages returned by raw API since the results are not modified there. Instead, the message must be fetched with a friendly method or through events.

Related

How to send a direct message with Twython?

I know this is a beginner question, but can someone please provide some sample code of sending a Twitter direct message (just text) with Twython? I can't seems to find a lot of specific documentation over this (I know it's briefly covered in the official docs but they aren't super clear to me). Thank you!

Solution
twitter.send_direct_message(event = {"type": "message_create",
"message_create":{"target": {"recipient_id": ID goes here},
"message_data":
{"text": "Hello World!"}}})
Explanation
In short, you take the raw JSON data that you would send as a POST request to Twitter, and use it as a parameter in the twitter.send_direct_message() function. When using the JSON as a parameter in Python, we must interpret it as a dictionary. This can be done by setting the parent object as the dictionary key, and what follows as the dictionary value. So, in my case the JSON:
{"event" : {"type": "message_create",
"message_create":{"target": {"recipient_id": ID goes here},
"message_data":
{"text": "Hello World!"}}}}
becomes:
event = {"type": "message_create",
"message_create":{"target": {"recipient_id": ID goes here},
"message_data":
{"text": "Hello World!"}}}
More information about what JSON data to send to Twitter for specific direct message requests can be found here.

How should I use parse_mode='HTML' in telegram python bot?

I'm trying to send a message in a channel with a bot, using Telegram API's send_photo() method. It takes a caption parameter (type String) but I can't format it through parse_mode='HTML' parameter...
If I use something like this:
send_photo(chat_id, photo, caption="<b>Some text</b>", parse_mode='HTML')
it sends the message but without any kind of formatting. Does anybody know why? Thanks

First, you need to import ParseMode from telegram like this:
from telegram import ParseMode
Then, all you need is to specify parse_mode=ParseMode.HTML. Here's a working example:
def jordan(bot, update):
chat_id = update.message.chat.id
with open('JordanPeterson.jpg', 'rb') as jordan_picture:
caption = "<a href='https://twitter.com/jordanbpeterson'>Jordan B. Peterson</a>"
bot.send_photo(
chat_id,
photo=jordan_picture,
caption=caption,
parse_mode=ParseMode.HTML
)
And we can see that it works:
Update: Actually, both parse_mode='html' (as suggested by #slackmart) and parse_mode='HTML' that you used yourself work for me!
Another Update (as per your comment): You can use multiple tags. Here's an example of one, with hyperlink, bold, and italic:
Yet Another Update: Regarding your comment:
...do I have any limitations on HTML tags? I can't use something like <img> or <br> to draw a line
Honestly,
That's what I did!
Now you're trying to format the caption of an image, using HTML, meaning you're formatting a text, so obviously, you can't use "something like <img>." It has to be a "text formatting tag" (plus <a>). And not even all of them! I believe you can only use these: <a>, <b>, <strong>, <i> and <em>.
If you try to use a text-formatting tag like <del>, it will give you this error:
Can't parse entities: unsupported start tag "del" at byte offset 148
Which is a shame! I'd love to be able to do something like this in captions of images.or something like this!

It works for me! Here's the code I'm using:
>>> from telegram import Bot
>>> tkn = '88888:199939393'; chid = '-31828'
>>> bot = Bot(tkn)
>>> with open('ye.jpeg', 'rb') as fme:
... bot.send_photo(chid, fme, caption='<b>Hallo</b>', parse_mode='html')
...
<telegram.message.Message object at 0x7f6301b44d10>
Of course, you must use your own telegram token and channel id. Also notice I'm using parse_mode='html' # lowercase

Parsing an email message body

I'm using the gmail API to parse through my gmail message body. It works other than when the body is in an html. Does anyone know how I can just extract the text within the email? If not, how I can just ignore emails with html?
Eventually I want to implement this for personal/professional emails in which there likely won't be html in it.
def message_converter(message_id):
message = service.users().messages().get(userId='me', id=message_id,format='raw').execute()
msg_str = str(base64.urlsafe_b64decode(message['raw'].encode('ASCII')),'UTF-8')
mime_msg = email.message_from_string(msg_str)
if mime_msg.is_multipart():
for payload in mime_msg.get_payload():
# if payload.is_multipart(): ...
print (payload.get_payload())
else:
print (mime_msg.get_payload())

html2text does a pretty good job - it converts HTML into ASCII text.
You may need to do additional parsing/formatting after the fact, however.

i dont know if this can help you but Gmail Api have the same syntax so in C# you can find the text message in 3 places (it depends on the mail server) so :
msg.Payload.Parts[1].Body.Data; // here you can find text message without HTML tag
msg.Payload.Parts[0].Body.Data; // here you can find text message with HTML tag
msg.Payload.Body.Data; // and here you dont have a choice you have the HTMl tag

This answer may help you do what you are heading to. I understand that you wanna get certain texts out of the body of the email. You may use regular expressions to do that. I made a video explaining how to get data out of Gmail email body but using Google App Script (JavaScript):
https://youtu.be/nI1OH3pAz6s?t=8
You download the code from GitHub link:
https://gist.github.com/MoayadAbuRmilah/5835369fdebbecf980029f7339e4d769

Why is Boolean values in request.POST is Unicode format?

(1)I designed a API application. Some arguments in API expected it will receive Boolean datas.
Example:
def hello(request):
# request.POST.viewitems()
# {u'is_logined': u'False', u'user': u'hello'}
user_name = request.POST.get("user", "") # "hello"
is_logined = request.POST.get("is_logined", "") # "False"
This is my sending:
url = "http://127.0.0.1:8000/test"
aaa= {"user": "hello",
"is_logined": False}
res = requests.post(url, data=aaa)
I suppose I get the argument is a boolean data but it's Unicode format.
Anyone know why it is Unicode format.
(2)I have another question. If java program will access my API, I know boolean in java is false and true.
When my API receive the boolean data, is it still false and true of Unicode string?

When you use the POST method, in which the browser bundles up the form data, encodes it for transmission, sends it back to the server, and then receives its response.
I guess you use form to send data. The data type of sending provides Text and File formats. So, this is why you receive is Text. If you know a app it's name postman, you can try it.

Not an straightforward answer to you question, but this is one of the reasons why you should use forms to process posted data. Through forms api you will have the "right" python object type.
Other reasons are mainly security.

extract json data from post body request with python

Is there a way to easily extract the json data portion in the body of a POST request?
For example, if someone posts to www.example.com/post with the body of the form with json data, my GAE server will receive the request by calling:
jsonstr = self.request.body
However, when I look at the jsonstr, I get something like :
str: \r\n----------------------------8cf1c255b3bd7f2\r\nContent-Disposition: form-data;
name="Actigraphy"\r\n Content-Type: application/octet-
stream\r\n\r\n{"Data":"AfgCIwHGAkAB4wFYAZkBKgHwAebQBaAD.....
I just want to be able to call a function to extract the json part of the body which starts at the {"Data":...... section.
Is there an easy function I can call to do this?

there is a misunderstanding, the string you show us is not json data, it looks like a POST body. You have to parse the body with something like cgi.parse_multipart.
Then you could parse json like answered by aschmid00. But instead of the body, you parse only the data.
Here you can find a working code that shows how to use cgi.FieldStorage for parsing the POST body.
This Question is also answered here..

It depends on how it was encoded on the browser side before submitting, but normally you would get the POST data like this:
jsonstr = self.request.POST["Data"]
If that's not working you might want to give us some info on how "Data" was encoded into the POST data on the client side.

you can try:
import json
values = 'random stuff .... \r\n {"data":{"values":[1,2,3]}} more rnandom things'
json_value = json.loads(values[values.index('{'):values.rindex('}') + 1])
print json_value['data'] # {u'values': [1, 2, 3]}
print json_value['data']['values'] # [1, 2, 3]
but this is dangerous and takes a fair amount of assumptions, Im not sure which framework you are using, bottle, flask, theres many, please use the appropriate call to POST
to retrieve the values, based on the framework, if indeed you are using one.
I think you mean to do this self.request.get("Data") If you are using the GAE by itself.
https://developers.google.com/appengine/docs/python/tools/webapp/requestclass#Request_get
https://developers.google.com/appengine/docs/python/tools/webapp/requestclass#Request_get_all

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to get formatted text with entities? - python

I need some help with telethon: I have text from telethon message. Example: 'some text message' And i have an entity. Example: {"_": "MessageEntityBold", "offset": 5, "length": 4} I need some method or tip to get formated text like this: '''some <b>text</b> message'''

Related

How to send a direct message with Twython?

How should I use parse_mode='HTML' in telegram python bot?

Parsing an email message body

Why is Boolean values in request.POST is Unicode format?

extract json data from post body request with python

Categories

Resources