simplejson dumps and multi lines - python

I have a little question.
I use simplejson to dumps a string.
This string contains some new line characters ( \n ),
so when I print it on the server side, I get something like that :
toto
tata
titi
And I want that it displays the same way on the client side (html).
So I did simply :
return json.dumps(data.replace('\n','<br />'))
And it works, but I don't think it's the good way to do it.
Is here another method ?
Thanks.

I don't know the specifics of your situation, so maybe this is fine, but in general I'd recommend that you replace \n in the client, not on the server side. If someone wants to use your JSON API for non-HTML client, having <br> will be pretty annoying, and they'll just have to parse that back out. The server should convey the actual data, and the client should be responsible for turning that into information relevant to their user, including changing the formatting or markup if necessary.

Related

Getting additional input from SlackClient using python

One of the issues is I’m running into is getting additional data from a command in slack. I️ don’t want to use Slash commands because I️ can’t expose my localhost to the world.
Example:
#mybot do
Will return, let’s say “I’m doing something”. However I️ want to be able to do something like
#mybot do 2
Where 2 is a parameter in the back end. Basically what I’m trying to do is have it where the user can say #mybot do 2 and it will get data from the database where the ID is 2. You could make it a 3, 4, 5 etc and the command will pull the information from the database. I️ have found where I️ can make it match the exact “do” command, though I️ can’t get it to read the follow on data. I️ was following this tutorial. Any help will be great.
if you see weird symbols it’s because I’m doing this from an iPhone and I️ have that stupid bug where the turns I️ (eye) into I️
You need to use regular expressions to extract arguments from text. I hope this will help.
import re
def handle_command(command, channel):
response = "Not sure what you mean. Use the *" + EXAMPLE_COMMAND + \
"* command with numbers, delimited by spaces."
match = re.match("do (?P<arg>\S+)", command)
if match:
arg = match.groupdict('arg')
response = "Wow! My argument is: " + arg
slack_client.api_call("chat.postMessage", channel=channel,
text=response, as_user=True)
How to get "additional information"
You will get the complete input string in the text property, e.g. "do 2". All you need to do is split the string into words. I am not a Python developer, but apparently split() will do the job.
Exposing your localhost
I would strongly recommend to go ahead and expose your localhost with a VPN tunnel. It makes development so much easier. You can use ngrok to securely expose your localhost to Slack.
"Dont want to use slash commands"
You will always need an app (e.g. Python script) on an exposed host for any custom functionality to work with Slack. Actually, slash commands are easier to implement then Event API and RTM, so I would recommend it for your case.

Py3 imaplib: get only immediate body (no reply) of email [duplicate]

There are two pre-existing questions on the site.
One for Python, one for Java.
Java How to remove the quoted text from an email and only show the new text
Python Reliable way to only get the email text, excluding previous emails
I want to be able to do pretty much exactly the same (in PHP). I've created a mail proxy, where two people can have a correspondance together by emailing a unique email address.
The problem I am finding however, is that when a person receives the email and hits reply, I am struggling to accurately capture the text that he has written and discard the quoted text from previous correspondance.
I'm trying to find a solution that will work for both HTML emails and Plaintext email, because I am sending both.
I also have the ability if it helps to insert some <*****RESPOND ABOVE HERE*******> tag if neccessary in the emails meaning that I can discard everything below.
What would you recommend I do? Always add that tag to the HTML copy and the plaintext copy then grab everything above it?
I would still then be left with the scenario of knowing how each mail client creates the response. Because for example Gmail would do this:
On Wed, Nov 2, 2011 at 10:34 AM, Message Platform <35227817-7cfa-46af-a190-390fa8d64a23#dev.example.com> wrote:
## In replies all text above this line is added to your message conversation ##
Any suggestions or recommendations of best practices?
Or should I just grab the 50 most popular mail clients, and start creating custom Regex for each. Then for each of these clients, also a bizallion different locale settings since I'm guessing the locale of the user will also influence what is added.
Or should I just remove the preceding line always if it contains a date?.. etc
Unfortunately, you're in for a world of hurt if you want to try to clean up emails meticulously (removing everything that's not part of the actual reply email itself). The ideal way would be to, as you suggest, write up regex for each popular email client/service, but that's a pretty ridiculous amount of work, and I recommend being lazy and dumb about it.
Interestingly enough, even Facebook engineers have trouble with this problem, and Google has a patent on a method for "Detecting quoted text".
There are three solutions you might find acceptable:
Leave It Alone
The first solution is to just leave everything in the message. Most email clients do this, and nobody seems to complain. Of course, online message systems (like Facebook's 'Messages') look pretty odd if they have inception-style replies. One sneaky way to make this work okay is to render the message with any quoted lines collapsed, and include a little link to 'expand quoted text'.
Separate the Reply from the Older Message
The second solution, as you mention, is to put a delineating message at the top of your messages, like --------- please reply above this line ----------, and then strip that line and anything below when processing the replies. Many systems do this, and it's not the worst thing in the world... but it does make your email look more 'automated' and less personal (in my opinion).
Strip Out Quoted Text
The last solution is to simply strip out any new line beginning with a >, which is, presumably, a quoted line from the reply email. Most email clients use this method of indicating quoted text. Here's some regex (in PHP) that would do just that:
$clean_text = preg_replace('/(^\w.+:\n)?(^>.*(\n|$))+/mi', '', $message_body);
There are some problems using this simpler method:
Many email clients also allow people to quote earlier emails, and preface those quote lines with > as well, so you'll be stripping out quotes.
Usually, there's a line above the quoted email with something like On [date], [person] said. This line is hard to remove, because it's not formatted the same among different email clients, and it may be one or two lines above the quoted text you removed. I've implemented this detection method, with moderate success, in my PHP Imap library.
Of course, testing is key, and the tradeoffs might be worth it for your particular system. YMMV.
There are many libraries out there that can help you extract the reply/signature from a message:
Ruby: https://github.com/github/email_reply_parser
Python: https://github.com/zapier/email-reply-parser or https://github.com/mailgun/talon
JavaScript: https://github.com/turt2live/node-email-reply-parser
Java: https://github.com/Driftt/EmailReplyParser
PHP: https://github.com/willdurand/EmailReplyParser
I've also read that Mailgun has a service to parse inbound email and POST its content to a URL of your choice. It will automatically strip quoted text from your emails: https://www.mailgun.com/blog/handle-incoming-emails-like-a-pro-mailgun-api-2-0/
Hope this helps!
Possibly helpful: quotequail is a Python library that helps identify quoted text in emails
Afaik, (standard) emails should quote the whole text by adding a ">" in front of every line. Which you could strip by using strstr(). Otherwise, did you trie to port that Java example to php? It's nothing else than Regex.
Even pages like Github and Facebook do have this problem.
Just an idea: You have the text which was originally sent, so you can look for it and remove it and additional surrounding noise from the reply. It is not trivial, because additional line breaks, HTML elements, ">" characters are added by the mail client application.
The regex is definitely better if it works, because it is simple and it perfectly cuts the original text, but if you find that it frequently does not work then this can be a fallback method.
I agree that quoted text or reply is just a TEXT. So there's no accurate way to fetch it. Anyway you can use regexp replace like this.
$filteringMessage = preg_replace('/.*\n\n((^>+\s{1}.*$)+\n?)+/mi', '', $message);
Test
https://regex101.com/r/xO8nI1/2

GAE unicode character gets encoded to utf-8 bytes

In my app I'm accepting text from user inputs where users often paste text from microsoft word.
A good example being the apostrophe ’, which for some reason gets converted to =E2=80=99 when posting to my handler in google app engine. I've tried a number of confused ways to prevent this and I'm quite happy to simple remove these characters, some of these methods work in plain python but not in app engine.
here's some of what I've tried:
problem_string = re.sub(r'[^\x00-\x7F]+','', problem_string)# trying to remove it
problem_string = problem_string.encode( "utf-8" )# desperation...
problem_string = "".join((c if ord(c) < 128 else '' for c in problem_string))# trying to just remove the thing
problem_string = unicode(problem_string, "utf8")# probably fails since its already unicode
... where I'm trying to capture the string including ’ and then later save it to the ndb datastore as a StringProperty(). Except for the last option, the apsotrophe example gets converted to =E2=80=99.
If I could save the apostrophe type character and display it again that would be great, but simply removing it would also serve my needs.
*Edit - the following:
experience = re.sub(r'[^\x00-\x7F]+',' ', experience)
seems to work fine on the dev server, and successfully removes the offending apostrophe.
Also what may be an issue is that the POST fields are going through the blobstore, so: blobstore_handlers.BlobstoreUploadHandler, which I think may being causing some problems.
I've really been bumping my head against this and I would really really appreciate an explanation from some clever stack-overflower...
Ok, I think I've vaguely stumbled upon a solution.
It had something to do with the blobstore upload handler, I guess it was encoding/decoding unicode appropriately to account for weird file characters. So I modified the handler so that the image file is uploaded via google cloud storage instead of the blobstore and it seems to work fine, i.e. the ’ gets to the datastore as ’ instead of =E2=80=99
I won't accept my own answer for the next few days, maybe someone can clarify things better for future confused individuals.

How to get unparsed XML from a suds response, and best django model field to use for storage

I am using suds to request data from a 3rd party using a wsdl. I am only saving some of the data returned for now, but I am paying for the data that I get so I would like to keep all of it. I have decided that the best way to save this data is by capturing the raw xml response into a database field both for future use should I decide that I want to start using different parts of the data and as a paper trail in the event of discrepancies.
So I have a two part question:
Is there a simple way to output the raw received xml from the suds.client object? In my searches for the answer to this I have learned this can be done through logging, but I was hoping to not have to dig that information back out of the logs to put into the database field. I have also looked into the MessagePlugin.recieved() hook, but could not really figure out how to access this information after it has been parsed, only that I can override that function and have access to the raw xml as it is being parsed (which is before I have decided whether or not it is actually worth saving yet or not). I have also explored the retxml option but I would like to use the parsed version as well and making two separate calls, one as retxml and the other parsed will cost me twice. I was hoping for a simple function built into the suds client (like response.as_xml() or something equally simple) but have not found anything like that yet. The option bubbling around in my head might be to extend the client object using the .received() plugin hook that saves the xml as an object parameter before it is parsed, to be referenced later... but the execution of such seems a little tricky to me right now, and I have a hard time believing that the suds client doesn't just have this built in somewhere already, so I thought I would ask first.
The other part to my question is: What type of django model field would be best suited to handle up to ~100 kb of text data as raw xml? I was going to simply use a simple CharField with a stupidly long max_length, but that feels wrong.
Thanks in advance.
I solved this by using the flag retxml on client initialization:
client = Client(settings.WSDL_ADDRESS, retxml=True)
raw_reply = client.service.PersonSearch(soapified_search_object)
I was then able to save raw_reply as the raw xml into a django models.TextField()
and then inject the raw xml to get a suds parsed result without having to re-submit my search lika so:
parsed_result = client.service.PersonSearch(__inject={'reply': raw_reply})
I suppose if I had wanted to strip off the suds envelope stuff from raw reply I could have used a python xml library for further usage of the reply, but as my existing code was already taking the information I wanted from the suds client result I just used that.
Hope this helps someone else.
I have used kyrayzk solution for a while, but have always found it a bit hackish, as I had to create a separate dummy client just for when I needed to process the raw XML.
So I sort of reimplemented .last_received() and .last_sent() methods (which were (IMHO, mistakenly) removed in suds-jurko 0.4.1) through a MessagePlugin.
Hope it helps someone:
class MyPlugin(MessagePlugin):
def __init__(self):
self.last_sent_raw = None
self.last_received_raw = None
def sending(self, context):
self.last_sent_raw = str(context.envelope)
def received(self, context):
self.last_received_raw = str(context.reply)
Usage:
plugin = MyPlugin()
client = Client(TRTH_WSDL_URL, plugins=[plugin])
client.service.SendSomeRequest()
print plugin.last_sent_raw
print plugin.last_received_raw
And as an extra, if you want a nicely indented XML, try this:
from lxml import etree
def xmlpprint(xml):
return etree.tostring(etree.fromstring(xml), pretty_print=True)

Extra Tabs in IMAP HTML Text

I'm using Python and imaplib to obtain emails from a IMAP server (supports all kinds of IMAP servers - GMail, etc.).
My problem is: Using the IMAP BODY[INDEX] command to fetch a specific body part, the HTML comes with extra tabs. As in:
(...)</a>\t\t\t\t\t\t\t\t<a>(...)
When showing the HTML the tabs are obviously extra:
(The screenshot is in the Portuguese language but I believe that is not relevant.
I have searched IMAP documentation but found nothing that helps. I am guessing these \t are always following tag closes (such as \t\t\t\t\t), so I could just find all tabs that come after a tag close and delete them, but I don't know if that would be a reliable method at all.
Thank you
I found a solution (for the time being at least).
When receiving data from a IMAP call response, there are \\r\\n characters delimiting the lines. I remove these.
However, I discovered that besides these there are also \\t characters coupled with these in some instances. For example:
\\r\\n\\t\\t\\t\t
If I remove the \\t together with the \\r\\n, the HTML is presented perfectly.

Categories

Resources