Unable to extract the body of the email file in python

Unable to extract the body of the email file in python - python

I am reading an email file stored in my machine,able to extract the headers of the email, but unable to extract the body.
# The following part is working , opening a file and reading the header .
import email
from email.parser import HeaderParser
with open(passedArgument1+filename,"r",encoding="ISO-8859-1") as f:
msg=email.message_from_file(f)
print('message',msg.as_string())
parser = HeaderParser()
h = parser.parsestr(msg.as_string())
print (h.keys())
# The following snippet gives error
msgBody=msg.get_body('text/plain')
Is there any proper way to extract only the body message.Stuck at this point.
For reference the email file can be downloaded from
https://drive.google.com/file/d/0B3XlF206d5UrOW5xZ3FmV3M3Rzg/view

The 3.6 email lib uses an API that is compatible with Python 3.2 by default and that is what is causing you this problem.
Note the default policy in the declaration below from the docs:
email.message_from_file(fp, _class=None, *, policy=policy.compat32)
If you want to use the "new" API that you see in the 3.6 docs, you have to create the message with a different policy.
import email
from email import policy
...
msg=email.message_from_file(f, policy=policy.default)
will give you the new API that you see in the docs which will include the very useful: get_body()

Update
If you are having the AttributeError: 'Message' object has no attribute 'get_body' error, you might want to read what follows.
I did some tests, and it seems the doc is indeed erroneous compared to the current library implementation (July 2017).
What you might be looking for is actually the function get_payload() it seems to do what you want to achieve:
The conceptual model provided by an EmailMessage object is that of an
ordered dictionary of headers coupled with a payload that represents
the RFC 5322 body of the message, which might be a list of
sub-EmailMessage objects
get_payload() is not in current July 2017 Documentation, but the help() says the following:
get_payload(i=None, decode=False) method of email.message.Message instance
Return a reference to the payload.
The payload will either be a list object or a string. If you mutate
the list object, you modify the message's payload in place. Optional
i returns that index into the payload.
Optional decode is a flag indicating whether the payload should be decoded or not, according to the Content-Transfer-Encoding
header (default is False).
When True and the message is not a multipart, the payload will be decoded if this header's value is 'quoted-printable' or 'base64'. If some other encoding is used, or the header is missing, or if the payload has bogus data (i.e. bogus base64 or uuencoded data), the payload is returned as-is.
If the message is a multipart and the decode flag is True, then None is returned.

Related

Consume web service having Byte64 array as parameter with python Zeep

I'm trying to consume a webservice with python Zeep that has a parameter of type xsd:base64Binary technical document specify type as: Byte[]
Errors are:
urllib3.exceptions.HeaderParsingError: [StartBoundaryNotFoundDefect(), MultipartInvariantViolationDefect()], unparsed data: ''
and on the reply I get: Generic error "data at the root level is invalid.
I can't find the correct way to do it.
My code is:
content=open(fileName,"r").read()
encodedContent = base64.b64encode(content.encode('ascii'))
myParameter=dict(param=dict(XMLFile=encodedContent))
client.service.SendFile(**myParameter)
thanks everyone for the comments.
Mike

This is how the built-in type of Base64Binary looks like in zeep:
class Base64Binary(BuiltinType):
accepted_types = [str]
_default_qname = xsd_ns("base64Binary")
#check_no_collection
def xmlvalue(self, value):
return base64.b64encode(value)
def pythonvalue(self, value):
return base64.b64decode(value)
As you can see, it's doing the encoding and decoding by itself. You don't need to encode the file content, you have to send it as it is and zeep will encode it before putting it on the wire.
Most likely this is causing the issue. When the message element is decoded, an array of bytes is expected but another base64 string is found there.

Escaping string in json dictionary python request

So I've got a python application that is using requests.post to make a post request with json headers, body info, etc.
Problem is that in my dictionary that gets sent as headers, I have a variable that often contains character groups like %25"" or "%2F", etc. I've seen this cause problems before if sent in body data, but that can be fixed by sending the body data as a sting rather than a dictionary. Haven't figured out how to make this work with the headers though, as you can't simply delimit the parameters with an ampersand like in body data.
How do I make sure that my cookie value is not altered in the process of the post request?
For instance, headers :
Host : blahblah.com
Connection : Keep-Alive
Cookie : My sensitive string with special characters
etc.
Note : Nothing server-side can be changed. The python application is being used for hired pentesting services.

A common technique for sending data that becomes a mess when transmitted is to encode it, especially as base64
Sender:
import base64
...
encoded_data = "base64:{}".format(base64.b64encode(data))
Receiver:
import base64
...
if encoded_data.startswith("base64:"):
data = base64.b64decode(encoded_data.split(':')[1])

data type of message in volttron pubsub

What is the data type for "message" in pubsub used by volttron? I have checked the documentation but there is nothing mentioned about this. When checking the source I found this function comment source :
param headers: header info for the message,
type headers: None or dict,
param message: actual message,
type message: None or any
Are the above info correct? Does that "any" type refer to this: typing.Any?

The message can be any Python object that can be serialized into JSON. Typically this will be something specifically defined by the Agent publishing the message that aligns with the purpose of the message. Usually this will be a dictionary or list, but occasionally messages will be numbers or strings. VOLTTRON does not place any restrictions on the structure of the data as long as it can be serialized.
It is up to agents define the datatype of the message and document it for use by other agents.
Nested data structures are allowed as they are in JSON.

Parsing MIME body parts in Python

I'm having trouble parsing specific body parts of MIME messages.
I have an email client web interface. I want to allow the user to download the attachments of an email. In the past, each time I wanted to download an attachment I would make a call to the IMAP server with the argument RFC822 to obtain the whole message, that I could easily parse with Python.
However, this is not efficient and I need a way to obtain just the required attachment. I'm using the alternative of making a call to the IMAP server with the BODY[1], BODY[2], etc index of the specific bodypart.
When I make this IMAP call I obtain back the correct body part (when I make a call to BODYSTRUCTURE, the number of bytes in the part I'm looking for adds up, so I'm definitely obtaining the correct part).
However, I cannot parse this body part into something useable, or save it for that matter.
A specific example: I make a call to obtain the BODY[1] of an email and obtain back
('4 (UID 26776 BODY[2] {5318}', '/9j/4AAQSkZJRgABAQEAYABgAAD/2wBDAAoHBwkHBgoJCAkLCwoMDxkQDw4ODx4WFxIZJCAmJSMg\r\nIyIoLTkwKCo2KyIjMkQyNjs9QEBAJjBGS0U+Sjk/QD3/2wBDAQsLCw8NDx0QEB09KSMpPT09PT09\r\nPT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT3/wAARCABRAQIDASIA\r\nAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQA\r\nAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3\r\nODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWm\r\np6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEA\r\nAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSEx\r\nBhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElK\r\nU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3\r\nuLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwD2aiii\r\ngAooooAKKKKACorm4jtLaS4mbbFEpdzjOAOtS1neIf8AkXdR/wCvaT/0E1UVeSRM3yxbRbtbuC9t\r\n1ntZUliboyHIqauA+F+f+JkMnH7vj/vqu/rSvS9lUcE9jLDVvbUlNq1wooorE3CiiigAooooAKKK\r\nKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKzvEP8AyLuo/wDX\r\ntJ/6Cabruu23h/T2urlJZAPuxxLuZv8A63vXlWo+M/EXjC9FnpMUsMRORBb8sR6u3p+QrehSlJ83\r\nRGFerGKcerOp+F/XUv8Atn/7NXfE4GT0FcN4fx4Ss5jqTwS6jPt3w2vRcZxuPQHnnH5VDe6vqOuS\r\n+RGG2t0hi7/X1/HiniZqpVckThKbpUVCW534IYAggg8gilqG0QxWcKMMMsaqR6YFTVznSFFFFABR\r\nRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFITgZPSuen8daJDqLW\r\na3Xmug3SPENyRjIHJ+p7ZqowlLSKuTKcYq8nY6Kio1uIngEyyIYmGQ4PBH1qCLVLSa5Nuky+bjIU\r\n8bh7etSUcn8TGaOz05kYqwlYgg4I4qbRiLXw7ZG3SOFrmLzJmjQKZGyeSRUHxO/48dP/AOurfyqb\r\nTf8AkXNK/wCvf+prtn/usPVnn0/98n6Ikh8KzXt9LPdP5UDOWAXlmH9K6Wy0+20+Ly7WJUHc9z9T\r\n3pXdotOZ0OGWLcPqBXF6V4m1S4k0xpLvzPtTESRvaeWijBPyv0Y8dBXPClKabXQ6qlaNNpPqd5RX\r\nKWuvX8ug6HdPIhlvLsRTHYMFSW6Dt0FM0HxBqF/faZHcSIyXEM7yAIBkq5A+nFU8PNJvt/wf8iVi\r\nYNpd7fjb/M66iuZvrzVZ/Ed3ZWV7HbRQWqzDdAH3E5461lr4w1E2clyRESumLcBNvHmGTZn1x7UR\r\nw8pK6/q4SxMIuzT/AOGO6orjL7XdW0QzR3FzDds1ibmNvJ2bGBAxgHkc1saXJeJewx3+rxTySwea\r\nLcW4Q445yD0HSlKi4q9xxxCk+VJ/h/mbdFcodS1G6uNVnOqw2FnY3Bh+a3D8DHJJPqa09B1G4v7j\r\nVEuHVlt7too8Lj5QBSlSaVxxrxk7W/r+kbFFcW+v6t9in1ZbiEW0V79n+y+T1XcFzvznPNVF8T6q\r\nS8gvBuF55Iia0xGV345l6A4q1hpvqZvFwW6Z39FcRdeJr+P7VeJfWqrBeGBbEoNzoGC7s5znv0rf\r\n0HUbi/l1MXDBhb3jwx4XGFGMD3qZUJRjzMuGIhOXKv6/qxsUVzWva9d6XqsqQ7Gij057gIy9XDYH\r\nPpVOTXNV0lomu7iG8W4sZLlR5Pl+WyqGA4PI5ojQk0muopYmEW0+h2NFcjBrOq2U+nteXMN1HfWs\r\nk+wQ7PLKpvABB5Haq665rMFnpt5LdQzLqKviEQBfKO0kYOeenen9Xl3X9X/yF9aj2f8AVv8ANHbU\r\nVysXiG6e18PyCVHa7jd7gBR821CT9ORVTT/EmoPJpc0t/azi/Yq9skYDQcEjkHPbvR9Xn/Xz/wAh\r\n/Woaf12/zO1orkdI8R6tLpVpdXsFmYZiVEzT7Gc5PATHXjpntUdnrmriHSL+4uYZINSm8s24h2+X\r\nnOCGzk9O9H1eSuhLFQaTSev9fqdlRXB6Z4m1K4Gnyf2lbXM9xcCOSyWEB0XJy2Qc9Bn8as6T4gvp\r\n9Rgj1K+kt5JJin2Y2JCnk4USU5Yaav5Cji4Stbr6f5/8E7OikornOoWoLyWWC0llt4DcSopKxBgp\r\nc+mTwKnrP1LWrPTFInkzJ2jTlj/h+NAHkPiLxL4m8Rak2lyW9xbEnH2GFCGP+8erfyq7F4FvdA8P\r\n3ep6jIiSsixrbpztBdeWPTPHQV1g8Q3OoavDsVYIySMIPmIwTgt6e3SpPFF5JdeD71ZcEoY/m9cs\r\nK7qWIbnGEVZXRwV6CVOc5O7syj4SYnwlgkkLdsACeg2iofFhGn2cF8gkmldSBCi8gKfvZ9Km8I/8\r\nim3/AF+N/wCgiqXjpikGkMpKsElIIOCPmFDpqpinF92TGq6WDjNdkcoPEGu+IZ0SQrc28fAjkHyJ\r\n77uuffOa9JtvIXSbKG1mEqwQ7GI7H0NadnpFnqHh6xWeFQTAjb0G1gSAScj3qnZ+EDb3xka8byl+\r\n6EGGb2NY1qvN7iVkjpo0lH327tm5cyJFpEskgLIkBZgvUgLziuUtLPTrWSEhbyQWsgFtbTXWfnLB\r\nQQmOB82c+hrspII5rd4ZFDRupRl9QRjFVZNHs5m3Sxs5Awu6Rjs5B+Xn5eg6Y6VjGco6JmsqcZWc\r\nkc+ulWulXsDyWtz5UKyXSRG63xQ7cbiq+vzcVAunaUmlwXJkurcwRyGAQ3XzupO5hkDrk11v2C3I\r\nUOhfajRguxY7WxkEnrnA601dOt1tHtdrtA67SjyM3GMY5PAqvbT7k+wp/wApy0thpxa2nM2qCS5z\r\nbl1ucMwWQJ8x78t+VW9T0TSLEW8EkM4iuo1sSUfiNAdwJz/tAc+9bDaJYOWLQH5ju4dhtJYMSvPy\r\n/MAeMVLNplrcWwgnjMsYVlxIxY4YEHknPQmj20+4ewp/ynM3Fxp+rySzGxuJRFF9kIaUIGRpNoP4\r\nkA59DUxt7Tw7qMM5a6muBauQJ7oEKgK5Vc9TzwK310myQyFbdR5hUtgnnacj8jUzWsL3KzvGrSqp\r\nRWPYEg/zApe0la1x+yhfmtqcZfQ6ZdiS4eO+hh1Bt+xbkIkpDqhLA/d5INTQ2dvNfLNajUIzdXD7\r\n/KvNqF1zk8dRgf0rpRotgH3fZwfm3AFiQp3BuBnA+YA8VLHp9tE6tHCqlXaRcZ4Zup/Gn7WdrXF7\r\nCne9jlpLLTJLi2eOC8YXpS6S18/bCXbJyR+GahmsdMgeRZReNAswklhW8BQykb+F7jpz/hXULodg\r\ni4WAjGNpEjZTGcBTn5RyeBjrSLoGmLHsFnGRkHJyWzjHXr0o9tU7h7Cn/KZd1oWkDSWmltmIuZ0m\r\naTI8xWd1/i7AE9PTNU47SyfXrmO1lv4pPPMswW72ITk5IUDn7p49q6ePTreO0e2Cs0DrtKSOzjGM\r\nY5Jpkek2cRjMUbJsQINkjDKgk4bB+bknrnqaPaz7h7Gn2OfvZbHVLuKS7tLsSXlqIYxE4OYny2cd\r\nj8v61JfRaZdxFpYrkrZxC0G1wCVkAB/EVuPo9k6xgw48pFjjKsVKKvTBByOtMGh6eCuLcAKANoZs\r\nHGcEjOCRk8nml7SS2ZTpwd7o55b/AE+efS1FuzSQQbYB9oUpsZMEOR/FgdPerMOj2Wm6rAsUV1N9\r\nkj84JLcEx26kkfKO54NbJ0Wx3xusGx40CI0bshVRkAZBB7mppNPt5Z0mdW8xF2bg7DK9cHB5H1zR\r\n7Se1xeyhvY5nT4dPtbxmtrCYXF9GDCjSgqscgZjt/ufdOR9Kgt00m3h0xl05oHiKSQO0qKXDBgPM\r\nb8Dx7iupg0iyt3R44cNGRsJZm24BAAyeAAx46c0DSbIGEi3T9yAsecnaBnH8zR7WfcPYw7HJ2Mel\r\nB5LiGC6eCxDzfZ5LnKIwBJ2J0I54Oe9T/ZNK0u8siIdSfEp8i3eQlYWJAyEJ/wBr+ddO+mWryyO0\r\nZzKNsih2CuMbeVzg8cdKi/sLTyOYCzZyHZ2LA8YO4nPG0Y9MU/az7iVGmvsmEtppkQstNtrSdpYJ\r\nVmhLOFbdlyQzdcDaePpTLRLK4vrW7dtSnjSdNpnudyxyuMj5fbIGa6EaJYgDbCVYHO9XYPnLHO7O\r\nc/M3PfNPj0qziQJHAqqHWQAZ4ZQAp/AAUvaz7j9jDsW6KKKg0AjIIPQ1zV34NjluxJBcskbHLq/z\r\nEfQ/4101FAGbDpFpptjMttF85QgueWbj1rlNYu7ebSriwDl3mKZZOQu056967zqOawtX8LW17DM9\r\noqwXLL8pyQmfcD+lVB2kmiZpSi01c5HToZLO0aZLlLOyhY75pXwgOORj+I9OKx/EXijTddntrW2e\r\nRFtlZEmkTCyliM8clenGf0pn/CBeJ9W1U2t6iQQRHPmlv3IB7oB1P6+pr0Pw34G0rw4Fkij+0Xg6\r\n3EoBYf7o6L+HPvXYpQoy5+bmkccoSrx9ny2ibGkRtFo9lHIpV0gRWB7EKKuUUVxN3dztSsrBRRRS\r\nGFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQA\r\nneloooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKA\r\nCiiigD//2Q==\r\n').
This specific response corresponds to a JPEG image attachment.
I tried extracting the string representing the body part (so, I'm talking about the string starting in '/9j' and ending in '2Q==\r\n') and saving that to a file as a .jpg, but it's not a valid file.
I then though that, as there are multiple instances of \r\n in that string, that the string might be split with newline/carriage return, so I split the string and stripped it of the \r\n, then joined the substrings and tried to save that to a file. Still not a valid JPEG file.
What can I do to try and parse this response?
Thank you.

You need to parse the BODYRESPONSE string to see what format the data is encoded in, see the IMAP RFC 3501, section 7.4.2. The 5th field is the content encoding:
['IMAGE', 'JPEG', ['NAME', 'image001.jpg'], '<image001.jpg#01CDE914.6E62F850>', None, 'BASE64', 5318, None, None, None]
The fields are, in order, the type and subtype (so image/jpeg in this case), body parameters (such as characterset, format-flowed, or the filename in this case), the attachment id, description, encoding, size, MD5 signature (if any), disposition and language.
In this case the data is base-64 encoded:
>>> imagedata = datastring.decode('base64')
>>> imagedata[:10]
'\xff\xd8\xff\xe0\x00\x10JFIF'
which looks like JPEG data to me.

Send mail with python using bcc

I'm working with django, i need send a mail to many emails, i want to do this with a high level library like python-mailer, but i need use bcc field, any suggestions?

You should look at the EmailMessage class inside of django, supports the bcc.
Complete docs availble here:
http://docs.djangoproject.com/en/dev/topics/email/#the-emailmessage-class
Quick overview:
The EmailMessage class is initialized with the following parameters (in the given order, if positional arguments are used). All parameters are optional and can be set at any time prior to calling the send() method.
subject: The subject line of the e-mail.
body: The body text. This should be a plain text message.
from_email: The sender's address. Both fred#example.com and Fred forms are legal. If omitted, the DEFAULT_FROM_EMAIL setting is used.
to: A list or tuple of recipient addresses.
bcc: A list or tuple of addresses used in the "Bcc" header when sending the e-mail.
connection: An e-mail backend instance. Use this parameter if you want to use the same connection for multiple messages. If omitted, a new connection is created when send() is called.
attachments: A list of attachments to put on the message. These can be either email.MIMEBase.MIMEBase instances, or (filename, content, mimetype) triples.
headers: A dictionary of extra headers to put on the message. The keys are the header name, values are the header values. It's up to the caller to ensure header names and values are in the correct format for an e-mail message.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Unable to extract the body of the email file in python - python

Related

Consume web service having Byte64 array as parameter with python Zeep

Escaping string in json dictionary python request

data type of message in volttron pubsub

Parsing MIME body parts in Python

Send mail with python using bcc

Categories

Resources