Parsing an email with Python

Parsing an email with Python - python

I feel like this is a simple question but nonetheless I cannot find a straightforward answer.
I have an email (an .eml file) that I need to parse. This email has a data table in the body that I need to export to my database. I have been successful parsing data out of txt file emails and attached PDF files, so I understand concepts like mapping to where the data is stored as well as RegularExpressions, but these eml files I can't seem to figure out.
In my code below I have three blocks of code essentially trying to do the same thing (two of them are comments now). I am simply attempting to capture any, or all, of the data in the email. Each block of code produces the same error though:
TypeError: initial_value must be str or None, not _io.TextIOWrapper
I have read that this error is most likely due to Python expecting to receive a string but receives bytes instead, or vice versa. So I followed up those attempts by trying to implement io.StringIO or io.BytesIO but neither worked. I would like to be able to recognize and parse specific data out of the email.
Thank you for any help, as well as question asking criticism.
My code:
import email
#import io
import os
import re
path = 'Z:\\folderwithemlfile'
for filename in os.listdir(path):
file_path = os.path.join(path, filename)
if os.path.isfile(file_path):
with open(file_path, 'r', encoding="utf-8") as f:
b = email.message_from_string(f)
if b.is_multipart():
for paylod in b.get_payload():
print(payload.get_payload())
else:
print(b.get_payload())
#b = email.message_from_string(f)
#bbb = b['from']
#ccc = b['to']
#print(f)
#msg = email.message_from_string(f)
#msg['from']
#msg['to']
Picture of email:

Related

How can I input the image as byte data instead of string?

I'm new to python and was playing around with how to change my Instagram profile picture. The part I just can't get past is how I can put my image into the program. This is my code:
from instagram_private_api import Client, ClientCompatPatch
user_name = 'my_username'
password = 'my_password'
api = Client(user_name, password)
api.change_profile_picture('image.png')
Now, from what I read on the API Documentation, I can't just put in an image. It needs to be byte data. On the API documentation, the parameter is described like this:
photo_data – byte string of image
I converted the image on an encoding website and now I have the file image.txt with the byte data of the image. So I changed the last line to this:
api.change_profile_picture('image.txt')
But this still doesn't work. The program doesn't read it as byte data. I get the following error:
Exception has occurred: TypeError
a bytes-like object is required, not 'str'
What is the right way to put in the picture?

The error is telling you that "input.txt" (or "image.png") is a string, and it's always going to say that as long as you pass in a filename because filenames are always strings. Doesn't matter what's in the file, because the API doesn't read the file.
It doesn't want the filename of the image, it wants the actual image data that's in that file. That's why the parameter is named photo_data and not photo_filename. So read it (in binary mode, so you get bytes rather than text) and pass that instead.
with open("image.png", "rb") as imgfile:
api.change_profile_picture(imgfile.read())
The with statement ensures that the file is closed after you're done with it.

if you have .png or .jpeg or ... then use this.
with open("image.png", "rb") as f:
api.change_profile_picture(f.read())
and if you have a .txt file then use this.
with open("image.txt", "rb") as f:
api.change_profile_picture(f.read())

'application/octet-stream' instead of application/csv?

I am quite new to Python. I want to confirm that the type of the dataset (URL in the code below) is indeed a csv file. However, when checking via the headers I get 'application/octet-stream' instead of 'application/csv'.
I assume that I defined something in the wrong way when reading in the data, but I don't know what.
import requests
url="https://opendata.ecdc.europa.eu/covid19/casedistribution/csv/data.csv"
d1 = requests.get( url )
filePath = 'data/data_notebook-1_covid-new.csv'
with open(filePath, "wb") as f:
f.write(d1.content)
## data type via headers #PROBLEM
import requests
headerDict=d1.headers
#accessing content-type header
if "Content-Type" in headerDict:
print("Content-Type:")
print( headerDict['Content-Type'] )

I assume that I defined something in the wrong way when reading in the data
No, you didn't. The Content-Type header is supposed to indicate what the response body is, but there is nothing you can do to force the server to set that to a value you expect. Some servers are just badly configured and don't play along.
application/octet-stream is the most generic content type of them all - it gives you no more info than "it's a bunch of bytes, have fun".
What's more, there isn't necessarily One True Type for each kind of content, only more-or-less widely agreed-upon conventions. For CSV, a common one would be text/csv.
So if you're sure what the content is, feel free to ignore the Content-Type header.
import requests
url = "https://opendata.ecdc.europa.eu/covid19/casedistribution/csv/data.csv"
response = requests.get(url)
filePath = 'data/data_notebook-1_covid-new.csv'
with open(filePath, "wb") as f:
f.write(response.content)
Writing to file in binary mode is a good idea in the absence of any further information, because this will retain the original bytes exactly as they were.
In order to convert that to string, it needs to be decoded using a certain encoding. Since the Content-Type did not give any indication here (it could have said Content-Type: text/csv; charset=XYZ), the best first assumption for data from the Internet would be UTF-8:
import csv
filePath = 'data/data_notebook-1_covid-new.csv'
with open(filePath, encoding='utf-8') as f:
reader = csv.reader(f, delimiter=',')
for row in reader:
print(row)
Should that turn out to be wrong (i.e. there are decoding errors or garbled characters), you can try a different encoding until you find one that works. That would not be possible if you had written the file in text mode in the beginning, as any data corruption from wrong decoding would have made it into the file.

Errno 22 invalid mode w+ or filename

IOError: [Errno 22] invalid mode ('w+') or filename: 'hodor_2017-05-09_14:03:38.txt'
So I was having issues creating a file where as it is "name" [delimiter] "datetime" .txt
I was looking up different bits of code such as:
Turn a string into a valid filename?
python: how to convert a string to utf-8
https://github.com/django/django/blob/master/django/utils/safestring.py
and it still seems to not work for me.
My concept is simple: given a name and content write a file with that name and that content.
My code is:
def create_json_file(name, contents):
filename = u"%s_%s.json" % (name, datetime.datetime.now().strftime("%Y/%m/%d_%H:%M:%S"))
print "%s" % filename
filename = slugify(filename)
f = open(filename, "w+")
f.write(contents)
f.close()
and as you can see i have been tweaking it. I was looking up the results that django does, which uses slugify.
My original did not have that line. Maybe there is a better way to name the file too. I think the name and datetime is pretty normal but i wasnt sure what delimiters I should be using between name and datetime etc.
For the record, I am not currently using Django because i dont have a need for the framework. I am just trying to test a way to pass in a string and a json map and turn it into a config.json file essentially.
Eventually, I will want to leverage an AJAX request from a website to do this, but that is outside the scope of this question.

Use a different separator in your filename mask:
filename = u"%s_%s.json" % (name, datetime.datetime.now().strftime("%Y_%m_%d_%H%M%S"))
The OS is trying to open 2005/04/01_5:45:04.json. Slashes aren't allowed in file/directory names.
Edit: Removed colons in response to comments.

Python suds file

I'm newbie as with python as programming, poor English also...
I have a doubt, I'm using suds to get methods from a WSDL and then sometimes it returns me type instance or type text, when it returns me instance I could manipulate the object like a list, but like a text I couldn't, so I try to parse it, but it's too big, and the structure of the text there are a lot of "\n", so I thought, maybe I can read and treat like a file.txt and for each "\n" I get a list element. But I have no idea how I can turn a string or "Text" in .txt
Can you help me?
my python.py:
#!/usr/bin/python
from suds.client import Client
import xml.etree.ElementTree as ET
url = 'https://gpadev.servicedesk.net.br/dataservices/application/clients/clients.asmx?WSDL'
d = dict(http='******', https='********')
client = Client(url, proxy = d, username= '******', password = '********')
method = client.service.Export('*******')
type (method)
it returns me:
type text
if a print, I get something like:
CLIENT,FULLNAME,SEX,NICKNAME,BOSS,TITLE,MANAGER,INACTIVE,NETID,EMAILID,EMAILALT,NOTIFYMAIL,PAGERNUMBER,NOTIFYPAGER,PHONELBL1,PHONE1,PHONELBL2,PHONE2,PHONELBL3,PHONE3,ADDRESS,ADDRESS2,ZIP,CITY,STATE,DIVISION,REGION,LOCATION,ORGUNIT,CHARGE,SLEVEL,SKILL,LANGID,TIMEZONE,NOTES,CLIENT_LIST_MANAGELEVEL,ANALYST_LIST_PROFILE **\n** CLIENT,FULLNAME,SEX,NICKNAME,BOSS,TITLE,MANAGER,INACTIVE,NETID,EMAILID,EMAILALT,NOTIFYMAIL,PAGERNUMBER,NOTIFYPAGER,PHONELBL1,PHONE1,PHONELBL2,PHONE2,PHONELBL3,PHONE3,ADDRESS,ADDRESS2,ZIP,CITY,STATE,DIVISION,REGION,LOCATION,ORGUNIT,CHARGE,SLEVEL,SKILL,LANGID,TIMEZONE,NOTES,CLIENT_LIST_MANAGELEVEL,ANALYST_LIST_PROFILE **\n** .......**\n** .......**\n** .......**\n**
thanks for helping me

There are at least two things in your question:
how to split a string into a list of lines
how to save a string into an ASCII file (.txt)
For the first thing: it's as easy as calling lines=method.split('\n'), then you
can iterate through the returned lines list.
For the second thing:
with open("path to save the file + filename.txt", "w") as f:
f.write(method)

Save an email.Message object into a file

I am trying to modify emails stored as text files. I first import a message like this :
import email
f = open('filename')
msg = email.message_from_file(f)
Then, I make all the modifications I want, using the features of the email module.
The last step is to save the Message object (msg) in a file. What is the piece of code that does this ? There seems not to be any simple function like "message_to_file()"...
Many thanks.

The Messsage.as_string method should give you a flattened version of the message that you can write out just as you would any other string:
msg.as_string()
If this doesn't provide exactly the format you want, consider trying the email.generator module? If I read things correctly, you should be able to do something like this:
generator = email.generator.Generator(out_file)
generator.flatten(msg)
Assuming out_file is an open and writable file and msg is your message.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Parsing an email with Python - python

Related

How can I input the image as byte data instead of string?

'application/octet-stream' instead of application/csv?

Errno 22 invalid mode w+ or filename

Python suds file

Save an email.Message object into a file

Categories

Resources