Save email attachment (python3, pop3_ssl, gmail)

Save email attachment (python3, pop3_ssl, gmail) - python

I'm trying to save email attachment from Google mail account.
AFAIK, it can be done 'walking' the message and getting its payload,
for part in message.walk():
# getting payload, saving attach etc.
but it does not work.
See the whole example below:
def test_save_attach(self):
self.connection = poplib.POP3_SSL('pop.gmail.com', 995)
self.connection.set_debuglevel(1)
self.connection.user(USERNAME)
self.connection.pass_(PASS)
emails, total_bytes = self.connection.stat()
print("{0} emails in the inbox, {1} bytes total".format(emails, total_bytes))
# return in format: (response, ['mesg_num octets', ...], octets)
msg_list = self.connection.list()
print(msg_list)
# messages processing
for i in range(emails):
response = self.connection.retr(i+1)
# return in format: (response, ['line', ...], octets)
lines = response[1]
str_message = email.message_from_bytes(b''.join(lines))
print(str_message)
# save attach
for part in str_message.walk():
print(part.get_content_type())
if part.get_content_maintype() == 'multipart':
continue
if part.get('Content-Disposition') is None:
continue
filename = part.get_filename()
if not(filename): continue
fp = open(os.path.join(self.savedir, filename), 'wb')
fp.write(part.get_payload(decode=1))
fp.close
self.connection.quit()
Script output is:
*cmd* 'USER **********'
*cmd* 'PASS **********'
*cmd* 'STAT'
*stat* [b'+OK', b'1', b'5301']
1 emails in the inbox, 5301 bytes total
*cmd* 'LIST'
(b'+OK 1 messages (5301 bytes)', [b'1 5301'], 8)
*cmd* 'RETR 1'
[<message headers and body>]
text/plain
*cmd* 'QUIT'
As we can see, the only part of the message has 'text/plain' format and does not contain any attach information, although the message body defenitely contains it and it can be seen while debug output.

response = self.connection.retr(i+1)
raw_message = response[1]
raw_message is not a string. retr returns the message as a list of single lines. you are trying to convert the list into a string with str(raw_message) - that doesn't work.
instead, join these lines together, eg, replace
str_message = email.message_from_string(str(raw_message))
with:
python2:
str_message = email.message_from_string("\n".join(raw_message))
python3:
str_message = email.message_from_bytes(b'\n'.join(raw_message))
edit:// adding my full working source and output to help debug the problem
import poplib
import email
import os
class GmailTest(object):
def __init__(self):
self.savedir="/tmp"
def test_save_attach(self):
self.connection = poplib.POP3_SSL('pop.gmail.com', 995)
self.connection.set_debuglevel(1)
self.connection.user("<munged>")
self.connection.pass_("<munged>")
emails, total_bytes = self.connection.stat()
print("{0} emails in the inbox, {1} bytes total".format(emails, total_bytes))
# return in format: (response, ['mesg_num octets', ...], octets)
msg_list = self.connection.list()
print(msg_list)
# messages processing
for i in range(emails):
# return in format: (response, ['line', ...], octets)
response = self.connection.retr(i+1)
raw_message = response[1]
str_message = email.message_from_bytes(b'\n'.join(raw_message))
# save attach
for part in str_message.walk():
print(part.get_content_type())
if part.get_content_maintype() == 'multipart':
continue
if part.get('Content-Disposition') is None:
print("no content dispo")
continue
filename = part.get_filename()
if not(filename): filename = "test.txt"
print(filename)
fp = open(os.path.join(self.savedir, filename), 'wb')
fp.write(part.get_payload(decode=1))
fp.close
#I exit here instead of pop3lib quit to make sure the message doesn't get removed in gmail
import sys
sys.exit(0)
d=GmailTest()
d.test_save_attach()
output:
python3 thetest.py
*cmd* 'USER <munged>'
*cmd* 'PASS <munged>'
*cmd* 'STAT'
*stat* [b'+OK', b'2', b'152928']
2 emails in the inbox, 152928 bytes total
*cmd* 'LIST'
(b'+OK 2 messages (152928 bytes)', [b'1 76469', b'2 76459'], 18)
*cmd* 'RETR 1'
multipart/mixed
text/plain
test.txt
application/pdf
ADDFILE_0.pdf
*cmd* 'RETR 2'
multipart/mixed
text/plain
test.txt
application/pdf
ADDFILE_0.pdf

Related

'Latin-1' codec can't encode characters in position 1011-1013: ordinal not in range(256)

I am newbie in Python so my method to make the code work is by referring to other people's code and modify until it solves my problem.
I have tried to make a code to download the 'pdf' attachment from the email with particular name. I have made the code and it worked well in my windows laptop. But the problem is my laptop cannot run 24 hours so I was planning to move the code to Raspberry Pi 4 device.
I had to make some adjustments on the code to make it works in the Raspberry Pi, and eventually worked for sometimes. But then now, when I tried to run the code from the terminal in Raspberry Pi, it always shows an error: 'latin-1' codec can't encode characters in position 1011-1013: ordinal not in range(256)
What is going on here? Why does the exact same code work last week, but doesn't work today?
Below is my code:
import imaplib
import email
from email.header import decode_header
import os
import sys
import webbrowser
org_email = "#yahoo.com"
username = "test123" + org_email
password = "xxxxxxx"
smtp_server = "imap.gmail.com"
smtp_port = 993
def create(text): #clean text for creating a folder
if "CCI Daily" in text:
foldername = "CCI Daily"
elif "ICT" in text:
foldername = "Platts ICT"
elif "Argus Coal Daily International" in text:
foldername = "Argus"
elif "Fenwei Index Price Comparion" in text:
foldername = "Fenwei Index Price Comparisons"
else:
foldername = "Spam"
return foldername
#Create Connection
mail = imaplib.IMAP4_SSL(smtp_server)
mail.login(username,password)
#Which Gmail Folder to Select
mail.select("inbox")
type, data = mail.search(None,"ALL")
mail_ids = data[0]
id_list = mail_ids.split()
first_email_id = int(id_list[0])
last_email_id = int(id_list[-1])
print("\nThere are", last_email_id, "emails detected")
for i in range(first_email_id, last_email_id+1):
a = last_email_id + 1 - i #a = latest email index
print("\n%s th email:" %a)
res, msg = mail.fetch(str(a), "(RFC822)")
for response in msg:
if isinstance (response, tuple): #parse a bytes email into a message object
msg = email.message_from_bytes(response[1])
#decode the email subject
subject, encoding = decode_header(msg["Subject"])[0]
if isinstance (subject, bytes):
subject = subject.decode(encoding)
#decode the email sender
From, encoding = decode_header(msg.get("From"))[0]
if isinstance (From, bytes):
From = From.decode(encoding)
print("Subject: ", subject)
print("===============================================")
print("From: ", From)
#if the email message is multipart
if msg.is_multipart():
#iterate over email parts
for part in msg.walk():
content_type = part.get_content_type()
content_disposition = str(part.get("Content-Disposition"))
print(content_type)
if content_disposition != "None":
print(content_disposition)
try:
#get the email body and print the email body
body = part.get_payload(decode=True).decode()
except:
pass
if content_type == "text/plain" and "attachment" not in content_disposition:
#print text/plain emails and skip attachments
print(body)
elif "attachment" in content_disposition:
#download attachment
filename = part.get_filename()
if "ICT" in filename or "CCI" in filename:
folder_name = create(filename) #create specific folder for specific filename
print("Foldername:", folder_name)
if not os.path.isdir(folder_name):
#make a folder for this email
os.mkdir(folder_name)
filepath = os.path.join(folder_name,filename)
open(filepath, "wb").write(part.get_payload(decode=True))
exit()
else:
print("We do not download this attachment")

Since you actually have an encoding name, chances are you have malformed messages, that tough they specify the "latin1" encoding, they have characters that it can't handle. Pass the extra named argument errors="replace" in your calls to "decode": out of range chars will be replaced with a "�", but the app won't stop.

If it's Unicode text file, don't use open(filepath, "wb"), instead use open(filepath, "w", encoding="utf-8").You can also use try/except block depending on the situation:
try:
open(filepath, "wb").write(body)
except UnicodeEncodeError:
open(filepath, "w", encoding="utf-8").write(body)

How to deal with error when mails are read

The script is first doing it job
find the message with specific subject
send a message to the sender
copy the message to folder 'answered'
delete the message
sleep for 1min and repeat
mail.select()
status, messages = mail.select("INBOX")
n = int(str(messages[0], 'utf-8'))
messages = int(messages[0])
for i in range(messages, messages-n,-1):
res, msg = mail.fetch(str(i), "(RFC822)")
for response in msg:
if isinstance(response, tuple):
# parse a bytes email into a message object
msg = email.message_from_bytes(response[1])
Sub, encoding = decode_header(msg.get("Subject"))[0] # Error is about this line
Sub=Sub.decode((encoding))
if Sub == pat:
fro, encoding = decode_header(msg.get("From"))[0]
if isinstance(fro, bytes):
fro = fro.decode(encoding)
# if s == 0:
# time.sleep(60)
# mai_load(1)
print("From:", fro)
send_mail(fro)
mail.copy(str(i), 'praca')
mail.store(str(i), '+FLAGS', '\\Deleted')
print("=" * 100)
time.sleep(60)
mai_load(0)
And here is the problem, messages are mark as read and when the scripts connects again I'm receiving an error:
line 99, in mai_load
Sub, encoding = decode_header(msg.get("Subject"))[0]
File "/usr/lib/python3.8/email/header.py", line 80, in decode_header
if not ecre.search(header):
TypeError: expected string or bytes-like object

Try my lib: https://github.com/ikvk/imap_tools
from imap_tools import MailBox, AND, MailMessageFlags
with MailBox('imap.mail.com').login('test#mail.com', 'pwd', 'INBOX') as mailbox:
# get list of email senders from INBOX folder
senders = [msg.from_ for msg in mailbox.fetch()]
# FLAG unseen messages in current folder (INBOX) as Flagged
mailbox.flag(mailbox.fetch(AND(seen=False)), [MailMessageFlags.FLAGGED], True)

Reply to specific email and delete

I want the script response for email with specific Subject and then delete that email.
The script is doing one job which is responding to the email but I'm struggling with deleting that email, I'm not sure if I'm doing it even correctly: mail.store(response[1], '+FLAGS', '\Deleted') gives an error: imaplib.error: STORE command error: BAD [b'Could not parse command']
mail.select()
status, messages = mail.select("INBOX")
n = int(str(messages[0], 'utf-8'))
messages = int(messages[0])
for i in range(messages, messages-n,-1):
res, msg = mail.fetch(str(i), "(RFC822)")
for response in msg:
if isinstance(response, tuple):
# parse a bytes email into a message object
msg = email.message_from_bytes(response[1])
Sub, encoding = decode_header(msg.get("Subject"))[0]
if isinstance(Sub, bytes): # check the subject
Sub=Sub.decode((encoding))
if Sub == pat:
fro, encoding = decode_header(msg.get("From"))[0]
if isinstance(fro, bytes):
fro = fro.decode(encoding)
if s == 0:
time.sleep(180)
mai_load(1)
print("From:", fro)
send_mail(fro)
mail.store(response[1], '+FLAGS', '\\Deleted')
print("=" * 100)

Solution:
had to change one line:
mail.store(response[1], '+FLAGS', '\\Deleted')
to
mail.store(str(i), '+FLAGS', '\\Deleted')

Python MIME email attachment sending method sends jpg files as "noname.eml" instead

When I use MIME and SMTPLIB instead of sending my attachment "random.jpg" it sends "noname.eml" which is unopenable and unseeable, I am unable to send attachments correctly for this reason. Why is this caused and how can I solve it?
I tried to change the extension from "png" to "jpg" but the issue pursues.
fromaddr1 = ""
toaddr1 = ""
accpass1 = ""
msg1 = MIMEMultipart()
msg1['From'] = fromaddr1
msg1['To'] = toaddr1
msg1['Subject'] = "YOUR COMPUTER HAS BEEN ACCESSED"
body1 = "Someone has gained access to your computer"
msg1.attach(MIMEText(body1, 'plain'))
filename1 = "random.jpg"
attachment1 = open("random.jpg","rb")
part1 = MIMEBase('application', 'octet-stream')
part1.set_payload((attachment1).read())
encoders.encode_base64(part1)
part1.add_header('Content-Disposition', "attachment1; filename1= %s" %
filename1)
msg1.attach(part1)
server = smtplib.SMTP('smtp.gmail.com', 587)
server.starttls()
server.login(fromaddr1, accpass1)
text = msg1.as_string()
server.sendmail(fromaddr1, toaddr1, text)
server.quit()

in header:
from email.mime.application import MIMEApplication
instead attachment:
with open(PATH_TO_ZIP_FILE,'rb') as file: msg.attach(MIMEApplication(file.read(), Name='filename.zip'))

How do I download only unread attachments from a specific gmail label?

I have a Python script adapted from Downloading MMS emails sent to Gmail using Python
import email, getpass, imaplib, os
detach_dir = '.' # directory where to save attachments (default: current)
user = raw_input("Enter your GMail username:")
pwd = getpass.getpass("Enter your password: ")
# connecting to the gmail imap server
m = imaplib.IMAP4_SSL("imap.gmail.com")
m.login(user,pwd)
m.select("[Gmail]/All Mail") # here you a can choose a mail box like INBOX instead
# use m.list() to get all the mailboxes
resp, items = m.search(None, 'FROM', '"Impact Stats Script"') # you could filter using the IMAP rules here (check http://www.example-code.com/csharp/imap-search-critera.asp)
items = items[0].split() # getting the mails id
for emailid in items:
resp, data = m.fetch(emailid, "(RFC822)") # fetching the mail, "`(RFC822)`" means "get the whole stuff", but you can ask for headers only, etc
email_body = data[0][1] # getting the mail content
mail = email.message_from_string(email_body) # parsing the mail content to get a mail object
#Check if any attachments at all
if mail.get_content_maintype() != 'multipart':
continue
print "["+mail["From"]+"] :" + mail["Subject"]
# we use walk to create a generator so we can iterate on the parts and forget about the recursive headach
for part in mail.walk():
# multipart are just containers, so we skip them
if part.get_content_maintype() == 'multipart':
continue
# is this part an attachment ?
if part.get('Content-Disposition') is None:
continue
filename = part.get_filename()
counter = 1
# if there is no filename, we create one with a counter to avoid duplicates
if not filename:
filename = 'part-%03d%s' % (counter, 'bin')
counter += 1
att_path = os.path.join(detach_dir, filename)
#Check if its already there
if not os.path.isfile(att_path) :
# finally write the stuff
fp = open(att_path, 'wb')
fp.write(part.get_payload(decode=True))
fp.close()
I am filtering messages by subject and getting the attachments, but now I need to only get attachments from new emails. Can I modify the m.search() somehow to return only unread emails?

Try modifying this line:
resp, items = m.search(None, 'FROM', '"Impact Stats Script"')
to:
resp, items = m.search(None, 'UNSEEN', 'FROM', '"Impact Stats Script"')
The Python imaplib documentation shows just adding more search criteria, and the IMAP specification defines the UNSEEN search criteria:
UNSEEN
Messages that do not have the \Seen flag set.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Save email attachment (python3, pop3_ssl, gmail) - python

Related

'Latin-1' codec can't encode characters in position 1011-1013: ordinal not in range(256)

How to deal with error when mails are read

Reply to specific email and delete

Python MIME email attachment sending method sends jpg files as "noname.eml" instead

How do I download only unread attachments from a specific gmail label?

Categories

Resources