First, I should say, I don't really know much about computer programming, but I find Python fairly easy to use for automating simple tasks, thanks to Al Sweigart's book, "Automate the boring stuff."
I want to collect email text bodies. I'm trying to move homework to email to save paper. I thought I could do that by getting the numbers of the unseen mails and just looping through that. If I try that, the IDLE3 shell just becomes unresponsive, ctrl c does nothing, I have to restart the shell.
Question: Why can't I just use a loop variable in server.fetch()??
for msgNum in unseenMessages:
rawMessage = server.fetch([msgNum], ['BODY[]', 'FLAGS'])
It seems you need an actual number like 57, not msgNum, in there, or it won't work.
After looking at various questions and answers here on SO, the following works for me. I suppose it collects all the email bodies in one swoop.
import pyzmail
import pprint
from imapclient import IMAPClient
server = IMAPClient('imap.qq.com', use_uid=True, ssl=True)
server.login('myEmail#foxmail.com', 'myIMAPpassword')
select_info = server.select_folder('Inbox')
unseenMessages = server.search(['UNSEEN'])
rawMessage = server.fetch(unseenMessages, ['BODY[]', 'FLAGS'])
for msgNum in unseenMessages:
message = pyzmail.PyzMessage.factory(rawMessage[msgNum][b'BODY[]'])
text = message.text_part.get_payload().decode(message.text_part.charset)
print('Text' + str(msgNum) + ' = ')
print(text)
I've found this gist with nice and clean code and a page with many helping examples
The main difference between API of imaplib and pyzmail is pyzmail is all-in-one package with parsing and all client-server communication. But these packages are splitted into different packages in standard library. Basically, they both provided almost the same functionality and with the same methods.
As an additional important note here, pyzmail looks quite abandoned.
To save a useful code from that gist, I copy it here as it is with very small modifications like extracting main function (note, it's for Python 3):
#!/usr/bin/env python
#
# Very basic example of using Python 3 and IMAP to iterate over emails in a
# gmail folder/label. This code is released into the public domain.
#
# This script is example code from this blog post:
# http://www.voidynullness.net/blog/2013/07/25/gmail-email-with-python-via-imap/
#
# This is an updated version of the original -- modified to work with Python 3.4.
#
import sys
import imaplib
import getpass
import email
import email.header
import datetime
EMAIL_ACCOUNT = "notatallawhistleblowerIswear#gmail.com"
# Use 'INBOX' to read inbox. Note that whatever folder is specified,
# after successfully running this script all emails in that folder
# will be marked as read.
EMAIL_FOLDER = "Top Secret/PRISM Documents"
def process_mailbox(M):
"""
Do something with emails messages in the folder.
For the sake of this example, print some headers.
"""
rv, data = M.search(None, "ALL")
if rv != 'OK':
print("No messages found!")
return
for num in data[0].split():
rv, data = M.fetch(num, '(RFC822)')
if rv != 'OK':
print("ERROR getting message", num)
return
msg = email.message_from_bytes(data[0][1])
hdr = email.header.make_header(email.header.decode_header(msg['Subject']))
subject = str(hdr)
print('Message %s: %s' % (num, subject))
print('Raw Date:', msg['Date'])
# Now convert to local date-time
date_tuple = email.utils.parsedate_tz(msg['Date'])
if date_tuple:
local_date = datetime.datetime.fromtimestamp(
email.utils.mktime_tz(date_tuple))
print ("Local Date:", \
local_date.strftime("%a, %d %b %Y %H:%M:%S"))
def main(host, login, folder):
with imaplib.IMAP4_SSL(host) as M:
rv, data = M.login(login, getpass.getpass())
print(rv, data)
rv, mailboxes = M.list()
if rv == 'OK':
print("Mailboxes:")
print(mailboxes)
rv, data = M.select(folder)
if rv == 'OK':
print("Processing mailbox...\n")
process_mailbox(M)
else:
print("ERROR: Unable to open mailbox ", rv)
if __name__ == '__main__':
try:
main('imap.gmail.com', EMAIL_ACCOUNT, EMAIL_FOLDER)
except imaplib.IMAP4.error as e:
print('Error while processing mailbox:', e)
sys.exit(1)
Related
I've been struggling with this problem for a bit, I am trying to create a program that will create a datetime object based on the current date and time, create a second such object from our file data, find the difference between the two, and if it is greater than 10 minutes search for a "handshake file", which is a file we receive back when our file has successfully loaded. If we don't find that file, I want to kick out an error email.
My problem lies in being able to capture the result of my ls command in a meaningful way where I would be able to parse through it and see if the correct file exists. Here is my code:
"""
This module will check the handshake files sent by Pivot based on the following conventions:
- First handshake file (loaded to the CFL, *auditv2*): Check every half-hour
- Second handshake file (proofs are loaded and available, *handshake*): Check every 2 hours
"""
import smtplib
from email.mime.text import MIMEText
from datetime import datetime, timedelta
from csv import DictReader
from subprocess import *
from os import chdir
from glob import glob
def main():
audit_in = '/prod/bcs/lgnp/clientapp/csvbill/audit_process/lgnp.smr.csv0000.audit.qty'
with open(audit_in, 'rbU') as audit_qty:
my_audit_reader = DictReader(audit_qty, delimiter=';', restkey='ignored')
my_audit_reader.fieldnames = ("Property Code",
"Pivot ID",
"Inwork File",
"Billing Manager E-mail",
"Total Records",
"Number of E-Bills",
"Printed Records",
"File Date",
"Hour",
"Minute",
"Status")
# Get current time to reconcile against
now = datetime.now()
# Change internal directory to location of handshakes
chdir('/prod/bcs/lgnp/input')
for line in my_audit_reader:
piv_id = line['Pivot ID']
status = line['Status']
file_date = datetime(int(line['File Date'][:4]),
int(line['File Date'][4:6]),
int(line['File Date'][6:8]),
int(line['Hour']),
int(line['Minute']))
# print(file_date)
if status == 's':
diff = now - file_date
print diff
print piv_id
if 10 < (diff.seconds / 60) < 30:
proc = Popen('ls -lh *{0}*'.format(status),
shell=True) # figure out how to get output
print proc
def send_email(recipient_list):
msg = MIMEText('Insert message here')
msg['Subject'] = 'Alert!! Handshake files missing!'
msg['From'] = r'xxx#xxx.com'
msg['To'] = recipient_list
s = smtplib.SMTP(r'xxx.xxx.xxx')
s.sendmail(msg['From'], msg['To'], msg.as_string())
s.quit()
if __name__ == '__main__':
main()
To parse ls output is not the best solution here. You can surely do that parsing subprocess.check_output result or in any other way, but let me give you an advice.
It is a good criterion of something going wrong if you find yourself parsing someone's output or logs to solve a standard problem, please consider other solutions, like offered below:
If the only thing you want is to see the contents of the directory use os.listdir like:
my_home_files = os.listdir(os.path.expanduser('~/my_dir')) # surely it's cross-platform
now you have a list of files in your my_home_files variable.
You can filter them in the way you want or use glob.glob to use metacharacters like that:
glob.glob("/home/me/handshake-*.txt") # will output everything matching the expression
# (say you have ids in your filenames).
After that you may want to check some stats of the file (like the date of last access etc.)
consider using os.stat:
os.stat(my_home_files[0]) # outputs stats of the first
# posix.stat_result(st_mode=33104, st_ino=140378115, st_dev=3306L, st_nlink=1, st_uid=23449, st_gid=59216, st_size=1442, st_atime=1421834474, st_mtime=1441831745, st_ctime=1441234474)
# see os.stat linked above to understand how to parse it
The Grooveshark music streaming service has been shut down without previous notification. I had many playlists that I would like to recover (playlists I made over several years).
Is there any way I could recover them? A script or something automated would be awesome.
Update [2018-05-11]
Three years have passed since this answer was posted and it seems this script no longer works. If you are still in need to recover your old Grooveshark playlists, it might not be possible anymore. Good luck and, if you find a way to do it, share it here! I will be happy to accept your answer instead. :-)
I made a script that will try to find all the playlists made by the user and download them in an output directory as CSV files. It is made in Python.
You must just pass your username as parameter to the script (i.e. python pysharkbackup.py "my_user_name"). Your email address should work as well (the one you used for registering in Grooveshark).
The output directory is set by default to ./pysharkbackup_$USERNAME.
Here is the script:
#!/bin/python
import os
import sys
import csv
import argparse
import requests
URI = 'http://playlist.fish/api'
description = 'Download your Grooveshark playlists as CSV.'
parser = argparse.ArgumentParser(description = description)
parser.add_argument('USER', type=str, help='Grooveshar user name')
args = parser.parse_args()
user = args.USER
with requests.Session() as session:
# Login as user
data = {'method': 'login', 'params': {'username': user}}
response = session.post(URI, json=data).json()
if not response['success']:
print('Could not login as user "%s"! (%s)' %
(user, response['result']))
sys.exit()
# Get user playlists
data = {'method': 'getplaylists'}
response = session.post(URI, json=data).json()
if not response['success']:
print('Could not get "%s" playlists! (%s)' %
(user, response['result']))
sys.exit()
# Save to CSV
playlists = response['result']
if not playlists:
print('No playlists found for user %s!' % user)
sys.exit()
path = './pysharkbackup_%s' % user
if not os.path.exists(path):
os.makedirs(path)
for p in playlists:
plid = p['id']
name = p['n']
data = {'method': 'getPlaylistSongs', 'params': {'playlistID': plid}}
response = session.post(URI, json=data).json()
if not response['success']:
print('Could not get "%s" songs! (%s)' %
(name, response['result']))
continue
playlist = response['result']
f = csv.writer(open(path + '/%s.csv' % name, 'w'))
f.writerow(['Artist', 'Album', 'Name'])
for song in playlist:
f.writerow([song['Artist'], song['Album'], song['Name']])
You can access some information left in your browser by checking the localStorage variable.
Go to grooveshark.com
Open dev tools (Right click -> Inspect Element)
Go to Resources -> LocalStorage -> grooveshark.com
Look for library variables: recentListens, library and storedQueue
Parse those variables to extract your songs
Might not give your playlists, but can help retrieving some of your collection.
I have found a script on this website http://wammu.eu/docs/manual/smsd/run.html
#!/usr/bin/python
import os
import sys
numparts = int(os.environ['DECODED_PARTS'])
# Are there any decoded parts?
if numparts == 0:
print('No decoded parts!')
sys.exit(1)
# Get all text parts
text = ''
for i in range(1, numparts + 1):
varname = 'DECODED_%d_TEXT' % i
if varname in os.environ:
text = text + os.environ[varname]
# Do something with the text
f = open('/home/pi/output.txt','w')
f.write('Number %s have sent text: %s' % (os.environ['SMS_1_NUMBER'], text))
And i know that my gammu-smsd is working fine, because i can turn of my ledlamp on raspberry by sending sms to the raspberry, but my question is why is this script failing? nonthing is happening. and when I try to run the script by it self it also fails.
What I would like to do is just receive the sms and then read the content and save the content and phonenumber which sent the sms to a file.
I hope you understand my issue.
Thank you in advance, all the best.
In the gammu-smsd config file, you can use the file backend which does this for you automatically.
See this example from the gammu documentation
http://wammu.eu/docs/manual/smsd/config.html#files-service
[smsd]
Service = files
PIN = 1234
LogFile = syslog
InboxPath = /var/spool/sms/inbox/
OutboPpath = /var/spool/sms/outbox/
SentSMSPath = /var/spool/sms/sent/
ErrorSMSPath = /var/spool/sms/error/
Also see options for the file backend to tailor to your needs.
http://wammu.eu/docs/manual/smsd/config.html#files-backend-options
Hope this helps :)
So I have a simple reddit bot set up which I wrote using the praw framework. The code is as follows:
import praw
import time
import numpy
import pickle
r = praw.Reddit(user_agent = "Gets the Daily General Thread from subreddit.")
print("Logging in...")
r.login()
words_to_match = ['sdfghm']
cache = []
def run_bot():
print("Grabbing subreddit...")
subreddit = r.get_subreddit("test")
print("Grabbing thread titles...")
threads = subreddit.get_hot(limit=10)
for submission in threads:
thread_title = submission.title.lower()
isMatch = any(string in thread_title for string in words_to_match)
if submission.id not in cache and isMatch:
print("Match found! Thread ID is " + submission.id)
r.send_message('FlameDraBot', 'DGT has been posted!', 'You are awesome!')
print("Message sent!")
cache.append(submission.id)
print("Comment loop finished. Restarting...")
# Run the script
while True:
run_bot()
time.sleep(20)
I want to create a file (text file or xml, or something else) using which the user can change the fields for the various information being queried. For example I want a file with lines such as :
Words to Search for = sdfghm
Subreddit to Search in = text
Send message to = FlameDraBot
I want the info to be input from fields, so that it takes the value after Words to Search for = instead of the whole line. After the information has been input into the file and it has been saved. I want my script to pull the information from the file, store it in a variable, and use that variable in the appropriate functions, such as:
words_to_match = ['sdfghm']
subreddit = r.get_subreddit("test")
r.send_message('FlameDraBot'....
So basically like a config file for the script. How do I go about making it so that my script can take input from a .txt or another appropriate file and implement it into my code?
Yes, that's just a plain old Python config, which you can implement in an ASCII file, or else YAML or JSON.
Create a subdirectory ./config, put your settings in ./config/__init__.py
Then import config.
Using PEP-18 compliant names, the file ./config/__init__.py would look like:
search_string = ['sdfghm']
subreddit_to_search = 'text'
notify = ['FlameDraBot']
If you want more complicated config, just read the many other posts on that.
I have found a program called "Best Email Extractor" http://www.emailextractor.net/. The website says it is written in Python. I tried to write a similar program. The above program extracts about 300 - 1000 emails per minute. My program extracts about 30-100 emails per hour. Could someone give me tips on how to improve the performance of my program? I wrote the following:
import sqlite3 as sql
import urllib2
import re
import lxml.html as lxml
import time
import threading
def getUrls(start):
urls = []
try:
dom = lxml.parse(start).getroot()
dom.make_links_absolute()
for url in dom.iterlinks():
if not '.jpg' in url[2]:
if not '.JPG' in url[2]:
if not '.ico' in url[2]:
if not '.png' in url[2]:
if not '.jpeg' in url[2]:
if not '.gif' in url[2]:
if not 'youtube.com' in url[2]:
urls.append(url[2])
except:
pass
return urls
def getURLContent(urlAdresse):
try:
url = urllib2.urlopen(urlAdresse)
text = url.read()
url.close()
return text
except:
return '<html></html>'
def harvestEmail(url):
text = getURLContent(url)
emails = re.findall('[\w\-][\w\-\.]+#[\w\-][\w\-\.]+[a-zA-Z]{1,4}', text)
if emails:
if saveEmail(emails[0]) == 1:
print emails[0]
def saveUrl(url):
connection = sql.connect('url.db')
url = (url, )
with connection:
cursor = connection.cursor()
cursor.execute('SELECT COUNT(*) FROM urladressen WHERE adresse = ?', url)
data = cursor.fetchone()
if(data[0] == 0):
cursor.execute('INSERT INTO urladressen VALUES(NULL, ?)', url)
return 1
return 0
def saveEmail(email):
connection = sql.connect('emails.db')
email = (email, )
with connection:
cursor = connection.cursor()
cursor.execute('SELECT COUNT(*) FROM addresse WHERE email = ?', email)
data = cursor.fetchone()
if(data[0] == 0):
cursor.execute('INSERT INTO addresse VALUES(NULL, ?)', email)
return 1
return 0
def searchrun(urls):
for url in urls:
if saveUrl(url) == 1:
#time.sleep(0.6)
harvestEmail(url)
print url
urls.remove(url)
urls = urls + getUrls(url)
urls1 = getUrls('http://www.google.de/#hl=de&tbo=d&output=search&sclient=psy-ab&q=DVD')
urls2 = getUrls('http://www.google.de/#hl=de&tbo=d&output=search&sclient=psy-ab&q=Jolie')
urls3 = getUrls('http://www.finanzen.net')
urls4 = getUrls('http://www.google.de/#hl=de&tbo=d&output=search&sclient=psy-ab&q=Party')
urls5 = getUrls('http://www.google.de/#hl=de&tbo=d&output=search&sclient=psy-ab&q=Games')
urls6 = getUrls('http://www.spiegel.de')
urls7 = getUrls('http://www.kicker.de/')
urls8 = getUrls('http://www.chessbase.com')
urls9 = getUrls('http://www.nba.com')
urls10 = getUrls('http://www.nfl.com')
try:
threads = []
urls = (urls1, urls2, urls3, urls4, urls5, urls6, urls7, urls8, urls9, urls10)
for urlList in urls:
thread = threading.Thread(target=searchrun, args=(urlList, )).start()
threads.append(thread)
print threading.activeCount()
for thread in threads:
thread.join()
except RuntimeError:
print RuntimeError
I don't think many people are going to help you harvest emails. It's a generally detested activity.
Regarding the performance bottlenecks in your code, you need to find out where the time is going by profiling. At the lowest level, replace each of your functions with a dummy that does no processing but returns valid output; so the email collector could return a list of the same address 100 times (or however many are in these URL results). That will show you which function is costing you time.
Things that stick out:
Get the files behind the URLs from the server beforehand; if you spam Google every time you run the script, they could well block you. Reading from disk is faster than requesting the files from the internet and can be done separately and concurrently.
The database code is creating a new connection for each call to saveEmail etc, which will spend most of its time doing handshaking and authentication. Better to have an object that keeps the connection alive between calls, or better yet to insert multiple records at once.
Once the network and database issues are done, the regex could probably do with \b around it so that the matching does less backtracking.
A series of if not 'foo' in str: then if not 'blah' in str ... is poor coding. Extract the final segment once and check it against multiple values by creating a set or even frozenset of all the non-permitted values like ignoredExtensions = set([jpg,png,gif]) and comparing with that like if not extension in ignoredExtensions. Note also that converting extension to lower case first will mean less checking and work whether it is jpg or JPG.
Finally, consider running the same script without threading on multiple commandlines. There is no real need to have the threading inside the script except for coordinating the different url lists. Frankly it would be far simpler to just have a set of url lists in files, and start a separate script to work on each. Let the OS do the multithreading, it is better at it.