Capturing STDOUT to then reconcile on the file names - python

I've been struggling with this problem for a bit, I am trying to create a program that will create a datetime object based on the current date and time, create a second such object from our file data, find the difference between the two, and if it is greater than 10 minutes search for a "handshake file", which is a file we receive back when our file has successfully loaded. If we don't find that file, I want to kick out an error email.
My problem lies in being able to capture the result of my ls command in a meaningful way where I would be able to parse through it and see if the correct file exists. Here is my code:
"""
This module will check the handshake files sent by Pivot based on the following conventions:
- First handshake file (loaded to the CFL, *auditv2*): Check every half-hour
- Second handshake file (proofs are loaded and available, *handshake*): Check every 2 hours
"""
import smtplib
from email.mime.text import MIMEText
from datetime import datetime, timedelta
from csv import DictReader
from subprocess import *
from os import chdir
from glob import glob
def main():
audit_in = '/prod/bcs/lgnp/clientapp/csvbill/audit_process/lgnp.smr.csv0000.audit.qty'
with open(audit_in, 'rbU') as audit_qty:
my_audit_reader = DictReader(audit_qty, delimiter=';', restkey='ignored')
my_audit_reader.fieldnames = ("Property Code",
"Pivot ID",
"Inwork File",
"Billing Manager E-mail",
"Total Records",
"Number of E-Bills",
"Printed Records",
"File Date",
"Hour",
"Minute",
"Status")
# Get current time to reconcile against
now = datetime.now()
# Change internal directory to location of handshakes
chdir('/prod/bcs/lgnp/input')
for line in my_audit_reader:
piv_id = line['Pivot ID']
status = line['Status']
file_date = datetime(int(line['File Date'][:4]),
int(line['File Date'][4:6]),
int(line['File Date'][6:8]),
int(line['Hour']),
int(line['Minute']))
# print(file_date)
if status == 's':
diff = now - file_date
print diff
print piv_id
if 10 < (diff.seconds / 60) < 30:
proc = Popen('ls -lh *{0}*'.format(status),
shell=True) # figure out how to get output
print proc
def send_email(recipient_list):
msg = MIMEText('Insert message here')
msg['Subject'] = 'Alert!! Handshake files missing!'
msg['From'] = r'xxx#xxx.com'
msg['To'] = recipient_list
s = smtplib.SMTP(r'xxx.xxx.xxx')
s.sendmail(msg['From'], msg['To'], msg.as_string())
s.quit()
if __name__ == '__main__':
main()

To parse ls output is not the best solution here. You can surely do that parsing subprocess.check_output result or in any other way, but let me give you an advice.
It is a good criterion of something going wrong if you find yourself parsing someone's output or logs to solve a standard problem, please consider other solutions, like offered below:
If the only thing you want is to see the contents of the directory use os.listdir like:
my_home_files = os.listdir(os.path.expanduser('~/my_dir')) # surely it's cross-platform
now you have a list of files in your my_home_files variable.
You can filter them in the way you want or use glob.glob to use metacharacters like that:
glob.glob("/home/me/handshake-*.txt") # will output everything matching the expression
# (say you have ids in your filenames).
After that you may want to check some stats of the file (like the date of last access etc.)
consider using os.stat:
os.stat(my_home_files[0]) # outputs stats of the first
# posix.stat_result(st_mode=33104, st_ino=140378115, st_dev=3306L, st_nlink=1, st_uid=23449, st_gid=59216, st_size=1442, st_atime=1421834474, st_mtime=1441831745, st_ctime=1441234474)
# see os.stat linked above to understand how to parse it

Related

Changing output of speedtest.py and speedtest-cli to include IP address in output .csv file

I added a line in the python code “speedtest.py” that I found at pimylifeup.com. I hoped it would allow me to track the internet provider and IP address along with all the other speed information his code provides. But when I execute it, the code only grabs the next word after the find all call. I would also like it to return the IP address that appears after the provider. I have attached the code below. Can you help me modify it to return what I am looking for.
Here is an example what is returned by speedtest-cli
$ speedtest-cli
Retrieving speedtest.net configuration...
Testing from Biglobe (111.111.111.111)...
Retrieving speedtest.net server list...
Selecting best server based on ping...
Hosted by GLBB Japan (Naha) [51.24 km]: 118.566 ms
Testing download speed................................................................................
Download: 4.00 Mbit/s
Testing upload speed......................................................................................................
Upload: 13.19 Mbit/s
$
And this is an example of what it is being returned by speediest.py to my .csv file
Date,Time,Ping,Download (Mbit/s),Upload(Mbit/s),myip
05/30/20,12:47,76.391,12.28,19.43,Biglobe
This is what I want it to return.
Date,Time,Ping,Download (Mbit/s),Upload (Mbit/s),myip
05/30/20,12:31,75.158,14.29,19.54,Biglobe 111.111.111.111
Or may be,
05/30/20,12:31,75.158,14.29,19.54,Biglobe,111.111.111.111
Here is the code that I am using. And thank you for any help you can provide.
import os
import re
import subprocess
import time
response = subprocess.Popen(‘/usr/local/bin/speedtest-cli’, shell=True, stdout=subprocess.PIPE).stdout.read().decode(‘utf-8’)
ping = re.findall(‘km]:\s(.*?)\s’, response, re.MULTILINE)
download = re.findall(‘Download:\s(.*?)\s’, response, re.MULTILINE)
upload = re.findall(‘Upload:\s(.*?)\s’, response, re.MULTILINE)
myip = re.findall(‘from\s(.*?)\s’, response, re.MULTILINE)
ping = ping[0].replace(‘,’, ‘.’)
download = download[0].replace(‘,’, ‘.’)
upload = upload[0].replace(‘,’, ‘.’)
myip = myip[0]
try:
f = open(‘/home/pi/speedtest/speedtestz.csv’, ‘a+’)
if os.stat(‘/home/pi/speedtest/speedtestz.csv’).st_size == 0:
f.write(‘Date,Time,Ping,Download (Mbit/s),Upload (Mbit/s),myip\r\n’)
except:
pass
f.write(‘{},{},{},{},{},{}\r\n’.format(time.strftime(‘%m/%d/%y’), time.strftime(‘%H:%M’), ping, download, upload, myip))
Let me know if this works for you, it should do everything you're looking for
#!/usr/local/env python
import os
import csv
import time
import subprocess
from decimal import *
file_path = '/home/pi/speedtest/speedtestz.csv'
def format_speed(bits_string):
""" changes string bit/s to megabits/s and rounds to two decimal places """
return (Decimal(bits_string) / 1000000).quantize(Decimal('.01'), rounding=ROUND_UP)
def write_csv(row):
""" writes a header row if one does not exist and test result row """
# straight from csv man page
# see: https://docs.python.org/3/library/csv.html
with open(file_path, 'a+', newline='') as csvfile:
writer = csv.writer(csvfile, delimiter=',', quotechar='"')
if os.stat(file_path).st_size == 0:
writer.writerow(['Date','Time','Ping','Download (Mbit/s)','Upload (Mbit/s)','myip'])
writer.writerow(row)
response = subprocess.run(['/usr/local/bin/speedtest-cli', '--csv'], capture_output=True, encoding='utf-8')
# if speedtest-cli exited with no errors / ran successfully
if response.returncode == 0:
# from the csv man page
# "And while the module doesn’t directly support parsing strings, it can easily be done"
# this will remove quotes and spaces vs doing a string split on ','
# csv.reader returns an iterator, so we turn that into a list
cols = list(csv.reader([response.stdout]))[0]
# turns 13.45 ping to 13
ping = Decimal(cols[5]).quantize(Decimal('1.'))
# speedtest-cli --csv returns speed in bits/s, convert to bytes
download = format_speed(cols[6])
upload = format_speed(cols[7])
ip = cols[9]
date = time.strftime('%m/%d/%y')
time = time.strftime('%H:%M')
write_csv([date,time,ping,download,upload,ip])
else:
print('speedtest-cli returned error: %s' % response.stderr)
$/usr/local/bin/speedtest-cli --csv-header > speedtestz.csv
$/usr/local/bin/speedtest-cli --csv >> speedtestz.csv
output:
Server ID,Sponsor,Server Name,Timestamp,Distance,Ping,Download,Upload,Share,IP Address
Does that not get you what you're looking for? Run the first command once to create the csv with header row. Then subsequent runs are done with the append '>>` operator, and that'll add a test result row each time you run it
Doing all of those regexs will bite you if they or a library that they depend on decides to change their debugging output format
Plenty of ways to do it though. Hope this helps

pyzmail: use a loop variable to get a single raw message

First, I should say, I don't really know much about computer programming, but I find Python fairly easy to use for automating simple tasks, thanks to Al Sweigart's book, "Automate the boring stuff."
I want to collect email text bodies. I'm trying to move homework to email to save paper. I thought I could do that by getting the numbers of the unseen mails and just looping through that. If I try that, the IDLE3 shell just becomes unresponsive, ctrl c does nothing, I have to restart the shell.
Question: Why can't I just use a loop variable in server.fetch()??
for msgNum in unseenMessages:
rawMessage = server.fetch([msgNum], ['BODY[]', 'FLAGS'])
It seems you need an actual number like 57, not msgNum, in there, or it won't work.
After looking at various questions and answers here on SO, the following works for me. I suppose it collects all the email bodies in one swoop.
import pyzmail
import pprint
from imapclient import IMAPClient
server = IMAPClient('imap.qq.com', use_uid=True, ssl=True)
server.login('myEmail#foxmail.com', 'myIMAPpassword')
select_info = server.select_folder('Inbox')
unseenMessages = server.search(['UNSEEN'])
rawMessage = server.fetch(unseenMessages, ['BODY[]', 'FLAGS'])
for msgNum in unseenMessages:
message = pyzmail.PyzMessage.factory(rawMessage[msgNum][b'BODY[]'])
text = message.text_part.get_payload().decode(message.text_part.charset)
print('Text' + str(msgNum) + ' = ')
print(text)
I've found this gist with nice and clean code and a page with many helping examples
The main difference between API of imaplib and pyzmail is pyzmail is all-in-one package with parsing and all client-server communication. But these packages are splitted into different packages in standard library. Basically, they both provided almost the same functionality and with the same methods.
As an additional important note here, pyzmail looks quite abandoned.
To save a useful code from that gist, I copy it here as it is with very small modifications like extracting main function (note, it's for Python 3):
#!/usr/bin/env python
#
# Very basic example of using Python 3 and IMAP to iterate over emails in a
# gmail folder/label. This code is released into the public domain.
#
# This script is example code from this blog post:
# http://www.voidynullness.net/blog/2013/07/25/gmail-email-with-python-via-imap/
#
# This is an updated version of the original -- modified to work with Python 3.4.
#
import sys
import imaplib
import getpass
import email
import email.header
import datetime
EMAIL_ACCOUNT = "notatallawhistleblowerIswear#gmail.com"
# Use 'INBOX' to read inbox. Note that whatever folder is specified,
# after successfully running this script all emails in that folder
# will be marked as read.
EMAIL_FOLDER = "Top Secret/PRISM Documents"
def process_mailbox(M):
"""
Do something with emails messages in the folder.
For the sake of this example, print some headers.
"""
rv, data = M.search(None, "ALL")
if rv != 'OK':
print("No messages found!")
return
for num in data[0].split():
rv, data = M.fetch(num, '(RFC822)')
if rv != 'OK':
print("ERROR getting message", num)
return
msg = email.message_from_bytes(data[0][1])
hdr = email.header.make_header(email.header.decode_header(msg['Subject']))
subject = str(hdr)
print('Message %s: %s' % (num, subject))
print('Raw Date:', msg['Date'])
# Now convert to local date-time
date_tuple = email.utils.parsedate_tz(msg['Date'])
if date_tuple:
local_date = datetime.datetime.fromtimestamp(
email.utils.mktime_tz(date_tuple))
print ("Local Date:", \
local_date.strftime("%a, %d %b %Y %H:%M:%S"))
def main(host, login, folder):
with imaplib.IMAP4_SSL(host) as M:
rv, data = M.login(login, getpass.getpass())
print(rv, data)
rv, mailboxes = M.list()
if rv == 'OK':
print("Mailboxes:")
print(mailboxes)
rv, data = M.select(folder)
if rv == 'OK':
print("Processing mailbox...\n")
process_mailbox(M)
else:
print("ERROR: Unable to open mailbox ", rv)
if __name__ == '__main__':
try:
main('imap.gmail.com', EMAIL_ACCOUNT, EMAIL_FOLDER)
except imaplib.IMAP4.error as e:
print('Error while processing mailbox:', e)
sys.exit(1)

How do I run python '__main__' program file from bash prompt in Windows10?

I am trying to run a python3 program file and am getting some unexpected behaviors.
I'll start off first with my PATH and env setup configuration. When I run:
which Python
I get:
/c/Program Files/Python36/python
From there, I cd into the directory where my python program is located to prepare to run the program.
Roughly speaking this is how my python program is set up:
import modulesNeeded
print('1st debug statement to show program execution')
# variables declared as needed
def aFunctionNeeded():
print('2nd debug statement to show fxn exe, never prints')
... function logic...
if __name__ == '__main__':
aFunctionNeeded() # Never gets called
Here is a link to the repository with the code I am working with in case you would like more details as to the implementation. Keep in mind that API keys are not published, but API keys are in local file correctly:
https://github.com/lopezdp/API.Mashups
My question revolves around why my 1st debug statements inside the files are printing to the terminal, but not the 2nd debug statements inside the functions?
This is happening in both of the findRestaurant.py file and the geocode.py file.
I know I have written my if __name__ == '__main__': program entry point correctly as this is the same exact way I have done it for other programs, but in this case I may be missing something that I am not noticing.
If this is my output when I run my program in my bash terminal:
$ python findRestaurant.py
inside geo
inside find
then, why does it appear that my aFunctionNeeded() method shown in my pseudo code is not being called from the main?
Why do both programs seem to fail immediately after the first debug statements are printed to the terminal?
findRestaurant.py File that can also be found in link above
from geocode import getGeocodeLocation
import json
import httplib2
import sys
import codecs
print('inside find')
sys.stdout = codecs.getwriter('utf8')(sys.stdout)
sys.stderr = codecs.getwriter('utf8')(sys.stderr)
foursquare_client_id = "..."
foursquare_client_secret = "..."
def findARestaurant(mealType,location):
print('inside findFxn')
#1. Use getGeocodeLocation to get the latitude and longitude coordinates of the location string.
latitude, longitude = getGeocodeLocation(location)
#2. Use foursquare API to find a nearby restaurant with the latitude, longitude, and mealType strings.
#HINT: format for url will be something like https://api.foursquare.com/v2/venues/search?client_id=CLIENT_ID&client_secret=CLIENT_SECRET&v=20130815&ll=40.7,-74&query=sushi
url = ('https://api.foursquare.com/v2/venues/search?client_id=%s&client_secret=%s&v=20130815&ll=%s,%s&query=%s' % (foursquare_client_id, foursquare_client_secret,latitude,longitude,mealType))
h = httplib2.Http()
result = json.loads(h.request(url,'GET')[1])
if result['response']['venues']:
#3. Grab the first restaurant
restaurant = result['response']['venues'][0]
venue_id = restaurant['id']
restaurant_name = restaurant['name']
restaurant_address = restaurant['location']['formattedAddress']
address = ""
for i in restaurant_address:
address += i + " "
restaurant_address = address
#4. Get a 300x300 picture of the restaurant using the venue_id (you can change this by altering the 300x300 value in the URL or replacing it with 'orginal' to get the original picture
url = ('https://api.foursquare.com/v2/venues/%s/photos?client_id=%s&v=20150603&client_secret=%s' % ((venue_id,foursquare_client_id,foursquare_client_secret)))
result = json.loads(h.request(url, 'GET')[1])
#5. Grab the first image
if result['response']['photos']['items']:
firstpic = result['response']['photos']['items'][0]
prefix = firstpic['prefix']
suffix = firstpic['suffix']
imageURL = prefix + "300x300" + suffix
else:
#6. if no image available, insert default image url
imageURL = "http://pixabay.com/get/8926af5eb597ca51ca4c/1433440765/cheeseburger-34314_1280.png?direct"
#7. return a dictionary containing the restaurant name, address, and image url
restaurantInfo = {'name':restaurant_name, 'address':restaurant_address, 'image':imageURL}
print ("Restaurant Name: %s" % restaurantInfo['name'])
print ("Restaurant Address: %s" % restaurantInfo['address'])
print ("Image: %s \n" % restaurantInfo['image'])
return restaurantInfo
else:
print ("No Restaurants Found for %s" % location)
return "No Restaurants Found"
if __name__ == '__main__':
findARestaurant("Pizza", "Tokyo, Japan")
geocode.py File that can also be found in link above
import httplib2
import json
print('inside geo')
def getGeocodeLocation(inputString):
print('inside of geoFxn')
# Use Google Maps to convert a location into Latitute/Longitute coordinates
# FORMAT: https://maps.googleapis.com/maps/api/geocode/json?address=1600+Amphitheatre+Parkway,+Mountain+View,+CA&key=API_KEY
google_api_key = "..."
locationString = inputString.replace(" ", "+")
url = ('https://maps.googleapis.com/maps/api/geocode/json?address=%s&key=%s' % (locationString, google_api_key))
h = httplib2.Http()
result = json.loads(h.request(url,'GET')[1])
latitude = result['results'][0]['geometry']['location']['lat']
longitude = result['results'][0]['geometry']['location']['lng']
return (latitude,longitude)
The reason you're not seeing the output of the later parts of your code is that you've rebound the standard output and error streams with these lines:
sys.stdout = codecs.getwriter('utf8')(sys.stdout)
sys.stderr = codecs.getwriter('utf8')(sys.stderr)
I'm not exactly sure why those lines are breaking things for you, perhaps your console does not expect utf8 encoded output... But because they don't work as intended, you're not seeing anything from the rest of your code, including error messages, since you rebound the stderr stream along with the stdout stream.

Getting file input into Python script for praw script

So I have a simple reddit bot set up which I wrote using the praw framework. The code is as follows:
import praw
import time
import numpy
import pickle
r = praw.Reddit(user_agent = "Gets the Daily General Thread from subreddit.")
print("Logging in...")
r.login()
words_to_match = ['sdfghm']
cache = []
def run_bot():
print("Grabbing subreddit...")
subreddit = r.get_subreddit("test")
print("Grabbing thread titles...")
threads = subreddit.get_hot(limit=10)
for submission in threads:
thread_title = submission.title.lower()
isMatch = any(string in thread_title for string in words_to_match)
if submission.id not in cache and isMatch:
print("Match found! Thread ID is " + submission.id)
r.send_message('FlameDraBot', 'DGT has been posted!', 'You are awesome!')
print("Message sent!")
cache.append(submission.id)
print("Comment loop finished. Restarting...")
# Run the script
while True:
run_bot()
time.sleep(20)
I want to create a file (text file or xml, or something else) using which the user can change the fields for the various information being queried. For example I want a file with lines such as :
Words to Search for = sdfghm
Subreddit to Search in = text
Send message to = FlameDraBot
I want the info to be input from fields, so that it takes the value after Words to Search for = instead of the whole line. After the information has been input into the file and it has been saved. I want my script to pull the information from the file, store it in a variable, and use that variable in the appropriate functions, such as:
words_to_match = ['sdfghm']
subreddit = r.get_subreddit("test")
r.send_message('FlameDraBot'....
So basically like a config file for the script. How do I go about making it so that my script can take input from a .txt or another appropriate file and implement it into my code?
Yes, that's just a plain old Python config, which you can implement in an ASCII file, or else YAML or JSON.
Create a subdirectory ./config, put your settings in ./config/__init__.py
Then import config.
Using PEP-18 compliant names, the file ./config/__init__.py would look like:
search_string = ['sdfghm']
subreddit_to_search = 'text'
notify = ['FlameDraBot']
If you want more complicated config, just read the many other posts on that.

expected string or buffer when processing text file with python

I'm processing a large (120mb) text file from my thunderbird imap directory and attempting to extract to/from info from the headers using mbox and regex. the process runs for a while until I eventually get an exception: "TypeError: expected string or buffer".
The exception references the fifth line of this code:
PAT_EMAIL = re.compile(r"[0-9A-Za-z._-]+\#[0-9A-Za-z._-]+")
temp_list = []
mymbox = mbox("data.txt")
for email in mymbox.values():
from_address = PAT_EMAIL.findall(email["from"])
to_address = PAT_EMAIL.findall(email["to"])
for item in from_address:
temp_list.append(item) #items are added to a temporary list where they are sorted then written to file
I've run the code on other (smaller) files, so I'm guessing the issue is my file. The file appears to be just a bunch of text. Can someone point me in the write direction for debugging this?
There can only be one from address (I think!):
In the following:
from_address = PAT_EMAIL.findall(email["from"])
I have a feeling you're trying to duplicate the work of email.message_from_file and email.utils.parseaddr
from email.utils import parseaddr
>>> s = "Jon Clements <jon#example.com>"
>>> from email.utils import parseaddr
>>> parseaddr(s)
('Jon Clements', 'jon#example.com')
So you can use parseaddr(email['from'])[1] to get the email address and use that.
Similarly, you may wish to look at email.utils.getaddresses to handle to and cc addresses...
Well, I didn't solve the issue but have worked around it for my own purposes. I inserted a try statement so that the iteration just continues past any TypeError. For every thousand email addresses I'm getting about 8 failures, which will suffice. Thanks for your input!
PAT_EMAIL = re.compile(r"[0-9A-Za-z._-]+\#[0-9A-Za-z._-]+")
temp_list = []
mymbox = mbox("data.txt")
for email in mymbox.values():
try:
from_address = PAT_EMAIL.findall(email["from"])
except(TypeError):
print "TypeError!"
try:
to_address = PAT_EMAIL.findall(email["to"])
except(TypeError):
print "TypeError!"
for item in from_address:
temp_list.append(item) #items are added to a temporary list where they are sorted then written to file

Categories

Resources