Python - Loop through log files with some logic

Python - Loop through log files with some logic - python

Good Morning,
I have taken on a small project, solely for the purposes of learning python.
So far the script will ssh to another server, to obtain a list of nodes which are down. I want to change this though, to have it store a list of down nodes in a tmp file, and the next day compare one to the other and only work with nodes which are down that weren't down yesterday. But that part can wait...
The issue I'm seeing at the moment is searching for various strings in a number of log files, but if the line count for a particular node exceeds a certain number; rather than being sent to the terminal.. a message is sent instead saying "too many log entries; entries save to /tmp/..
Here's what I have so far, but it doesn't really do what I want.
Also, if you have any other advice for my script, I would be infinitely grateful. I am learning, but it's sinking in slowly! :)
#!/usr/bin/python
#
from subprocess import *
import sys
from glob import glob
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('-f', metavar='logname', help='logfile to check')
args = parser.parse_args()
ssh = Popen(["ssh", "root#srv1", 'check_nodes'],
shell=False,
stdout=PIPE,
stderr=PIPE)
result = ssh.stdout.readlines()
down_nodes=[]
status_list=["down", "admindown"]
if result == []:
error = ssh.stderr.readlines()
print >>sys.stderr, "ERROR: %s" % error
else:
for line in result[1:]:
columns = line.split()
status = columns[3]
nodes = columns[2]
if status in status_list:
down_nodes.append(nodes)
if args.f:
logs = args.f
else:
try:
logs = glob("/var/log/*")
except IndexError:
print "Something is wrong. Logs not available."
sys.exit(1)
valid_errors = ["error", "MCE", "CATERR"]
for log in logs:
with open("log", "r") as tmp_log:
opn_log = tmp_log.readlines()
for line in open_log:
for down_nodes in open_log:
if valid_errors in open_log:
print valid_errors
What I have so far, sort of works in testing.. but it just finds the errors in valid_errors and doesn't find lines that have both down_node and valid_errors in the same time. Also, with a date.. maybe something like, lines in a log that contain down_node, valid_errors and contains a date string of less than 3 days or something.
As from Friday - I hadn't used Python for anything! I've worked only with shell scripts and always found that a bash script it perfect for what I need. So I am a beginner... :)
Thanks
Jon

I've broken down my specific question if it makes it clearer.. Essentially at the moment, I'm just trying to find a line in a bunch of log files that contains any of the down_nodes AND any of the valid_errors my code:
for node in down_nodes:
for log in logs:
with open(log) as log1:
open_log = log1.readlines()
for line in open_log:
if node + valid_errors in line
print line

Related

Record Error from CMD in output with Python

I am trying to run the below code - and what it does is run the commands in a file line by line and extract the results from the cmd into a new file. the command looks something like this 'ping (host name)' with many hosts and a line for each host.
some hosts fail in the cmd, as in it cannot find a response - usually when that happens the code breaks, so that is why I have the try and except below. but I am struggling to make the except section record the failed items in the same document (if possible).
so for example if ping (host name3) failed - I want it to record that message and store in the file.
If you have a better way of doing all of this please let me know!
command_path = pathlib.Path(r"path to the file with commands")
command_file = command_path.joinpath('command file.txt')
commands = command_file.read_text().splitlines()
#print(commands)
try:
for command in commands:
#Args = command.split()
#print(f"/Running: {Args[0]}")
outputfile = subprocess.check_output(command)
print(outputfile.decode("utf-8"))
results_path = command_path.joinpath(f"Passed_Results.txt")
results = open(results_path, "a")
results.write('\n' + outputfile.decode("utf-8"))
results.close()
except:
#this is where I need help.

I got a response on a different question that I was able to augment into this. I essentially broke my entire code down and rewrote as follows. this worked for me, However if you are able to provide insight on a faster processing time for this. PLEASE LET ME KNOW
cmds_file = pathlib.Path(r"C:\Users'path to file here').joinpath("Newfile.txt")
output_file = pathlib.Path(r"C:\Users'path to file
here').joinpath("All_Results.txt")
with open(cmds_file, encoding="utf-8") as commands, open(output_file, "w",
encoding="utf-8") as output:
for command in commands:
command = shlex.split(command)
output.write(f"\n# {shlex.join(command)}\n")
output.flush()
subprocess.run(command, stdout=output, encoding="utf-8")

python checking file changes without reading the full file

I have a web app (in the backend) where I am using pysondb (https://github.com/pysonDB/pysonDB) to upload some tasks which will be executed by another program (sniffer).
The sniffer program (a completely separate program) now checks the database for any new unfinished uploaded tasks in an infinite loop and executes them and updates the database.
I don't want to read the database repeatedly, instead want to look for any file changes in the database file (db.json), then read the database only. I have looked into watchdog but was looking for something lightweight and modern to suit my needs.
# infinite loop
import pysondb
import time
from datetime import datetime
# calling aligner with os.system
import os
import subprocess
from pathlib import Path
while True:
# always alive
time.sleep(2)
try:
# process files
db = pysondb.getDb("../tasks_db.json")
tasks = db.getBy({"task_status": "uploaded"})
for task in tasks:
try:
task_path = task["task_path"]
cost = task["cost"]
corpus_folder = task_path
get_output = subprocess.Popen(f"mfa validate {corpus_folder} english english", shell=True, stdout=subprocess.PIPE).stdout
res = get_output.read().decode("utf-8")
# print(type(res))
if "ERROR - There was an error in the run, please see the log." in res:
# log errors
f = open("sniffer_log.error", "a+")
f.write(f"{datetime.now()} :: {str(res)}\n")
f.close()
else:
align_folder = f"{corpus_folder}_aligned"
Path(align_folder).mkdir(parents=True, exist_ok=True)
o = subprocess.Popen(f"mfa align {corpus_folder} english english {align_folder}", shell=True, stdout=subprocess.PIPE).stdout.read().decode("utf-8")
# success
except subprocess.CalledProcessError:
# mfa align ~/mfa_data/my_corpus english english ~/mfa_data/my_corpus_aligned
# log errors
f = open("sniffer_log.error", "a+")
f.write(f"{datetime.now()} :: Files not in right format\n")
f.close()
except Exception as e:
# log errors
f = open("sniffer_log.error", "a+")
f.write(f"{datetime.now()} :: {e}\n")
f.close()

Using python-rq would be a much more efficient way of doing this that wouldn't need a database. It has no requirements other then needing a redis install. From there, you could just move all of that into a function:
def task(task_path, cost):
corpus_folder = task_path
get_output = subprocess.Popen(f"mfa validate {corpus_folder} english english", shell=True, stdout=subprocess.PIPE).stdout
res = get_output.read().decode("utf-8")
# print(type(res))
if "ERROR - There was an error in the run, please see the log." in res:
# log errors
f = open("sniffer_log.error", "a+")
f.write(f"{datetime.now()} :: {str(res)}\n")
... #etc
Obviously you would rename that function and put the try-except statement back, but then you could just call that through RQ:
# ... where you want to call the function
from wherever.you.put.your.task.function import task
result = your_redis_queue.enqueue(task, "whatever", "arguments)

Get local DNS settings in Python

Is there any elegant and cross platform (Python) way to get the local DNS settings?
It could probably work with a complex combination of modules such as platform and subprocess, but maybe there is already a good module, such as netifaces which can retrieve it in low-level and save some "reinventing the wheel" effort.
Less ideally, one could probably query something like dig, but I find it "noisy", because it would run an extra request instead of just retrieving something which exists already locally.
Any ideas?

Using subprocess you could do something like this, in a MacBook or Linux system
import subprocess
process = subprocess.Popen(['cat', '/etc/resolv.conf'],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
stdout, stderr = process.communicate()
print(stdout, stderr)
or do something like this
import subprocess
with open('dns.txt', 'w') as f:
process = subprocess.Popen(['cat', '/etc/resolv.conf'], stdout=f)
The first output will go to stdout and the second to a file

Maybe this one will solve your problem
import subprocess
def get_local_dns(cmd_):
with open('dns1.txt', 'w+') as f:
with open('dns_log1.txt', 'w+') as flog:
try:
process = subprocess.Popen(cmd_, stdout=f, stderr=flog)
except FileNotFoundError as e:
flog.write(f"Error while executing this command {str(e)}")
linux_cmd = ['cat', '/etc/resolv.conf']
windows_cmd = ['windows_command', 'parameters']
commands = [linux_cmd, windows_cmd]
if __name__ == "__main__":
for cmd in commands:
get_local_dns(cmd)

Thanks #MasterOfTheHouse.
I ended up writing my own function. It's not so elegant, but it does the job for now. There's plenty of room for improvement, but well...
import os
import subprocess
def get_dns_settings()->dict:
# Initialize the output variables
dns_ns, dns_search = [], ''
# For Unix based OSs
if os.path.isfile('/etc/resolv.conf'):
for line in open('/etc/resolv.conf','r'):
if line.strip().startswith('nameserver'):
nameserver = line.split()[1].strip()
dns_ns.append(nameserver)
elif line.strip().startswith('search'):
search = line.split()[1].strip()
dns_search = search
# If it is not a Unix based OS, try "the Windows way"
elif os.name == 'nt':
cmd = 'ipconfig /all'
raw_ipconfig = subprocess.check_output(cmd)
# Convert the bytes into a string
ipconfig_str = raw_ipconfig.decode('cp850')
# Convert the string into a list of lines
ipconfig_lines = ipconfig_str.split('\n')
for n in range(len(ipconfig_lines)):
line = ipconfig_lines[n]
# Parse nameserver in current line and next ones
if line.strip().startswith('DNS-Server'):
nameserver = ':'.join(line.split(':')[1:]).strip()
dns_ns.append(nameserver)
next_line = ipconfig_lines[n+1]
# If there's too much blank at the beginning, assume we have
# another nameserver on the next line
if len(next_line) - len(next_line.strip()) > 10:
dns_ns.append(next_line.strip())
next_next_line = ipconfig_lines[n+2]
if len(next_next_line) - len(next_next_line.strip()) > 10:
dns_ns.append(next_next_line.strip())
elif line.strip().startswith('DNS-Suffix'):
dns_search = line.split(':')[1].strip()
return {'nameservers': dns_ns, 'search': dns_search}
print(get_dns_settings())
By the way... how did you manage to write two answers with the same account?

Python Command Line Arguments Try/Except

I want to create a program that will take two command line arguments. The first being the name of a file to open for parsing and the second the flag -s. If the user provides the wrong number of arguments or the other argument is not -s then it will print the message "Usage: [-s] file_name" and terminate the program using exit.
Next, I want my program to attempt to open the file for reading. The program should open the file read each line and return a count of every float, integer, and other kinds of strings that are not ints or floats. However, if opening the file fails it should raise an exception and print "Unable to open [filename]" and quit using exit.
I've been looking up lots of stuff on the internet about command lines in Python but I've ended up more confused. So here's my attempt at it so far from what I've researched.
from optparse import OptionParser
def command_line():
parser = OptionParser()
parser.add_option("--file", "-s")
options, args = parser.parse_args()
if options.a and obtions.b:
parser.error("Usage: [-s] file_name")
exit
def read_file():
#Try:
#Open input file
#Except:
#print "Unable to open [filename]"
#Exit

from optparse import OptionParser
import sys,os
def command_line():
parser = OptionParser("%prog [-s] file_name")
parser.add_option("-s",dest="filename",
metavar="file_name",help="my help message")
options, args = parser.parse_args()
if not options.filename:
parser.print_help()
sys.exit()
return options.filename
def read_file(fn):
if os.path.isfile(fn):
typecount = {}
with open(fn) as f:
for line in f:
for i in line.split()
try:
t = type(eval(i))
except NameError:
t = type(i)
if t in typecount:
typecount[t] += 1
else:
typecount[t] = 1
else:
print( "Unable to open {}".format(fn))
sys.exit()
print(typecount)
read_file(command_line())
So step by step:
options.a is not defined unless you define an option --a or (preferably) set dest="a".
using the built-in parser.print_help() is better than making your own, you get -h/--help for free then.
you never called your function command_line, therefore never getting any errors, as the syntax was correct. I set the commandline to pass only the filename as a return value, but there are better ways of doing this for when you have more options/arguments.
When it comes to read_file, instead of using try-except for the file I recommend using os.path.isfile which will check whether the file exists. This does not check that the file has the right format though.
We then create a dictionary of types, then loop over all lines and evaluate objects which are separated by whitespace(space,newline,tab). If your values are separated by eg. a comma, you need to use line.split(',').
If you want to use the counts later in your script, you might want to return typecount instead of printing it.

tail multiple logfiles in python

This is probably a bit of a silly excercise for me, but it raises a bunch of interesting questions. I have a directory of logfiles from my chat client, and I want to be notified using notify-osd every time one of them changes.
The script that I wrote basically uses os.popen to run the linux tail command on every one of the files to get the last line, and then check each line against a dictionary of what the lines were the last time it ran. If the line changed, it used pynotify to send me a notification.
This script actually worked perfectly, except for the fact that it used a huge amount of cpu (probably because it was running tail about 16 times every time the loop ran, on files that were mounted over sshfs.)
It seems like something like this would be a great solution, but I don't see how to implement that for more than one file.
Here is the script that I wrote. Pardon my lack of comments and poor style.
Edit: To clarify, this is all linux on a desktop.

Not even looking at your source code, there are two ways you could easily do this more efficiently and handle multiple files.
Don't bother running tail unless you have to. Simply os.stat all of the files and record the last modified time. If the last modified time is different, then raise a notification.
Use pyinotify to call out to Linux's inotify facility; this will have the kernel do option 1 for you and call back to you when any files in your directory change. Then translate the callback into your osd notification.
Now, there might be some trickiness depending on how many notifications you want when there are multiple messages and whether you care about missing a notification for a message.
An approach that preserves the use of tail would be to instead use tail -f. Open all of the files with tail -f and then use the select module to have the OS tell you when there's additional input on one of the file descriptors open for tail -f. Your main loop would call select and then iterate over each of the readable descriptors to generate notifications. (You could probably do this without using tail and just calling readline() when it's readable.)
Other areas of improvement in your script:
Use os.listdir and native Python filtering (say, using list comprehensions) instead of a popen with a bunch of grep filters.
Update the list of buffers to scan periodically instead of only doing it at program boot.
Use subprocess.popen instead of os.popen.

If you're already using the pyinotify module, it's easy to do this in pure Python (i.e. no need to spawn a separate process to tail each file).
Here is an example that is event-driven by inotify, and should use very little cpu. When IN_MODIFY occurs for a given path we read all available data from the file handle and output any complete lines found, buffering the incomplete line until more data is available:
import os
import select
import sys
import pynotify
import pyinotify
class Watcher(pyinotify.ProcessEvent):
def __init__(self, paths):
self._manager = pyinotify.WatchManager()
self._notify = pyinotify.Notifier(self._manager, self)
self._paths = {}
for path in paths:
self._manager.add_watch(path, pyinotify.IN_MODIFY)
fh = open(path, 'rb')
fh.seek(0, os.SEEK_END)
self._paths[os.path.realpath(path)] = [fh, '']
def run(self):
while True:
self._notify.process_events()
if self._notify.check_events():
self._notify.read_events()
def process_default(self, evt):
path = evt.pathname
fh, buf = self._paths[path]
data = fh.read()
lines = data.split('\n')
# output previous incomplete line.
if buf:
lines[0] = buf + lines[0]
# only output the last line if it was complete.
if lines[-1]:
buf = lines[-1]
lines.pop()
# display a notification
notice = pynotify.Notification('%s changed' % path, '\n'.join(lines))
notice.show()
# and output to stdout
for line in lines:
sys.stdout.write(path + ': ' + line + '\n')
sys.stdout.flush()
self._paths[path][1] = buf
pynotify.init('watcher')
paths = sys.argv[1:]
Watcher(paths).run()
Usage:
% python watcher.py [path1 path2 ... pathN]

Simple pure python solution (not the best, but doesn't fork, spits out 4 empty lines after idle period and marks everytime the source of the chunk, if changed):
#!/usr/bin/env python
from __future__ import with_statement
'''
Implement multi-file tail
'''
import os
import sys
import time
def print_file_from(filename, pos):
with open(filename, 'rb') as fh:
fh.seek(pos)
while True:
chunk = fh.read(8192)
if not chunk:
break
sys.stdout.write(chunk)
def _fstat(filename):
st_results = os.stat(filename)
return (st_results[6], st_results[8])
def _print_if_needed(filename, last_stats, no_fn, last_fn):
changed = False
#Find the size of the file and move to the end
tup = _fstat(filename)
# print tup
if last_stats[filename] != tup:
changed = True
if not no_fn and last_fn != filename:
print '\n<%s>' % filename
print_file_from(filename, last_stats[filename][0])
last_stats[filename] = tup
return changed
def multi_tail(filenames, stdout=sys.stdout, interval=1, idle=10, no_fn=False):
S = lambda (st_size, st_mtime): (max(0, st_size - 124), st_mtime)
last_stats = dict((fn, S(_fstat(fn))) for fn in filenames)
last_fn = None
last_print = 0
while 1:
# print last_stats
changed = False
for filename in filenames:
if _print_if_needed(filename, last_stats, no_fn, last_fn):
changed = True
last_fn = filename
if changed:
if idle > 0:
last_print = time.time()
else:
if idle > 0 and last_print is not None:
if time.time() - last_print >= idle:
last_print = None
print '\n' * 4
time.sleep(interval)
if '__main__' == __name__:
from optparse import OptionParser
op = OptionParser()
op.add_option('-F', '--no-fn', help="don't print filename when changes",
default=False, action='store_true')
op.add_option('-i', '--idle', help='idle time, in seconds (0 turns off)',
type='int', default=10)
op.add_option('--interval', help='check interval, in seconds', type='int',
default=1)
opts, args = op.parse_args()
try:
multi_tail(args, interval=opts.interval, idle=opts.idle,
no_fn=opts.no_fn)
except KeyboardInterrupt:
pass

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python - Loop through log files with some logic - python

Related

Record Error from CMD in output with Python

python checking file changes without reading the full file

Get local DNS settings in Python

Python Command Line Arguments Try/Except

tail multiple logfiles in python

Categories

Resources