Python: Corrupted output from script - python

I have the following Python script which I use to ping a list of IPs passed on the command line.
#! /usr/bin/python
import sys, time
from threading import Thread
import subprocess
from Queue import Queue
num_threads = 255
queue = Queue()
p=0
f=0
t=0
def timestamp():
lt = time.localtime(time.time())
return "%02d.%02d.%04d %02d:%02d:%02d" % (lt[2], lt[1], lt[0], lt[3], lt[4], lt[5])
def pinger(i, q):
global p
global f
while True:
ip = q.get()
ret = subprocess.call("ping -c 2 %s" % ip,
shell=True,
stdout=open('/dev/null', 'w'),
stderr=subprocess.STDOUT)
if ret == 0:
print(ip+"\tpassed")
time.sleep(0.1)
p+=1
else:
print(ip+"\tfailed")
time.sleep(0.1)
f+=1
q.task_done()
for i in range(num_threads):
worker = Thread(target=pinger, args=(i, queue))
worker.setDaemon(True)
worker.start()
print("\nStarted at "+timestamp())
for ip in open(sys.argv[1]).readlines():
ip=ip.strip()
queue.put(ip)
t+=1
queue.join()
print(str(t) + " IPs pinged, " + str(p) + " passed, " + str(f) + " failed\n")
print("\nFinished at "+timestamp())
For the most part the output is perfect like this, tab separated ready for Excel
192.168.188.1 failed
192.168.199.107 passed
192.168.7.2 passed
192.168.199.108 failed
But every time, and at random, I get problems with the output for example odd indent on a line
192.168.164.173 failed
192.168.164.190 failed
Or odd indent in one field - here the IP is indented but not the message
172.29.19.132 failed
or concatenation followed by a blank line
172.29.9.37 passed172.29.19.133 passed
172.29.9.39 passed
And combinations of these not always the same IPs.
I've ensured the ip list is clean, and tried reducing the thread count, introducing delays to printing, and flushing STDOUT to no avail. Any ideas please?

Related

use multiprocessing or threading script python

i want use multiprocessing or threading for this code
and I would like to control the threading example threads = (50)
if i chose 50 theards is opend 50 process
please help me
this code :-
import subprocess
import csv
def ping(hostname):
p = subprocess.Popen(["ping", "-n", "1","-w","1000",hostname], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
pingStatus = 'ok';
for line in p.stdout:
output = line.rstrip().decode('UTF-8')
if (output.endswith('unreachable.')) :
#No route from the local system. Packets sent were never put on the wire.
pingStatus = 'unreacheable'
break
elif (output.startswith('Ping request could not find host')) :
pingStatus = 'host_not_found'
break
if (output.startswith('Request timed out.')) :
#No Echo Reply messages were received within the default time of 1 second.
pingStatus = 'timed_out'
break
#end if
#endFor
return pingStatus
#endDef
def printPingResult(hostname):
statusOfPing = ping(hostname)
if (statusOfPing == 'host_not_found') :
writeToFile('!server-not-found.txt', hostname)
elif (statusOfPing == 'unreacheable') :
writeToFile('!unreachable.txt', hostname)
elif (statusOfPing == 'timed_out') :
writeToFile('!timed_out.txt', hostname)
elif (statusOfPing == 'ok') :
writeToFile('!ok.txt', hostname)
#endIf
#endPing
def writeToFile(filename, data) :
with open(filename, 'a') as output:
output.write(data + '\n')
#endWith
#endDef
'''
servers.txt example
vm8558
host2
server873
google.com
'''
file = open('hosts.txt')
try:
reader = csv.reader(file)
for item in reader:
printPingResult(item[0].strip())
#endFor
finally:
file.close()
#endTry
i want use multiprocessing or threading for this code
and I would like to control the threading example threads = (50)
if i chose 50 theards is opend 50 process
You may want to use pool object from python multiprocessing module.
from https://docs.python.org/2/library/multiprocessing.html
Pool object which offers a convenient means of parallelizing the execution of a function across multiple input values, distributing the input data across processes (data parallelism). The following example demonstrates the common practice of defining such functions in a module so that child processes can successfully import that module.
Here is a simple example that demonstrates multiprocessing for ping calls.
Hope this helps.
EDIT1
**host.txt**
google.com
yahoo.com
microsoft.com
cnn.com
stackoverflow.com
github.com
CODE
from multiprocessing import Pool
import subprocess
def ping(hostname):
return hostname, subprocess.call(['ping', '-c', '3', '-w', '1000', hostname])
if __name__ == '__main__':
HOSTFILE = 'server.txt'
POOLCOUNT = 5
#read host name file and load to list
hostfile = open(HOSTFILE, 'r')
hosts = [line.strip() for line in hostfile.readlines()]
#Create pool
p = Pool(POOLCOUNT)
#multiprocess and map ping function to host list
print(p.map(ping, hosts))
Result
Status 1= sucess, 0 = failure
>>>
[('google.com', 1), ('yahoo.com', 1), ('microsoft.com', 1), ('cnn.com', 1), ('stackoverflow.com', 1), ('github.com', 1)]

Display Popen.communicate() in real time [duplicate]

I have a python subprocess that I'm trying to read output and error streams from. Currently I have it working, but I'm only able to read from stderr after I've finished reading from stdout. Here's what it looks like:
process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout_iterator = iter(process.stdout.readline, b"")
stderr_iterator = iter(process.stderr.readline, b"")
for line in stdout_iterator:
# Do stuff with line
print line
for line in stderr_iterator:
# Do stuff with line
print line
As you can see, the stderr for loop can't start until the stdout loop completes. How can I modify this to be able to read from both in the correct order the lines come in?
To clarify: I still need to be able to tell whether a line came from stdout or stderr because they will be treated differently in my code.
The code in your question may deadlock if the child process produces enough output on stderr (~100KB on my Linux machine).
There is a communicate() method that allows to read from both stdout and stderr separately:
from subprocess import Popen, PIPE
process = Popen(command, stdout=PIPE, stderr=PIPE)
output, err = process.communicate()
If you need to read the streams while the child process is still running then the portable solution is to use threads (not tested):
from subprocess import Popen, PIPE
from threading import Thread
from Queue import Queue # Python 2
def reader(pipe, queue):
try:
with pipe:
for line in iter(pipe.readline, b''):
queue.put((pipe, line))
finally:
queue.put(None)
process = Popen(command, stdout=PIPE, stderr=PIPE, bufsize=1)
q = Queue()
Thread(target=reader, args=[process.stdout, q]).start()
Thread(target=reader, args=[process.stderr, q]).start()
for _ in range(2):
for source, line in iter(q.get, None):
print "%s: %s" % (source, line),
See:
Python: read streaming input from subprocess.communicate()
Non-blocking read on a subprocess.PIPE in python
Python subprocess get children's output to file and terminal?
Here's a solution based on selectors, but one that preserves order, and streams variable-length characters (even single chars).
The trick is to use read1(), instead of read().
import selectors
import subprocess
import sys
p = subprocess.Popen(
["python", "random_out.py"], stdout=subprocess.PIPE, stderr=subprocess.PIPE
)
sel = selectors.DefaultSelector()
sel.register(p.stdout, selectors.EVENT_READ)
sel.register(p.stderr, selectors.EVENT_READ)
while True:
for key, _ in sel.select():
data = key.fileobj.read1().decode()
if not data:
exit()
if key.fileobj is p.stdout:
print(data, end="")
else:
print(data, end="", file=sys.stderr)
If you want a test program, use this.
import sys
from time import sleep
for i in range(10):
print(f" x{i} ", file=sys.stderr, end="")
sleep(0.1)
print(f" y{i} ", end="")
sleep(0.1)
The order in which a process writes data to different pipes is lost after write.
There is no way you can tell if stdout has been written before stderr.
You can try to read data simultaneously from multiple file descriptors in a non-blocking way
as soon as data is available, but this would only minimize the probability that the order is incorrect.
This program should demonstrate this:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os
import select
import subprocess
testapps={
'slow': '''
import os
import time
os.write(1, 'aaa')
time.sleep(0.01)
os.write(2, 'bbb')
time.sleep(0.01)
os.write(1, 'ccc')
''',
'fast': '''
import os
os.write(1, 'aaa')
os.write(2, 'bbb')
os.write(1, 'ccc')
''',
'fast2': '''
import os
os.write(1, 'aaa')
os.write(2, 'bbbbbbbbbbbbbbb')
os.write(1, 'ccc')
'''
}
def readfds(fds, maxread):
while True:
fdsin, _, _ = select.select(fds,[],[])
for fd in fdsin:
s = os.read(fd, maxread)
if len(s) == 0:
fds.remove(fd)
continue
yield fd, s
if fds == []:
break
def readfromapp(app, rounds=10, maxread=1024):
f=open('testapp.py', 'w')
f.write(testapps[app])
f.close()
results={}
for i in range(0, rounds):
p = subprocess.Popen(['python', 'testapp.py'], stdout=subprocess.PIPE
, stderr=subprocess.PIPE)
data=''
for (fd, s) in readfds([p.stdout.fileno(), p.stderr.fileno()], maxread):
data = data + s
results[data] = results[data] + 1 if data in results else 1
print 'running %i rounds %s with maxread=%i' % (rounds, app, maxread)
results = sorted(results.items(), key=lambda (k,v): k, reverse=False)
for data, count in results:
print '%03i x %s' % (count, data)
print
print "=> if output is produced slowly this should work as whished"
print " and should return: aaabbbccc"
readfromapp('slow', rounds=100, maxread=1024)
print
print "=> now mostly aaacccbbb is returnd, not as it should be"
readfromapp('fast', rounds=100, maxread=1024)
print
print "=> you could try to read data one by one, and return"
print " e.g. a whole line only when LF is read"
print " (b's should be finished before c's)"
readfromapp('fast', rounds=100, maxread=1)
print
print "=> but even this won't work ..."
readfromapp('fast2', rounds=100, maxread=1)
and outputs something like this:
=> if output is produced slowly this should work as whished
and should return: aaabbbccc
running 100 rounds slow with maxread=1024
100 x aaabbbccc
=> now mostly aaacccbbb is returnd, not as it should be
running 100 rounds fast with maxread=1024
006 x aaabbbccc
094 x aaacccbbb
=> you could try to read data one by one, and return
e.g. a whole line only when LF is read
(b's should be finished before c's)
running 100 rounds fast with maxread=1
003 x aaabbbccc
003 x aababcbcc
094 x abababccc
=> but even this won't work ...
running 100 rounds fast2 with maxread=1
003 x aaabbbbbbbbbbbbbbbccc
001 x aaacbcbcbbbbbbbbbbbbb
008 x aababcbcbcbbbbbbbbbbb
088 x abababcbcbcbbbbbbbbbb
This works for Python3 (3.6):
p = subprocess.Popen(cmd, stdout=subprocess.PIPE,
stderr=subprocess.PIPE, universal_newlines=True)
# Read both stdout and stderr simultaneously
sel = selectors.DefaultSelector()
sel.register(p.stdout, selectors.EVENT_READ)
sel.register(p.stderr, selectors.EVENT_READ)
ok = True
while ok:
for key, val1 in sel.select():
line = key.fileobj.readline()
if not line:
ok = False
break
if key.fileobj is p.stdout:
print(f"STDOUT: {line}", end="")
else:
print(f"STDERR: {line}", end="", file=sys.stderr)
from https://docs.python.org/3/library/subprocess.html#using-the-subprocess-module
If you wish to capture and combine both streams into one, use
stdout=PIPE and stderr=STDOUT instead of capture_output.
so the easiest solution would be:
process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
stdout_iterator = iter(process.stdout.readline, b"")
for line in stdout_iterator:
# Do stuff with line
print line
I know this question is very old, but this answer may help others who stumble upon this page in researching a solution for a similar situation, so I'm posting it anyway.
I've built a simple python snippet that will merge any number of pipes into a single one. Of course, as stated above, the order cannot be guaranteed, but this is as close as I think you can get in Python.
It spawns a thread for each of the pipes, reads them line by line and puts them into a Queue (which is FIFO). The main thread loops through the queue, yielding each line.
import threading, queue
def merge_pipes(**named_pipes):
r'''
Merges multiple pipes from subprocess.Popen (maybe other sources as well).
The keyword argument keys will be used in the output to identify the source
of the line.
Example:
p = subprocess.Popen(['some', 'call'],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
outputs = {'out': log.info, 'err': log.warn}
for name, line in merge_pipes(out=p.stdout, err=p.stderr):
outputs[name](line)
This will output stdout to the info logger, and stderr to the warning logger
'''
# Constants. Could also be placed outside of the method. I just put them here
# so the method is fully self-contained
PIPE_OPENED=1
PIPE_OUTPUT=2
PIPE_CLOSED=3
# Create a queue where the pipes will be read into
output = queue.Queue()
# This method is the run body for the threads that are instatiated below
# This could be easily rewritten to be outside of the merge_pipes method,
# but to make it fully self-contained I put it here
def pipe_reader(name, pipe):
r"""
reads a single pipe into the queue
"""
output.put( ( PIPE_OPENED, name, ) )
try:
for line in iter(pipe.readline,''):
output.put( ( PIPE_OUTPUT, name, line.rstrip(), ) )
finally:
output.put( ( PIPE_CLOSED, name, ) )
# Start a reader for each pipe
for name, pipe in named_pipes.items():
t=threading.Thread(target=pipe_reader, args=(name, pipe, ))
t.daemon = True
t.start()
# Use a counter to determine how many pipes are left open.
# If all are closed, we can return
pipe_count = 0
# Read the queue in order, blocking if there's no data
for data in iter(output.get,''):
code=data[0]
if code == PIPE_OPENED:
pipe_count += 1
elif code == PIPE_CLOSED:
pipe_count -= 1
elif code == PIPE_OUTPUT:
yield data[1:]
if pipe_count == 0:
return
This works for me (on windows):
https://github.com/waszil/subpiper
from subpiper import subpiper
def my_stdout_callback(line: str):
print(f'STDOUT: {line}')
def my_stderr_callback(line: str):
print(f'STDERR: {line}')
my_additional_path_list = [r'c:\important_location']
retcode = subpiper(cmd='echo magic',
stdout_callback=my_stdout_callback,
stderr_callback=my_stderr_callback,
add_path_list=my_additional_path_list)

how to add non blocking stderr capture to threaded popen's

I've got a python 3 script i use to backup and encrypt mysqldump files and im having a particular issues with one database that is 67gb after encryption & compression.
The mysqldump is outputting errorcode 3, so i'd like to catch the actual error message, as this could mean a couple of things.
The random thing is the backup file is the right size, so not sure what the error means. it worked once on this database...
the code looks like the below and i'd really appreciate some help on how to add non-blocking capture of stderr when the return code is anything but 0 for both p1 and p2.
Also, if im doing anything glaringly obvious wrong, please do let me know, as i'd like to make sure this is a reliable process. it has been working fine on my databases under 15gb compressed.
def dbbackup():
while True:
item = q.get()
#build up folder structure, daily, weekly, monthy & project
genfile = config[item]['DBName'] + '-' + dateyymmdd + '-'
genfile += config[item]['PubKey'] + '.sql.gpg'
if os.path.isfile(genfile):
syslog.syslog(item + ' ' + genfile + ' exists, removing')
os.remove(genfile)
syslog.syslog(item + ' will be backed up as ' + genfile)
args = ['mysqldump', '-u', config[item]['UserNm'],
'-p' + config[item]['Passwd'], '-P', config[item]['Portnu'],
'-h', config[item]['Server']]
args.extend(config[item]['MyParm'].split())
args.append(config[item]['DBName'])
p1 = subprocess.Popen(args, stdout=subprocess.PIPE)
p2 = subprocess.Popen(['gpg', '-o', genfile, '-r',
config[item]['PubKey'], '-z', '9', '--encrypt'], stdin=p1.stdout)
p2.wait()
if p2.returncode == 0:
syslog.syslog(item + ' encryption successful')
else:
syslog.syslog(syslog.LOG_CRIT, item + ' encryption failed '+str(p2.returncode))
p1.terminate()
p1.wait()
if p1.returncode == 0:
#does some uploads of the file etc..
else:
syslog.syslog(syslog.LOG_CRIT, item + ' extract failed '+str(p1.returncode))
q.task_done()
def main():
db2backup = []
for settingtest in config:
db2backup.append(settingtest)
if len(db2backup) >= 1:
syslog.syslog('Backups started')
for database in db2backup:
q.put(database)
syslog.syslog(database + ' added to backup queue')
q.join()
syslog.syslog('Backups finished')
q = queue.Queue()
config = configparser.ConfigParser()
config.read('backup.cfg')
backuptype = 'daily'
dateyymmdd = datetime.datetime.now().strftime('%Y%m%d')
for i in range(2):
t = threading.Thread(target=dbbackup)
t.daemon = True
t.start()
if __name__ == '__main__':
main()
Simplify your code:
avoid unnecessary globals, pass parameters to the corresponding functions instead
avoid reimplementing a thread pool (it hurts readability and it misses convience features accumulated over the years).
The simplest way to capture stderr is to use stderr=PIPE and .communicate() (blocking call):
#!/usr/bin/env python3
from configparser import ConfigParser
from datetime import datetime
from multiprocessing.dummy import Pool
from subprocess import Popen, PIPE
def backup_db(item, conf): # config[item] == conf
"""Run `mysqldump ... | gpg ...` command."""
genfile = '{conf[DBName]}-{now:%Y%m%d}-{conf[PubKey]}.sql.gpg'.format(
conf=conf, now=datetime.now())
# ...
args = ['mysqldump', '-u', conf['UserNm'], ...]
with Popen(['gpg', ...], stdin=PIPE) as gpg, \
Popen(args, stdout=gpg.stdin, stderr=PIPE) as db_dump:
gpg.communicate()
error = db_dump.communicate()[1]
if gpg.returncode or db_dump.returncode:
error
def main():
config = ConfigParser()
with open('backup.cfg') as file: # raise exception if config is unavailable
config.read_file(file)
with Pool(2) as pool:
pool.starmap(backup_db, config.items())
if __name__ == "__main__":
main()
NOTE: no need to call db_dump.terminate() if gpg dies prematurely: mysqldump dies when it tries to write something to the closed gpg.stdin.
If there are huge number of items in the config then you could use pool.imap() instead of pool.starmap() (the call should be modified slightly).
For robustness, wrap backup_db() function to catch and log all exceptions.

process stop working while queue is not empty

I try to write a script in python to convert url into its corresponding ip. Since the url file is huge (nearly 10GB), so I'm trying to use multiprocessing lib.
I create one process to write output to file and a set of processes to convert url.
Here is my code:
import multiprocessing as mp
import socket
import time
num_processes = mp.cpu_count()
sentinel = None
def url2ip(inqueue, output):
v_url = inqueue.get()
print 'v_url '+v_url
try:
v_ip = socket.gethostbyname(v_url)
output_string = v_url+'|||'+v_ip+'\n'
except:
output_string = v_url+'|||-1'+'\n'
print 'output_string '+output_string
output.put(output_string)
print output.full()
def handle_output(output):
f_ip = open("outputfile", "a")
while True:
output_v = output.get()
if output_v:
print 'output_v '+output_v
f_ip.write(output_v)
else:
break
f_ip.close()
if __name__ == '__main__':
output = mp.Queue()
inqueue = mp.Queue()
jobs = []
proc = mp.Process(target=handle_output, args=(output, ))
proc.start()
print 'run in %d processes' % num_processes
for i in range(num_processes):
p = mp.Process(target=url2ip, args=(inqueue, output))
jobs.append(p)
p.start()
for line in open('inputfile','r'):
print 'ori '+line.strip()
inqueue.put(line.strip())
for i in range(num_processes):
# Send the sentinal to tell Simulation to end
inqueue.put(sentinel)
for p in jobs:
p.join()
output.put(None)
proc.join()
However, it did not work. It did produce several outputs (4 out of 10 urls in the test file) but it just suddenly stops while queues are not empty (I did check queue.empty())
Could anyone suggest what's wrong?Thanks
You're workers exit after processing a single url each, they need to loop internally until they get the sentinel. However, you should probably just look at multiprocessing.pool instead, as that does the bookkeeping for you.

Confusing on subprocess.Popen

This problem makes me confused
I just want to run 1 command on 18 different input file so I've wrote it like
while filenames or running:
while filenames and len(running) < N_CORES:
filename = filenames.pop(0)
print 'Submiting process for %s' % filename
cmd = COMMAND % dict(filename=filename, localdir=localdir)
p = subprocess.Popen(cmd, shell=True)
print 'running:', cmd
running.append((cmd, p))
i = 0
while i < len(running):
(cmd, p) = running[i]
ret = p.poll()
if ret is not None:
rep = open('Crux.report.%d' % (report_number), 'w')
rep.write('Command: %s' % cmd)
print localdir
print 'done!'
report_number += 1
running.remove((cmd, p))
else:
i += 1
time.sleep(1)
But when I've run it after 3 hours all of the process going to Sleep mode.
But if I call the command from terminal manually (for all of the different files), all of them have been Ok.
Any help would be appreciate.
I assume you want to run 18 processes (one process per file) with no more than N_CORES processes in parallel.
The simplest way could be to use multiprocessing.Pool here:
import multiprocessing as mp
import subprocess
def process_file(filename):
try:
return filename, subprocess.call([cmd, filename], cwd=localdir)
except OSError:
return filename, None # failed to start subprocess
if __name__ == "__main__":
pool = mp.Pool()
for result in pool.imap_unordered(process_file, filenames):
# report result here
Whithout knowing what your subprocesses are supposed to do and how long they are supposed to be running it's hard to give an accurate answer here.
Some problems I see with your program:
you check for i < len(running), while incrementing i and removing from running.
Either use a counter or check if the list still contains elements, but don't do both at the same time. This way you will break out of the loop halfway.
you increment i each time a process has not finished, you probably want to increment if a process has finished.

Categories

Resources