Improve performance when running the same function several times - python

I'm currently coding an app based on Python which runs the same function several times but never with the same arguments.
The goal of the app is to send PowerShell commands to a server via pywinrm module. Here a simplified code snippet which updates permissions on a Windows folder:
server = input('Please provide a server: ')
path = input('Provide the path of the folder: ')
permissions = ['Read', 'ReadAndExecute', 'List', 'Write', 'Modify']
run_ps_cmd(server, path, permissions[0])
run_ps_cmd(server, path, permissions[1])
run_ps_cmd(server, path, permissions[2])
run_ps_cmd(server, path, permissions[3])
run_ps_cmd(server, path, permissions[4])
For now, it takes 20 seconds to run and I would like to improve the performance but I don't know how. Which direction should I take? Parallelism, threading, multiprocessing, etc...?
Thanks a lot.

To get them to just run in parallel, I would simply add them all to a multiprocessing pool.
from multiprocessing import Pool
from functools import partial
server = input('Please provide a server: ')
path = input('Provide the path of the folder: ')
permissions = ['Read', 'ReadAndExecute', 'List', 'Write', 'Modify']
run_ps = partial(run_ps_cmd, (server, path))
# required due to how Pool.map only takes an interable
# to pass each item in it to the function.
p = Pool(processes=5)
results = p.map(run_ps, permissions)
p.terminate()
p.join()
In Python 3 it supports being run as a context manager:
with Pool(processes=5) as p:
results = p.map(run_ps, permissions)
I would caution against running in parallel without actually knowing the code of run_ps_cmd or if a threading pool would suffice if it is I/O bound (in which case switch out Pool for ThreadPool from from multiprocessing.pool import ThreadPool.)
Someone's random article with more information http://chriskiehl.com/article/parallelism-in-one-line/
Hope this helps!

Related

Running into a Multithreading issue connecting to multiple devices at the same time

I am defining the main function with def get_info(). This function doesn't take the arguments. This program uses argumentParser to parse the arguments from the command line. The argument provided is the CSV file with --csv option. This picks up the csv file from the current directory and read the lines each containing an IP address, logs into devices serially and runs few commands return the output and appends in the text file. When the code runs it removes the old text file from the directory and create a new output text file when executed.
Problem: I want to achieve this using threading module so that it takes 5 devices in parallel and outputs to a file. The problem I am running is with the lock issues as the same object is being used by same process at the same time. Here the sample code I have written. The threading concept is very new to me so please understand.
import getpass
import csv
import time
import os
import netmiko
import paramiko
from argparse import ArgumentParser
from multiprocessing import Process, Queue
def get_ip(device_ip,output_q):
try:
ssh_session = netmiko.ConnectHandler(device_type='cisco_ios', ip=device_row['device_ip'],
username=ssh_username, password=ssh_password)
time.sleep(2)
ssh_session.clear_buffer()
except (netmiko.ssh_exception.NetMikoTimeoutException,
netmiko.ssh_exception.NetMikoAuthenticationException,
paramiko.ssh_exception.SSHException) as s_error:
print(s_error)
def main():
show_vlanfile = "pool.txt"
if os.path.isfile(show_vlanfile):
try:
os.remove(show_vlanfile)
except OSError as e:
print("Error: %s - %s." %(e.filename, e.strerror))
parser = ArgumentParser(description='Arguments for running oneLiner.py')
parser.add_argument('-c', '--csv', required=True, action='store', help='Location of CSV file')
args = parser.parse_args()
ssh_username = input("SSH username: ")
ssh_password = getpass.getpass('SSH Password: ')
with open(args.csv, "r") as file:
reader = csv.DictReader(file)
output_q = Queue(maxsize=5)
procs = []
for device_row in reader:
# print("+++++ {0} +++++".format(device_row['device_ip']))
my_proc = Process(target=show_version_queue, args=(device_row, output_q))
my_proc.start()
procs.append(my_proc)
# Make sure all processes have finished
for a_proc in procs:
a_proc.join()
commands = ["terminal length 0","terminal width 511","show run | inc hostname","show ip int brief | ex una","show
vlan brief","terminal length 70"]
output = ''
for cmd in commands:
output += "\n"
output += ssh_session.send_command(cmd)
output += "\n"
with open("pool.txt", 'a') as outputfile:
while not output_q.empty():
output_queue = output_q.get()
for x in output_queue:
outputfile.write(x)
if name == "main":
main()
Somewhat different take...
I run effectively a main task, and then just fire up a (limited) number of threads; and they communicate via 2 data queues - basically "requests" and "responses".
Main task
dumps the requests into the request queue.
fires up a number (i.e. 10 or so...) worker tasks.
sits on the "response" queue waiting for results. The results can be simple user info messages about status, error messages, or DATA responses to be written out to files.
When all the threads finish, program shuts down.
Workers basically:
get a request. If none, just shut down
connect to the device
send a log message to the response queue that it's started.
does what it has to do.
puts the result as DATA to the response queue
closes the connection to the device
loop back to the start
This way you don't inadvertently flood the processing host, as you have a limited number of concurrent threads going, all doing exactly the same thing in their own sweet time, until there's nothing left to do.
Note that you DON'T do any screen/file IO in the threads, as it will get jumbled with the different tasks running at the same time. Each essentially only sees inputQ, outputQ, and the Netmiko sessions that get cycled through.
It looks like you have code that is from a Django example:
def main():
'''
Use threads and Netmiko to connect to each of the devices in the database.
'''
devices = NetworkDevice.objects.all()
You should move the argument parsing into the main thread. You should read the CSV file in the main thread. You should have each child thread be the Netmiko-SSH connection.
I say this as your current solution--has all of the SSH connections happen in one thread which is not what you intended.
At a high-level main() should have argument parsing, delete old output file, obtain username/password (assuming these are the same for all the devices), loop over CSV file obtaining the IP address for each device.
Once you have the IP address, then you create a thread, the thread uses Netmiko-SSH to connect to each device and retrieve your output. I would then use a Queue to pass back the output from each device (back to the main thread).
Then back in the main thread, you would write all of the output to a single file.
It would look a bit like this:
https://github.com/ktbyers/netmiko/blob/develop/examples/use_cases/case16_concurrency/threads_netmiko.py
Here is an example using a queue (with multiprocessing) though you can probably adapt this using a thread-Queue pretty easily.
https://github.com/ktbyers/netmiko/blob/develop/examples/use_cases/case16_concurrency/processes_netmiko_queue.py

Python subprocess in .exe

I'm creating a python script that will copy files and folder over the network. it's cross-platform so I make an .exe file using cx_freeze
I used Popen method of the subprocess module
if I run .py file it is running as expected but when i create .exe subprocess is not created in the system
I've gone through all documentation of subprocess module but I didn't find any solution
everything else (I am using Tkinter that also works fine) is working in the .exe accept subprocess.
any idea how can I call subprocess in .exe.file ??
This file is calling another .py file
def start_scheduler_action(self, scheduler_id, scheduler_name, list_index):
scheduler_detail=db.get_scheduler_detail_using_id(scheduler_id)
for detail in scheduler_detail:
source_path=detail[2]
if not os.path.exists(source_path):
showerror("Invalid Path","Please select valid path", parent=self.new_frame)
return
self.forms.new_scheduler.start_scheduler_button.destroy()
#Create stop scheduler button
if getattr(self.forms.new_scheduler, "stop_scheduler_button", None)==None:
self.forms.new_scheduler.stop_scheduler_button = tk.Button(self.new_frame, text='Stop scheduler', width=10, command=lambda:self.stop_scheduler_action(scheduler_id, scheduler_name, list_index))
self.forms.new_scheduler.stop_scheduler_button.grid(row=11, column=1, sticky=E, pady=10, padx=1)
scheduler_id=str(scheduler_id)
# Get python paths
if sys.platform == "win32":
proc = subprocess.Popen(['where', "python"], env=None, stdout=subprocess.PIPE)
else:
proc = subprocess.Popen(['which', "python"], env=None,stdout=subprocess.PIPE)
out, err = proc.communicate()
if err or not out:
showerror("", "Python not found", parent=self.new_frame)
else:
try:
paths = out.split(os.pathsep)
# Create python path
python_path = (paths[len(paths) - 1]).split('\n')[0]
cmd = os.path.realpath('scheduler.py')
#cmd='scheduler.py'
if sys.platform == "win32":
python_path=python_path.splitlines()
else:
python_path=python_path
# Run the scheduler file using scheduler id
proc = subprocess.Popen([python_path, cmd, scheduler_id], env=None, stdout=subprocess.PIPE)
message="Started the scheduler : %s" %(scheduler_name)
showinfo("", message, parent=self.new_frame)
#Add process id to scheduler table
process_id=proc.pid
#showinfo("pid", process_id, parent=self.new_frame)
def get_process_id(name):
child = subprocess.Popen(['pgrep', '-f', name], stdout=subprocess.PIPE, shell=False)
response = child.communicate()[0]
return [int(pid) for pid in response.split()]
print(get_process_id(scheduler_name))
# Add the process id in database
self.db.add_process_id(scheduler_id, process_id)
# Add the is_running status in database
self.db.add_status(scheduler_id)
except Exception as e:
showerror("", e)
And this file is called:
def scheduler_copy():
date= strftime("%m-%d-%Y %H %M %S", localtime())
logFile = scheduler_name + "_"+scheduler_id+"_"+ date+".log"
#file_obj=open(logFile, 'w')
# Call __init__ method of xcopy file
xcopy=XCopy(connection_ip, username , password, client_name, server_name, domain_name)
check=xcopy.connect()
# Cretae a log file for scheduler
file_obj=open(logFile, 'w')
if check is False:
file_obj.write("Problem in connection..Please check connection..!!")
return
scheduler_next_run=schedule.next_run()
scheduler_next_run="Next run at: " +str(scheduler_next_run)
# If checkbox_value selected copy all the file to new directory
if checkbox_value==1:
new_destination_path=xcopy.create_backup_directory(share_folder, destination_path, date)
else:
new_destination_path=destination_path
# Call backup method for coping data from source to destination
try:
xcopy.backup(share_folder, source_path, new_destination_path, file_obj, exclude)
file_obj.write("Scheduler completed successfully..\n")
except Exception as e:
# Write the error message of the scheduler to log file
file_obj.write("Scheduler failed to copy all data..\nProblem in connection..Please check connection..!!\n")
# #file_obj.write("Error while scheduling")
# return
# Write the details of scheduler to log file
file_obj.write("Total skipped unmodified file:")
file_obj.write(str(xcopy.skipped_unmodified_count))
file_obj.write("\n")
file_obj.write("Total skipped file:")
file_obj.write(str(xcopy.skipped_file))
file_obj.write("\n")
file_obj.write("Total copied file:")
file_obj.write(str(xcopy.copy_count))
file_obj.write("\n")
file_obj.write("Total skipped folder:")
file_obj.write(str(xcopy.skipped_folder))
file_obj.write("\n")
# file_obj.write(scheduler_next_run)
file_obj.close()
There is some awkwardness in your source code, but I won't spend time on that. For instance, if you want to find the source_path, it's better to use a for loop with break/else:
for detail in scheduler_detail:
source_path = detail[2]
break # found
else:
# not found: raise an exception
...
Some advice:
Try to separate the user interface code and the sub-processing, avoid mixing the two.
Use exceptions and exception handlers.
If you want portable code: avoid system call (there are no pgrep on Windows).
Since your application is packaged in a virtualenv (I make the assumption cx_freeze does this kind of thing), you have no access to the system-wide Python. You even don't have that on Windows. So you need to use the packaged Python (this is a best practice anyway).
If you want to call a Python script like a subprocess, that means you have two packaged applications: you need to create an exe for the main application and for the scheduler.py script. But, that's not easy to communicate with it.
Another solution is to use multiprocessing to spawn a new Python process. Since you don't want to wait for the end of processing (which may be long), you need to create daemon processes. The way to do that is explained in the multiprocessing module.
Basically:
import time
from multiprocessing import Process
def f(name):
print('hello', name)
if __name__ == '__main__':
p = Process(target=f, args=('bob',))
p.daemon = True
p.start()
# let it live and die, don't call: `p.join()`
time.sleep(1)
Of course, we need to adapt that with your problem.
Here is how I would do that (I removed UI-related code for clarity):
import scheduler
class SchedulerError(Exception):
pass
class YourClass(object):
def start_scheduler_action(self, scheduler_id, scheduler_name, list_index):
scheduler_detail = db.get_scheduler_detail_using_id(scheduler_id)
for detail in scheduler_detail:
source_path = detail[2]
break
else:
raise SchedulerError("Invalid Path", "Missing source path", parent=self.new_frame)
if not os.path.exists(source_path):
raise SchedulerError("Invalid Path", "Please select valid path", parent=self.new_frame)
p = Process(target=scheduler.scheduler_copy, args=('source_path',))
p.daemon = True
p.start()
self.db.add_process_id(scheduler_id, p.pid)
To check if your process is still running, I recommend you to use psutil. It's really a great tool!
You can define your scheduler.py script like that:
def scheduler_copy(source_path):
...
Multiprocessing vs Threading Python
Quoting this answer: https://stackoverflow.com/a/3044626/1513933
The threading module uses threads, the multiprocessing module uses processes. The difference is that threads run in the same memory space, while processes have separate memory. This makes it a bit harder to share objects between processes with multiprocessing. Since threads use the same memory, precautions have to be taken or two threads will write to the same memory at the same time. This is what the global interpreter lock is for.
Here, the advantage of multiprocessing over multithreading is that you can kill (or terminate) a process; you can't kill a thread. You may need psutil for that.
This is not an exact solution you are looking for, but following suggestion should be preferred for two reasons.
These are more pythonic way
subprocess is slightly expensive
Suggestions you can consider
Don't use subprocess for fetching system path. Try check os.getenv('PATH') to get env variable & try to find if python is in the path. For windows, one has to manually add Python path or else you can directly check in Program Files I guess
For checking process ID's you can try psutils. A wonderful answer is provided here at how do I get the process list in Python?
Calling another script from a python script. This does not look cool. Not bad, but I would not prefer this at all.
In above code, line - if sys.platform == "win32": has same value in if and else condition ==> you dont need a conditional statement here.
You wrote pretty fine working code to tell you. Keep Coding!
If you want to run a subprocess in an exe file, then you can use
import subprocess
program=('example')
arguments=('/command')
subprocess.call([program, arguments])

Django not inserting data into database when I execute subprocess.Popen

I am using django and inserting data into database and downloading images. When I call the function it works fine but it blocks the main thread. To execute the process in the background I'm using :
views.py:
class get_all_data(View):
def post(self, request, *args, **kwargs):
subprocess.Popen(
['python get_images.py --data=data'],
close_fds=True,
shell = True
)
But when I call:
python get_images.py(data="data")
it works fine but it runs on the main thread.
How can I fix it ? BTW I'm using python 2.7.
Note: Please don't suggest Celery . I'm looking for something which runs the task async or any other alternative.
I'm using the following code to insert into my db:
get_images.py:
from models import Test
def get_image_all():
#get data from server
insert_to_Test(data="data")
if __name__ == "__main__":
import argparse
from .models import Test
import django
django.setup()
parser = argparse.ArgumentParser()
parser.add_argument('--from_date')
parser.add_argument('--data')
parser.add_argument('--execute', type=bool, default=False)
args = parser.parse_args()
print "args", args
get_image_all(**vars(args))
When I call subprocess.Popen then it does not insert data into my db but If I execute it by call the function then it inserts the data to db. Why does that happen?
First, import and use Django models after django.setup()
Second, make sure the environment the spawn process runs in is correct, mainly make sure you use the right Python interpreter in the right virtualenv (if you use it), and make sure the process runs in the correct working directory
Sample code:
if __name__ == "__main__":
import argparse
import django
django.setup()
from .models import Test
# Parse args
# Use Django models
Next you need to make sure subprocess runs in the correct working directory, and activates your Python virtualenv (if any), roughly something like below:
subprocess.Popen(
'/path/to/virtualenv/bin/python get_images.py --data=data',
cwd='/path/to/project/working_directory/',
shell=True
)
Shouldn't it be
subprocess.Popen(
['python get_images.py(data=data)'],
close_fds=True,
shell = True
)

How can python wait for a batch SGE script finish execution?

I have a problem I'd like you to help me to solve.
I am working in Python and I want to do the following:
call an SGE batch script on a server
see if it works correctly
do something
What I do now is approx the following:
import subprocess
try:
tmp = subprocess.call(qsub ....)
if tmp != 0:
error_handler_1()
else:
correct_routine()
except:
error_handler_2()
My problem is that once the script is sent to SGE, my python script interpret it as a success and keeps working as if it finished.
Do you have any suggestion about how could I make the python code wait for the actual processing result of the SGE script ?
Ah, btw I tried using qrsh but I don't have permission to use it on the SGE
Thanks!
From your code you want the program to wait for job to finish and return code, right? If so, the qsub sync option is likely what you want:
http://gridscheduler.sourceforge.net/htmlman/htmlman1/qsub.html
Additional Answer for an easier processing:
By using the python drmaa module : link which allows a more complete processing with SGE.
A functioning code provided in the documentation is here: [provided you put a sleeper.sh script in the same directory]
please notice that the -b n option is needed to execute a .sh script, otherwise it expects a binary by default like explained here
import drmaa
import os
def main():
"""Submit a job.
Note, need file called sleeper.sh in current directory.
"""
s = drmaa.Session()
s.initialize()
print 'Creating job template'
jt = s.createJobTemplate()
jt.remoteCommand = os.getcwd()+'/sleeper.sh'
jt.args = ['42','Simon says:']
jt.joinFiles=False
jt.nativeSpecification ="-m abe -M mymail -q so-el6 -b n"
jobid = s.runJob(jt)
print 'Your job has been submitted with id ' + jobid
retval = s.wait(jobid, drmaa.Session.TIMEOUT_WAIT_FOREVER)
print('Job: {0} finished with status {1}'.format(retval.jobId, retval.hasExited))
print 'Cleaning up'
s.deleteJobTemplate(jt)
s.exit()
if __name__=='__main__':
main()

Python, How to break out of multiple threads

I am following one of the examples in a book I am reading ("Violent Python"). It is to create a zip file password cracker from a dictionary. I have two questions about it. First it says to thread it as I have written in the code to increase performance but when I timed it (I know time.time() is not great for timing) there was about a twelve second difference in favor of not threading. Is this because it is taking longer to start the threads? Second if I do it without the threads I can break as soon as the correct value is found by printing the result and the entering the statement exit(0). Is there a way to get the same result using threading so that if I find the result I am looking for I can end all other threads simultaneously?
import zipfile
from threading import Thread
import time
def extractFile(z, password, starttime):
try:
z.extractall(pwd=password)
except:
pass
else:
z.close()
print('PWD IS ' + password)
print(str(time.time()-starttime))
def main():
start = time.time()
z = zipfile.ZipFile('test.zip')
pwdfile = open('words.txt')
pwds = pwdfile.read()
pwdfile.close()
for pwd in pwds.splitlines():
t = Thread(target=extractFile, args=(z, pwd, start))
t.start()
#extractFile(z, pwd, start)
print(str(time.time()-start))
if __name__ == '__main__':
main()
In CPython, the Global Interpreter Lock ("GIL") enforces the restriction that only one thread at a time can execute Python bytecode.
So in this application, it is probably better to use the map method of a multiprocessing.Pool, since every try is independant of the others;
import multiprocessing
import zipfile
def tryfile(password):
rv = passwd
with zipfile.ZipFile('test.zip') as z:
try:
z.extractall(pwd=password)
except:
rv = None
return rv
with open('words.txt') as pwdfile:
data = pwdfile.read()
pwds = data.split()
p = multiprocessing.Pool()
results = p.map(tryfile, pwds)
results = [r for r in results if r is not None]
This will start (by default) as many processes as your computer has cores. If will keep running tryfile() with a different passwords in these processes until the list pwds is exhausted, gather the results and return them. The last list comprehension is to discard the None results.
Note that this code could be improved to stop shut down the map once the password is found. You'd probably have to use map_async and a shared variable in that case. It would also be nice to load the zipfile only once and share that.
This code is slow because python has a Global Interpreter Lock, which means only one thread can execute at a time. This causes multithreaded code to run slower than serial code in Python. If you want to create a truly multithreaded application, you'd have to use the Multiprocessing Module.
To break out of the threads and get the return value, you can use os._exit(1) First, import the os module at the top of your file:
import os
Then, change your extractFile function to use os._exit(1):
def extractFile(z, password, starttime):
try:
z.extractall(pwd=password)
except:
pass
else:
z.close()
print('PWD IS ' + password)
print(str(time.time()-starttime))
os._exit(1)

Categories

Resources