Python: Parallel execution pysphere commands

Python: Parallel execution pysphere commands - python

My current for loop does 1 by 1 removing snapshots from my 16 VMs
for vmName in vmList:
snapshots = vmServer.get_vm_by_name(vmName).get_snapshots()
for i in range(len(snapshots)-3):
snapshotName = snapshots[i].get_name()
print "Deleting snapshot " + snapshotName + " of " + vmName
vmServer.get_vm_by_name(vmName).delete_named_snapshot(snapshotName)
I need to run it in parallel(so it wouldn't wait finish of previous job to start next one)
I was trying to apply "multiprocessing", here's full code:
import argparse
from pysphere import VIServer # Tested with vCenter Server 5.5.0 and pysphere package 0.1.7
from CONFIG import * # Contains username and password for vCenter connection, list of VM names to take snapshot
from multiprocessing.pool import ThreadPool as Pool
def purgeSnapshotStage(vmList):
# Connect to vCenter
vmServer = VIServer()
vmServer.connect("VM_ADDRESS", username, password)
snapshots = vmServer.get_vm_by_name(vmName).get_snapshots()
for i in range(len(snapshots) - 3):
snapshotName = snapshots[i].get_name()
print "Deleting snapshot " + snapshotName + " of VM: " + vmName
vmServer.get_vm_by_name(vmName).delete_named_snapshot(snapshotName)
vmServer.disconnect()
# Get the environment to delete snapshot from command line
parser = argparse.ArgumentParser(description="Take snapshot of VMs for stage or stage2")
parser.add_argument('env', choices=("stage", "stage2", "stage3"), help="Valid value stage or stage2 or stage3")
env = parser.parse_args().env
vmList = globals()[env + "VmList"]
pool_size = 5 # your "parallelness"
pool = Pool(pool_size)
for vmName in vmList:
pool.apply_async(purgeSnapshotStage, (vmList,))
pool.close()
pool.join()
But there is a mistake, because it's trying to execute "remove" command only on last one.
Didn't find good guide about multiprocessing, and can't find how to debug it.
Need help to find mistake.

You have an error in here:
for vmName in vmList:
pool.apply_async(purgeSnapshotStage, (vmList,))
It should be:
for vmName in vmList:
pool.apply_async(purgeSnapshotStage, (vmName,))
And then in your function header you need this:
def purgeSnapshotStage(vmList):
Then, there might be other errors in your code.
Generally: I doubt that parallelizing this might give you any performance benefit. Your bottleneck will be the vmware server. It will not be faster when you start many delete jobs at the same time.

Related

pxssh does not work between compute nodes in a slurm cluster

I'm using the following script for connecting two compute nodes in a slurm cluster.
from getpass import getuser
from socket import gethostname
from pexpect import pxssh
import sys
python = sys.executable
worker_command = "%s -m worker" % python + " %i " + server_socket
pid = 0
children = []
for node, ntasks in node_list.items():
if node == gethostname():
continue
if node != gethostname():
pid_range = range(pid, pid + ntasks)
pid += ntasks
ssh = pxssh.pxssh()
ssh.login(node, getuser())
for worker in pid_range:
ssh.sendline(worker_command % worker + '&')
children.append(ssh)
node_list is a dictionary {'cn000': 28, 'cn001': 28}. worker is a python file placed in the working dictionary.
I expect ssh.sendline to be the same as pexpect.spawn. However, nothing happened after I ran the script.
Although an ssh session was built by ssh.login(node, getuser()), it seems the line ssh.sendline(worker_command % worker) has no effect, because the script to be run by worker_command is not run.
How can I fix this? Or should I try something else?
How can I create one socket on one compute node and connect it with a socket on another compute node?

There is missing a '%s' from the content of worker_command. It contains something like this: "/usr/bin/python3 -m worker" -> worker_command%worker should result in error.
If not (it is possible, because this source looks like a short part of the original program), then add ">>workerprocess.log 2>&1" string before '&', then try to run your program and take a look at workerprocess.log on the server! If your $HOME is writable on the server, you should find the error message(s) in it.

Improve performance when running the same function several times

I'm currently coding an app based on Python which runs the same function several times but never with the same arguments.
The goal of the app is to send PowerShell commands to a server via pywinrm module. Here a simplified code snippet which updates permissions on a Windows folder:
server = input('Please provide a server: ')
path = input('Provide the path of the folder: ')
permissions = ['Read', 'ReadAndExecute', 'List', 'Write', 'Modify']
run_ps_cmd(server, path, permissions[0])
run_ps_cmd(server, path, permissions[1])
run_ps_cmd(server, path, permissions[2])
run_ps_cmd(server, path, permissions[3])
run_ps_cmd(server, path, permissions[4])
For now, it takes 20 seconds to run and I would like to improve the performance but I don't know how. Which direction should I take? Parallelism, threading, multiprocessing, etc...?
Thanks a lot.

To get them to just run in parallel, I would simply add them all to a multiprocessing pool.
from multiprocessing import Pool
from functools import partial
server = input('Please provide a server: ')
path = input('Provide the path of the folder: ')
permissions = ['Read', 'ReadAndExecute', 'List', 'Write', 'Modify']
run_ps = partial(run_ps_cmd, (server, path))
# required due to how Pool.map only takes an interable
# to pass each item in it to the function.
p = Pool(processes=5)
results = p.map(run_ps, permissions)
p.terminate()
p.join()
In Python 3 it supports being run as a context manager:
with Pool(processes=5) as p:
results = p.map(run_ps, permissions)
I would caution against running in parallel without actually knowing the code of run_ps_cmd or if a threading pool would suffice if it is I/O bound (in which case switch out Pool for ThreadPool from from multiprocessing.pool import ThreadPool.)
Someone's random article with more information http://chriskiehl.com/article/parallelism-in-one-line/
Hope this helps!

How do I count the number of line in a FTP file without downloading it locally while using Python

So I need to be able to read and count the number of lines from a FTP server WITHOUT downloading it to my local machine while using Python.
I know the code to connect to the server:
ftp = ftplib.FTP('example.com') //Object ftp set as server address
ftp.login ('username' , 'password') // Login info
ftp.retrlines('LIST') // List file directories
ftp.cwd ('/parent folder/another folder/file/') //Change file directory
I also know the basic code to count the number of line If it is already downloaded/stored locally :
with open('file') as f:
... count = sum(1 for line in f)
... print (count)
I just need to know how to connect these 2 pieces of code without having to download the file to my local system.
Any help is appreciated.
Thank You

As far as i know FTP doesn't provide any kind of functionality to read the file content without actually downloading it. However you could try using something like Is it possible to read FTP files without writing them using Python?
(You haven't specified what python you are using)
#!/usr/bin/env python
from ftplib import FTP
def countLines(s):
print len(s.split('\n'))
ftp = FTP('ftp.kernel.org')
ftp.login()
ftp.retrbinary('RETR /pub/README_ABOUT_BZ2_FILES', countLines)
Please take this code as a reference only

There is a way: I adapted a piece of code that I created for processes csv files "on the fly". Is implement by producer-consumer problem approach. Apply this pattern allows us to assign each task to a thread (or process) and show partial results for huge remote files. You can adapt it for ftp requests.
Download stream is saved in queue and is consumed "on the fly". No HDD extra space is needed and memory efficient. Tested in Python 3.5.2 (vanilla) on Fedora Core 25 x86_64.
This is the source adapted for ftp (over http) retrieve:
from threading import Thread, Event
from queue import Queue, Empty
import urllib.request,sys,csv,io,os,time;
import argparse
FILE_URL = 'http://cdiac.ornl.gov/ftp/ndp030/CSV-FILES/nation.1751_2010.csv'
def download_task(url,chunk_queue,event):
CHUNK = 1*1024
response = urllib.request.urlopen(url)
event.clear()
print('%% - Starting Download - %%')
print('%% - ------------------ - %%')
'''VT100 control codes.'''
CURSOR_UP_ONE = '\x1b[1A'
ERASE_LINE = '\x1b[2K'
while True:
chunk = response.read(CHUNK)
if not chunk:
print('%% - Download completed - %%')
event.set()
break
chunk_queue.put(chunk)
def count_task(chunk_queue, event):
part = False
time.sleep(5) #give some time to producer
M=0
contador = 0
'''VT100 control codes.'''
CURSOR_UP_ONE = '\x1b[1A'
ERASE_LINE = '\x1b[2K'
while True:
try:
#Default behavior of queue allows getting elements from it and block if queue is Empty.
#In this case I set argument block=False. When queue.get() and queue Empty ocurrs not block and throws a
#queue.Empty exception that I use for show partial result of process.
chunk = chunk_queue.get(block=False)
for line in chunk.splitlines(True):
if line.endswith(b'\n'):
if part: ##for treat last line of chunk (normally is a part of line)
line = linepart + line
part = False
M += 1
else:
##if line not contains '\n' is last line of chunk.
##a part of line which is completed in next interation over next chunk
part = True
linepart = line
except Empty:
# QUEUE EMPTY
print(CURSOR_UP_ONE + ERASE_LINE + CURSOR_UP_ONE)
print(CURSOR_UP_ONE + ERASE_LINE + CURSOR_UP_ONE)
print('Downloading records ...')
if M>0:
print('Partial result: Lines: %d ' % M) #M-1 because M contains header
if (event.is_set()): #'THE END: no elements in queue and download finished (even is set)'
print(CURSOR_UP_ONE + ERASE_LINE+ CURSOR_UP_ONE)
print(CURSOR_UP_ONE + ERASE_LINE+ CURSOR_UP_ONE)
print(CURSOR_UP_ONE + ERASE_LINE+ CURSOR_UP_ONE)
print('The consumer has waited %s times' % str(contador))
print('RECORDS = ', M)
break
contador += 1
time.sleep(1) #(give some time for loading more records)
def main():
chunk_queue = Queue()
event = Event()
args = parse_args()
url = args.url
p1 = Thread(target=download_task, args=(url,chunk_queue,event,))
p1.start()
p2 = Thread(target=count_task, args=(chunk_queue,event,))
p2.start()
p1.join()
p2.join()
# The user of this module can customized one parameter:
# + URL where the remote file can be found.
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument('-u', '--url', default=FILE_URL,
help='remote-csv-file URL')
return parser.parse_args()
if __name__ == '__main__':
main()
Usage
$ python ftp-data.py -u <ftp-file>
Example:
python ftp-data-ol.py -u 'http://cdiac.ornl.gov/ftp/ndp030/CSV-FILES/nation.1751_2010.csv'
The consumer has waited 0 times
RECORDS = 16327
Csv version on Github: https://github.com/AALVAREZG/csv-data-onthefly

Is there a quick way to know the status of another computer?

I need to know the status of ten computers.
Trying to use "PING",I get the info in ten seconds.
I want a more quick way to get this info in Windows7 64.
code:
from platform import system as system_name # Returns the system/OS name
from os import system as system_call # Execute a shell command
def ping(host):
# Ping parameters as function of OS
parameters = "-n 1" if system_name().lower()=="windows" else "-c 1"
# Pinging
return system_call("ping " + parameters + " " + host) == 0
Thanks!

Try with subprocess
import subprocess
def ping(host):
# Ping parameters as function of OS
parameters = "-n" if system_name().lower()=="windows" else "-c"
# Pinging
return subprocess.Popen(["ping", host, parameters, '1'], stdout=subprocess.PIPE).stdout.read()

How can python wait for a batch SGE script finish execution?

I have a problem I'd like you to help me to solve.
I am working in Python and I want to do the following:
call an SGE batch script on a server
see if it works correctly
do something
What I do now is approx the following:
import subprocess
try:
tmp = subprocess.call(qsub ....)
if tmp != 0:
error_handler_1()
else:
correct_routine()
except:
error_handler_2()
My problem is that once the script is sent to SGE, my python script interpret it as a success and keeps working as if it finished.
Do you have any suggestion about how could I make the python code wait for the actual processing result of the SGE script ?
Ah, btw I tried using qrsh but I don't have permission to use it on the SGE
Thanks!

From your code you want the program to wait for job to finish and return code, right? If so, the qsub sync option is likely what you want:
http://gridscheduler.sourceforge.net/htmlman/htmlman1/qsub.html

Additional Answer for an easier processing:
By using the python drmaa module : link which allows a more complete processing with SGE.
A functioning code provided in the documentation is here: [provided you put a sleeper.sh script in the same directory]
please notice that the -b n option is needed to execute a .sh script, otherwise it expects a binary by default like explained here
import drmaa
import os
def main():
"""Submit a job.
Note, need file called sleeper.sh in current directory.
"""
s = drmaa.Session()
s.initialize()
print 'Creating job template'
jt = s.createJobTemplate()
jt.remoteCommand = os.getcwd()+'/sleeper.sh'
jt.args = ['42','Simon says:']
jt.joinFiles=False
jt.nativeSpecification ="-m abe -M mymail -q so-el6 -b n"
jobid = s.runJob(jt)
print 'Your job has been submitted with id ' + jobid
retval = s.wait(jobid, drmaa.Session.TIMEOUT_WAIT_FOREVER)
print('Job: {0} finished with status {1}'.format(retval.jobId, retval.hasExited))
print 'Cleaning up'
s.deleteJobTemplate(jt)
s.exit()
if __name__=='__main__':
main()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: Parallel execution pysphere commands - python

Related

pxssh does not work between compute nodes in a slurm cluster

Improve performance when running the same function several times

How do I count the number of line in a FTP file without downloading it locally while using Python

Is there a quick way to know the status of another computer?

How can python wait for a batch SGE script finish execution?

Categories

Resources