measure data transfer rate with 200 simultaneous users - python

I have linux (Ubuntu) machine as a client. I want to measure data transfer rate when 200 users try to download files from my server at the same time.
Is there some python or linux tool for this? Or can you recommend an approach?
I saw this speedcheck code, and I can wrap it in threads, but I don't understand why the code there is so "complicated" and the block size changes all the time.

I used Mult-Mechanize recently to run some performance tests. It was fairly easy and worked pretty well.

Not sure if you're talking about an actual dedicated server. For traffic graphs and so on I prefer to use Munin. It is a pretty complete monitoring application which builds you nice graphs using rrdtool. Examples are linked on the munin site: full setup, eth0 traffic graph.
The new munin 2 is even more flashy, but I did not use it yet as it's not in my repos and I don't like to mess with perl applications.

Maybe 'ab' from apache ?
ab -n 1000 -c 200 [http[s]://]hostname[:port]/path
-n Number of requests to perform
-c Number of multiple requests to make at a time
It has many options, http://httpd.apache.org/docs/2.2/programs/ab.html
or man ab

import threading
import time
import urllib2
block_sz = 8192
num_threads = 1
url = "http://192.168.1.1/bigfile2"
secDownload = 30
class DownloadFileThread(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
self.u = urllib2.urlopen(url)
self.file_size_dl = 0
def run(self):
while True:
buffer = self.u.read(block_sz)
if not buffer:
raise 'There is nothing to read. You should have bigger file or smaller time'
self.file_size_dl += len(buffer)
if __name__ == "__main__":
print 'Download from url ' + url + ' use in ' + str(num_threads) + ' to download. test time ' + str(secDownload)
threads = []
for i in range(num_threads):
downloadThread = DownloadFileThread()
downloadThread.daemon = True
threads.append(downloadThread)
for i in range(num_threads):
threads[i].start()
time.sleep(secDownload)
sumBytes=0
for i in range(num_threads):
sumBytes = sumBytes + threads[i].file_size_dl
print sumBytes
print str(sumBytes/(secDownload *1000000)) + 'MBps'

Related

How to make a python module with function to run command line tools in parallel (w/o using if __name__ == '__main__': so it's importable)?

I wanted to make a python module with a convenience function for running commands in parallel using Python 3.7 on Windows. (for az cli commands)
I wanted a to make a function that:
Was easy to use: Just pass a list of commands as strings, and have them execute in parallel.
Let me see the output generated by the commands.
Used build in python libraries
Worked equally well on Windows and Linux (Python Multiprocessing uses fork(), and Windows doesn't have fork(), so sometimes Multiprocessing code will work on Linux but not Windows.)
Could be made into an importable module for greater convenience.
This was surprisingly difficult, I think maybe it used to not be possible in older versions of python? (I saw several 2-8 year old Q&As that said you had to use if __name__==__main__: to pull off parallel processing, but I discovered that didn't work in a consistently predictable way when it came to making a importable module.
def removeExtraLinesFromString(inputstring):
stringtoreturn = ""
for line in inputstring.split("\n"):
if len(line.strip()) > 0: #Only add non empty lines to the stringtoreturn
stringtoreturn = stringtoreturn + line
return stringtoreturn
def runCmd(cmd): #string of a command passed in here
from subprocess import run, PIPE
stringtoreturn = str( run(cmd, shell=True, stdout=PIPE).stdout.decode('utf-8') )
stringtoreturn = removeExtraLinesFromString(stringtoreturn)
return stringtoreturn
def exampleOfParrallelCommands():
if __name__ == '__main__': #I don't like this method, because it doesn't work when imported, refractoring attempts lead to infinite loops and unexpected behavior.
from multiprocessing import Pool
cmd = "python -c \"import time;time.sleep(5);print('5 seconds have passed')\""
cmds = []
for i in range(12): #If this were running in series it'd take at least a minute to sleep 5 seconds 12 times
cmds.append(cmd)
with Pool(processes=len(cmds)) as pool:
results = pool.map(runCmd, cmds) #results is a list of cmd output
print(results[0])
print(results[1])
return results
When I tried importing this as a module it didn't work (makes since because of the if statement), so I tried rewriting the code to move the if statement around, I think I removed it once which caused my computer to go into a loop until I shut the program. Another time I was able to import the module into another python program, but to make that work I had to add __name__ == '__main__' and that's very intuitive.
I almost gave up, but after 2 days of searching though tons of python websites and SO posts I finally figured out how to do it after seeing user jfs's code in this Q&A (Python: execute cat subprocess in parallel) I modified his code so it'd better fit into an answer to my question.
toolbox.py
def removeExtraLinesFromString(inputstring):
stringtoreturn = ""
for line in inputstring.split("\n"):
if len(line.strip()) > 0: #Only add non empty lines to the stringtoreturn
stringtoreturn = stringtoreturn + line
return stringtoreturn
def runCmd(cmd): #string of a command passed in here
from subprocess import run, PIPE
stringtoreturn = str( run(cmd, shell=True, stdout=PIPE).stdout.decode('utf-8') )
stringtoreturn = removeExtraLinesFromString(stringtoreturn)
return stringtoreturn
def runParallelCmds(listofcommands):
from multiprocessing.dummy import Pool #thread pool
from subprocess import Popen, PIPE, STDOUT
listofprocesses = [Popen(listofcommands[i], shell=True,stdin=PIPE, stdout=PIPE, stderr=STDOUT, close_fds=True) for i in range(len(listofcommands))]
#Python calls this list comprehension, it's a way of making a list
def get_outputs(process): #MultiProcess Thread Pooling require you to map to a function, thus defining a function.
return process.communicate()[0] #process is object of type subprocess.Popen
outputs = Pool(len(listofcommands)).map(get_outputs, listofprocesses) #outputs is a list of bytes (which is a type of string)
listofoutputstrings = []
for i in range( len(listofcommands) ):
outputasstring = removeExtraLinesFromString( outputs[i].decode('utf-8') ) #.decode('utf-8') converts bytes to string
listofoutputstrings.append( outputasstring )
return listofoutputstrings
main.py
from toolbox import runCmd #(cmd)
from toolbox import runParallelCmds #(listofcommands)
listofcommands = []
cmd = "ping -n 2 localhost"
listofcommands.append(cmd)
cmd = "python -c \"import time;time.sleep(5);print('5 seconds have passed')\""
for i in range(12):
listofcommands.append(cmd) # If 12 processes each sleep 5 seconds, this taking less than 1 minute proves parrallel processing
outputs = runParallelCmds(listofcommands)
print(outputs[0])
print(outputs[1])
output:
Pinging neokylesPC [::1] with 32 bytes of data:
Reply from ::1: time<1ms Reply from ::1: time<1ms Ping statistics
for ::1:
Packets: Sent = 2, Received = 2, Lost = 0 (0% loss), Approximate round trip times in milli-seconds:
Minimum = 0ms, Maximum = 0ms, Average = 0ms
5 seconds have passed

youtbe-dl multiple downloads at the same time

I have a python app, where I have a variable that contains multiple urls.
At this moment I use something like this:
for v in arr:
cmd = 'youtube-dl -u ' + email + ' -p ' + password + ' -o "' + v['path'] + '" ' + v['url']
os.system(cmd)
But this way I download just one video after another. How can I download, let's say 3 videos at the same time ? (Is not from youtube so no playlist or channels)
I not necessary need multi threading in python, but to call the youtube-dl multiple times, splitting the array. So from a python perspective can be on thread.
Use a Pool:
import multiprocessing.dummy
import subprocess
arr = [
{'vpath': 'example/%(title)s.%(ext)s', 'url': 'https://www.youtube.com/watch?v=BaW_jenozKc'},
{'vpath': 'example/%(title)s.%(ext)s', 'url': 'http://vimeo.com/56015672'},
{'vpath': '%(playlist_title)s/%(title)s-%(id)s.%(ext)s',
'url': 'https://www.youtube.com/playlist?list=PLLe-WjSmNEm-UnVV8e4qI9xQyI0906hNp'},
]
email = 'my-email#example.com'
password = '123456'
def download(v):
subprocess.check_call([
'echo', 'youtube-dl',
'-u', email, '-p', password,
'-o', v['vpath'], '--', v['url']])
p = multiprocessing.dummy.Pool(concurrent)
p.map(download, arr)
multiprocessing.dummy.Pool is a lightweight thread-based version of a Pool, which is more suitable here because the work tasks are just starting subprocesses.
Note that instead of os.system, subprocess.check_call, which prevents the command injection vulnerability in your previous code.
Also note that youtube-dl output templates are really powerful. In most cases, you don't actually need to define and manage file names yourself.
I achieved the same thing using threading library, which is considered a lighter way to spawn new processes.
Assumption:
Each task will download videos to a different directory.
import os
import threading
import youtube_dl
COOKIE_JAR = "path_to_my_cookie_jar"
def download_task(videos, output_dir):
if not os.path.isdir(output_dir):
os.makedirs(output_dir)
if not os.path.isfile(COOKIE_JAR):
raise FileNotFoundError("Cookie Jar not found\n")
ydl_opts = {
'cookiefile': COOKIE_JAR,
'outtmpl': f'{output_dir}/%(title)s.%(ext)s'
}
with youtube_dl.YoutubeDL(ydl_opts) as ydl:
ydl.download(videos)
if __name__ == "__main__":
output_dir = "./root_dir"
threads = []
for playlist in many_playlists:
output_dir = f"{output_dir}/playlist.name"
thread = threading.Thread(target=download_task, args=(playlist, output_dir))
threads.append(thread)
# Actually start downloading
for thread in threads:
thread.start()
# Wait for all the downloads to complete
for thread in threads:
thread.join()

How do I count the number of line in a FTP file without downloading it locally while using Python

So I need to be able to read and count the number of lines from a FTP server WITHOUT downloading it to my local machine while using Python.
I know the code to connect to the server:
ftp = ftplib.FTP('example.com') //Object ftp set as server address
ftp.login ('username' , 'password') // Login info
ftp.retrlines('LIST') // List file directories
ftp.cwd ('/parent folder/another folder/file/') //Change file directory
I also know the basic code to count the number of line If it is already downloaded/stored locally :
with open('file') as f:
... count = sum(1 for line in f)
... print (count)
I just need to know how to connect these 2 pieces of code without having to download the file to my local system.
Any help is appreciated.
Thank You
As far as i know FTP doesn't provide any kind of functionality to read the file content without actually downloading it. However you could try using something like Is it possible to read FTP files without writing them using Python?
(You haven't specified what python you are using)
#!/usr/bin/env python
from ftplib import FTP
def countLines(s):
print len(s.split('\n'))
ftp = FTP('ftp.kernel.org')
ftp.login()
ftp.retrbinary('RETR /pub/README_ABOUT_BZ2_FILES', countLines)
Please take this code as a reference only
There is a way: I adapted a piece of code that I created for processes csv files "on the fly". Is implement by producer-consumer problem approach. Apply this pattern allows us to assign each task to a thread (or process) and show partial results for huge remote files. You can adapt it for ftp requests.
Download stream is saved in queue and is consumed "on the fly". No HDD extra space is needed and memory efficient. Tested in Python 3.5.2 (vanilla) on Fedora Core 25 x86_64.
This is the source adapted for ftp (over http) retrieve:
from threading import Thread, Event
from queue import Queue, Empty
import urllib.request,sys,csv,io,os,time;
import argparse
FILE_URL = 'http://cdiac.ornl.gov/ftp/ndp030/CSV-FILES/nation.1751_2010.csv'
def download_task(url,chunk_queue,event):
CHUNK = 1*1024
response = urllib.request.urlopen(url)
event.clear()
print('%% - Starting Download - %%')
print('%% - ------------------ - %%')
'''VT100 control codes.'''
CURSOR_UP_ONE = '\x1b[1A'
ERASE_LINE = '\x1b[2K'
while True:
chunk = response.read(CHUNK)
if not chunk:
print('%% - Download completed - %%')
event.set()
break
chunk_queue.put(chunk)
def count_task(chunk_queue, event):
part = False
time.sleep(5) #give some time to producer
M=0
contador = 0
'''VT100 control codes.'''
CURSOR_UP_ONE = '\x1b[1A'
ERASE_LINE = '\x1b[2K'
while True:
try:
#Default behavior of queue allows getting elements from it and block if queue is Empty.
#In this case I set argument block=False. When queue.get() and queue Empty ocurrs not block and throws a
#queue.Empty exception that I use for show partial result of process.
chunk = chunk_queue.get(block=False)
for line in chunk.splitlines(True):
if line.endswith(b'\n'):
if part: ##for treat last line of chunk (normally is a part of line)
line = linepart + line
part = False
M += 1
else:
##if line not contains '\n' is last line of chunk.
##a part of line which is completed in next interation over next chunk
part = True
linepart = line
except Empty:
# QUEUE EMPTY
print(CURSOR_UP_ONE + ERASE_LINE + CURSOR_UP_ONE)
print(CURSOR_UP_ONE + ERASE_LINE + CURSOR_UP_ONE)
print('Downloading records ...')
if M>0:
print('Partial result: Lines: %d ' % M) #M-1 because M contains header
if (event.is_set()): #'THE END: no elements in queue and download finished (even is set)'
print(CURSOR_UP_ONE + ERASE_LINE+ CURSOR_UP_ONE)
print(CURSOR_UP_ONE + ERASE_LINE+ CURSOR_UP_ONE)
print(CURSOR_UP_ONE + ERASE_LINE+ CURSOR_UP_ONE)
print('The consumer has waited %s times' % str(contador))
print('RECORDS = ', M)
break
contador += 1
time.sleep(1) #(give some time for loading more records)
def main():
chunk_queue = Queue()
event = Event()
args = parse_args()
url = args.url
p1 = Thread(target=download_task, args=(url,chunk_queue,event,))
p1.start()
p2 = Thread(target=count_task, args=(chunk_queue,event,))
p2.start()
p1.join()
p2.join()
# The user of this module can customized one parameter:
# + URL where the remote file can be found.
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument('-u', '--url', default=FILE_URL,
help='remote-csv-file URL')
return parser.parse_args()
if __name__ == '__main__':
main()
Usage
$ python ftp-data.py -u <ftp-file>
Example:
python ftp-data-ol.py -u 'http://cdiac.ornl.gov/ftp/ndp030/CSV-FILES/nation.1751_2010.csv'
The consumer has waited 0 times
RECORDS = 16327
Csv version on Github: https://github.com/AALVAREZG/csv-data-onthefly

Run server alongside infinite loop in Python

I have the following code:
#!/usr/bin/python
import StringIO
import subprocess
import os
import time
from datetime import datetime
from PIL import Image
# Original code written by brainflakes and modified to exit
# image scanning for loop as soon as the sensitivity value is exceeded.
# this can speed taking of larger photo if motion detected early in scan
# Motion detection settings:
# need future changes to read values dynamically via command line parameter or xml file
# --------------------------
# Threshold - (how much a pixel has to change by to be marked as "changed")
# Sensitivity - (how many changed pixels before capturing an image) needs to be higher if noisy view
# ForceCapture - (whether to force an image to be captured every forceCaptureTime seconds)
# filepath - location of folder to save photos
# filenamePrefix - string that prefixes the file name for easier identification of files.
threshold = 10
sensitivity = 180
forceCapture = True
forceCaptureTime = 60 * 60 # Once an hour
filepath = "/home/pi/camera"
filenamePrefix = "capture"
# File photo size settings
saveWidth = 1280
saveHeight = 960
diskSpaceToReserve = 40 * 1024 * 1024 # Keep 40 mb free on disk
# Capture a small test image (for motion detection)
def captureTestImage():
command = "raspistill -w %s -h %s -t 500 -e bmp -o -" % (100, 75)
imageData = StringIO.StringIO()
imageData.write(subprocess.check_output(command, shell=True))
imageData.seek(0)
im = Image.open(imageData)
buffer = im.load()
imageData.close()
return im, buffer
# Save a full size image to disk
def saveImage(width, height, diskSpaceToReserve):
keepDiskSpaceFree(diskSpaceToReserve)
time = datetime.now()
filename = filepath + "/" + filenamePrefix + "-%04d%02d%02d-%02d%02d%02d.jpg" % ( time.year, time.month, time.day, time.hour, time.minute, time.second)
subprocess.call("raspistill -w 1296 -h 972 -t 1000 -e jpg -q 15 -o %s" % filename, shell=True)
print "Captured %s" % filename
# Keep free space above given level
def keepDiskSpaceFree(bytesToReserve):
if (getFreeSpace() < bytesToReserve):
for filename in sorted(os.listdir(filepath + "/")):
if filename.startswith(filenamePrefix) and filename.endswith(".jpg"):
os.remove(filepath + "/" + filename)
print "Deleted %s to avoid filling disk" % filename
if (getFreeSpace() > bytesToReserve):
return
# Get available disk space
def getFreeSpace():
st = os.statvfs(".")
du = st.f_bavail * st.f_frsize
return du
# Get first image
image1, buffer1 = captureTestImage()
# Reset last capture time
lastCapture = time.time()
# added this to give visual feedback of camera motion capture activity. Can be removed as required
os.system('clear')
print " Motion Detection Started"
print " ------------------------"
print "Pixel Threshold (How much) = " + str(threshold)
print "Sensitivity (changed Pixels) = " + str(sensitivity)
print "File Path for Image Save = " + filepath
print "---------- Motion Capture File Activity --------------"
while (True):
# Get comparison image
image2, buffer2 = captureTestImage()
# Count changed pixels
changedPixels = 0
for x in xrange(0, 100):
# Scan one line of image then check sensitivity for movement
for y in xrange(0, 75):
# Just check green channel as it's the highest quality channel
pixdiff = abs(buffer1[x,y][1] - buffer2[x,y][1])
if pixdiff > threshold:
changedPixels += 1
# Changed logic - If movement sensitivity exceeded then
# Save image and Exit before full image scan complete
if changedPixels > sensitivity:
lastCapture = time.time()
saveImage(saveWidth, saveHeight, diskSpaceToReserve)
break
continue
# Check force capture
if forceCapture:
if time.time() - lastCapture > forceCaptureTime:
changedPixels = sensitivity + 1
# Swap comparison buffers
image1 = image2
buffer1 = buffer2
This code takes a picture once movement is detected, and keeps doing so until I manually stop it. (I should mention that the code is for use with the Raspberry Pi computer)
I also have the following code courtesy of Nathan Jhaveri here on Stackoverflow:
import SocketServer
from BaseHTTPServer import BaseHTTPRequestHandler
def some_function():
print "some_function got called"
class MyHandler(BaseHTTPRequestHandler):
def do_GET(self):
if self.path == '/captureImage':
saveImage(saveWidth, saveHeight, diskSpaceToReserve)
self.send_response(200)
httpd = SocketServer.TCPServer(("", 8080), MyHandler)
httpd.serve_forever()
This code runs a simple server that would execute the
saveImage(saveWidth, saveHeight, diskSpaceToReserve)
function when the url /captureImage is visited on the server. I have run into a problem with this though. Since the two pieces of code are both infinite loops, they cannot run side by side. I would assume I need to do some kind of multi-threading, but that is something I have never experimented with in Python before. I would appreciate if anyone could help me get back on track with this.
This isn't a small question. Your best bet is to work through some python threading tutorials such as this one: http://www.tutorialspoint.com/python/python_multithreading.htm (found via google)
Try taking the webserver, and running it on a background thread so that calling "serve_forever()" does not block the main thread's "while True:" loop. Replace the existing call to httpd.serve_forever() with something like:
import threading
class WebThread(threading.Thread):
def run(self):
httpd = SocketServer.TCPServer(("", 8080), MyHandler)
httpd.serve_forever()
WebThread().start()
Make sure that chunk of code runs before the "while (True):" loop, and you should have both the webserver loop and the main thread loop running.
Keep in mind that having multiple threads can get complicated. What happens when two threads access the same resource at the same time? As Velox mentioned in another answer, it is worthwhile to learn more about threading.
I can illustrate a simple example using multi-threading.
from http.server import BaseHTTPRequestHandler, HTTPServer
import concurrent.futures
import logging
import time
hostName = "localhost"
serverPort = 5001
class MyServer(BaseHTTPRequestHandler):
def do_GET(self):
self.send_response(200)
self.send_header("Content-type", "text/html")
self.end_headers()
self.wfile.write(bytes("<html><head><title>python3 http server</title><body>", "utf-8"))
def serverThread():
webServer = HTTPServer((hostName, serverPort), MyServer)
logging.info("Server started http://%s:%s" % (hostName, serverPort))
try:
webServer.serve_forever()
except :
pass
webServer.server_close()
logging.info("Server stopped.")
def logThread():
while True:
time.sleep(2.0)
logging.info('hi from log thread')
if __name__ == "__main__":
logging.basicConfig(level=logging.INFO)
with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor:
# Run Server
executor.submit(serverThread)
# Run A Parallel Thread
executor.submit(logThread)
Here we have two threads : A server and Another parallel Thread which logs a line every 2 seconds.
You have to define code corresponding to each thread in separate functions, and submit it to the concurrent.futures Thread Pool.
Btw, I have not done study of how efficient it is to run a server this way.

In Django, how to call a subprocess with a slow start-up time

Suppose you're running Django on Linux, and you've got a view, and you want that view to return the data from a subprocess called cmd that operates on a file that the view creates, for example likeso:
def call_subprocess(request):
response = HttpResponse()
with tempfile.NamedTemporaryFile("W") as f:
f.write(request.GET['data']) # i.e. some data
# cmd operates on fname and returns output
p = subprocess.Popen(["cmd", f.name],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
out, err = p.communicate()
response.write(p.out) # would be text/plain...
return response
Now, suppose cmd has a very slow start-up time, but a very fast operating time, and it does not natively have a daemon mode. I would like to improve the response-time of this view.
I would like to make the whole system would run much faster by starting up a number of instances of cmd in a worker-pool, have them wait for input, and having call_process ask one of those worker pool processes handle the data.
This is really 2 parts:
Part 1. A function that calls cmd and cmd waits for input. This could be done with pipes, i.e.
def _run_subcmd():
p = subprocess.Popen(["cmd", fname],
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
out, err = p.communicate()
# write 'out' to a tmp file
o = open("out.txt", "W")
o.write(out)
o.close()
p.close()
exit()
def _run_cmd(data):
f = tempfile.NamedTemporaryFile("W")
pipe = os.mkfifo(f.name)
if os.fork() == 0:
_run_subcmd(fname)
else:
f.write(data)
r = open("out.txt", "r")
out = r.read()
# read 'out' from a tmp file
return out
def call_process(request):
response = HttpResponse()
out = _run_cmd(request.GET['data'])
response.write(out) # would be text/plain...
return response
Part 2. A set of workers running in the background that are waiting on the data. i.e. We want to extend the above so that the subprocess is already running, e.g. when the Django instance initializes, or this call_process is first called, a set of these workers is created
WORKER_COUNT = 6
WORKERS = []
class Worker(object):
def __init__(index):
self.tmp_file = tempfile.NamedTemporaryFile("W") # get a tmp file name
os.mkfifo(self.tmp_file.name)
self.p = subprocess.Popen(["cmd", self.tmp_file],
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
self.index = index
def run(out_filename, data):
WORKERS[self.index] = Null # qua-mutex??
self.tmp_file.write(data)
if (os.fork() == 0): # does the child have access to self.p??
out, err = self.p.communicate()
o = open(out_filename, "w")
o.write(out)
exit()
self.p.close()
self.o.close()
self.tmp_file.close()
WORKERS[self.index] = Worker(index) # replace this one
return out_file
#classmethod
def get_worker() # get the next worker
# ... static, incrementing index
There should be some initialization of workers somewhere, like this:
def init_workers(): # create WORKERS_COUNT workers
for i in xrange(0, WORKERS_COUNT):
tmp_file = tempfile.NamedTemporaryFile()
WORKERS.push(Worker(i))
Now, what I have above becomes something likeso:
def _run_cmd(data):
Worker.get_worker() # this needs to be atomic & lock worker at Worker.index
fifo = open(tempfile.NamedTemporaryFile("r")) # this stores output of cmd
Worker.run(fifo.name, data)
# please ignore the fact that everything will be
# appended to out.txt ... these will be tmp files, too, but named elsewhere.
out = fifo.read()
# read 'out' from a tmp file
return out
def call_process(request):
response = HttpResponse()
out = _run_cmd(request.GET['data'])
response.write(out) # would be text/plain...
return response
Now, the questions:
Will this work? (I've just typed this off the top of my head into StackOverflow, so I'm sure there are problems, but conceptually, will it work)
What are the problems to look for?
Are there better alternatives to this? e.g. Could threads work just as well (it's Debian Lenny Linux)? Are there any libraries that handle parallel process worker-pools like this?
Are there interactions with Django that I ought to be conscious of?
Thanks for reading! I hope you find this as interesting a problem as I do.
Brian
It may seem like i am punting this product as this is the second time i have responded with a recommendation of this.
But it seems like you need a Message Queing service, in particular a distributed message queue.
ere is how it will work:
Your Django App requests CMD
CMD gets added to a queue
CMD gets pushed to several works
It is executed and results returned upstream
Most of this code exists, and you dont have to go about building your own system.
Have a look at Celery which was initially built with Django.
http://www.celeryq.org/
http://robertpogorzelski.com/blog/2009/09/10/rabbitmq-celery-and-django/
Issy already mentioned Celery, but since comments doesn't work well
with code samples, I'll reply as an answer instead.
You should try to use Celery synchronously with the AMQP result store.
You could distribute the actual execution to another process or even another machine. Executing synchronously in celery is easy, e.g.:
>>> from celery.task import Task
>>> from celery.registry import tasks
>>> class MyTask(Task):
...
... def run(self, x, y):
... return x * y
>>> tasks.register(MyTask)
>>> async_result = MyTask.delay(2, 2)
>>> retval = async_result.get() # Now synchronous
>>> retval 4
The AMQP result store makes sending back the result very fast,
but it's only available in the current development version (in code-freeze to become
0.8.0)
How about "daemonizing" the subprocess call using python-daemon or its successor, grizzled.

Categories

Resources