How to run Locust with multiprocessing on a single machine - python

I want Locust to use all cores on my PC.
I have many Locust classes and I want to use Locust as a library.
Example of my code:
import gevent
from locust.env import Environment
from locust.stats import stats_printer
from locust.log import setup_logging
import time
from locust import HttpUser, TaskSet, task, between
def index(l):
l.client.get("/")
def stats(l):
l.client.get("/stats/requests")
class UserTasks(TaskSet):
# one can specify tasks like this
tasks = [index, stats]
# but it might be convenient to use the #task decorator
#task
def page404(self):
self.client.get("/does_not_exist")
class WebsiteUser(HttpUser):
"""
User class that does requests to the locust web server running on localhost
"""
host = "http://127.0.0.1:8089"
wait_time = between(2, 5)
tasks = [UserTasks]
def worker():
env2 = Environment(user_classes=[WebsiteUser])
env2.create_worker_runner(master_host="127.0.0.1", master_port=50013)
# env2.runner.start(10, hatch_rate=1)
env2.runner.greenlet.join()
def master():
env1 = Environment(user_classes=[WebsiteUser])
env1.create_master_runner(master_bind_host="127.0.0.1", master_bind_port=50013)
env1.create_web_ui("127.0.0.1", 8089)
env1.runner.start(20, hatch_rate=4)
env1.runner.greenlet.join()
import multiprocessing
from multiprocessing import Process
import time
procs = []
proc = Process(target=master)
procs.append(proc)
proc.start()
time.sleep(5)
for i in range(multiprocessing.cpu_count()):
proc = Process(target=worker) # instantiating without any argument
procs.append(proc)
proc.start()
for process in procs:
process.join()
This code doesn't work correctly.
(env) ➜ test_locust python main3.py
You are running in distributed mode but have no worker servers connected. Please connect workers prior to swarming.
Traceback (most recent call last):
File "src/gevent/greenlet.py", line 854, in gevent._gevent_cgreenlet.Greenlet.run
File "/home/alex/projects/performance/env/lib/python3.6/site-packages/locust/runners.py", line 532, in client_listener
client_id, msg = self.server.recv_from_client()
File "/home/alex/projects/performance/env/lib/python3.6/site-packages/locust/rpc/zmqrpc.py", line 44, in recv_from_client
msg = Message.unserialize(data[1])
File "/home/alex/projects/performance/env/lib/python3.6/site-packages/locust/rpc/protocol.py", line 18, in unserialize
msg = cls(*msgpack.loads(data, raw=False, strict_map_key=False))
File "msgpack/_unpacker.pyx", line 161, in msgpack._unpacker.unpackb
TypeError: unpackb() got an unexpected keyword argument 'strict_map_key'
2020-08-13T11:21:10Z <Greenlet at 0x7f8cf300c848: <bound method MasterRunner.client_listener of <locust.runners.MasterRunner object at 0x7f8cf2f531d0>>> failed with TypeError
Unhandled exception in greenlet: <Greenlet at 0x7f8cf300c848: <bound method MasterRunner.client_listener of <locust.runners.MasterRunner object at 0x7f8cf2f531d0>>>
Traceback (most recent call last):
File "src/gevent/greenlet.py", line 854, in gevent._gevent_cgreenlet.Greenlet.run
File "/home/alex/projects/performance/env/lib/python3.6/site-packages/locust/runners.py", line 532, in client_listener
client_id, msg = self.server.recv_from_client()
File "/home/alex/projects/performance/env/lib/python3.6/site-packages/locust/rpc/zmqrpc.py", line 44, in recv_from_client
msg = Message.unserialize(data[1])
File "/home/alex/projects/performance/env/lib/python3.6/site-packages/locust/rpc/protocol.py", line 18, in unserialize
msg = cls(*msgpack.loads(data, raw=False, strict_map_key=False))
File "msgpack/_unpacker.pyx", line 161, in msgpack._unpacker.unpackb
TypeError: unpackb() got an unexpected keyword argument 'strict_map_key'
ACTUAL RESULT: workers do not connect to the master and run users without a master
EXPECTED RESULT: workers run only with the master.
What is wrong?

You cannot use multiprocessing together with Locust/gevent (or at least it is known to cause issues).
Please spawn separate processes using subprocess or something completely external to locust. Perhaps you could modify locust-swarm (https://github.com/SvenskaSpel/locust-swarm) to make it able to run worker processes on the same machine.

I faced the same issue today, and since I didn't found a better option
I've add is like the following:
#events.init_command_line_parser.add_listener
def add_processes_arguments(parser: configargparse.ArgumentParser):
processes = parser.add_argument_group("start multiple worker processes")
processes.add_argument(
"--processes",
"-p",
action="store_true",
help="start slave processes to start",
env_var="LOCUST_PROCESSES",
default=False,
)
#events.init.add_listener
def on_locust_init(environment, **kwargs): # pylint: disable=unused-argument
if (
environment.parsed_options.processes
and environment.parsed_options.master
and environment.parsed_options.expect_workers
):
environment.worker_processes = []
master_args = [*sys.argv]
worker_args = [sys.argv[0]]
if "-f" in master_args:
i = master_args.index("-f")
worker_args += [master_args.pop(i), master_args.pop(i)]
if "--locustfile" in master_args:
i = master_args.index("--locustfile")
worker_args += [master_args.pop(i), master_args.pop(i)]
worker_args += ["--worker"]
for _ in range(environment.parsed_options.expect_workers):
p = subprocess.Popen( # pylint: disable=consider-using-with
worker_args, start_new_session=True
)
environment.worker_processes.append(p)
can see the rest of the code here:
https://github.com/fruch/hydra-locust/blob/master/common.py#L27
and run it from command line like this:
locust -f locustfile.py --host 172.17.0.2 --headless --users 1000 -t 1m -r 100 --master --expect-workers 2 --csv=example --processes

Related

python luigi : requires() can not return Target objects

I'm really new to Luigi and I would like to set up luigi to execute my API calls.
I'm working with MockFiles since the json object that I retrieve through API are light and I want to avoid to use an external database.
This is my code :
import luigi
from luigi import Task, run as runLuigi, mock as LuigiMock
import yaml
class getAllCountries(Task):
task_complete = False
def requires(self):
return LuigiMock.MockFile("allCountries")
def run(self):
sync = Sync()
# Get list of all countries
countries = sync.getAllCountries()
if(countries is None or len(countries) == 0):
Logger.error("Sync terminated. The country array is null")
object_to_send = yaml.dump(countries)
_out = self.output().open('r')
_out.write(object_to_send)
_out.close()
task_complete = True
def complete(self):
return self.task_complete
class getActiveCountries(Task):
task_complete = False
def requires(self):
return getAllCountries()
def run(self):
_in = self.input().read('r')
serialised = _in.read()
countries = yaml.load(serialised)
doSync = DoSync()
activeCountries = doSync.getActiveCountries(countries)
if(activeCountries is None or len(activeCountries) == 0):
Logger.error("Sync terminated. The active country account array is null")
task_complete = True
def complete(self):
return self.task_complete
if __name__ == "__main__":
runLuigi()
I'm running the project with the following command :
PYTHONPATH='.' luigi --module app getActiveCountries --workers 2 --local-scheduler
And it fails, and this is the stacktrace that I got :
DEBUG: Checking if getActiveCountries() is complete
DEBUG: Checking if getAllCountries() is complete
INFO: Informed scheduler that task getActiveCountries__99914b932b has status PENDING
ERROR: Luigi unexpected framework error while scheduling getActiveCountries()
Traceback (most recent call last):
File "/Users/thibaultlr/anaconda3/envs/testThib/lib/python3.6/site-packages/luigi/worker.py", line 763, in add
for next in self._add(item, is_complete):
File "/Users/thibaultlr/anaconda3/envs/testThib/lib/python3.6/site-packages/luigi/worker.py", line 861, in _add
self._validate_dependency(d)
File "/Users/thibaultlr/anaconda3/envs/testThib/lib/python3.6/site-packages/luigi/worker.py", line 886, in _validate_dependency
raise Exception('requires() can not return Target objects. Wrap it in an ExternalTask class')
Exception: requires() can not return Target objects. Wrap it in an ExternalTask class
INFO: Worker Worker(salt=797067816, workers=2, host=xxx, pid=85795) was stopped. Shutting down Keep-Alive thread
ERROR: Uncaught exception in luigi
Traceback (most recent call last):
File "/Users/thibaultlr/anaconda3/envs/testThib/lib/python3.6/site-packages/luigi/retcodes.py", line 75, in run_with_retcodes
worker = luigi.interface._run(argv).worker
File "/Users/thibaultlr/anaconda3/envs/testThib/lib/python3.6/site-packages/luigi/interface.py", line 211, in _run
return _schedule_and_run([cp.get_task_obj()], worker_scheduler_factory)
File "/Users/thibaultlr/anaconda3/envs/testThib/lib/python3.6/site-packages/luigi/interface.py", line 171, in _schedule_and_run
success &= worker.add(t, env_params.parallel_scheduling, env_params.parallel_scheduling_processes)
File "/Users/thibaultlr/anaconda3/envs/testThib/lib/python3.6/site-packages/luigi/worker.py", line 763, in add
for next in self._add(item, is_complete):
File "/Users/thibaultlr/anaconda3/envs/testThib/lib/python3.6/site-packages/luigi/worker.py", line 861, in _add
self._validate_dependency(d)
File "/Users/thibaultlr/anaconda3/envs/testThib/lib/python3.6/site-packages/luigi/worker.py", line 886, in _validate_dependency
raise Exception('requires() can not return Target objects. Wrap it in an ExternalTask class')
Exception: requires() can not return Target objects. Wrap it in an ExternalTask class
Also, i'm running the luigid in background and I do not see any tasks that ran on it. Neither if it failed or not
Any ideas ?
Firstly, you are not seeing anything happen within the luigi daemon, because in your PYTHONPATH you specify --local-scheduler. This disregards the daemon entirely and just runs the scheduler on the local process.
Second, in the getAllCountries task, you are specifying a Target as a requirement, when it should be in your output function. Once you've changed if from:
def requires(self):
return LuigiMock.MockFile("allCountries")
to:
def output(self):
return LuigiMock.MockFile("allCountries")
you won't need to redefine the complete function or set task_complete to True, because luigi will determine the task is complete by the presence of the output file. To find out more about targets take a look at: https://luigi.readthedocs.io/en/stable/workflows.html#target
Side note: You can make this section:
_out = self.output().open('r')
_out.write(object_to_send)
_out.close()
a lot easier and less prone to bugs by just using Python's with functionality.
with self.output().open('r') as _out:
_out.write(object_to_send)
Python will automatically close the file when exiting the with scope and on error.
Second side note: Don't use luigi's run. It is deprecated. Use luigi.build instead.

RobotFramework with Python's asyncio

I'm trying to run RobotFramework with Python3.6's asyncio.
The relevant Python-Code looks as follows:
""" SampleProtTest.py """
import asyncio
import threading
class SubscriberClientProtocol(asyncio.Protocol):
"""
Generic, Asynchronous protocol that allows sending using a synchronous accessible queue
Based on http://stackoverflow.com/a/30940625/4150378
"""
def __init__(self, loop):
self.loop = loop
""" Functions follow for reading... """
class PropHost:
def __init__(self, ip: str, port: int = 50505) -> None:
self.loop = asyncio.get_event_loop()
self.__coro = self.loop.create_connection(lambda: SubscriberClientProtocol(self.loop), ip, port)
_, self.__proto = self.loop.run_until_complete(self.__coro)
# run the asyncio-loop in background thread
threading.Thread(target=self.runfunc).start()
def runfunc(self) -> None:
self.loop.run_forever()
def dosomething(self):
print("I'm doing something")
class SampleProtTest(object):
def __init__(self, ip='127.0.0.1', port=8000):
self._myhost = PropHost(ip, port)
def do_something(self):
self._myhost.dosomething()
if __name__=="__main__":
tester = SampleProtTest()
tester.do_something()
If I run this file in python, it prints, as expected:
I'm doing something
To run the code in Robot-Framework, I wrote the following .robot file:
*** Settings ***
Documentation Just A Sample
Library SampleProtTest.py
*** Test Cases ***
Do anything
do_something
But if I run this .robot-file, I get the following error:
Initializing test library 'SampleProtTest' with no arguments failed: This event loop is already running
Traceback (most recent call last):
File "SampleProtTest.py", line 34, in __init__
self._myhost = PropHost(ip, port)
File "SampleProtTest.py", line 21, in __init__
_, self.__proto = self.loop.run_until_complete(self.__coro)
File "appdata\local\programs\python\python36\lib\asyncio\base_events.py", line 454, in run_until_complete
self.run_forever()
File "appdata\local\programs\python\python36\lib\asyncio\base_events.py", line 408, in run_forever
raise RuntimeError('This event loop is already running')
Can someone explain to me why or how I can get around this?
Thank you very much!
EDIT
Thanks to #Dandekar I added some Debug outputs, see code above, and get the following output from robot:
- Loop until complete...
- Starting Thread...
- Running in thread...
==============================================================================
Sample :: Just A Sample
==============================================================================
Do anything - Loop until complete...
| FAIL |
Initializing test library 'SampleProtTest' with no arguments failed: This event loop is already running
Traceback (most recent call last):
File "C:\share\TestAutomation\SampleProtTest.py", line 42, in __init__
self._myhost = PropHost(ip, port)
File "C:\share\TestAutomation\SampleProtTest.py", line 24, in __init__
_, self.__proto = self.loop.run_until_complete(self.__coro)
File "c:\users\muechr\appdata\local\programs\python\python36\lib\asyncio\base_events.py", line 454, in run_until_complete
self.run_forever()
File "c:\users\muechr\appdata\local\programs\python\python36\lib\asyncio\base_events.py", line 408, in run_forever
raise RuntimeError('This event loop is already running')
------------------------------------------------------------------------------
Sample :: Just A Sample | FAIL |
1 critical test, 0 passed, 1 failed
1 test total, 0 passed, 1 failed
==============================================================================
Output: C:\share\TestAutomation\results\output.xml
Log: C:\share\TestAutomation\results\log.html
Report: C:\share\TestAutomation\results\report.html
As I see it, the problem is that it already started the Thread BEFORE the test case. Oddly, if I remove the line
_, self.__proto = self.loop.run_until_complete(self.__coro)
It seems to run through - but I can't explain why... But this is not a practical solution as I'm not able to access __proto like this...
Edit: Comment out the part where your code runs at start
# if __name__=="__main__":
# tester = SampleProtTest()
# tester.do_something()
That piece gets run when you import your script in robot framework (causing the port to be occupied).
Also: If you are simply trying to run keywords asynchronously, there is a library that does that (although I have not tried it myself).
robotframework-async

Python3 filling a dictionary concurrently

I want to fill a dictionary in a loop. Iterations in the loop are independent from each other. I want to perform this on a cluster with thousands of processors. Here is a simplified version of what I tried and need to do.
import multiprocessing
class Worker(multiprocessing.Process):
def setName(self,name):
self.name=name
def run(self):
print ('In %s' % self.name)
return
if __name__ == '__main__':
jobs = []
names=dict()
for i in range(10000):
p = Worker()
p.setName(str(i))
names[str(i)]=i
jobs.append(p)
p.start()
for j in jobs:
j.join()
I tried this one in python3 on my own computer and received the following error:
..
In 249
Traceback (most recent call last):
File "test.py", line 16, in <module>
p.start()
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/multiprocessing/process.py", line 105, in start
In 250
self._popen = self._Popen(self)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/multiprocessing/context.py", line 212, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/multiprocessing/context.py", line 267, in _Popen
return Popen(process_obj)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/multiprocessing/popen_fork.py", line 20, in __init__
self._launch(process_obj)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/multiprocessing/popen_fork.py", line 66, in _launch
parent_r, child_w = os.pipe()
OSError: [Errno 24] Too many open files
Is there any better way to do this?
multiprocessing talks to its subprocesses via pipes. Each subprocesses requires two open file descriptors, one for read and one for write. If you launch 10000 workers, you'll end opening 20000 file descriptors which exceeds the default limit on OS X (which your paths indicate you're using).
You can fix the issue by raising the limit. See https://superuser.com/questions/433746/is-there-a-fix-for-the-too-many-open-files-in-system-error-on-os-x-10-7-1 for details - basically, it amounts to setting two sysctl knobs and upping your shell's ulimit setting.
You are spawning 10000 processes at once at the moment. That really isn't a good idea.
The error you see is most definitely raised because the multiprocessing module (seem to) use pipes for the Inter Proccess Communication and there is a limit of open pipes/FDs.
I suggest using an python interpreter without a Global interpreter lock like Jython or IronPython and just replace the multiprocessing module with the threading one.
If you still want to use the multiprocessing module, you could use an Proccess Pool like this to collect the return values:
from multiprocessing import Pool
def worker(params):
name, someArg = params
print ('In %s' % name)
# do something with someArg here
return (name, someArg)
if __name__ == '__main__':
jobs = []
names=dict()
# Spawn 100 worker processes
pool = Pool(processes=100)
# Fill with real data
task_dict = dict(('name_{}'.format(i), i) for i in range(1000))
# Process every task via our pool
results = pool.map(worker, task_dict.items())
# And convert the rsult to a dict
results = dict(results)
print (results)
This should work with minimal changes for the threading module, too.

Use cx_Oracle and multiprocessing to query data concurrently

All,
I am trying to access and process a large chunk of data from an Oracle database. So I used multiprocessing module to spawn 50 processes to access the database. To avoid opening 50 physical connections, I tried to use session pooling from cx_Oracle. So the code looks like below. However I always got an unpickling error. I know cx_Oracle has pickling issue, but I thought I go around it by using a global variable. Could any one help.
import sys
import cx_Oracle
import os
from multiprocessing import Pool
# Read a list of ids from the input file
def ReadList(inputFile):
............
def GetText(applId):
global sPool
connection = sPool.acquire()
cur = connection.cursor()
cur.prepare('Some Query')
cur.execute(None, appl_id = applId)
result = cur.fetchone()
title = result[0]
abstract = result[2].read()
sa = result[3].read()
cur.close()
sPool.release(connection)
return (title, abstract, sa)
if __name__=='__main__':
inputFile = sys.argv[1]
ids = ReadList(inputFile)
dsn = cx_Oracle.makedsn('xxx', ...)
sPool=cx_Oracle.SessionPool(....., min=1, max=10, increment=1)
pool = Pool(10)
results = pool.map(GetText, ids)
Exception in thread Thread-2:
Traceback (most recent call last):
File "/usr/lib/python2.6/threading.py", line 525, in __bootstrap_inner
self.run()
File "/usr/lib/python2.6/threading.py", line 477, in run
self.__target(*self.__args, **self.__kwargs)
File "/usr/lib/python2.6/multiprocessing/pool.py", line 282, in _handle_results
task = get()
UnpicklingError: NEWOBJ class argument has NULL tp_new
How are you expecting 50 processes to use the same, intra-process-managed DB connection (pool)?!
First of all, your code results the error "NameError: global name 'sPool' is not defined", therefore sPool=cx_Oracle.SessionPool(....., min=1, max=10, increment=1) must be above of def GetText(applId):
For me, this code starts work properly after change from multiprocessing import Pool to from multiprocessing.dummy import Pool and adding parameter threaded=True to call of cx_Oracle.SessionPool as sPool=cx_Oracle.SessionPool(....., min=1, max=10, increment=1, threaded=True)

Listening Twisted TCP connection with python-daemon gives bad file descriptor

I'm trying to create a program which:
fork at start using multiprocessing
the forked process uses python-daemon to fork again in the background
opening a twisted listening TCP port in the resulting background process
The reason I need to fork the process before launching python-daemon is because I want the staring process to stay alive (by default python-daemon kills the father process).
So far my code is:
from twisted.web import xmlrpc, server
from twisted.internet import reactor
from daemon import daemon
import multiprocessing
import os
import logging
class RemotePart(object):
def setup(self):
self.commands = CommandPart()
reactor.listenTCP(9091, server.Site(self.commands))
class CommandPart(xmlrpc.XMLRPC, object):
def __init__(self):
super(CommandPart, self).__init__()
def xmlrpc_stop(self):
return True
class ServerPart(object):
def __init__(self):
self.logger = logging.getLogger("server")
self.logger.info("ServerPart.__init__()")
def start_second_daemon(self):
self.logger.info("start_second_daemon()")
daemon_context = daemon.DaemonContext(detach_process=True)
daemon_context.stdout = open(
name="log.txt",
mode='w+',
buffering=0
)
daemon_context.stderr = open(
name="log.txt",
mode='w+',
buffering=0
)
daemon_context.working_directory = os.getcwd()
daemon_context.open()
self.inside_daemon()
def inside_daemon(self):
self.logger.setLevel(0)
self.logger.info("inside daemon")
self.remote = RemotePart()
self.remote.setup()
reactor.run()
class ClientPart(object):
def __init__(self):
logging.basicConfig(level=0)
self.logger = logging.getLogger("client")
self.logger.info("ClientPart.__init__()")
def start_daemon(self):
self.logger.info("start_daemon()")
start_second_daemon()
def launch_daemon(self):
self.logger.info("launch_daemon()")
server = ServerPart()
p = multiprocessing.Process(target=server.start_second_daemon())
p.start()
p.join()
if __name__ == '__main__':
client = ClientPart()
client.launch_daemon()
Starting the process seems to work:
INFO:client:ClientPart.__init__()
INFO:client:launch_daemon()
INFO:server:ServerPart.__init__()
INFO:server:start_second_daemon()
But looking to the log file of the background process, twisted cannot open the TCP port:
INFO:server:inside daemon
Traceback (most recent call last):
File "forking_test.py", line 74, in <module>
client.launch_daemon()
File "forking_test.py", line 68, in launch_daemon
p = multiprocessing.Process(target=server.start_second_daemon())
File "forking_test.py", line 45, in start_second_daemon
self.inside_daemon()
File "forking_test.py", line 51, in inside_daemon
self.remote.setup()
File "forking_test.py", line 12, in setup
reactor.listenTCP(9091, server.Site(self.commands))
File "/usr/lib/python2.7/site-packages/twisted/internet/posixbase.py", line 482, in listenTCP
p.startListening()
File "/usr/lib/python2.7/site-packages/twisted/internet/tcp.py", line 1004, in startListening
self.startReading()
File "/usr/lib/python2.7/site-packages/twisted/internet/abstract.py", line 429, in startReading
self.reactor.addReader(self)
File "/usr/lib/python2.7/site-packages/twisted/internet/epollreactor.py", line 247, in addReader
EPOLLIN, EPOLLOUT)
File "/usr/lib/python2.7/site-packages/twisted/internet/epollreactor.py", line 233, in _add
self._poller.register(fd, flags)
IOError: [Errno 9] Bad file descriptor
Any idea ? It seems python-daemon is closing all the file descriptors of the background process when this one starts, could it be coming from this behavior ?
There are lots of reasons that running fork and then running some arbitary library code that doesn't work. It would be hard to list them all here, but generally it's not cool to do. My guess as to what's specifically happening here is that something within multiprocessing is closing the "waker" file descriptor that lets Twisted communicate with its thread pool, but I can't be completely sure.
If you were to re-write this to:
Use spawnProcess instead of multiprocessing
Use twistd instead of python-daemonize
the interactions would be far less surprising, because you'd be using process-spawning and daemonization code specifically designed to work with Twisted, instead of two things with lots of accidental platform interactions (calling fork, serializing things over pipes with pickle, calling setsid and setuid and changing controlling terminal and session leader at various times).
(And actually I would recommend integrating with your platform's daemon management tools, like upstart or launchd or systemd or a cross-platform one like runit rather than depending on any daemonization code, including that in twistd, but I would need to know more about your application to know what to recommend.)

Categories

Resources