The gRPC object creates extra processes that don't close

The gRPC object creates extra processes that don't close - python

I developed the application with gRPC servicer. The point of my application is:
gRPC servicer (class DexFxServicer in the code below) has Transmit method which is called by gRPC client outside.
Transmit method creates multiple channels and stubs for the different hosts from hostList.
Further application creates the process pool and launches it.
Each child process calls gRPC method SendHostListAndGetMetrics for its own stub and receives response iterator.
This code works well, the application invokes Transmit method and receive all needed results from the process pool. But I noticed when outside gRPC client calls Transmit method multiple times, this code didn't close some of its child processes. And it leads to extra nonclosing processes creation as htop shows.
When I try to close gRPC channels by channel.close() method, extra processes are being created more intensively.
Python 2.7.12
grpcio==1.16.1
grpcio-tools==1.16.1
Ubuntu 16.04.6 LTS 4.4.0-143-generic
from concurrent import futures
import sleep
import grpc
import sys
import cascade_pb2
import cascade_pb2_grpc
import metrics_pb2
import metrics_pb2_grpc
from multiprocessing import Pool
class DexFxServicer(cascade_pb2_grpc.DexFxServicer):
def __init__(self, args):
self.args = args
def Transmit(self, request, context):
entrypoint = request.sender.host_address # entrypoint is a string
hostList = [] # hostList is a list of strings
for rec in request.sender.receiver:
hostList.append(rec.host_address)
channels = {}
stubs = {}
for host in hostList:
try:
channels[host] = grpc.insecure_channel('%s:%d' % (host, self.args.cascadePort))
except Exception as e:
print(e)
sys.exit(0)
else:
stubs[host] = metrics_pb2_grpc.MetricsStub(channels[host])
def collect_metrics(host):
mtrx = []
hosts = (metrics_pb2.Host(hostname = i) for i in hostList + [entrypoint])
for i in stubs[host].SendHostListAndGetMetrics(hosts):
mtrx.append(i.mtrx)
return mtrx
pool = Pool(len(hostList))
results = pool.map(collect_metrics, hostList)
pool.close()
pool.terminate()
pool.join()
# Return the iterator of the results
I expect to see the code which doesn't create extra nonclosing processes. Please, suggest me what to do in this case.

The problem was solved by means of update grpcio version to 1.23.0. gRPC issue

Related

Python: how to create a server to supervise a thread pool?

I have a thread pool that handles some tasks concurrently. Now I'd like the tasks (multiply_by_2 here) to print something before exit.
Originally, I created a lock and passed the lock to each worker thread. If a thread wants to print something, it first acquires the lock, prints its message to stdout, then releases the lock.
Now, I want to have a dedicated event-driven server thread to handle the printing. If a thread wants to print something, it just send its message to that server, via a Unix domain socket (AF_UNIX). I hope in this way, each thread's blocking time can be reduced (no need to wait for the lock) and I don't need to share a lock among worker threads. The server thread just prints whatever messages it got from clients (i.e. the worker threads) in order.
I tried for some time with Python's asyncio module (requiring Python 3.7+) but couldn't figure it out. How should I do it?
This cleaned-up template is:
# Python 3.7+
import asyncio
import multiprocessing.dummy as mp # Threading wrapped using multiprocessing API.
import os
import socket
import sys
import threading
import time
server_address = './uds_socket' # UNIX domain socket
def run_multiple_clients_until_complete(input_list):
pool = mp.Pool(8)
result_list = pool.map(multiply_by_2, input_list)
return result_list
def multiply_by_2(n):
time.sleep(0.2) # Simulates some blocking call.
message_str = "client: n = %d" % n
# TODO send message_str.encode() to server
return n * 2
# Server's callback when it gets a client connection
# If you want to change it, please do..
def client_connected_cb(
stream_reader: asyncio.StreamReader,
stream_writer: asyncio.StreamWriter) -> None:
message_str = reader.read().decode()
print(message_str)
def create_server_thread():
pass # TODO
# Let the server finish handling all connections it got, then
# stop the server and join the thread
def stop_server_and_wait_thread(thread):
pass # TODO
def work(input_list):
thread = create_server_thread()
result_list = run_multiple_clients_until_complete(input_list)
stop_server_and_wait_thread(thread)
return result_list
def main():
input_list = list(range(20))
result_list = work(input_list)
print(result_list)
if __name__ == "__main__":
sys.exit(main())
Some extra requirements:
Don't make async: run_multiple_clients_until_complete(), multiply_by_2(), main().
It would be nicer to use the SOCK_DGRAM UDP protocol instead of SOCK_STREAM TCP, but it's unnecessary.

Python - How to use FastAPI and uvicorn.run without blocking the thread?

I'm looking for a possibility to use uvicorn.run() with a FastAPI app but without uvicorn.run() is blocking the thread. I already tried to use processes, subprocessesand threads but nothing worked.
My problem is that I want to start the Server from another process that should go on with other tasks after starting the server. Additinally I have problems closing the server like this from another process.
Has anyone an idea how to use uvicorn.run() non blocking and how to stop it from another process?

Approach given by #HadiAlqattan will not work because uvicorn.run expects to be run in the main thread. Errors such as signal only works in main thread will be raised.
Correct approach is:
import contextlib
import time
import threading
import uvicorn
class Server(uvicorn.Server):
def install_signal_handlers(self):
pass
#contextlib.contextmanager
def run_in_thread(self):
thread = threading.Thread(target=self.run)
thread.start()
try:
while not self.started:
time.sleep(1e-3)
yield
finally:
self.should_exit = True
thread.join()
config = uvicorn.Config("example:app", host="127.0.0.1", port=5000, log_level="info")
server = Server(config=config)
with server.run_in_thread():
# Server is started.
...
# Server will be stopped once code put here is completed
...
# Server stopped.
Very handy to run a live test server locally using a pytest fixture:
# conftest.py
import pytest
#pytest.fixture(scope="session")
def server():
server = ...
with server.run_in_thread():
yield
Credits: uvicorn#742 by florimondmanca

This is an alternate version which works and was inspired by Aponace uvicorn#1103. The uvicorn maintainers want more community engagement with this issue, so if you are experiencing it, please join the conversation.
Example conftest.py file.
import pytest
from fastapi.testclient import TestClient
from app.main import app
import multiprocessing
from uvicorn import Config, Server
class UvicornServer(multiprocessing.Process):
def __init__(self, config: Config):
super().__init__()
self.server = Server(config=config)
self.config = config
def stop(self):
self.terminate()
def run(self, *args, **kwargs):
self.server.run()
#pytest.fixture(scope="session")
def server():
config = Config("app.main:app", host="127.0.0.1", port=5000, log_level="debug")
instance = UvicornServer(config=config)
instance.start()
yield instance
instance.stop()
#pytest.fixture(scope="module")
def mock_app(server):
client = TestClient(app)
yield client
Example test_app.py file.
def test_root(mock_app):
response = mock_app.get("")
assert response.status_code == 200

When I set reload to False, fastapi will start a multi-process web service. If it is true, there will only be one process for the web service
import uvicorn
from fastapi import FastAPI, APIRouter
from multiprocessing import cpu_count
import os
router = APIRouter()
app = FastAPI()
#router.post("/test")
async def detect_img():
print("pid:{}".format(os.getpid()))
return os.getpid
if __name__ == '__main__':
app.include_router(router)
print("cpu个数：{}".format(cpu_count()))
workers = 2*cpu_count() + 1
print("workers:{}".format(workers))
reload = False
#reload = True
uvicorn.run("__main__:app", host="0.0.0.0", port=8082, reload=reload, workers=workers, timeout_keep_alive=5,
limit_concurrency=100)

According to Uvicorn documentation there is no programmatically way to stop the server.
instead, you can stop the server only by pressing ctrl + c (officially).
But I have a trick to solve this problem programmatically using multiprocessing standard lib with these three simple functions :
A run function to run the server.
A start function to start a new process (start the server).
A stop function to join the process (stop the server).
from multiprocessing import Process
import uvicorn
# global process variable
proc = None
def run():
"""
This function to run configured uvicorn server.
"""
uvicorn.run(app=app, host=host, port=port)
def start():
"""
This function to start a new process (start the server).
"""
global proc
# create process instance and set the target to run function.
# use daemon mode to stop the process whenever the program stopped.
proc = Process(target=run, args=(), daemon=True)
proc.start()
def stop():
"""
This function to join (stop) the process (stop the server).
"""
global proc
# check if the process is not None
if proc:
# join (stop) the process with a timeout setten to 0.25 seconds.
# using timeout (the optional arg) is too important in order to
# enforce the server to stop.
proc.join(0.25)
With the same idea you can :
use threading standard lib instead of using multiprocessing standard lib.
refactor these functions into a class.
Example of usage :
from time import sleep
if __name__ == "__main__":
# to start the server call start function.
start()
# run some codes ....
# to stop the server call stop function.
stop()
You can read more about :
Uvicorn server.
multiprocessing standard lib.
threading standard lib.
Concurrency to know more about multi processing and threading in python.

How to create nntplib objects using Multiprocessing

trying for 2 days to get multiprocessing to work when creating connections to an NNTP server. Goal: make a bunch of connections (like 50) as fast as possible. As making connections can be slow in a for loop (like upto 10 sec), i want to make them all 'at once' using multiprocessing. After creation of the connections, they remain open, as 10,000+ request will be made in some future multiprocessing part, relying on similar principle.
Some simplified part of the code:
#!/usr/bin/env python3
import sys
import ssl
from nntplib import NNTP_SSL
from multiprocessing import Pool
def MakeCon(i, host, port):
context = ssl.SSLContext(ssl.PROTOCOL_TLS)
s = NNTP_SSL(host, port=port, ssl_context=context, readermode=True)
print('created connection', i) # print to see progress
sys.stdout.flush()
return s
def Main():
host = 'reader.xsnews.nl'
port = 563
num_con = 4
y=MakeCon(1, host, port).getwelcome() #request some message from NNTP host to see if it works
print(y)
# the actual part that has the issue:
if __name__ == '__main__':
cons = range(num_con)
s = [None] * num_con
pool = Pool()
for con in cons:
s[con]=pool.apply_async(MakeCon, args=(con, host, port))
pool.close
print(s[1])
for con in cons:
t=s[con].getwelcome() #request some message from NNTP host to see if it works
print(t)
print('end')
Main()
Showing that the connection to the NNTP server etc works, but I fail at the part to extract the connections into some object I can use in combination with the nntplib options. I would say I ain't that experienced with python, especially not multiprocessing.

There are a few different issues with your approach. The biggest is that it won't work to create the connection in different processes and then send them to the main process. This is because each connection opens a socket and sockets are not serializable (pickable) and can therefore not be sent between processes.
And even if it had worked, the usage of .apply_sync() is not the right way to go. It is better to use .map() which returns the output from the function call directly (as opposed to .apply_sync() that returns an object from which the return value can be extracted).
However, in the current situation, the program is I/O bound, rather than CPU bound, and in these situations threading works just as well as multiprocessing, since the GIL won't hold back the execution. Thus, changing to threads instead of multiprocessing and to .map()from .apply_sync() gives the following solution:
#!/usr/bin/env python3
import sys
import ssl
from nntplib import NNTP_SSL
from multiprocessing.pool import ThreadPool
def MakeCon(i, host, port):
context = ssl.SSLContext(ssl.PROTOCOL_TLS)
s = NNTP_SSL(host, port=port, ssl_context=context, readermode=True)
print('created connection', i) # print to see progress
sys.stdout.flush()
return s
def Main():
host = 'reader.xsnews.nl'
port = 563
num_con = 4
y=MakeCon(1, host, port).getwelcome() #request some message from NNTP host to see if it works
print(y)
return con
cons = range(num_con)
s = [None] * num_con
pool = ThreadPool()
s=pool.map(lambda con: MakeCon(con, host, port), cons)
pool.close
if __name__ == "__main__":
Main()
A small word of advice, though. Be careful with creating too many connections, since that might not be looked to nicely upon from the server, since you are draining resources doing this.
Also, if you are to use your different connections to fetch articles these calls should probably also be done in different threads.
And, as a final comment, the same effect as using threads is to use asyncio. That, however, is something you probably need to study a while before you feel comfortable using.

Worker connecting to server but executing on client with multiprocessing package (python 2.7)

First post here, hello everyone.
I have a problem with the multiprocessing package with python 2.7.
I wish to have some processes run in parallel on a server; they do connect but they are executed locally instead.
This is the code I use on the server (Ubuntu 14.04):
from multiprocessing import Process
from multiprocessing.managers import BaseManager
from multiprocessing import cpu_count
class MyManager(BaseManager):
pass
def server():
mgr = MyManager(address=("", 2288), authkey="12345")
mgr.get_server().serve_forever()
if __name__ == "__main__":
print "number of cpus/cores:", cpu_count()
server = Process(target=server)
server.start()
print "server started"
server.join()
server.terminate()
while this is the code that runs on the client (Mac OS 10.11):
from multiprocessing import Manager
from multiprocessing import Process
from multiprocessing import current_process
from multiprocessing.managers import BaseManager
from math import sqrt
class MyManager(BaseManager):
pass
def worker(address, port, authkey):
mgr = MyManager(address=(address, port), authkey=authkey)
try:
mgr.connect()
print "- {} connected to {}:{}".format(current_process().name, address, port)
except:
print "- {} could not connect to server ({}:{})".format(current_process().name, address, port)
current_process().authkey = authkey
for k in range(1000000000):
sqrt(k * k)
if __name__ == "__main__":
# create processes
p = [Process(target=worker, args=("xx.xx.xx.xx", 2288, "12345")) for _ in range(4)]
# start processes
for each in p:
each.start()
# join the processes
for each in p:
each.join()
The for loop
for k in range(1000000000):
sqrt(k * k)
that's inside the worker function is just to let the workers process a lot, so I can monitor their activity into Activity Monitor or with top.
The problem is that the processes connect (as a matter of fact if I put a wrong address they do not) but they are executed on the local machine, as I see the server CPUs staying idle while the local CPUs going all towards 100%.
Am I getting something wrong?

You are starting your Process locally on your client. p and for each in p: each.start() is executed on your client, where it is run and starts the workers.
While each Process "connects" to the Manager via mgr.connect() it never interacts with it. The local Processes don't magically transfer to your server just because you opened a connection. Furthermore, a Manager isn't meant to run workers, it is meant to share data.
You'd have to start workers on the server, then send work to there.

Python Tornado - How to create background process?

Is it possible to let Python Tornado run some long background process, but concurrently, it is also serving all the handlers?
I have a Tornado Webapp that serves some webpages. But I also have a message queue, and I want Tornado to poll the message queue as a subscriber. Can this be done in Tornado?
I've searched around the user guide, and there seems to be something called a periodic_call_back I can use within the ioloop. It sounds like I can use a callback function that reads a message queue. However, is there a way to create a co-routine that never stops?
Any help is appreciated, thanks!

To Read from Zero-MQ:
Install Zero-MQ Python Library
Install the IOLoop before application.listen()
Use an executor (For python2, you can install executor libraries from python3) to execute a message queue listener, which setups tornado to listen to a message queue, and then it will utilize callbacks when it recieves data.
Example (main.py):
# Import tornado libraries
import tornado.ioloop
import tornado.web
# Import URL mappings
from url import application
# Import zeroMQ libraries
from zmq.eventloop import ioloop
# Import zeroMQ.py functions
from zeroMQ import startListenToMessageQueue
# Import zeroMQ settings
import zeroMQ_settings
# Import our executor
import executors
# Import our db_settings
import db_settings
# main.py is the main access point of the tornado app, to run the application, just run "python main.py"
# What this will do is listen to port 8888, and then we can access the app using
# http://localhost:8888 on any browser, or using python requests library
if __name__ == "__main__":
# Install PyZMQ's IOLoop
ioloop.install()
# Set the application to listen to port 8888
application.listen(8888)
# Get the current IOLoop
currentIOLoop = tornado.ioloop.IOLoop.current()
# Execute ZeroMQ Subscriber for our topics
executors.executor.submit(startListenToMessageQueue(zeroMQ_settings.server_subscriber_ports,
zeroMQ_settings.server_subscriber_IP,
zeroMQ_settings.server_subscribe_list))
# Test if the connection to our database is successful before we start the IOLoop
db_settings.testForDatabase(db_settings.database)
# Start the IOLoop
currentIOLoop.start()
Example (zeroMQ.py):
# Import our executor
import executors
# Import zeroMQ libraries
import zmq
from zmq.eventloop import ioloop, zmqstream
# Import db functions to process the message
import db
# zeroMQ.py deals with the communication between a zero message queue
def startListenToMessageQueue(subscribe_ports, subscribe_IP, subscribe_topic):
# Usage:
# This function starts the subscriber for our application that will listen to the
# address and ports specified in the zeroMQ_settings.py, it will spawn a callback when we
# received anything relevant to our topic.
# Arguments:
# None
# Return:
# None
# Get zmq context
context = zmq.Context()
# Get the context socket
socket_sub = context.socket(zmq.SUB)
# Connect to multiple subscriber ports
for ports in subscribe_ports:
socket_sub.connect("tcp://"+str(subscribe_IP)+":"+str(ports))
# Subscribe to our relevant topics
for topic in subscribe_topic:
socket_sub.setsockopt(zmq.SUBSCRIBE, topic)
# Setup ZMQ Stream with our socket
stream_sub = zmqstream.ZMQStream(socket_sub)
# When we recieve our data, we will process the data by using a callback
stream_sub.on_recv(processMessage)
# Print the Information to Console
print "Connected to publisher with IP:" + \
str(subscribe_IP) + ", Port" + str(subscribe_ports) + ", Topic:" + str(subscribe_topic)
def processMessage(message):
# Usage:
# This function processes the data using a callback format. The on_recv will call this function
# and populate the message variable with the data that we recieved through the message queue
# Arguments:
# message: a string containing the data that we recieved from the message queue
# Return:
# None
# Process the message with an executor, and use the addData function in our db to process the message
executors.executor.submit(db.addData, message)
Example (executors.py):
# Import futures library
from concurrent import futures
# executors.py will create our threadpools, and this can be shared around different python files
# which will not re-create 10 threadpools when we call it.
# we can a handful of executors for running synchronous tasks
# Create a 10 thread threadpool that we can use to call any synchronous/blocking functions
executor = futures.ThreadPoolExecutor(10)
Example (zeroMQ_settings.py):
# zeroMQ_settings.py keep the settings for zeroMQ, for example port, IP, and topics that
# we need to subscribe
# Set the Port to 5558
server_subscriber_ports = ["5556", "5558"]
# Set IP to localhost
server_subscriber_IP = "localhost"
# Set Message to Subscribe: metrics.dat
server_subscriber_topic_metrics = "metrics.dat"
# Set Message to Subscribe: test-010
server_subscribe_topics_test_010 = "test-010"
# List of Subscriptions
server_subscribe_list = [server_subscriber_topic_metrics, server_subscribe_topics_test_010]
Extra thanks to #dano

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

The gRPC object creates extra processes that don't close - python

The problem was solved by means of update grpcio version to 1.23.0. gRPC issue

Related

Python: how to create a server to supervise a thread pool?

Python - How to use FastAPI and uvicorn.run without blocking the thread?

How to create nntplib objects using Multiprocessing

Worker connecting to server but executing on client with multiprocessing package (python 2.7)

Python Tornado - How to create background process?

Categories

Resources