How do you launch a Go script in a django app safely?
I made a go script which is self contained. I would like to be able to launch a job from a django web app (I use celery to have the job run in the background). What would be the proper/safer way of achieving this? Maybe a way to isolate this process?
I feel that running...
os.system(f"./goscript -o {option1} -b {optiom2}")
...is quite unsafe.
as a bonus, I'd like to be able to get the output to see if the script crashes etc... but that is a bonus question.
Something like this should help IMHO
import shlex
import subprocess
def get_output(command, working_folder=None):
logging.debug("Executing %s in %s", command, working_folder)
try:
output = subprocess.check_output(shlex.split(command), cwd=working_folder)
return output.decode("utf-8")
except OSError:
logging.error("Command being executed: {}".format(command))
raise
Thanks to the people who've answered. I've actually justed remembered that a much better solution would be to use an asynchronous messaging queue library. The one I'm familiar with, and this is very easy to adapt is ZMQ https://zeromq.org/. It's dead easy to do a server/client setup with the Go script listening as a server and the django app requesting as a client for a job.
As a proof of concept, here's a snippet from the documentations of the different libraries.
Server in GO
This script is the server, written in Go, I beleive it can be set as a service to run continuously, waiting for django to send a job to do.
// source: https://github.com/pebbe/zmq4/blob/master/examples/hwserver.go
//
// Hello World server.
// Binds REP socket to tcp://*:5555
// Expects "Hello" from client, replies with "World"
//
package main
import (
zmq "github.com/pebbe/zmq4"
"fmt"
"time"
)
func main() {
// Socket to talk to clients
responder, _ := zmq.NewSocket(zmq.REP)
defer responder.Close()
responder.Bind("tcp://*:5555")
for {
// Wait for next request from client
msg, _ := responder.Recv(0)
fmt.Println("Received ", msg)
// Do some 'work', can take a whilst
time.Sleep(time.Second)
// Send reply back to client
reply := "World"
responder.Send(reply, 0)
fmt.Println("Sent ", reply)
}
}
Client in python
Here is the python app that can be easilly called within any http request. It can create the zmq context and start sending stuff to do to the go server.
# source: http://zguide.zeromq.org/py:hwclient
#
# Hello World client in Python
# Connects REQ socket to tcp://localhost:5555
# Sends "Hello" to server, expects "World" back
#
import zmq
context = zmq.Context()
# Socket to talk to server
print("Connecting to hello world server…")
socket = context.socket(zmq.REQ)
socket.connect("tcp://localhost:5555")
# Do 10 requests, waiting each time for a response
for request in range(10):
print("Sending request %s …" % request)
socket.send(b"Hello")
# Get the reply.
message = socket.recv()
print("Received reply %s [ %s ]" % (request, message))
# Gracefully closing the sockects
socket.close()
context.term()
# Back to normal django stuff
What's creat with this approach is that the clieant can dynamically creat and shut down the zmq context. Furthermore, you don't even have to have the go script running on the same server. You could communicate to any IP addresses. Provided you take the care to at least encrypte the packages, o look at the security features ZMQ provides.
--
note: I know I'm answering my own question, but it's like telephoning IT, you need to phone them to solve the problem right when they pick up the phone
Related
I am trying to adopt the ZeroMQ asynchronous client-server pattern described here with python multiprocessing. A brief description in the ZeroMQ guide
It's a DEALER/ROUTER for the client to server frontend communication and DEALER/DEALER for the server backend to the server workers communication. The server frontend and backend are connected using a zmq.proxy()-instance.
Instead of using threads, I want to use multiprocessing on the server. But requests from the client do not reach the server workers. However, they do reach the server frontend. And also the backend. But the backend is not able to connect to the server workers.
How do we generally debug these issues in pyzmq?How to turn on verbose logging for the sockets?
The python code snippets I am using -
server.py
import zmq
import time
from multiprocessing import Process
def run(context, worker_id):
socket = context.socket(zmq.DEALER)
socket.connect("ipc://backend.ipc")
print(f"Worker {worker_id} started")
try:
while True:
ident, msg = socket.recv_multipart()
print("Worker received %s from %s" % (msg, "ident"))
time.sleep(5)
socket.send_multipart([ident, msg])
print("Worker sent %s from %s" % (msg, ident))
except:
socket.close()
if __name__ == "__main__":
context = zmq.Context()
frontend = context.socket(zmq.ROUTER)
frontend.bind("tcp://*:5570")
backend = context.socket(zmq.DEALER)
backend.bind("ipc://backend.ipc")
N_WORKERS = 7
jobs = []
try:
for worker_id in range(N_WORKERS):
job = Process(target=run, args=(context, worker_id,))
jobs.append(job)
job.start()
zmq.proxy(frontend, backend)
for job in jobs:
job.join()
except:
frontend.close()
backend.close()
context.term()
client.py
import re
import zmq
from uuid import uuid4
if __name__ == "__main__":
context = zmq.Context()
socket = context.socket(zmq.DEALER)
identity = uuid4()
socket.identity = identity.encode("ascii")
socket.connect("tcp://localhost:5570")
poll = zmq.Poller()
poll.register(socket, zmq.POLLIN)
request = {
"body": "Some request body.",
}
socket.send_string(json.dumps(request))
while True:
for i in range(5):
sockets = dict(poll.poll(10))
if socket in sockets:
msg = socket.recv()
print(msg)
Q : "How to turn on verbose logging for the sockets?"
Start using the published native API socket_monitor() for all relevant details, reported as events arriving from socket-(instance)-under-monitoring.
Q : "How do we generally debug these issues in pyzmq?"
There is no general strategy on doing this. Having gone into a domain of a distributed-computing, you will almost always create your own, project-specific, tools for "collecting" & "viewing/interpreting" a time-ordered flow of (principally) distributed-events.
Last but not least : avoid trying to share a Context()-instance, the less "among" 8 processes
The Art of Zen of Zero strongly advocates to avoid any shape and form of sharing. Here, the one and the very same Context()-instance is referenced ("shared") via a multiprocessing.Process's process-instantiation call-signature interface, which does not make the inter-process-"sharing" work.
One may let each spawned process-instance create it's own Context()-instance and use it from inside its private space during its own life-cycle.
Btw, your code ignores any return-codes, documented in the native API, that help you handle ( in worse cases debug post-mortem ) what goes alongside the distributed-computing. The try: ... except: ... finally: scaffolding also helps a lot here.
Anyway, the sooner you will learn to stop using the blocking-forms of the { .send() | .recv() | .poll() }-methods, the better your code starts to re-use the actual powers of the ZeroMQ.
I have been having some issues with timeouts while sending messages to EventHub.
import sys
import logging
import datetime
import time
import os
from azure.eventhub import EventHubClient, Sender, EventData
logger = logging.getLogger("azure")
ADDRESS = "xxx"
USER = "xxx"
KEY = "xxx"
ENDPOINT = "xxx"
try:
if not ADDRESS:
raise ValueError("No EventHubs URL supplied.")
# Create Event Hubs client
client = EventHubClient(ADDRESS, username=USER, password=KEY, debug=True)
sender = client.add_sender(partition="0", send_timeout=300, keep_alive=10)
client.run()
try:
start_time = time.time()
for i in range(10000):
print("Sending message: {}".format(i))
message = "Message {}".format(i)
sender.send(EventData(message))
except:
raise
finally:
end_time = time.time()
client.stop()
run_time = end_time - start_time
logger.info("Runtime: {} seconds".format(run_time))
except KeyboardInterrupt:
pass
My context is as follow; i am able to send messages without problem from my personal development computer, from a virtual machine in Azure, and from on premises server1, but when trying to send messages to on premises server2 i receive the error:
azure.eventhub.common.EventHubError: Send failed: Message send failed with result: MessageSendResult.Timeout
I have tried modifying the send_timeout and the keep_alive (even though i dont belive this configurations are to blame) but with no success, my personal guess is that there is something in my on premises server2 that is blocking or interfering with my communication. Firstly, am i changing the timeout value correctly? i have checked the source code of the class here: link but it seems i am doing it right, but i actually belive such property implies the time after the message is in the queue for sending instead of how long we wait for the response of the event. Secondly, is there a way i can validate that the problem relies on the envoiroment of my on premises server2? for example like exploring the network path with traceroute, or dig? The system is a CentOS. Could it be related to new upgrades in the Python SDK? i just saw this other question where it shows that my method for uploading events has been upgraded just the "01/08/2020" maybe is something related to such upgrades(i doubt it)?
Anyhow, any clues would be greatly aprecciated. For now i will be testing on other servers and checking i can manage to change my implementation to the newer version and see if that solves the issue.
It sounds like a networking issue. Try pinging TCP endpoint of your namespace on port 9354 on server2. If firewall is blocking outbound connection to the endpoint, then either you need to fix it or try enabling websockets which can go through 443.
I'm working on a relatively simple Python / ZeroMQ based work distribution system, using REQ/ROUTER sockets. The system is distributed and worker nodes are geographically distributed on different continents.
The ROUTER, responsible for distributing work, .bind()-s a ROUTER socket. Workers .connect() to it over TCP using a REQ socket.
In the process of setting up a new worker node, I've noticed that while smaller messages (up to 1kB) do the trip with no issues, replies of ~2kB and up, sent by the ROUTER-end are never received by the worker into their REQ-socket - when I call recv(), the socket just hangs.
The worker code runs inside Docker containers, and I was able to work around the issue when running the same image with --net=host - it seems to not happen if Docker is using the host network.
I'm wondering if this is something in the network stack configuration on the host machine or in Docker, or maybe something that can be prevented in my code?
Here is a simplified version of my code that reproduces this issue:
Worker
import sys
import zmq
import logging
import time
READY = 'R'
def worker(connect_to):
ctx = zmq.Context()
socket = ctx.socket(zmq.REQ)
socket.connect(connect_to)
log = logging.getLogger(__name__)
while True:
socket.send_string(READY)
log.debug("Send READY message, waiting for reply")
message = socket.recv()
log.debug("Got reply of %d bytes", len(message))
time.sleep(5)
if __name__ == '__main__':
logging.basicConfig(level=logging.DEBUG)
worker(sys.argv[1])
Router
import sys
import zmq
import logging
REPLY_SIZE = 1024 * 8
def router(bind_to):
ctx = zmq.Context()
socket = ctx.socket(zmq.ROUTER)
socket.bind(bind_to)
poller = zmq.Poller()
poller.register(socket, zmq.POLLIN)
log = logging.getLogger(__name__)
while True:
socks = dict(poller.poll(5000))
if socks.get(socket) == zmq.POLLIN:
message = socket.recv_multipart()
log.debug("Received message of %d parts", len(message))
identity, _ = message[:2]
res = handle_message(message[2:])
log.debug("Sending %d bytes back in response on socket", len(res))
socket.send_multipart([identity, '', res])
def handle_message(parts):
log = logging.getLogger(__name__)
log.debug("Got message: %s", parts)
return 'A' * REPLY_SIZE
if __name__ == '__main__':
logging.basicConfig(level=logging.DEBUG)
router(sys.argv[1])
FWIW I was able to reproduce this on Ubuntu 16.04 (both router and worker) with Docker 17.09.0-ce, libzmq 4.1.5 and PyZMQ 15.4.0.
No, sir, the socket does not hang at all:
Why?
The issue is, that you have instructed the Socket()-instance to enter into an infinitely blocking state, once having called .recv() method, without specifying a zmq.NOBLOCK flag ( the ZMQ_DONTWAIT flag in the ZeroMQ original API ).
This is the cause, that upon other circumstances reported yesterday, moves the code into infinite blocking, as there seem to be other issues that prevent Docker-container to properly deliver any first message to the hands of the Worker's Docker-embedded-ZeroMQ-Context() I/O-engine and to the hands of the REQ-access-point. As the REQ-archetype uses a strict two-step Finite-State-Automaton - strictly striding ( .send()->.recv()->.send()-> ... ad infimum )
This cause->effect reversing is wrong and misleading -
the issue of "socket just hangs"
is un-decideable
from an issue Docker does not deliver a single message ( to allow .recv() to return )
Next steps:
may use .poll() in REQ-side to sniff without blocking for any already arrived message in the Worker.
Once there are none such, focus on Docker first + next may benefit from ZeroMQ Context()-I/O-engine performance and link-level tweaking configuration options.
I have a small asynchronous server implemented using bottle and gevent.wsgi. There is a routine used to implement long poll that looks pretty much like the "Event Callbacks" example in the bottle documentation:
def worker(body):
msg = msgbus.recv()
body.put(msg)
body.put(StopIteration)
#route('/poll')
def poll():
body = gevent.queue.Queue()
worker = gevent.spawn(worker, body)
return body
Here, msgbus is a ZMQ sub socket.
This all works fine, but if a client breaks the connection while
worker is blocked on msgbus.recv(), that greenlet task will hang
around "forever" (well, until a message is received), and will only
find out about the disconnected client when it attempts to send a
response.
I can use msgbus.poll(timeout=something) if I don't want to block
forever waiting for ipc messages, but I still can't detect a client
disconnect.
What I want to do is get something like a reference to the client
socket so that I can use it in some kind of select or poll loop,
or get some sort of asynchronous notification inside my greenlet, but
I'm not sure how to accomplish either of these things with these
frameworks (bottle and gevent).
Is there a way to get notified of client disconnects?
Aha! The wsgi.input variable, at least under gevent.wsgi, has an rfile member that is a file-like object. This doesn't appear to be required by the WSGI spec, so it might not work with other servers.
With this I was able to modify my code to look something like:
def worker(body, rfile):
poll = zmq.Poller()
poll.register(msgbus)
poll.register(rfile, zmq.POLLIN)
while True:
events = dict(poll.poll())
if rfile.fileno() in events:
# client disconnect!
break
if msgbus in events:
msg = msgbus.recv()
body.put(msg)
break
body.put(StopIteration)
#route('/poll')
def poll():
rfile = bottle.request.environ['wsgi.input'].rfile
body = gevent.queue.Queue()
worker = gevent.spawn(worker, body, rfile)
return body
And this works great...
...except on OpenShift, where you will have to use the
alternate frontend on port 8000 with websockets support.
I am writing a tool in python (platform is linux), one of the tasks is to capture a live tcp stream and to
apply a function to each line. Currently I'm using
import subprocess
proc = subprocess.Popen(['sudo','tcpflow', '-C', '-i', interface, '-p', 'src', 'host', ip],stdout=subprocess.PIPE)
for line in iter(proc.stdout.readline,''):
do_something(line)
This works quite well (with the appropriate entry in /etc/sudoers), but I would like to avoid calling an external program.
So far I have looked into the following possibilities:
flowgrep: a python tool which looks just like what I need, BUT: it uses pynids
internally, which is 7 years old and seems pretty much abandoned. There is no pynids package
for my gentoo system and it ships with a patched version of libnids
which I couldn't compile without further tweaking.
scapy: this is a package manipulation program/library for python,
I'm not sure if tcp stream
reassembly is supported.
pypcap or pylibpcap as wrappers for libpcap. Again, libpcap is for packet
capturing, where I need stream reassembly which is not possible according
to this question.
Before I dive deeper into any of these libraries I would like to know if maybe someone
has a working code snippet (this seems like a rather common problem). I'm also grateful if
someone can give advice about the right way to go.
Thanks
Jon Oberheide has led efforts to maintain pynids, which is fairly up to date at:
http://jon.oberheide.org/pynids/
So, this might permit you to further explore flowgrep. Pynids itself handles stream reconstruction rather elegantly.See http://monkey.org/~jose/presentations/pysniff04.d/ for some good examples.
Just as a follow-up: I abandoned the idea to monitor the stream on the tcp layer. Instead I wrote a proxy in python and let the connection I want to monitor (a http session) connect through this proxy. The result is more stable and does not need root privileges to run. This solution depends on pymiproxy.
This goes into a standalone program, e.g. helper_proxy.py
from multiprocessing.connection import Listener
import StringIO
from httplib import HTTPResponse
import threading
import time
from miproxy.proxy import RequestInterceptorPlugin, ResponseInterceptorPlugin, AsyncMitmProxy
class FakeSocket(StringIO.StringIO):
def makefile(self, *args, **kw):
return self
class Interceptor(RequestInterceptorPlugin, ResponseInterceptorPlugin):
conn = None
def do_request(self, data):
# do whatever you need to sent data here, I'm only interested in responses
return data
def do_response(self, data):
if Interceptor.conn: # if the listener is connected, send the response to it
response = HTTPResponse(FakeSocket(data))
response.begin()
Interceptor.conn.send(response.read())
return data
def main():
proxy = AsyncMitmProxy()
proxy.register_interceptor(Interceptor)
ProxyThread = threading.Thread(target=proxy.serve_forever)
ProxyThread.daemon=True
ProxyThread.start()
print "Proxy started."
address = ('localhost', 6000) # family is deduced to be 'AF_INET'
listener = Listener(address, authkey='some_secret_password')
while True:
Interceptor.conn = listener.accept()
print "Accepted Connection from", listener.last_accepted
try:
Interceptor.conn.recv()
except: time.sleep(1)
finally:
Interceptor.conn.close()
if __name__ == '__main__':
main()
Start with python helper_proxy.py. This will create a proxy listening for http connections on port 8080 and listening for another python program on port 6000. Once the other python program has connected on that port, the helper proxy will send all http replies to it. This way the helper proxy can continue to run, keeping up the http connection, and the listener can be restarted for debugging.
Here is how the listener works, e.g. listener.py:
from multiprocessing.connection import Client
def main():
address = ('localhost', 6000)
conn = Client(address, authkey='some_secret_password')
while True:
print conn.recv()
if __name__ == '__main__':
main()
This will just print all the replies. Now point your browser to the proxy running on port 8080 and establish the http connection you want to monitor.