tornado - transferring a file to cdn without blocking

tornado - transferring a file to cdn without blocking - python

I have the nginx upload module handling site uploads, but still need to transfer files (let's say 3-20mb each) to our cdn, and would rather not delegate that to a background job.
What is the best way to do this with tornado without blocking other requests? Can i do this in an async callback?

You may find it useful in the overall architecture of your site to add a message queuing service such as RabbitMQ.
This would let you complete the upload via the nginx module, then in the tornado handler, post a message containing the uploaded file path and exit. A separate process would be watching for these messages and handle the transfer to your CDN. This type of service would be useful for many other tasks that could be handled offline ( sending emails, etc.. ). As your system grows, this also provides you a mechanism to scale by moving queue processing to separate machines.
I am using an architecture very similar to this. Just make sure to add your message consumer process to supervisord or whatever you are using to manage your processes.
In terms of implementation, if you are on Ubuntu installing RabbitMQ is a simple:
sudo apt-get install rabbitmq-server
On CentOS w/EPEL repositories:
yum install rabbit-server
There are a number of Python bindings to RabbitMQ. Pika is one of them and it happens to be created by an employee of LShift, who is responsible for RabbitMQ.
Below is a bit of sample code from the Pika repo. You can easily imagine how the handle_delivery method would accept a message containing a filepath and push it to your CDN.
import sys
import pika
import asyncore
conn = pika.AsyncoreConnection(pika.ConnectionParameters(
sys.argv[1] if len(sys.argv) > 1 else '127.0.0.1',
credentials = pika.PlainCredentials('guest', 'guest')))
print 'Connected to %r' % (conn.server_properties,)
ch = conn.channel()
ch.queue_declare(queue="test", durable=True, exclusive=False, auto_delete=False)
should_quit = False
def handle_delivery(ch, method, header, body):
print "method=%r" % (method,)
print "header=%r" % (header,)
print " body=%r" % (body,)
ch.basic_ack(delivery_tag = method.delivery_tag)
global should_quit
should_quit = True
tag = ch.basic_consume(handle_delivery, queue = 'test')
while conn.is_alive() and not should_quit:
asyncore.loop(count = 1)
if conn.is_alive():
ch.basic_cancel(tag)
conn.close()
print conn.connection_close

advice on the tornado google group points to using an async callback (documented at http://www.tornadoweb.org/documentation#non-blocking-asynchronous-requests) to move the file to the cdn.
the nginx upload module writes the file to disk and then passes parameters describing the upload(s) back to the view. therefore, the file isn't in memory, but the time it takes to read from disk–which would cause the request process to block itself, but not other tornado processes, afaik–is negligible.
that said, anything that doesn't need to be processed online shouldn't be, and should be deferred to a task queue like celeryd or similar.

Related

Using gevent and Flask to implement websocket, how to achieve concurrency?

So I'm using Flask_Socket to try to implement a websocket on Flask. Using this I hope to notify all connected clients whenever a piece of data has changed. Here's a simplification of my routes/index.py. The issue that I have is that when a websocket connection is opened, it will stay in the notify_change loop until the socket is closed, and in the meantime, other routes like /users can't be accessed.
from flask_sockets import Sockets
sockets = Sockets(app)
#app.route('/users',methods=['GET'])
def users():
return json.dumps(get_users())
data = "Some value" # the piece of data to push
is_dirty = False # A flag which is set by the code that changes the data
#sockets.route("/notifyChange")
def notify_change(ws):
global is_dirty, data
while not ws.closed:
if is_dirty:
is_dirty = False
ws.send(data)
This seems a normal consequence of what is essentially a while True: however, I've been looking online for a way to get around this while still using flask_sockets and haven't had any luck. I'm running the server on GUnicorn
flask/bin/gunicorn -b '0.0.0.0:8000' -k flask_sockets.worker app:app
I tried adding threads by doing --threads 12 but no luck.
Note: I'm only serving up to 4 users at a time, so scaling is not a requirement, and the solution doesn't need to be ultra-optimized.

Setting up Rabbit MQ Heartbeat with Kombu

Edit:
The main issue is the 3rd party rabbitmq machine seems to kill idle connections every now and then. That's when I start getting "Broken Pipe" exceptions. The only way to gets comms. back to normal is for me to kill the processes and restart them. I assume there's a better way?
--
I'm a little lost here. I am connecting to a 3rd party RabbitMQ server to push messages to. Every now and then all the sockets on their machine gets dropped and I end up getting a "Broken Pipe" exception.
I've been told to implement a heartbeat check in my code but I'm not sure how exactly. I've found some info here: http://kombu.readthedocs.org/en/latest/changelog.html#version-2-3-0 but no real example code.
Do I only need to add "?heartbeat=x" to the connection string? Does Kombu do the rest? I see I need to call "Connection.heartbeat_check()" at "x/2". Should I create a periodic task to call this? How does the connection get re-established?
I'm using:
celery==3.0.12
kombu==2.5.4
My code looks like this right now. A simple Celery task gets called to send the message through to the 3rd party RabbitMQ server (removed logging and comments to keep it short, basic enough):
class SendMessageTask(Task):
name = "campaign.backends.send"
routing_key = "campaign.backends.send"
ignore_result = True
default_retry_delay = 60 # 1 minute.
max_retries = 5
def run(self, send_to, message, **kwargs):
payload = "Testing message"
try:
conn = BrokerConnection(
hostname=HOSTNAME,
port=PORT,
userid=USER_ID,
password=PASSWORD,
virtual_host=VHOST
)
with producers[conn].acquire(block=True) as producer:
publish = conn.ensure(producer, producer.publish, errback=sending_errback, max_retries=3)
publish(
body=payload,
routing_key=OUT_ROUTING_KEY,
delivery_mode=2,
exchange=EXCHANGE,
serializer=None,
content_type='text/xml',
content_encoding = 'utf-8'
)
except Exception, ex:
print ex
Thanks for any and all help.

While you certainly can add heartbeat support to a producer, it makes more sense for consumer processes.
Enabling heartbeats means that you have to send heartbeats regularly, e.g. if the heartbeat is set to 1 second, then you have to send a heartbeat every second or more or the remote will close the connection.
This means that you have to use a separate thread or use async io to reliably send heartbeats in time, and since a connection cannot be shared between threads this leaves us with async io.
The good news is that you probably won't get much benefit adding heartbeats to a produce-only connection.

Counting the number of requests per second in Tornado

I am new to Python and Tornado WebServer.
I am trying to figure out the number of request and number of requests/second in my server side code. I am using Tornadio2 to implement websockets.
Kindly take a look at the following code and let me know, what modification can be done to it.
I am using the RequestHandler.prepare() to bottleneck all the requests and using a list as it is immutable to store the count.
Consider all modules are included
count=[0]
class IndexHandler(tornado.web.RequestHandler):
"""Regular HTTP handler to serve the chatroom page"""
def prepare(self):
count[0]=count[0]+1
def get(self):
self.render('index1.html')
class SocketIOHandler(tornado.web.RequestHandler):
def get(self):
self.render('../socket.io.js')
partQue=Queue.Queue()
class ChatConnection(tornadio2.conn.SocketConnection):
participants = set()
def on_open(self, info):
self.send("Welcome from the server.")
self.participants.add(self)
def on_message(self, message):
partQue.put(message)
time.sleep(10)
self.qmes=partQue.get()
for p in self.participants:
p.send(self.qmes+" "+str(count[0]))
partQue.task_done()
def on_close(self):
self.participants.remove(self)
partQue.join()
# Create tornadio server
ChatRouter = tornadio2.router.TornadioRouter(ChatConnection)
# Create socket application
sock_app = tornado.web.Application(
ChatRouter.urls,
flash_policy_port = 843,
flash_policy_file = op.join(ROOT, 'flashpolicy.xml'),
socket_io_port = 8002)
# Create HTTP application
http_app = tornado.web.Application(
[(r"/", IndexHandler), (r"/socket.io.js", SocketIOHandler)])
if __name__ == "__main__":
import logging
logging.getLogger().setLevel(logging.DEBUG)
# Create http server on port 8001
http_server = tornado.httpserver.HTTPServer(http_app)
http_server.listen(8001)
# Create tornadio server on port 8002, but don't start it yet
tornadio2.server.SocketServer(sock_app, auto_start=False)
# Start both servers
tornado.ioloop.IOLoop.instance().start()
Also, I am confused about every Websocket messages. Does each Websocket event got to server in the form of an HTTP request? or a Socket.IO request?

I use Siege - excellent tool for testing requests if your running on linux. Example
siege http://localhost:8000/?q=yourquery -c10 -t10s
-c10 = 10 concurrent users
-t10s = 10 seconds

Tornadio2 has built-in statistics module, which includes incoming connections/s and other counters.
Check following example: https://github.com/MrJoes/tornadio2/tree/master/examples/stats

When testing applications, always approach performance testing with a healthy appreciation for the uncertainty principle..
If you want to test a server, hook up two PCs to a HUB where you can monitor traffic from one going to the other. Then bang the hell out of the server. There are a variety of tools for doing this, just look for web load testing tools.

Normal HTTP requests in Tornado create a new RequestHandler instance, which persists until the connection is terminated.
WebSockets use persistent connections. One WebSocketHandler instance is created, and each message sent by the browser to the server calls the on_message method.
From what I understand, Socket.IO/Tornad.IO will use WebSockets if supported by the browser, falling back to long polling.

celery get tasks count

I am using python celery+rabbitmq. I can't find a way to get task count in some queue.
Some thing like this:
celery.queue('myqueue').count()
Is it posible to get tasks count from certaint queue?
One solution is to run external command from my python scrpit:
"rabbitmqctl list_queues -p my_vhost"
and parse results, is it good way to do this?

I suppose that using rabbitmqctl command is not good solution, especially on my ubuntu server, where rabbitmqctl can be executed only with root privileges.
By playing with pika objects I found working solution:
import pika
from django.conf import settings
def tasks_count(queue_name):
''' Connects to message queue using django settings and returns count of messages in queue with name queue_name. '''
credentials = pika.PlainCredentials(settings.BROKER_USER, settings.BROKER_PASSWORD)
parameters = pika.ConnectionParameters( credentials=credentials,
host=settings.BROKER_HOST,
port=settings.BROKER_PORT,
virtual_host=settings.BROKER_VHOST)
connection = pika.BlockingConnection(parameters=parameters)
channel = connection.channel()
queue = channel.queue_declare(queue=queue_name, durable=True)
message_count = queue.method.message_count
return message_count
I did not find documentation about inspecting the AMQP queue with pika, so I do not know about solution's correctness.

How can I list or discover queues on a RabbitMQ exchange using python?

I need to have a python client that can discover queues on a restarted RabbitMQ server exchange, and then start up a clients to resume consuming messages from each queue. How can I discover queues from some RabbitMQ compatible python api/library?

There does not seem to be a direct AMQP-way to manage the server but there is a way you can do it from Python. I would recommend using a subprocess module combined with the rabbitmqctl command to check the status of the queues.
I am assuming that you are running this on Linux. From a command line, running:
rabbitmqctl list_queues
will result in:
Listing queues ...
pings 0
receptions 0
shoveled 0
test1 55199
...done.
(well, it did in my case due to my specific queues)
In your code, use this code to get output of rabbitmqctl:
import subprocess
proc = subprocess.Popen("/usr/sbin/rabbitmqctl list_queues", shell=True, stdout=subprocess.PIPE)
stdout_value = proc.communicate()[0]
print stdout_value
Then, just come up with your own code to parse stdout_value for your own use.

As far as I know, there isn't any way of doing this. That's nothing to do with Python, but because AMQP doesn't define any method of queue discovery.
In any case, in AMQP it's clients (consumers) that declare queues: publishers publish messages to an exchange with a routing key, and consumers determine which queues those routing keys go to. So it does not make sense to talk about queues in the absence of consumers.

You can add plugin rabbitmq_management
sudo /usr/lib/rabbitmq/bin/rabbitmq-plugins enable rabbitmq_management
sudo service rabbitmq-server restart
Then use rest-api
import requests
def rest_queue_list(user='guest', password='guest', host='localhost', port=15672, virtual_host=None):
url = 'http://%s:%s/api/queues/%s' % (host, port, virtual_host or '')
response = requests.get(url, auth=(user, password))
queues = [q['name'] for q in response.json()]
return queues
I'm using requests library in this example, but it is not significantly.
Also I found library that do it for us - pyrabbit
from pyrabbit.api import Client
cl = Client('localhost:15672', 'guest', 'guest')
queues = [q['name'] for q in cl.get_queues()]

Since I am a RabbitMQ beginner, take this with a grain of salt, but there's an interesting Management Plugin, which exposes an HTTP interface to "From here you can manage exchanges, queues, bindings, virtual hosts, users and permissions. Hopefully the UI is fairly self-explanatory."
http://www.rabbitmq.com/blog/2010/09/07/management-plugin-preview-release/

I use https://github.com/bkjones/pyrabbit. It's talks directly to RabbitMQ's mgmt plugin's API interface, and is very handy for interrogating RabbitMQ.

Management features are due in a future version of AMQP. So for now you will have to wait till for a new version that will come with that functionality.

I found this works for me, /els being my demo vhost name..
rabbitmqctl list_queues --vhost /els

pyrabbit didn't work so well for me; However, the Management Plugin itself has its own command line script that you can download from your own admin GUI and use later on (for example, I downloaded mine from
http://localhost:15672/cli/
for local use)

I would use simply this:
Just replace the user(default= guest), passwd(default= guest) and port with your values.
import requests
import json
def call_rabbitmq_api(host, port, user, passwd):
url = 'http://%s:%s/api/queues' % (host, port)
r = requests.get(url, auth=(user,passwd))
return r
def get_queue_name(json_list):
res = []
for json in json_list:
res.append(json["name"])
return res
if __name__ == '__main__':
host = 'rabbitmq_host'
port = 55672
user = 'guest'
passwd = 'guest'
res = call_rabbitmq_api(host, port, user, passwd)
print ("--- dump json ---")
print (json.dumps(res.json(), indent=4))
print ("--- get queue name ---")
q_name = get_queue_name(res.json())
print (q_name)
Referred from here: https://gist.github.com/hiroakis/5088513#file-example_rabbitmq_api-py-L2

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

tornado - transferring a file to cdn without blocking - python

I have the nginx upload module handling site uploads, but still need to transfer files (let's say 3-20mb each) to our cdn, and would rather not delegate that to a background job. What is the best way to do this with tornado without blocking other requests? Can i do this in an async callback?

Related

Using gevent and Flask to implement websocket, how to achieve concurrency?

Setting up Rabbit MQ Heartbeat with Kombu

Counting the number of requests per second in Tornado

celery get tasks count

How can I list or discover queues on a RabbitMQ exchange using python?

Categories

Resources