speech to text processing - python

speech to text processing - python - python

I have a project in which the user will send an audio file from android/web to the server.
I need to perform speech to text processing on the server and return some files to the user back on android/web. However the server side is to be done using Python.
Please guide me as to how it could be done?

Alongside your web application, you can have a queue of tasks that need to be run and worker process(es) to run and track those tasks. This is a popular pattern when web requests need to either start tasks in the background, check in on tasks, or get the result of a task. An introduction to this pattern can be found in the Task Queues section of the Full Stack Python open book. Celery and RQ are two popular projects that supply task queue management and can plug into an existing Python web application, such as one built with Django or Flask.
Once you have task management, you'll have to decide how to keep the user up to date on the status of a task. If you're stuck with having to use RPC-style web service calls only, then you can have clients (e.g. Android or browser) poll for the status by making a call to a web service you've created that checks on the task via your task queue manager's API.
If you want the user to be informed faster or want to reduce wasteful overhead from constant polling, consider supplying a websocket instead. Through a websocket connection, clients could subscribe to notifications of events such as the completion of a speech-to-text job. The Autobahn|Python library provides server code for implementing websockets as well as support for a protocol on top called WAMP that can be used to communicate subscriptions and messages or call upon services. If you need to stick with Django, consider something like django-websocket-redis instead.

Related

Update single database value on a website with many users

For this question, I'm particularly struggling with how to structure this:
User accesses website
User clicks button
Value x in database increments
My issue is that multiple people could potentially be on the website at the same time and click the button - I want to make sure each user is able to click the button, and update the value and read the incremented value too, but I don't know how to circumvent any synchronisation/concurrency issues.
I'm using flask to run my website backend, and I'm thinking of using MongoDB or Redis to store my single value that needs to be updated.
Please comment if there is any lack of clarity in my question, but this is a problem I've really been struggling with how to solve.
Thanks :)

redis, I think you can use redis hincrby command, or create a distributed lock to make sure there is only one writer at the same time and only the lock holding writer can make the update in your flask framework. Make sure you release the lock after certain period of time or after the writer done using the lock.
mysql, you can start a transaction, and make the update and commit the change to make sure the data is right

To solve this problem I would suggest you follow a micro service architecture.
A service called worker would handle the flask route that's called when the user clicks on the link/button on the website. It would generate a message to be sent to another service called queue manager that maintains a queue of increment/decrement messages from the worker service.
There can be multiple worker service instances running concurrently but the queue manager is a singleton service that takes the messages from each service and adds them to the queue. If the queue manager is busy the worker service will either timeout and retry or return a failure message to the user. If the queue is full a response is sent back to the worker to retry n number of times, and you can count down that n.
A third service called storage manager is run every time the queue is not empty, this service sends the messages to the storage solution (whatever mongo, redis, good ol' sql) and it will ensure the increment/decrement messages are handled in the order they were received in the queue. You could also include a time stamp from the worker service in the message if you wanted to use that to sort the queue.
Generally whatever hosting environment for flask will use gunicorn as the production web server and support multiple concurrent worker instances to handle the http requests, and this would naturally be your worker service.
How you build and coordinate the queue manager and storage manager is down to implementation preference, for instance you could use something like Google Cloud pub/sub system to send messages between different deployed services but that's just off the top of my head. There's a load of different ways to do it, and you're in the best position to decide that.
Without knowing more details about what you're trying to achieve and what's the requirements for concurrent traffic I can't go into greater detail, but that's roughly how I've approached this type of problem in the past. If you need to handle more concurrent users at the website, you can pick a hosting solution with more concurrent workers. If you need the queue to be longer, you can pick a host with more memory, or else write the queue to an intermediate storage. This will slow it down but will make recovering from a crash easier.
You also need to consider handling when messages fail between different services, how to recover from a service crashing or the queue filling up.
EDIT: Been thinking about this over the weekend and a much simpler solution is to just create a new record in a table directly from the flask route that handles user clicks. Then to get your total you just get a count from this table. Your bottlenecks are going to be how many concurrent workers your flask hosting environment supports and how many concurrent connections your storage supports. Both of these can be solved by throwing more resources at them.

How to run a background control task inside a Flask app?

Context
I am working in an escape game company.
We currently have a Windows app that controls the games :
It runs a big loop that checks all the state of all the sensors (via queries to the serial port of the PC), take decisions and sends commands to that same serial port.
It has a GUI where the game master can monitor the status of the game and send manual commands to bypass some game logic when needed.
It works very well, but for stability reasons, update nightmare etc, we want to move away from Windows for that specific application. We want to run all this on Linux.
The project
The ideal thing would be a system where the PC that runs the game is headless and the escape room software is remotely controlled using a web interface. This is better that the current situation where the operators have to take remote control of the game PC using Windows Remote Desktop.
I would like to have some kind of RESTful API that can be queried by some JS webpages to display the state of the system and send commands to it.
I have the constrain to do the server part in Python.
But, I don't know how to approach that system.
In one hand, I will have a software that controls real world things and will, obviously, manage only one single game at a given time. Basically a big, non blocking, always running loop.
On the other hand, I will have a REST API to send command to the running game.
If I look at web frameworks, such as Flask, it provides RESTful API but it is designed to handle multiple connections at the same time and have them basically isolated from each other.
I don't see how I would make that web part discuss with the game system part.
As you can guess, I am not an expert at all. and I would like to keep the system as simple as possible to keep it manageable and understandable.
What should be the best (in term of simplicity) approach here ?
I tough of having two apps, one that runs the game and the web server, that sends commands and receive status through some sort of inter-process communication. But it looks complicated.
One dream thing would be to be able to have a sort of background task within the Flask framework that is running the game, sending the serial port requests and following the game scrips. At the same time, when REST request are received, the callback function of the request would have access to the variables of the background tasks to gather the status of the game and reply accordingly.
But I have no ideal how to do that. I even don't know what keyword to Google for to have an idea how to do that. Is there a common pattern here that would be so common that is supported by basic frameworks ? Or am I reinventing the wheel ?

To run a permanent background task in the same process as a Flask application, use a threading.Thread running a function with an infinite loop. Communicate through a queue.Queue which is thread-safe.
Note: if scaling past a single process, this would create multiple, separate control tasks which probably isn't desired. Scaling requires an external database or queue and a task framework such as Celery.
Example (based on Flask quickstart and basic thread usage):
from flask import Flask
from queue import Queue, Empty
from threading import Thread
from time import sleep
app = Flask(__name__)
commands = Queue()
def game_loop():
while True:
try:
command = commands.get_nowait()
print(command)
except Empty:
pass
sleep(5) # TODO poll other things
Thread(target=game_loop, daemon=True).start()
# Literally the Flask quickstart but pushing to the queue
#app.route("/")
def hello_world():
commands.put_nowait({ 'action': 'something' })
return "<p>Hello, World!</p>"

Best Way to Handle user triggered task (like import data) in Django

I need your opinion on a challenge that I'm facing. I'm building a website that uses Django as a backend, PostgreSQL as my DB, GraphQL as my API layer and React as my frontend framework. Website is hosted on Heroku. I wrote a python script that logs me in to my gmail account and parse few emails, based on pre-defined conditions, and store the parsed data into Google Sheet. Now, I want the script to be part of my website in which user will specify what exactly need to be parsed (i.e. filters) and then display the parsed data in a table to review accuracy of the parsing task.
The part that I need some help with is how to architect such workflow. Below are few ideas that I managed to come up with after some googling:
generate a graphQL mutation that stores a 'task' into a task model. Once a new task entry is stored, a Django Signal will trigger the script. Not sure yet if Signal can run custom python functions, but from what i read so far, it seems doable.
Use Celery to run this task asynchronously. But i'm not sure if asynchronous tasks is what i'm after here as I need this task to run immediately after the user trigger the feature from the frontend. But i'm might be wrong here. I'm also not sure if I need Redis to store the task details or I can do that on PostgreSQL.
What is the best practice in implementing this feature? The task can be anything, not necessarily parsing emails; it can also be importing data from excel. Any task that is user generated rather than scheduled or repeated task.
I'm sorry in advance if this question seems trivial to some of you. I'm not a professional developer and the above project is a way for me to sharpen my technical skills and learn new techniques.
Looking forward to learn from your experiences.

You can dissect your problem into the following steps:
User specifies task parameters
System executes task
System displays result to the User
You can either do all of these:
Sequentially and synchronously in one swoop; or
Step by step asynchronously.
Synchronously
You can run your script when generating a response, but it will come with the following downsides:
The process in the server processing your request will block until the script is finished. This may or may not affect the processing of other requests by that same server (this will depend on the number of simultaneous requests being processed, workload of the script, etc.)
The client (e.g. your browser) and even the server might time out if the script takes too long. You can fix this to some extent by configuring your server appropriately.
The beauty of this approach however is it's simplicity. For you to do this, you can just pass the parameters through the request, server parses and does the script, then returns you the result.
No setting up of a message queue, task scheduler, or whatever needed.
Asynchronously
Ideally though, for long-running tasks, it is best to have this executed outside of the usual request-response loop for the following advantages:
The server responding to the requests can actually serve other requests.
Some scripts can take a while, some you don't even know if it's going to finish
Script is no longer dependent on the reliability of the network (imagine running an expensive task, then your internet connection skips or is just plain intermittent; you won't be able to do anything)
The downside of this is now you have to set more things up, which increases the project's complexity and points of failure.
Producer-Consumer
Whatever you choose, it's usually best to follow the producer-consumer pattern:
Producer creates tasks and puts them in a queue
Consumer takes a task from the queue and executes it
The producer is basically you, the user. You specify the task and the parameters involved in that task.
This queue could be any datastore: in-memory datastore like Redis; a messaging queue like RabbitMQ; or an relational database management system like PostgreSQL.
The consumer is your script executing these tasks. There are multiple ways of running the consumer/script: via Celery like you mentioned which runs multiple workers to execute the tasks passed through the queue; via a simple time-based job scheduler like crontab; or even you manually triggering the script
The question is actually not trivial, as the solution depends on what task you are actually trying to do. It is best to evaluate the constraints, parameters, and actual tasks to decide which approach you will choose.
But just to give you a more relevant guideline:
Just keep it simple, unless you have a compelling reason to do so (e.g. server is being bogged down, or internet connection is not reliable in practice), there's really no reason to be fancy.
The more blocking the task is, or the longer the task takes or the more dependent it is to third party APIs via the network, the more it makes sense to push this to a background process add reliability and resiliency.
In your email import script, I'll most likely push that to the background:
Have a page where you can add a task to the database
In the task details page, display the task details, and the result below if it exists or "Processing..." otherwise
Have a script that executes tasks (import emails from gmail given the task parameters) and save the results to the database
Schedule this script to run every few minutes via crontab
Yes the above has side effects, like crontab running the script in multiple times at the same time and such, but I won't go into detail without knowing more about the specifics of the task.

Django with python ayncio to perform background task

I have two servers, A primary server that provide REST API to accept data from user and maintain a product details list. This server is also responsible to share product list (a subset of product data) with secondary server as soon as product is updated/created.
also note that secondary url depends on product details, not a fix server.
Primary server written in Django. I have used django model db signal as product update, create and delete event.
Now problem is that I don’t want to bock my primary server REST call until it populates detail to secondary server. I need some scheduler stuff to do that, i.e. create a task to populate data in background without blocking my current thread.
I found python asyncio module comes with a function 'run_in_executor', and its working till now, But I don’t have a knowledge of the side effect over django run in wsgi server, can anyone explain ? or any other alternate ?
I found django channel, but it need extra stuff like run worker thread separately, redis cache.

You should use Django Celery for running Tasks asynchronously or in the background.
Celery is a task queue with batteries included. It’s easy to use so that you can get started without learning the full complexities of the problem it solves.
You can get more information on celery from http://docs.celeryproject.org/en/latest/getting-started/first-steps-with-celery.html#first-steps

Coding a AMQP listening and user-facing daemon in Python

Edit: I posted this to python-list and tutor-list with no responses. Any advice would be much appreciated.
What is the best approach to writing a concurrent daemon that can execute callbacks for different types of events (AMQP messages, parsed output of a subprocess, HTTP requests)?
I am considering twisted, the built-in threading module, and greenlet. I must admit that I am very unfamiliar with concurrent programming and Python programming in general (formerly a data analysis driven procedural programmer). Any resources on threaded/concurrent programming (specifically daemons...not just multi-threading a single task) would be much appreciated.
Thanks!
Details:
1) Listens into AMQP messaging queues and executes callbacks when messages arrive.
Example: Immediately after startup, the daemon continuously listens to the Openstack Notifications messaging queue. When a virtual machine is launched, a notification is generated by Openstack with the hostname, IP address, etc. The daemon should read this message and write some info to a log (or POST the info to a server, or notify the user...something simple).
2) Parse the output of a subprocess and execute callbacks based on the output.
Example: Every 30 seconds, a system command "qstat" is run to query a job resource manager (e.g. TORQUE). Similar callbacks to 1).
3) Receive requests from a user and process them. I think this will be via WSGI HTTP.
Example: User submits an XML template with virtual machine templates. The daemon does some simple XML parsing and writes a job script for the job resource manager. The job is submitted to the resource manager and the daemon continually checks for the status of the job with "qstat" and for messages from AMQP. It should return "live" feedback to the user and write to a log.

You may want to look at the OpenStack Oslo project.
Start here:
https://wiki.openstack.org/wiki/Oslo
Oslo is basically a shared resource for all OpenStack applications. The focus here is providing re-usable code, and standardizing on methods that many applications create or use.
Messaging being a fundamental component of OpenStack has some break outs. Also, since openstack supports many messaging protocols, maybe doing direct AMQP isn't the right answer for you.
Anyways check this...
Messaging Specifically is being placed here:
https://github.com/openstack/oslo.messaging
I'd go dig into that repository and play with some of the methods made available there.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.