I've been building a website for some time, and I'm still stuck with that thing:
I store some small videos (~400MB approx. at best) for my website inside of a dbm database, and I'd like to stream them on my website.
I'm building the request handlers by hand using the Tornado python framework, and I was wondering how to build my handler. I never found how media stream worked and didn't find a lot of topics on the web.
So the complete result I'd want to achieve is having a web player on my website, where I can request specific videos, then playing them without having to load the entire file in memory/send it in 1 request.
These two links:
One for Tornado only: this appears to use special annotations.
One for Flask: albeit a motion JPEG example, it shows how you
can return a function that performs a "while" loop as a response.
Appear to be the answers you are looking for. And guess what? So am I!
Note that both use the "yield" keyword in python. It is unclear to me if the "coroutine" and "asynchronous" decorators are necessary in the Flask example (in other words, it is unclear if the example given in the link is complete...although, he literally wrote the book about it, so I suspect it is).
Beware: tests show that tornado.web holds on to the ENTIRE file during download, even if you stream it (i.e. read, write, flush, read...). The reason for this is unclear, and I have yet to find a way around it.
Related
I have a Python script that will regulary check an API for data updates. Since it runs without supervision I would like to be able monitor what the script does to make sure it works properly.
My initial thought is just to write every communication attempt with the API to a text file with date, time and if data was pulled or not. A new line for every imput. My question to you is if you would recommend doing it in another way? Write to excel for example to be able to sort the columns? Or are there any other options worth considering?
I would say it really depends on two factors
How often you update
How much interaction do you want with the monitoring data (i.e. notification, reporting etc)
I have had projects where we've updated Google Sheets (using the API) to be able to collaboratively extract reports from update data.
However, note that this means a web call at every update, so if your updates are close together, this will affect performance. Also, if your app is interactive, there may be a delay while the data gets updated.
The upside is you can build things like graphs and timelines really easily (and collaboratively) where needed.
Also - yes, definitely the logging module as answered below. I sort of assumed you were using the logging module already for the local file for some reason!
Take a look at the logging documentation.
A new line for every input is a good start. You can configure the logging module to print date and time automatically.
I'm trying to create a listener in python that automatically retrieve changes on a Cloudant database as they occur. When a change occurs I want to call a specific function.
I have read through the documentation and the API-specifications but couldn't find anything.
Is there a way to do this?
Here's a basic streaming changes feed reader (disclaimer: I wrote it):
https://github.com/xpqz/pylon/blob/master/pylon.py#L165
The official Cloudant Python client library also contains a changes feed follower:
https://python-cloudant.readthedocs.io/en/latest/feed.html
It's pretty easy to get a basic changes feed reader going as the _changes endpoint with a feed=continuous parameter does quite a lot off the bat for you, including passing the results back as self-contained json-objects per line. The hard bit is dealing with quite a non-obvious set of failure conditions.
new to python (4 months)
got thru the first steps of basic programming skills, I believe, (having passed edX MIT 6001x and 60002x)
having big problems in the world of new libraries...
here an example:
r= requests.get ('URL',timeout=x)
works well with certain URL, keeps waiting with some other URL and I am getting
HTTPSConnectionPool(host='URL', port=443): Read timed out. (read timeout=x)
and without the timeout parameter, the jupyter notebook keeps turning the sand-watch.
I am not trying to handle the exception but to get it work.
Is there a simple way out or is requests.get too short for these kind of tasks?
And a more general question here,if you have the time: learning from the official docs (especially for larger and more complex modules) is getting too abstract for me, where it makes me feel hopeless. 'Straight diving' produces problems such as this one where you even cant figure out the simplest..
What would be an efficient way to deal with state of the art libraries? How did/do you go forward?
Try to check a file "robots.txt" of the website whose content you're trying to scrape (just type something like www.example.com/robots.txt). It's plausible that the website simply does not allow robots to use it. If this is the case, you may try to trick it by passing a custom header:
import requests
headers={'user-agent':'Chrome/41.0.2228.0'}
url='...'
r=requests.get(url, headers=headers, timeout=x)
But, of course, if you make thousands of queries to a website which does not allow robots, you'll still be blocked after a while.
My web app asks users 3 questions and simple writes that to a file, a1,a2,a3. I also have real time visualization of the average of the data (reads real time from file).
Must I use a database to ensure that no/minimal information is lost? Is it possible to produce a queue of read/writes>(Since files are small I am not too worried about the execution time of each call). Does python/flask already take care of this?
I am quite experienced in python itself, but not in this area(with flask).
I see a few solutions:
read /dev/urandom a few times, calculate sha-256 of the number and use it as a file name; collision is extremely improbable
use Redis and command like LPUSH, using it from Python is very easy; then RPOP from right end of the linked list, there's your queue
I understand that letting any anonymous user upload any sort of file in general can be dangerous, especially if it's code. However, I have an idea to let users upload custom AI scripts to my website. I would provide the template so that the user could compete with other AI's in an online web game I wrote in Python. I either need a solution to ensure a user couldn't compromise any other files or inject malicious code via their uploaded script or a solution for client-side execution of the game. Any suggestions? (I'm looking for a solution that will work with my Python scripts)
I am in no way associated with this site and I'm only linking it because it tries to achieve what you are getting after: jailing of python. The site is code pad.
According to the about page it is ran under geordi and traps all sys calls with ptrace. In addition to be chroot'ed they are on a virtual machine with firewalls in place to disallow outbound connections.
Consider it a starting point but I do have to chime in on the whole danger thing. Gotta CYA myself. :)
Using PyPy you can create a python sandbox. The sandbox is a separate and supposedly secure python environment where you can execute their scripts. More info here
http://codespeak.net/pypy/dist/pypy/doc/sandbox.html
"In theory it's impossible to do anything bad or read a random file on the machine from this prompt."
"This is safe to do even if script.py comes from some random untrusted source, e.g. if it is done by an HTTP server."
Along with other safeguards, you can also incorporate human review of the code. Assuming part of the experience is reviewing other members' solutions, and everyone is a python developer, don't allow new code to be activated until a certain number of members vote for it. Your users aren't going to approve malicious code.
Yes.
Allow them to script their client, not your server.
PyPy is probably a decent bet on the server side as suggested, but I'd look into having your python backend provide well defined APIs and data formats and have the users implement the AI and logic in Javascript so it can run in their browser. So the interaction would look like: For each match/turn/etc, pass data to the browser in a well defined format, provide a javascript template that receives the data and can implement logic, and provide web APIs that can be invoked by the client (browser) to take the desired actions. That way you don't have to worry about security or server power.
Have an extensive API for the users and strip all other calls upon upload (such as import statements). Also, strip everything that has anything to do with file i/o.
(You might want to do multiple passes to ensure that you didn't miss anything.)