I would like to know what is the most elegant way to scrape live webcam (traffic) data, ideally using Python. The webcam feed is represented by an API, with each get request yielding a image of the currently available feed from the webcam. The feed in question has a 2/3 second delay and therefore there are ~ 30 images per minute that can be requested.
My current (trivial) solution simply queries the API in a loop (perhaps with a sleep timer) and then cleans up any duplicated images. However, this seems quite dirty and I was wondering if there is a cleaner/more elegant solution available.
In principal I would like the solution to (if at all possible) avoid:
downloading duplicated images
sleep timers
looping
Is something like this possible?
To avoid sleep timers in your code, you can write a process that is triggered by cron. Cron will handle running your script at defined intervals, such as every 2 seconds (60s / 30 images per minute).
An example process might call the API using requests. Assuming an image is passed back, the following example code might work. If a JSON string is passed back then you will need to parse it and extract the image URL.
r = requests.get('https://traffic-cam-site.com/cam', auth=('user', 'pass'))
if r.status == 200:
image = r.content
To avoid downloading duplicate images, you would need to know when a new image is present on the API. You will need to periodically check the cam site API for this purpose. Store a hash of the collected images in a database (or text file), and send that with your request to the API. Then hash the image that is currently present on the cam site - if the hashes match, don't download the image to your server.
Alternatively, if the cam site API does push notifications then you may be notified when a new image is present.
Related
I want to get info from API via Python, which has infinite info updating (it is live - for example live video or live monitoring). So I want to stop this GET request after interval (for example 1 second), then process these information and then repeat this cycle.
Any ideas? (now I am using requests module, but I do not know, how to stop receiving data and then get them)
I might be off here, but if you hit an endpoint at a specific time, it should return the JSON at that particular moment. You could then store it and use it in whatever process you have created.
If you want to hit it again, you would just use requests to hit the endpoint.
How to get the properties as shown on the image (Blocked, DNS resolution, Connecting ...) after sending the request?
From firefox, the waiting time = ~650ms
From python, requests.Response.elapsed.total_seconds() = ~750ms
Since the result is difference, i want to have a more details result as shown on firefox developer mode.
You can only get the total time of the request because the response doesn´t know more itself.
More informations are only logged by the programms which does handle the request and start-stopping a timer for some steps.
You need to track times in/on you connection-framework or you can have a look on the FireFox API for "timings" - there are some more APIs so maybe you find something you are able to use for your case - but main fact is you can´t do it directly and only with your script because request and response are fired/catched and logging/getting data does happen between other components then.
My frontend web app is calling my python Flask API on an endpoint that is cached and returns a JSON that is about 80,000 lines long and 1.7 megabytes.
It takes my UI about 7.5 seconds to download all of it.
It takes Chrome when calling the path directly about 6.5 seconds.
I know that I can split up this endpoint for performance gains, but out of curiosity, what are some other great options to improve the download speed of all this content?
Options I can think of so far:
1) compressing the content. But then I would have to decompress it on the frontend
2) Use something like gRPC
Further info:
My flask server is using WSGIServer from gevent and the endpoint code is below. PROJECT_DATA_CACHE is the already Jsonified data that is returned:
#blueprint_2.route("/projects")
def getInitialProjectsData():
global PROJECT_DATA_CACHE
if PROJECT_DATA_CACHE:
return PROJECT_DATA_CACHE
else:
LOGGER.debug('No cache available for GET /projects')
updateProjectsCache()
return PROJECT_DATA_CACHE
Maybe you could stream the file? I cannot see any way to transfer a file 80,000 lines long without some kind of download or wait.
This would be an opportunity to compress and decompress it, like you suggested. Definitely make sure that the JSON is minified.
One way to minify a JSON: https://www.npmjs.com/package/json-minify
Streaming a file:
https://blog.al4.co.nz/2016/01/streaming-json-with-flask/
It also really depends on the project, maybe you could get the users to download it completely?
The best way to do this is to break your JSON into chunks and stream it by passing a generator to the Response. You can then render the data as you receive it or show a progress bar displaying the percentage that is done. I have an example of how to stream data as a file is being downloaded from AWS s3 here. That should point you in the right direction.
I am working on a telegram bot that displays images from several webcams upon request. I fetch the images from urls and then send to the user (using bot.sendPhoto() ) My problem is that for any given webcam the filename does not change and it seems that the photo is sent from telegram's cache. So it will display the image from the first time that image was requested.
I have thought about downloading the image from the url, saving with a variable name (like a name with a timestamp in it) then sending it to the chat, this seems like an inelegant solution and was hoping for something better. Like forcing the image not to be cached on the telegram server.
I am using the python-telegram-bot wrapper, but I am not sure that it's specific to that.
Any ideas? I have tried searching but so far am turning up little.
Thanks in advance.
I had the same problem too, but i've found the simplest solution.
When you call the image, you have to add a parameter with timestamp to the image link.
Example:
http://www.example.com/img/img.jpg?a=TIMESTAMP
Where TIMESTAMP is the timestamp function based on the language you are using.
Simple but tricky ;)
I think the best way is to do the same as we do in React where also, same URL calls are first checked in the cache.
If you are using Python the best way is:
timestamp = datetime.datetime.now().isoformat()
# Above statement returns like: '2013-11-18T08:18:31.809000'
pic_url = '{0}?a={1}'.format(img_url, timestamp)
Hope that helps!
I had the same problem. I wanted to create a bot which sends an image taken by a webcam of a ski slope (webcam.example.com/image.jpg). Unfortunately, the filename and so the url never updates and telegram always sends the cached image. So I decided to alter the url passed to the api. In order to achieve this, I wrote a simple php site (example.com/photo.php) which redirects to the original url of the photo. After that, I created a folder (example.com/getphoto/) on my webspace with a .htaccess file inside. The .htaccess redirects all request in this folder to the photo.php site which redirects to the image (webcam.example.com/image.jpg). So you could add everything to the url of the folder and still get the picture (e. g. example.com/getphoto/42 or example.com/getphoto/hrte8437g). The telegram api seems to cache photos by url, so if you add always another ending to the url passed to the api, telegram doesnt use the cached version and sends the current image instead. The easiest way to always change the url is by adding the current date to it.
example.com/photo.php
<?php
header("Location: http://webcam.example.com/image.jpg");
die();
?>
example.com/getphoto/.htaccess
RewriteEngine on
RewriteRule ^(.*)$ http://example.com/photo.php
in python:
bot.sendPhoto(chat_id, 'example.com/getphoto/' + strftime("%Y-%m-%d_%H-%M-%S", gmtime()))
This workaround should also work in other languages like java or php. You just need to change the way to get the current date.
Hello I would like to do a simple bandwidth test. The size of the default html page is 10 MB. The upload speed of the server is 5 Mbps, so under no circumstances 10 MB can be completed in 10 seconds. My plan is to start time interval in get request and after 10 seconds later I should be able to get either percentage or amount of total bytes sent to one particular client. So my question here is how do I get the percentage or amount of total bytes?
Approach 1
So the simplest solution would be to use this open source speedtest that will show in the browser the download speeds they are getting from your specific server. It will also show the upload speeds to your server. This particular solution uses php and perl on the server side. There is a related question Python speedtest.net, or equivalent that didn't have an answer in the lines you are looking for.
Approach 2
You can do the download test using javascript and ajax which let you monitor the download progress of a file from the web browser. For the upload portion you also use the same technique of monitoring a ajax upload.