Tornado: Websocket connection limit - python

I am developing a web application with Tornado and have encountered the following problem:
I can't run more than 6 instances of my application in one browser probably because each instance creates websocket connection to Tornado server. I use standard WebSocketHandler class. They close properly, i.e. if I close the 6th tab, then I'd be able to open another application tab.
Is there any way to circumvent it? I will provide any additional information if needed.
EDIT: Connection information (I have 6 identical tabs here, 7th won't load):

Are you sure the limitation is not on the browser? I've seen the same issue (long-polling requests, 7th or 8th won't load), but opening the URL in another browser or location works fine.
Edit: each browser has indeed a limit of simultaneous persistent connections per server, as well as global limit. See this question, and especially this response which has more up-to-date values.

Related

How to close a SolrClient connection?

I am using SolrClient for python with Solr 6.6.2. It works as expected but I cannot find anything in the documentation for closing the connection after opening it.
def getdocbyid(docidlist):
for id in docidlist:
solr = SolrClient('http://localhost:8983/solr', auth=("solradmin", "Admin098"))
doc = solr.get('Collection_Test',doc_id=id)
print(doc)
I do not know if the client closes it automatically or not. If it doesn't, wouldn't it be a problem if several connections are left open? I just want to know if it there is any way to close the connection. Here is the link to the documentation:
https://solrclient.readthedocs.io/en/latest/
The connections are not kept around indefinitely. The standard timeout for any persistent http connection in Jetty is five seconds as far as I remember, so you do not have to worry about the number of connections being kept alive exploding.
The Jetty server will also just drop the connection if required, as it's not required to keep it around as a guarantee for the client. solrclient uses a requests session internally, so it should do pipelining for subsequent queries. If you run into issues with this you can keep a set of clients available as a pool in your application instead, then request an available client instead of creating a new one each time.
I'm however pretty sure you won't run into any issues with the default settings.

Scraping a lot of pages with multiple machines (with different IPs)

I have to scrape information from several web pages and use BeautifulSoup + requests + threading. I create many workers, each one grabs a URL from the queue, downloads it, scrapes data from HTML and puts the result to the result list. This is my code, I thought it was too long to just paste it here.
But I ran into following problem - this site probalby limits the quantity of requests from one IP per minute, so scraping becomes not as fast as it could be. But a have a server that has a different IP, so I thought I could make use of it.
I thought of creating a script for the server that would listen to some port (with sockets) and accept URLs, process them, and then send the result back to my main machine.
But I'm not sure if there is no ready-made solution, the problem seems common to me. If there is, what should I use?
Most of the web servers make use use of rate limiting to save resources and keep themselves from DoS attacks; its a common security measure.
Now looking into your problem these are the things you could do.
Put some sleep in between different different requests (it will
bring down the request per second count; and server may not treat
your code as robot)
If you are using an internet connection on your home computer and it is not using any static IP address then you may try rebooting your router every time your request gets denied using simple telnet interface to the router.
If you are using cloud server/VPS you can buy multiple IP address and keep switching your requests through different network interfaces it can also help you lower down the request per second.
You will need to check through the real cause of denial from the server you are pulling webpages from; it is very general topic to write any definitive answer; here are certain things you can do to find out what is causing your requests to be denied and choose one of the aforementioned method to fix the problem.
Decrease the requests per second count and see how web server is performing.
Set the request headers of HTTP to simulate a web-browser and see if its blocking or not.
Bandwidth of your internet connection/ Network connection limit of your machine could also be problem; use netstat to monitor number of active connection before and after your requests are being blocked.

How to properly forward requests through proxies with MITMProxy?

Trying to use MITMProxy to do custom forwarding to requests made from the Firefox browser, so that they go through one of several proxies selected at runtime. It is performing too slow for our purposes. Please bear in mind we are running this in Python 2.7.
The process is as follows:
Firefox sends request to configured MITMProxy.
MITMProxy takes the request from Firefox and generates a requests request and gets the data from the target server through a given proxy (which is not controlled by us and require authentication).
The response from the proxy-forwarded request gets converted into a response for the browser.
MITMProxy returns the data to the browser.
The situation seems to be that this process is too slow, which I believe could be for a number of reasons. It could be that there are settings enabled which bring down performance (such as too much logging, for example), the procedure being used is not the right one for the job (totally plausible) or something completely different.
How can we make this run faster?
Thanks very much! Any and all suggestions will be appreciated!
In this particular case, we were using the script feature of MITMProxy, which meant every modified request was executed synchronously (i.e., we could not use proper asynchronous behavior). This naturally became an issue once we started using the scripts with more clients.
As #Puciek mentioned in his comment, this was more a design issue than a problem with the library.

Monitor the Download process in Chrome

I am trying to hack together a Python script to monitor ongoing downloads in Chrome and shut-down my PC automatically after the download process closes. I know little JavaScript and am considering using the PyJs library, if required.
1) Is this the best approach? I don't need the app to be portable, just working.
2) How would you identify the download process?
3) How would you monitor the download progress? Apparently the Chrome API doesn't provide a specific function for it.
Nice question, may be because I can relate with the need of automating the shutdown. ;)
I just googled. There happens to be an experimental API but only for the dev channel as of now. I am not on a dev channel to try that out, so I just hope I am pointing you in the right direction.
One approach would be:
Have a Python HTTP server listening on some port XYZ
To your extension add the permission to the URL http://localhost:XYZ/
In your extension, you could use:
chrome.downloads.search(query, function (arrayOfDownloadItem){ .. })
Where, query is an instance of DownloadQuery, and contains state property as in_progress
You could probably check for the length of arrayOfDownloadItem.
If its zero, create a new XMLHttpRequest to your HTTP server end point, and then let the server shutdown your machine.
HTH

Python desktop software with web interface

I am building desktop software with a Python backend and a web interface. Currently, I have to start the backend, then open up the interface in the browser. If I accidentally refresh the page - then that clears everything! What I'd like to do is start the application and have a fullscreen browser window appear (using Chrome) - that shouldn't be difficult. I have two questions:
Can refresh be disabled?
Is it possible to hook into closing my program when the web UI is closed?
Update:
What I'm looking for is more like this: geckofx. A way to embed a Chrome webpage in a desktop app. Except I'm using Python rather than C#
Your first question is a dup of disable f5 and browser refresh using javascript.
For your second question, it depends on what kind of application you're building, but generally, the answer is no.
If you can rely on the JS to catch the close and, e.g., send a "quit" message to the service before closing, that's easy. However, there are plenty of cases where that won't work, because the client has no way to catch the close.
If, on the other hand, you can rely on a continuous connection between the client and the service—e.g., using a WebSocket—you can just quit when the connection goes down. But usually, you can't.
Finally, if you're writing something specifically for one platform, you may be able to use OS-level information to handle the cases that JS can't. (For example, on OS X, you can attach a listener to the default NSDistributedNotificationCenter and be notified whenever Chrome quits.) But generally, you can't rely on this, and even when you can, it may still not cover 100% of the cases.
So, you need to use the same tricks that every "real" web app uses for managing sessions. For example, the client can send a keepalive every 5 minutes, and the server can quit if it doesn't get any requests for 5 minutes. (You can still have an explicit "quit" command, you just can't rely on always receiving it.) If you want more information on ways to do this, there are probably 300 questions on SO about it.
Instead of embeding Chrome, you may embed only Webkit ( I don't know on Windows, but on Mac and Linux is easy).
Application logic seams to be on server side, and browser used only as interface. If that is the case, you may put „onbeforeunload” in body tag, and call a js function that send an ajax request to server to die.

Categories

Resources