Python Multiprocessing Rate Limit

Python Multiprocessing Rate Limit - python

Suppose you want to call some API (open without any KEY needed) which has some rate limit per second (10 requests max per second). Suppose now, I am calling a function (with multiprocessing.Pool) which request data from API and process it. Is there some way to switch ip in order to not get blocked? Maybe a list of ip/ proxies. Can somebody tell me a way to get this done?
Thanks!

You unfortunately can't retrieve data from an api and do multiprocessing with different ip's because the api will already have your Ip assigned to its request variable. Some Web pages also have http 402 errors which means it is possible to get timed out for sending too many requests.

There certainly are a lot of hacks you can use to get around rate limiting, but you should take a moment and ask yourself 'should I?' You'd likely be violating the terms of service of the service, and in some jurisdictions you could be opening yourself to legal action if you are caught. Additionally, many services implement smart bot detection which can identify bot behavior from request patterns and block multiple IPs. In extreme cases, I've seen teams block the IP range for an entire country to punish botting.

Related

How many requests can i make to google.com from same IP without getting blocked or something like that?

I write a program in python and want to check the internetconnection in a loop. i do this with requests modul in python and all works fine but my question is, how many requests are allowed per day, or hour. At the moment i check the connection every 2 seconds so every 2 seconds google gets a request from my ip. Thats make more than 40,000 requests a day, if my software runs 24 hours.
Is this a problem? I cant use proxies because i will not have access or control about the computer or settings of them when my software finally runs by the customer.

There are rate limits on all google public and internal apis.
However, documentation does not clearly spell out whats the exact rate limit on google.com
If you want to check connectivity, it might be better to use DNS servers such as 1.1.1.1
If you want programmatic access to google.com, you should rather use search API at https://developers.google.com/custom-search/v1/overview

Load spike protection for Django Channels

Is there anything specific that can be done to help make a Django Channels server less susceptible to light or accidental DDoS attack or general load increase from websocket/HTTP clients? Since Channels is not truly asynchronous (still workers behind the scenes), I feel like it would be quite easy to take down a Channels-based website - even with fairly simple hardware. I'm currently building an application on Django Channels and will run some tests later to see how it holds up.
Is there some form of throttling built in to Daphne? Should I implement some application-level throttling? This would still be slow since a worker still handles the throttled request, but the request can be much faster. Is there anything else I can do to attempt to thwart these attacks?
One thought I had was to always ensure there are workers designated for specific channels - that way, if the websocket channel gets overloaded, HTTP will still respond.
Edit: I'm well aware that low-level DDoS protection is an ideal solution, and I understand how DDoS attacks work. What I'm looking for is a solution built in to channels that can help handle an increased load like that. Perhaps the ability for Daphne to scale up a channel and scale down another to compensate, or a throttling method that can reduce the weight per request after a certain point.
I'm looking for a daphne/channels specific answer - general answers about DDoS or general load handling are not what I'm looking for - there are lots of other questions on SO about that.
I could also control throttling based on who's logged in and who is not - a throttle for users who are not logged in could help.
Edit again: Please read the whole question! I am not looking for general DDoS mitigation advice or explanations of low-level approaches. I'm wondering if Daphne has support for something like:
Throttling
Dynamic worker assignment based on queue size
Middleware to provide priority to authenticated requests
Or something of that nature. I am also going to reach out to the Channels community directly on this as SO might not be the best place for this question.

I've received an answer from Andrew Godwin. He doesn't use StackOverflow so I'm posting it here on his behalf.
Hi Jamie,
At the moment Channels has quite limited support for throttling - it pretty much consists of an adjustable channel size for incoming connections which, when full, will cause the server to return a 503 error. Workers are load-balanced based on availability due to the channels design, so there's no risk of a worker gaining a larger queue than others.
Providing more advanced DoS or DDoS protection is probably not something we can do within the scope of Channels itself, but I'd like to make sure we provide the appropriate hooks. Were there particular things you think we could implement that would help you write some of the things you need?
(It's also worth bearing in mind that right now we're changing the worker/consumer layout substantially as part of a major rewrite, which is going to mean different considerations when scaling, so I don't want to give too precise advice just yet)
Andrew
He's also written about the 2.0 migration in his blog.

I am only answering the first question. So basically it is impossible to be 100% protected from ddos attacks, because it always comes down to a battle of resources. If the server-side resources are greater than the attacker-side resources, the server will not go down (there may be slowed performance though) but if not, the server goes down [no reference required]. Why is it not possible to be 100% protected, you may ask. So basically your server "crashes" if people cannot connect to it [https://en.wikipedia.org/wiki/Crash_(computing)#Web_server_crashes --- Web server crashes sentence 1.]. So if you try to protect your server by shutting it down for 5 mins every time 10000 connections are made in a second, the ddos succeeded. It "crashed" your server. The only ddos protection that I know of that should work is Cloudfare (https://www.cloudflare.com/lp/ddos-b/?_bt=207028074666&_bk=%2Bddos%20%2Bprotection&_bm=b&_bn=g&gclid=EAIaIQobChMIu5qv4e-Z1QIVlyW9Ch2YGQdiEAAYASAAEgJbQ_D_BwE). It absorbs the impact of the ddos attack with its 10Tbps network backbone. But even it does not offer 100% ddos protection because once its 10Tbps is down, your server will go down too. So, I hope that helped.

DDoS = Distributed Denial of Service
The 'Distributed' part is the key: you can't know you're being attacked by 'someone' in particular, because requests come from all over the place.
Your server will only accept a certain number of connections. If the attacker manages to create so many connections that nobody else can connect, you're being DDoS'ed.
So, in essence you need to be able to detect that a connection is not legit, or you need to be able to scale up fast to compensate for the limit in number of connections.
Good luck with that!
DDoS protection should really be a service from your cloud provider, at the load balancer level.
Companies like OVH use sophisticated machine learning techniques to detect illegitimate traffic and ban the IPs acting out in quasi-real time.
For you to build such a detection machinery is a huge investment that is probably not worth your time (unless your web site is so critical and will lose millions of $$$ if it's down for a bit)

Theres a lot of things you cant to do about DDOS..however there are some neat 'tricks' depending on how much resources you have at your disposal, and how much somebody wants to take you offline.
Are you offering a total public service that requires direct connection to the resource you are trying to protect?
If so, you just going to need to 'soak up' DDOS with the resources you have, by scaling up and out... or even elastic... either way it's going to cost you money!
or make it harder for the attacker to consume your resources. There are number of methods to do this.
If you service requires some kind of authentication, then separate your authentication services from the resource you are trying to protect.
Many applications, the authentication and 'service' run on the same hardware. thats a DOS waiting to happen.
Only let fully authenticated users access the resources you are trying to protect with dynamic firewall filtering rules. If your authenticated then gate to the resources opens (with a restricted QOS in place) ! If your a well known, long term trusted users, then access the resource at full bore.
Have a way of auditing users resource behaviour (network,memory,cpu) , if you see particular accounts using bizarre amounts, ban them, or impose a limit, finally leading to a firewall drop policy of their traffic.
Work with an ISP that can has systems in place that can drop traffic to your specification at the ISP border.... OVH are your best bet. an ISP that exposes filter and traffic dropping as an API, i wish they existed... basically moving you firewall filtering rules to the AS border... niiiiice! (fantasy)
It won't stop DDOS, but will give you a few tools to keep resources wasted a consumption by attackers to a manageable level. DDOS have to turn to your authentication servers... (possible), or compromise many user accounts.... at already authenticated users will still have access :-)
If your DDOS are consuming all your ISP bandwidth, thats a harder problem, move to a larger ISP! or move ISP's... :-). Hide you main resource, allow it to be move dynamically, keep on the move! :-).
Break the problem into pieces... apply DDOS controls on the smaller pieces. :-)
I've tried a most general answer, but there are a lot a of depends, each DDOS mitigation requires a bit of Skin, not tin approach.. Really you need a anti-ddos ninja on your team. ;-)
Take a look at distributed protocols.... DP's maybe the answer for DDOS.
Have fun.

Let's apply some analysis to your question. A DDoS is like a DoS but with friends. If you want to avoid DDoS explotation you need minimize DoS possibilities. Thanks capitan obvious.
First thing is to do is make a list with what happens in your system and wich resources are affected:
A tcp handshake is performed (SYN_COOKIES are affected)
A ssl handshake comes later (entropy, cpu)
A connection is made to channel layer...
Then monitorize each resource and try to implement a counter-measure:
Protect to SYN_FLOOD configuring your kernel params and firewall
Use entropy generators
Configure your firewall to limit open/closed connection in short time (easy way to minimize ssl handshakes)
...
Separate your big problem (DDoS) in many simple and easy to correct tasks. Hard part is get a detailed list of steps and resources.
Excuse my poor english.

Scraping a lot of pages with multiple machines (with different IPs)

I have to scrape information from several web pages and use BeautifulSoup + requests + threading. I create many workers, each one grabs a URL from the queue, downloads it, scrapes data from HTML and puts the result to the result list. This is my code, I thought it was too long to just paste it here.
But I ran into following problem - this site probalby limits the quantity of requests from one IP per minute, so scraping becomes not as fast as it could be. But a have a server that has a different IP, so I thought I could make use of it.
I thought of creating a script for the server that would listen to some port (with sockets) and accept URLs, process them, and then send the result back to my main machine.
But I'm not sure if there is no ready-made solution, the problem seems common to me. If there is, what should I use?

Most of the web servers make use use of rate limiting to save resources and keep themselves from DoS attacks; its a common security measure.
Now looking into your problem these are the things you could do.
Put some sleep in between different different requests (it will
bring down the request per second count; and server may not treat
your code as robot)
If you are using an internet connection on your home computer and it is not using any static IP address then you may try rebooting your router every time your request gets denied using simple telnet interface to the router.
If you are using cloud server/VPS you can buy multiple IP address and keep switching your requests through different network interfaces it can also help you lower down the request per second.
You will need to check through the real cause of denial from the server you are pulling webpages from; it is very general topic to write any definitive answer; here are certain things you can do to find out what is causing your requests to be denied and choose one of the aforementioned method to fix the problem.
Decrease the requests per second count and see how web server is performing.
Set the request headers of HTTP to simulate a web-browser and see if its blocking or not.
Bandwidth of your internet connection/ Network connection limit of your machine could also be problem; use netstat to monitor number of active connection before and after your requests are being blocked.

Get alerts for upload activity with libtorrent (rasterbar)

I am trying to get alerts for the data that I'm sending peers. My code works great for incoming blocks by looking for libtorrent.block_finished_alert but I want to know when and what I am sending to peers. I can't find an alert that will give me the equivalent for outbound transfers. I need to know the file and offset (the peer request).
Is there an alert for outbound block requests?
I'm using the python bindings but C++ code is fine too.

The closest thing you have to alerts is probably stats_alert. It will tell you the number of payload bytes uploaded. It won't give the you granularity of a full block being sent though.
If you'd like to add an alert, have a look at bt_peer_connection::write_piece.
patches are welcome!

Multiple chat rooms - Is using ports the only way ? What if there are hundreds of rooms?

Need some direction on this.
I'm writing a chat room browser-application, however there is a subtle difference.
These are collaboration chats where one person types and the other person can see live ever keystroke entered by the other person as they type.
Also the the chat space is not a single line but a textarea space, like the one here (SO) to enter a question.
All keystrokes including tabs/spaces/enter should be visible live to the other person. And only one person can type at one time (I guess locking should be trivial)
I haven't written a multiple chatroom application. A simple client/server where both are communicatiing over a port is something I've written.
So here are the questions
1.) How is a multiple chatroom application written ? Is it also port based ?
2.) Showing the other persons every keystroke as they type is I guess possible through ajax. Is there any other mechanism available ?
Note : I'm going to use a python framework (web2py) but I don't think framework would matter here.
Any suggestions are welcome, thanks !

The Wikipedia entry for Comet (programming) has a pretty good overview of different approaches you can take on the client (assuming that your client's a web browser), and those approaches suggest the proper design for the server (assuming that the server's a web server).
One thing that's not mentioned on that page, but that you're almost certainly going to want to think about, is buffering input on the client. I don't think it's premature optimization to consider that a multi-user application in which every user's keystroke hits the server is going to scale poorly. I'd consider having user keystrokes go into a client-side buffer, and only sending them to the server when the user hasn't typed anything for 500 milliseconds or so.
You absolutely don't want to use ports for this. That's putting application-layer information in the transport layer, and it pushes application-level concerns (the application's going to create a new chat room) into transport-level concerns (a new port needs to be opened on the firewall).
Besides, a port's just a 16-bit field in the packet header. You can do the same thing in the design of your application's messages: put a room ID and a user ID at the start of each message, and have the server sort it all out.
The thing that strikes me as a pain about this is figuring out, when a client requests an update, what should be sent. The naive solution is to retain a buffer for each user in a room, and maintain an index into each (other) user's buffer as part of the user state; that way, when user A requests an update, the server can send down everything that users B, C, and D have typed since A's last request. This raises all kind of issues about memory usage and persistence that don't have obvious simple solutions
The right answers to the problems I've discussed here are going to depend on your requirements. Make sure those requirements are defined in great detail. You don't want to find yourself asking questions like "should I batch together keystrokes?" while you're building this thing.

You could try doing something like IRC, where the current "room" is sent from the client to the server "before" the text (/PRIVMSG #room-name Hello World), delimited by a space. For example, you could send ROOMNAME Sample text from the browser to the server.
Using AJAX would be the most reasonable option. I've never used web2py, but I'm guessing you could just use JSON to parse the data between the browser and the server, if you wanted to be fancy.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.