How to reject requests from a bot with Flask - python

I have recently made a site with Flask and have published it with pythonanywhere succesfully. However, I keep getting requests from bots using weird url's. They are clearly trying to access common wordpress pages as all the url's have "wp-" in the string.
I am trying to find a way to block these requests completely so as not to waste computer space. I have been able to find a way to not respond with pages, but I still receive GET requests. Is there any way to not even receieve GET requests if it contains certain text in the url.
What I have tried so far is :
#app.before_request
def before_request():
if "wp-" in request.path:
print("BOT INTRUDER!!")
return Response(status=204)
My site doesn't react to requests with "wp-" in the url now but looking at the access log, I still receive GET requests

Related

How to bypass 'headless' reCaptcha V2?

I'm creating a bot using requests, BeautifulSoup, and possibly Twill. The bot will scrape a large number of forums and gather data from them. However, the current forum I am working on (https://wearedevs.net/) uses reCaptcha V2 on its login page, so the bot cannot log in. I discovered this when after trying to log in through code, and instead of returning a valid response and reloading the page, I would continuously get a 404 error. I thought it was an error with my code, but even when trying Twill it still didn't log in.
I need to be able to log in through the site so I can access features that guest users wouldn't be able to access.
I knew the site had reCaptcha, so I looked into a reCaptcha bypass, the issue is it's not the visual reCaptcha, it's the "headless" version. As shown below:
Bottom right corner of the page:
In other words, it's the reCaptcha that doesn't give you a captcha prompt but instead analyzes your behavior on the site and determines if you're a bot or not.
I suspected that the 404 was because the reCaptcha determining that the requests were bots. So the second thing I attempted was sending a direct POST request from the code to the sites API, which is here:
https://wearedevs.net/api/v1/account/login
Along with the required JSON data, which is in this format:
{"g-recaptcha-response":"recaptcha-response-here", "username": "example_username", "password": "example_password", "token2fa": ""}
I didn't have a valid reCaptcha response to send to the server, so I tried excluding that from the JSON data but, while the request was successful, the server sent back an error saying that the login failed because a reCaptcha response was not present.
So then I tried using BeautifulSoup to send a request to the login page, grab the reCaptcha response, then include that in the JSON data to be sent, but I was unable to grab the reCaptcha response using BeautifulSoup.
I have tried Selenium, but I'm currently working in an environment in which a browser is not present, so Selenium won't work and therefore is not an option.
If anyone has any ways to bypass, or validate, the headless reCaptcha V2, please share and I would be grateful. Thanks!

Posting to ASP.NET URL keeps bringing me back to the same page via Python requests

I am using Python requests to ping a site that uses the ASP.NET framework. One of the URL's is giving me trouble and the response is the exact same page that I posted to, but the browser does not behave this way with the same URL - it refreshes with a new URL and all (but I do not think it is redirecting technically). What are some ways I can try to troubleshoot this? I would provide code and links but it is a secured website and requires authentication/subscription.

How do I send and receive a request using the requests library?

there is a task-from page get the text of all posts with more than 0 likes. As I understand it, you must first get all the tokens and IDs of possible posts from the page (which was not difficult) and make a request to the server using the requests library and these ids and get a response, since the post itself is in the code only in the form of a form without information about likes. But I don't understand much about the requests themselves and I can't figure out how to make such a request and get the html code of the post? Do I need a token? They are usually used for security and are generated by each user.
Directly finding the assumed token and request
Number of likes
To do this, you simply import the requests library and use requests.get(). More detailed response can be found here: https://realpython.com/python-requests/.

Is there any Instagram Web API for the new version of the site?

You're able to dm in the new version and i thought that there'd some simple GET and POST requests for that without getting access to the official instagram API.
I don't want to use bots that emulate app or similar, cause i can get a ban for that.
Tried to look at XHR in network tab on dev tools (Google Chrome) but I've never done that before and I have some troubles with that. I see requests, headers, response (where are messages), but i can't define how to do that with python for example.
I'm looking for help with that or for any ready solutions (not nescessary for python, i think i can port them to python or just use the language an api was written for)
Edit:
link looks like this (for the inbox page):
https://www.instagram.com/direct_v2/web/inbox/?persistentBadging=true&folder=0&limit=10&thread_message_limit=10
and a ton of headers
Instagram sends a request with cursor to load the direct messages data in chunks.
Its response has prev_cursor & oldest_cursor.
oldest_cursor value is the next cursor value you need to send for the next chunk of messages
When prev_cursor value becomes MINCURSOR it means that it is the last chunk means the first chunk of the message that has been initiated in the chat history.
I have been working on a script to unsend all the messages on Instagram DM. For deleting messages I need to get messages first, so I have written function which provides me all the messages.
You can look at the repository https://github.com/pishangujeniya/instagram-helper
For getting messages there is no limit in Instagram API requests. But for delete request, Instagram starts sending 429 Response Code i.e. Too Many Requests after we delete 83 messages in the single session. The solution to continue deleting is by logout and re-login after some time. But there also problems exist, if too many logouts and login were done, then Instagram blocks your account to log in for a particular period of time. (In my case I was blocked for 30 Minutes to log in while developing the script)
Update 20 April, 2020
I updated the script with having delay between requests, so as to avoid 429 response, and it is working very fine as of now.

Web login using python3 requests

I am trying to web scrape the a piece of news. I try to login into the website by python so that I can have full access to the whole web page. But I have looked at so many tutorials but still fail.
Here is the code. Can anyone tell me why.
There is no bug in my code. But I still can not see the full text, which means I am still not log in.
`
url='https://id.wsj.com/access/pages/wsj/us/signin.html?mg=id-wsj&mg=id-wsj'
payload={'username':'my_user_name',
'password':'******'}
session=requests.Session()
session.get(url)
response=session.post(url,data=payload)
print(response.cookies)
r=requests.get('https://www.wsj.com/articles/companies-push-to-repeal-amt-after-senates-last-minute-move-to-keep-it-alive-1512435711')
print(r.text)
`
Try sending your last GET request using the response variable. After all, it's the one who made the login and holds the cookies (if there are any). You've used a new requests object for your last request thus ignoring the login you just made.

Categories

Resources