I have been building my confidence with python by automating tasks with selenium. I am trying to automate logging into my battle.net account, yet when I click login it prompts for a fun captcha completion (it prompts the captcha after clicking log in).
So far I am able to get a valid akrose fun captcha token from 2captcha and insert it into the html of the page (as per the 2captcha documentation), however I can not find the callback function. I have spent a considerable amount of time searching for it throught the various scripts and in chrome developer network and console tabs. The 2captcha solver chrome extension is able to find the callback. So I also spent time looking through the scripts it uses, however I am unable to determine how it finds the callback function. I have tried asking 2captcha support, but to no avail.
I am after some help in finding it, as I feel I am clearly missing something or looking in the wrong place. Any piece of advice or direction would be greatly appreciated as I feel I have reached a standstill.
Thank you.
Related
If you are signed into a google account, will you still get the "I'm not a robot message", or is there no other way to avoid it besides paid services. I am using the Selenium library in python.
I'm not a robot reCAPTCHA
This is a challenging test to differentiate between humans and automated bots based on the response. reCAPTCHA is one of the CAPTCHA spam protection services bought by Google. Automated robots are the biggest headache for producing spams and consuming server resources which supposed to be utilized by real users. In order to avoid automated bots Google introduced No CAPTCHA reCAPTCHA API concept for website owners to protect their sites. Later to improve user experience, Google introduced invisible reCAPTCHA.
Invisible CAPTCHA helps to stop bots without showing I'm not a robot message to human users. But it does not work on many situation as the message will be still shown. For example, Google search page itself will show the I'm not a robot CAPTCHA message on certain circumstances when you enter the query and hit search button. You will be asked to prove you are a human by selecting the checkbox or selecting images based on the given hint.
When you do a real Google search and getting interrupted with I'm not a robot message will make you really embarrassed. Sometimes it will allow you with a simple click on the checkbox. Google will check the clicking position on the checkbox. Bots click exactly on the center of the checkbox while humans click somewhere on the box. This will help to decide Google whether the user is a human or bot. In the worst case, Google will completely stop you by showing the sorry page. The only option you have here is to wait and try later.
In the worst case, Google will completely stop you by showing the sorry page. The only option you have here is to wait and try later.
Root cause of I'm not a robot reCAPTCHA message
Some of the main reasons of this error are as follows:
When Google automatically detects requests coming from your computer network which appear to be in violation of the Terms of Service. The block will expire shortly after those requests stop.
This traffic may have been sent by malicious software, a browser plug-in, or a script that sends automated requests. A different computer using the same IP address may be responsible.
Sometimes you may see this page if you are using advanced terms that robots are known to use, or sending requests very quickly.
Fixing I'm not a robot reCAPTCHA issue
If you are always getting interrupted then a couple of remediation steps are as follows:
Stop using VPN.
Avoid unknown proxy servers.
Use Google public DNS.
Stop searching illegal queries.
Slow your clicks.
Stop sending automated queries.
Search like a human.
Check for malware and browser extensions.
Most of the websites have a barrier for preventing automated test software from signing in or even browsing through them, and since Selenium is just a automated web testing tool you are debarred from signing in, one of the ways to fix it, is by using pyautogui for just the basic sign in and then carry forwarding with your selenium code, or maybe using API for the particular google service which you require.
Even if the user is authorized under the google account, it does not protect him from the fact that at any site or his request captcha, because the whole point is that and determine who the bot and who a man ... The only solution how to bypass captchas is to use a service to recognize such captchas and solve them automatically
In my company intranet, any request to an external website X in Internet will be redirected to an internal page containing a button that I have to click on. Then the external website X in Internet will be loaded.
I want to write a program that automatically clicks this button for me (so I don't have to click it manually). After that, the program will make the browser redirect to a re-configured website Y (not X) for the purpose of security testing.
I don't have much experience with Python. So I would be really thankful if someone can tell me how I can write such a program.
Many thanks
unfortunately it can get a little complicated once you're interacting with javascript elements like buttons. However, the best way to approach this would be with selenium. There's a slight learning curve but thankfully the documentation is good and there are many resources online to show you how to get started.
Python has Selenium and BS4 library to help You out, but if You are not experienced with python, You might as well pick up node.js and puppeteer, its far superior in my opinion.
So I have been Working on a Web scraper that scrapes data from discord.
For this I used selenium python. So I want it to keep scraping data even when my Computer is offline.
So after a little research I found that I can use repl.it and uptime bot.
repl.it to run the script on the web and uptime bot to ping it in every 5 minutes.
But when I ran the script on repl.it it opened a small chromium window
which was good for that time
and in that it was prompting the hcaptcha
and here is where the problem began.
I tried Hard to find the class name of the checkbox of the hcaptcha and eventually found it but, then it asked me for selecting the pictures.
there were many solutions for recaptcha but none for hcaptcha.
So I searched it every where but couldn't find a solution that could satisfy my problem
Solutions Iam Looking for the problem:-
1.Any other platform or way i can run my script forever(of course which are not commerical and paid like aws, microsoft and all....)
2.and way to sort that hcaptcha problem (because where ever I find they have a answer for recaptcha not hcaptcha)
Links, Code, And resources I referred while making the project and looking for the solution
1.https://www.youtube.com/watch?v=As-_hfZUyIs(to bypass recaptcha)
2.https://medium.com/analytics-vidhya/how-to-easily-bypass-recaptchav2-with-selenium-7f7a9a44fa9e
3.https://www.browserstack.com/guide/how-to-handle-captcha-in-selenium
4.https://www.reddit.com/r/learnpython/comments/efeaxy/captcha_using_selenium_in_python/
5.https://stackoverflow.com/questions/44187909/python-selenium-and-captcha
6.https://github.com/dessant/buster(recaptch buster)
Issue 1: You could fit the script into Repl.it look here and use a replit auto pinger which pings the service every 5 minutes to keep your project alive.
Issue 2: There is a tampermonkey extention, here and what you could do is create a profile in chrome/any browser (I perfer firefox) and install greasemonkey, install the script, and then check if the captcha element exists and if it does, wait for 40 seconds. After that the captcha should solve by itself. There's also this repo on github if you're interested in a different method of bypassing hCaptcha.
Good luck!
I tried to use Selenium (chromedriver) for webscraping, but always get reCaptchas (around 5-8 in a row) which I have to solve.
When I visit the same website manually with Google Chrome, I don't even get one Captcha.
I don't use headless option...
Is there any solution to avoid these Captchas? Or to get maximum 1-2 Captchas for one request? I mean it's not a problem to solve Captchas for me, but 5-8 in a row takes to much time.
There are captcha solvers like 2captcha that solve them at around 15-40 seconds each captcha. Captcha was made to detect bots in various shapes and forms and well... that's what it has done. The simple answer is: no, there is no "bypass"
There are some workarounds to avoid the system as a whole such as using an alt-login, like an app that maybe uses a different API. This can be achieved via appium which is similar to selenium, or by using a HTTPRequest library.
I ran into the same issue. On the net there is a lot of tips that used to work like the suggestion in the comment of using specific headers, especially set the user agent explicitly or slowing down the actions on the page (like clicking) to mock real user actions. I found all of them not working currently with the newest reCaptcha versions and fell back to using non headless mode and manually solve the captcha before my script takes over and does its magic once I passed the captcha.
How can I bypass the Google CAPTCHA using Selenium and Python?
When I try to scrape something, Google give me a CAPTCHA. Can I bypass the Google CAPTCHA with Selenium Python?
As an example, it's Google reCAPTCHA. You can see this CAPTCHA via this link: https://www.google.com/recaptcha/api2/demo
To start with using Selenium's Python clients, you should avoid solving/bypass Google CAPTCHA.
Selenium
Selenium automates browsers. Now, what you want to achieve with that power is entirely up to individuals, but primarily it is for automating web applications through browser clients for testing purposes and of coarse it is certainly not limited to that.
CAPTCHA
On the other hand, CAPTCHA (the acronym being ...Completely Automated Public Turing test to tell Computers and Humans Apart...) is a type of challenge–response test used in computing to determine if the user is human.
So, Selenium and CAPTCHA serves two completely different purposes and ideally shouldn't be used to achieve any interrelated tasks.
Having said that, reCAPTCHA can easily detect the network traffic and identify your program as a Selenium driven bot.
Generic Solution
However, there are some generic approaches to avoid getting detected while web scraping:
The first and foremost attribute a website can determine your script/program by is through your monitor size. So it is recommended not to use the conventional Viewport.
If you need to send multiple requests to a website, keep on changing the User Agent on each request. Here you can find a detailed discussion on Way to change Google Chrome user agent in Selenium?
To simulate humanlike behavior, you may require to slow down the script execution even beyond WebDriverWait and expected_conditions inducing time.sleep(secs). Here you can find a detailed discussion on How to sleep Selenium WebDriver in Python for milliseconds
This use case
However, in a couple of use cases we were able to interact with the reCAPTCHA using Selenium and you can find more details in the following discussions:
How to click on the reCAPTCHA using Selenium and Java
CSS selector for reCAPTCHA checkbok using Selenium and VBA Excel
Find the reCAPTCHA element and click on it — Python + Selenium
References
You can find a couple of related discussion in:
How can I make a Selenium script undetectable using GeckoDriver and Firefox through Python?
Is there a version of Selenium WebDriver that is not detectable?
tl; dr
How does reCAPTCHA 3 know I'm using Selenium/chromedriver?
In order to bypass the CAPTCHA when scraping Google, you have to manually solve a CAPTCHA and export the cookies Google gives you. Now, every time you open a Selenium WebDriver, make sure you add the cookies you exported. The GOOGLE_ABUSE_EXEMPTION cookie is the one you're looking for, but I would save all cookies just to be on the safe side.
If you want an additional layer of stability in your scrapes, you should export several cookies and have your script randomly select one of them each time you ping Google.
These cookies have a long expiration date so you wouldn't need to get new cookies every day.
For help on saving and loading cookies in Python and Selenium, you should check out this answer: How to save and load cookies using Python + Selenium WebDriver
Clear Browsing History, cached data, cookies and other site data
First Create an Google Account while you are in browser window opened by selenium.
Sign in to your account
wd.get("https://accounts.google.com/signin/v2/identifier?hl=en&passive=true&continue=https%3A%2F%2Fwww.google.com%2F%3Fgws_rd%3Dssl&ec=GAZAmgQ&flowName=GlifWebSignIn&flowEntry=ServiceLogin");
Thread.sleep(2000);
wd.findElement(By.name("identifier")).sendKeys("Email"+Keys.ENTER);
Thread.sleep(3000);
wd.findElement(By.name("password")).sendKeys("Password"+Keys.ENTER);
Thread.sleep(5000);
Then Open any website that uses recaptcha tick on checkmark using this code
String framename=wd.findElement(By.tagName("iframe")).getAttribute("name");
wd.switchTo().frame(framename);
wd.findElement(By.xpath("//span[#id='recaptcha-anchor']")).click();
You won't find any Puzzles or anything.
Bypass as in solve it or bypass as in never get it at all?
To solve it:
sign up with 2captcha, capmonster cloud, deathbycaptcha, etc. and follow their instructions. They will give you a token that you pass with the form.
To never get it at all:
Make sure you have good IP reputation (most important for Cloudflare).
Make sure you have a good browser fingerprint (most important for Distil) - I recommend puppeteer + the stealth plugin.
Ok, so there is a simple python script to solve captcha for you.
It basically read the audio and then use google assistant to convert it into text and paste it.
It is only workable in audio captchas which is given the most case with imahe captcha V2
https://github.com/ohyicong/recaptcha_v2_solver
Disclaimer!
I do not write the script, i just get an idea of doing this but got this brother project so, thought to help others through this.
The simple solution is suspend the program for 10 seconds or more and then when the automated browser opens solve the reCAPTCHA on your own and then the program starts after 10 seconds and execute rest of the program like clicking submit button or other things