Problems with anticaptcha plugin in selenium python - python

I've recently started using selenium for a project I've been working on for a while that involves automation. One of the roadblocks in the plan was the ReCaptcha system, so I decided to use anti-captcha as the service that would solve the captchas when my bot encountered it. I properly installed the plugin and found some test code with selenium on their site.
I've followed the instructions and am receiving no errors while the code is running, but after it times out I am receiving the error message pertaining to this line at the very end:
WebDriverWait(browser, 120).until(lambda x: x.find_element_by_css_selector('.antigate_solver.solved'))
I don't know what I'm doing wrong and would appreciate some help figuring out the problem so I can get the service running. Apologies for my formatting and if my question is not very good I'm new to this.

What is the error message received that indicates there is a problem with the last line?
Does your code include the send form instruction after this line:
Sending form browser.find_element_by_css_selector('input[type=submit]').click()
You may search for more information here:
https://python-anticaptcha.readthedocs.io/en/latest/usage.html#solve-recaptcha

Since you haven't actually added exact errror you are getting, i believe it is a TimeoutException error. Go to your browser console and check for the error shown there. it will most likely be something to related to the api key and low funds in your account.

Related

[Python][Web Scraping] is there a way to prevent cache clearing when executing my script and the browser opens up?

I m a newbie so I will try to explain myself in a way it makes sense.
I produced my first ever python script to scrape data from a web page I use regularly at work. It just prints out couple of values in the console that previously I had to consult manually.
My problem is that every time I execute the script and the browser opens up, it seems the cache is cleared and I have to log in into that work webpage using my personal credentials and do the 2 factor authentication with my phone.
I m wondering wether there is a way to keep the cache for that browser (if I previously already logged into the web page) so I don´t need to go through authentication when I launch my script.
I m using selenium webdriver and chrome, and the option I have configured are these (in screenshot below). Is there perhaps another option I could add to keep cache?
Current options for browser
I tried to find info in the web but so far nothing.Many sites offer a guide on how to perform login by adding lines of code with the username and the password, but I would like to avoid that option as I still would need to use my phone for the 2 factor authentication, and also because this script could be used by some other colleagues in the future.
Thanks a lot for any tip or info :)
After days browsing everywhere, I found this post:
How to save and load cookies using Python + Selenium WebDriver
the second answer is actually the one that saved my life; I just had to add this to my series of options:
chrome_options.add_argument("user-data-dir=selenium")
see the provided link for the complete explanation of the options and imports to use.
Adding that option, I run the script for the first time and I still have to do the login manually and undergo authentication. But when I run it for the second time I don´t need any manual input; the data is scraped from the web, the result is returned and no need any manual action from me.
If anybody is interested in the topic please ping me.
Thanks!

How Can I Make a Web Scraper That Keeps Scraping Data even when the PC is offline using SELENIUM Python

So I have been Working on a Web scraper that scrapes data from discord.
For this I used selenium python. So I want it to keep scraping data even when my Computer is offline.
So after a little research I found that I can use repl.it and uptime bot.
repl.it to run the script on the web and uptime bot to ping it in every 5 minutes.
But when I ran the script on repl.it it opened a small chromium window
which was good for that time
and in that it was prompting the hcaptcha
and here is where the problem began.
I tried Hard to find the class name of the checkbox of the hcaptcha and eventually found it but, then it asked me for selecting the pictures.
there were many solutions for recaptcha but none for hcaptcha.
So I searched it every where but couldn't find a solution that could satisfy my problem
Solutions Iam Looking for the problem:-
1.Any other platform or way i can run my script forever(of course which are not commerical and paid like aws, microsoft and all....)
2.and way to sort that hcaptcha problem (because where ever I find they have a answer for recaptcha not hcaptcha)
Links, Code, And resources I referred while making the project and looking for the solution
1.https://www.youtube.com/watch?v=As-_hfZUyIs(to bypass recaptcha)
2.https://medium.com/analytics-vidhya/how-to-easily-bypass-recaptchav2-with-selenium-7f7a9a44fa9e
3.https://www.browserstack.com/guide/how-to-handle-captcha-in-selenium
4.https://www.reddit.com/r/learnpython/comments/efeaxy/captcha_using_selenium_in_python/
5.https://stackoverflow.com/questions/44187909/python-selenium-and-captcha
6.https://github.com/dessant/buster(recaptch buster)
Issue 1: You could fit the script into Repl.it look here and use a replit auto pinger which pings the service every 5 minutes to keep your project alive.
Issue 2: There is a tampermonkey extention, here and what you could do is create a profile in chrome/any browser (I perfer firefox) and install greasemonkey, install the script, and then check if the captcha element exists and if it does, wait for 40 seconds. After that the captcha should solve by itself. There's also this repo on github if you're interested in a different method of bypassing hCaptcha.
Good luck!

How to be undetectable with chrome webdriver?

I've already seen multiple posts on Stackoverflow regarding this. However, some of the answers are outdated (such as using PhantomJS) and others didn't work for me.
I'm using selenium to scrape a few sports websites for their data. However, every time I try to scrape these sites, a few of them block me because they know I'm using chromedriver. I'm not sending very many requests at all, and I'm also using a VPN. I know the issue is with chromedriver because anytime I stop running my code but try opening these sites on chromedriver, I'm still blocked. However, when I open them in my default web browser, I can access them perfectly fine.
So, I wanted to know if anyone has any suggestions of how to avoid getting blocked from these sites when scraping them in selenium. I've already tried changing the '$cdc...' variable within the chromedriver, but that didn't work. I would greatly appreciate any ideas, thanks!
Obviously they can tell you're not using a common browser. Could it have something to do with the User Agent?
Try it out with something like Postman. See what the responses are. Try messing with the user agent and other request fields. Look at the request headers when you access the site with a regular browser (like chrome) and try to spoof those.
Edit: just remembered this and realized the page might be performing some checks in JS and whatnot. It's worth looking into what happens when you block JS on the site with a regular browser.

How do I debug my appengine site being offline?

Coding n00b here - I have a website here which is a python site on appengine:
http://www.7bks.com/
And it's currently down. Completely unavailable. My appengine dashboard appears normal, no quota denials, no errors and when I try and visit there's no log generated (because I can't even reach the site).
I also can't reach the application directly via the appspot URL so it's not an issue with my domain name.
The Appengine status shows everything is groovy:
http://code.google.com/status/appengine/
So what gives? How do I figure out how to get it back online? Not even sure where to start debugging. I've not pushed any code for a few months so everything should be just ticking over?
UPDATE: hey guys, never got to the bottom of why this was happening but I shifted my DNS to the GoogleDNS (http://code.google.com/speed/public-dns/) and it seems to work fine now so I guess it's solved :)
Check if you have a DNS problem
Open the application with Google Chrome and when you get an error you will see the error code with the explanation on the bottom.
Check if you have a link issue
In a terminal type
traceroute http://www.7bks.com/
and see what it gives
Anyway it is more likely to be a DNS issue.

LinkedIn API OAuth issue

This is my first time using the LinkedIn API. I am using Python.
I am exactly following the steps listed here: https://developer.linkedin.com/documents/getting-oauth-token-python
Everything goes well until I try to get the PIN for the access token. I type into my browser: "https://www.linkedin.com/uas/oauth/authorize?oauth_token=" + oauth_token , and then I get the error "We were unable to find the authorization token."
I also tried to download the full code linked at the top of the page, but that goes to a "page not found".
Anyone have any insight as to why I can't get the access token PIN? Thank you.
I apologize for not seeing this sooner. The code on that page has been restored and you can download it now without issue.
https://developer.linkedin.com/documents/getting-oauth-token-python
If it's still not working, please let us know (I'll check back here, or you can post in our forums). I use essentially that exact code whenever I test our API so it should work.

Categories

Resources