I currently scrape a website and take a screenshot when a certain case happens.
I want to consume less bandwidth so im trying to do it via Requests.
I cant figure out how will i take screenshots but i thought of a workaround which:
Once the certain case happens - it will open chrome as usual - take screenshot - close chrome.
Any smarter way im not thinking about?
Thanks!
Request is a library for making HTTP requests. You can't "take a screenshot" with it, it makes no sense.
Maybe try Selenium instead.
Related
In my company intranet, any request to an external website X in Internet will be redirected to an internal page containing a button that I have to click on. Then the external website X in Internet will be loaded.
I want to write a program that automatically clicks this button for me (so I don't have to click it manually). After that, the program will make the browser redirect to a re-configured website Y (not X) for the purpose of security testing.
I don't have much experience with Python. So I would be really thankful if someone can tell me how I can write such a program.
Many thanks
unfortunately it can get a little complicated once you're interacting with javascript elements like buttons. However, the best way to approach this would be with selenium. There's a slight learning curve but thankfully the documentation is good and there are many resources online to show you how to get started.
Python has Selenium and BS4 library to help You out, but if You are not experienced with python, You might as well pick up node.js and puppeteer, its far superior in my opinion.
I tried to use Selenium (chromedriver) for webscraping, but always get reCaptchas (around 5-8 in a row) which I have to solve.
When I visit the same website manually with Google Chrome, I don't even get one Captcha.
I don't use headless option...
Is there any solution to avoid these Captchas? Or to get maximum 1-2 Captchas for one request? I mean it's not a problem to solve Captchas for me, but 5-8 in a row takes to much time.
There are captcha solvers like 2captcha that solve them at around 15-40 seconds each captcha. Captcha was made to detect bots in various shapes and forms and well... that's what it has done. The simple answer is: no, there is no "bypass"
There are some workarounds to avoid the system as a whole such as using an alt-login, like an app that maybe uses a different API. This can be achieved via appium which is similar to selenium, or by using a HTTPRequest library.
I ran into the same issue. On the net there is a lot of tips that used to work like the suggestion in the comment of using specific headers, especially set the user agent explicitly or slowing down the actions on the page (like clicking) to mock real user actions. I found all of them not working currently with the newest reCaptcha versions and fell back to using non headless mode and manually solve the captcha before my script takes over and does its magic once I passed the captcha.
I am trying to use Python 3.6 to access the snapshot feature of tradingview to get the url created. I was able to use selenium and be successful but it was taking along 4 seconds to execute. I was hoping to use requests library to achieve this.
First of; is this even possible? url is https://s.tradingview.com/widgetembed/?symbol=DRYS. There is a little camera on the right side. The html scraper cant be used since the url is generated after pressing the snapshot icon.
Any pointers can help.
We are testing videos in our website, and in order to play it should authenticate the user, get the authorization for the device he is playing and so on, check his entitlements etc.,
we have many varieties of network and video types to test. And I am in process of writing script which checks one of those calls are working fine for all type of videos.
Which is a POST call and need to build the param/data to post. Theres no direct way of getting one of the param value and this is how we do currently. We go to browser and play the video. Open the dev tools like firebug and capture the param value from the request header of the same call and use it in my script for rest of other 100 different calls to verify programmatically.
Is there a way in python to do the steps which we are manually doing? like open a url and capture all the calls which happens at the background just like how firebug does?
I am trying firePython, firelogger, mechanize to see if they help . but I have invested so much time figuring out doing this, so thought its time to approach some expert advice.
If you haven't looked at the Requests library, it's generally quite pleasant to work with and might make your life easier.
My python program basically submits a form by loading a URL. There is a security code that seems to change every second so that you have to actually be on the website to enter the form.
For example,
http://www.locationary.com/prizes/index.jsp?ACTION_TOKEN=index_jsp$JspView$BetAction&inTickets=125000000&inSecureCode=091823021&inCampaignId=3060745
The only solution I can think of is using something like Selenium...I don't know any other way of kind of simulating a web browser but not really having it be as heavy and slow as a web browser...any ideas? Or is there a way I can do this without browser automation?
EDIT:
Response to first answer: I DID get the security code using urllib...the problem is that it seems to already have changed by the time I try to load my submission url...so I'm just guessing/assuming that you have to do it in realtime...
Yes, you'll need to get the security code programmatically since it changes every time. You can do this manually with urllib, or you can use mechanize or Selenium to make things easier.