Running Python 3.6 and I'm having a whole lot of issues logging to a site primarily due to captcha. I really only need to search up URLs and retrieve the html on the page but I need to be logged in for certain additional information to appear on the accessible URLs.
I was using urllib to read the URLs but now I was looking for a solution to login and then request information. The automatic route won't seem to work due to those issues, so I'm looking for a method by which I am already logged in on an open browser and python opens up new tabs to search for URLs (the searches can be hidden, they don't have to literally open up new tabs). It appears that when I open new tabs manually on the site it still shows i'm logged in so If i can manually log in each time i want to run the script and then work based off that, it would actually work just fine.
Thanks
Related
I m a newbie so I will try to explain myself in a way it makes sense.
I produced my first ever python script to scrape data from a web page I use regularly at work. It just prints out couple of values in the console that previously I had to consult manually.
My problem is that every time I execute the script and the browser opens up, it seems the cache is cleared and I have to log in into that work webpage using my personal credentials and do the 2 factor authentication with my phone.
I m wondering wether there is a way to keep the cache for that browser (if I previously already logged into the web page) so I don´t need to go through authentication when I launch my script.
I m using selenium webdriver and chrome, and the option I have configured are these (in screenshot below). Is there perhaps another option I could add to keep cache?
Current options for browser
I tried to find info in the web but so far nothing.Many sites offer a guide on how to perform login by adding lines of code with the username and the password, but I would like to avoid that option as I still would need to use my phone for the 2 factor authentication, and also because this script could be used by some other colleagues in the future.
Thanks a lot for any tip or info :)
After days browsing everywhere, I found this post:
How to save and load cookies using Python + Selenium WebDriver
the second answer is actually the one that saved my life; I just had to add this to my series of options:
chrome_options.add_argument("user-data-dir=selenium")
see the provided link for the complete explanation of the options and imports to use.
Adding that option, I run the script for the first time and I still have to do the login manually and undergo authentication. But when I run it for the second time I don´t need any manual input; the data is scraped from the web, the result is returned and no need any manual action from me.
If anybody is interested in the topic please ping me.
Thanks!
Are there any alternatives to Selenium that don't require a web driver or browser to operate? I recently moved my code over to a Google Cloud VM instance, and when I run it there are multiple errors. I've been trying to get it to work for hours but just can't (no luck with PhantomJS, Chrome and GeckoDriver - tried re-downloading browsers, editing the sources.list file e.c.t.).
The page I'm web scraping uses JavaScript to load in numbers, which I was I initially chose Selenium. Everything else works perfectly though!
You could simply use the request library.
https://requests.readthedocs.io/en/master/
https://anaconda.org/anaconda/requests
You would then need to send a GET or POST request to the server.
If you do not know how to generate a proper POST request, simply try to "record" it.
If you have chrome, got to the page you want to navigate, press F12, navigate to the "Network" section and write method:POST into the filter.
Further info here:
https://stackoverflow.com/a/39661536/11971785
At first it is a bit more confusing than selenium, but once you understand it its waaaay better in my opinion.
Also the Java values shown on the page can usually be simply read out of the java code which is returned by your request.
No web driver or anything required and a lot more stable and customizable.
I need to verify data contained in an excel worksheet against a database. The access to the DB is via a web page so I need to login and then manually enter lots of information and submit it one at a time.
So far I have been able to automate the login process via selenium (easy) and Python, my problem is that the web application is written is such a way that once logged in, the original page is closed and a new one is opened so my control over the original one is lost (so I guess). What can I do to gain control over the new page?
I am not a developer so, for those who are helping, please be specific.
Thank you all.
I am trying to write a python script that populates the fields of an html form and then opens that form in a browser WITHOUT submitting it.
I can fill the form and submit it using URllib and urllib2 however I dont want to submit it - I want the person to check the data and then submit it manually.
I have seen this might be possible with Mechanize or Selenium but I want to try and do this with what comes standard (the script will be run on various computers by people who don't know what python is...)
Does anyone know how I could do this?
Opens the form in the browser? This will be tricky, as there is no cross-platform way to open a browser and point it to a URL. On Linux you would probably use xdg-open, on Windows I believe you can just use start, and I have no clue on Mac OS X. But regardless, you would use the subprocess module to open a web browser.
As for the filling it out part...you might be able to replicate the page and serve the local, pre-filled copy with a basic webserver, shutting it down when the user submits the form. I don't think this would be the best idea.
An alternative is using Sikuli script to automate everything - open the user's web browser, populate the fields, maybe even move the mouse cursor to the submit button or highlight it without clicking. That sounds more like what you're trying to achieve.
Is it possible for my python web app to provide an option the for user to automatically send jobs to the locally connected printer? Or will the user always have to use the browser to manually print out everything.
If your Python webapp is running inside a browser on the client machine, I don't see any other way than manually for the user.
Some workarounds you might want to investigate:
if you web app is installed on the client machine, you will be able to connect directly to the printer, as you have access to the underlying OS system.
you could potentially create a plugin that can be installed on the browser that does this for him, but I have no clue as how this works technically.
what is it that you want to print ? You could generate a pdf that contains everything that the user needs to print, in one go ?
You can serve to the user's browser a webpage that includes the necessary Javascript code to perform the printing if the user clicks to request it, as shown for example here (a pretty dated article, but the key idea of using Javascript to call window.print has not changed, and the article has some useful suggestions, e.g. on making a printer-friendly page; you can locate lots of other articles mentioning window.print with a web search, if you wish).
Calling window.print (from the Javascript part of the page that your Python server-side code will serve) will actually (in all browsers/OSs I know) bring up a print dialog, so the user gets system-appropriate options (picking a printer if he has several, maybe saving as PDF instead of doing an actual print if his system supports that, etc, etc).