Web site atuomation via python - python

I need to verify data contained in an excel worksheet against a database. The access to the DB is via a web page so I need to login and then manually enter lots of information and submit it one at a time.
So far I have been able to automate the login process via selenium (easy) and Python, my problem is that the web application is written is such a way that once logged in, the original page is closed and a new one is opened so my control over the original one is lost (so I guess). What can I do to gain control over the new page?
I am not a developer so, for those who are helping, please be specific.
Thank you all.

Related

[Python][Web Scraping] is there a way to prevent cache clearing when executing my script and the browser opens up?

I m a newbie so I will try to explain myself in a way it makes sense.
I produced my first ever python script to scrape data from a web page I use regularly at work. It just prints out couple of values in the console that previously I had to consult manually.
My problem is that every time I execute the script and the browser opens up, it seems the cache is cleared and I have to log in into that work webpage using my personal credentials and do the 2 factor authentication with my phone.
I m wondering wether there is a way to keep the cache for that browser (if I previously already logged into the web page) so I don´t need to go through authentication when I launch my script.
I m using selenium webdriver and chrome, and the option I have configured are these (in screenshot below). Is there perhaps another option I could add to keep cache?
Current options for browser
I tried to find info in the web but so far nothing.Many sites offer a guide on how to perform login by adding lines of code with the username and the password, but I would like to avoid that option as I still would need to use my phone for the 2 factor authentication, and also because this script could be used by some other colleagues in the future.
Thanks a lot for any tip or info :)
After days browsing everywhere, I found this post:
How to save and load cookies using Python + Selenium WebDriver
the second answer is actually the one that saved my life; I just had to add this to my series of options:
chrome_options.add_argument("user-data-dir=selenium")
see the provided link for the complete explanation of the options and imports to use.
Adding that option, I run the script for the first time and I still have to do the login manually and undergo authentication. But when I run it for the second time I don´t need any manual input; the data is scraped from the web, the result is returned and no need any manual action from me.
If anybody is interested in the topic please ping me.
Thanks!

How to get notified when new content is pushed to a webpage

I'm looking for a way to scan a website and instantly detect an update, without having to refresh the page. So when a new post is pushed to the webpage, I'd like to be instantly notified. Is there a way to do that without having to refresh the page constantly?
Cheers
What browser are you using? Chrome has an auto refresh extension. Try doing a Google search for the extension. It's very easy to set up. It's more of a timed refresh that you can program. But it works for situations like what you are asking.
Without knowing a bit more about your task, it's hard to give you a clear answer. Typically you would set up some kind of API to determine if data has been updated, rather than scraping the contents of a website directly. See if an API exists, or if you could create one for your purpose.
Using an API
Write a script that calls the API every minute or so (or more often if necessary). Every time you call the API, save the result. Then compare the previous result to the new result - if they're different then the data has been updated.
Scraping a Website
If you do have to scrape a website, this is possible. If you execute an HTTP GET request against a webpage, the response will contain the DOM of the webpage. You can then traverse the DOM to determine the contents of a webpage. Similar to the API example, you can write a script that executes the HTTP request every minute or so, saves the state, and compares it to the previous state. There are numerous libraries out there to help preform HTTP request and traverse the DOM, but without knowing your tech stack I can't really recommend anything.

Use Existing Google Chrome Window To Web Scrape Python

Running Python 3.6 and I'm having a whole lot of issues logging to a site primarily due to captcha. I really only need to search up URLs and retrieve the html on the page but I need to be logged in for certain additional information to appear on the accessible URLs.
I was using urllib to read the URLs but now I was looking for a solution to login and then request information. The automatic route won't seem to work due to those issues, so I'm looking for a method by which I am already logged in on an open browser and python opens up new tabs to search for URLs (the searches can be hidden, they don't have to literally open up new tabs). It appears that when I open new tabs manually on the site it still shows i'm logged in so If i can manually log in each time i want to run the script and then work based off that, it would actually work just fine.
Thanks

populate html form using python and open in browser without submitting

I am trying to write a python script that populates the fields of an html form and then opens that form in a browser WITHOUT submitting it.
I can fill the form and submit it using URllib and urllib2 however I dont want to submit it - I want the person to check the data and then submit it manually.
I have seen this might be possible with Mechanize or Selenium but I want to try and do this with what comes standard (the script will be run on various computers by people who don't know what python is...)
Does anyone know how I could do this?
Opens the form in the browser? This will be tricky, as there is no cross-platform way to open a browser and point it to a URL. On Linux you would probably use xdg-open, on Windows I believe you can just use start, and I have no clue on Mac OS X. But regardless, you would use the subprocess module to open a web browser.
As for the filling it out part...you might be able to replicate the page and serve the local, pre-filled copy with a basic webserver, shutting it down when the user submits the form. I don't think this would be the best idea.
An alternative is using Sikuli script to automate everything - open the user's web browser, populate the fields, maybe even move the mouse cursor to the submit button or highlight it without clicking. That sounds more like what you're trying to achieve.

python web script send job to printer

Is it possible for my python web app to provide an option the for user to automatically send jobs to the locally connected printer? Or will the user always have to use the browser to manually print out everything.
If your Python webapp is running inside a browser on the client machine, I don't see any other way than manually for the user.
Some workarounds you might want to investigate:
if you web app is installed on the client machine, you will be able to connect directly to the printer, as you have access to the underlying OS system.
you could potentially create a plugin that can be installed on the browser that does this for him, but I have no clue as how this works technically.
what is it that you want to print ? You could generate a pdf that contains everything that the user needs to print, in one go ?
You can serve to the user's browser a webpage that includes the necessary Javascript code to perform the printing if the user clicks to request it, as shown for example here (a pretty dated article, but the key idea of using Javascript to call window.print has not changed, and the article has some useful suggestions, e.g. on making a printer-friendly page; you can locate lots of other articles mentioning window.print with a web search, if you wish).
Calling window.print (from the Javascript part of the page that your Python server-side code will serve) will actually (in all browsers/OSs I know) bring up a print dialog, so the user gets system-appropriate options (picking a printer if he has several, maybe saving as PDF instead of doing an actual print if his system supports that, etc, etc).

Categories

Resources