I am attempting to semi-automate my department's workflow, and part of it includes this project I'm doing. The sections I am currently struggling with is "clicking" a button on a webpage.
The webpage has data in a grid (looks like an Excel sheet) and has multiple pages that I want to parse out and use as data for the automation, and there is a button on the webpage that, when clicked, converts all the pages of that data into an Excel sheet and saves it where I need on my computer. I want to constantly update that Excel sheet with the changes as more data is added to the webpage.
However, I do not want to always have that webpage open and have automatic button presses that constantly saves the webpage to Excel. Rather, I want to have it running "in the background" so the only progress I see is the Excel sheet getting updated. Is there a way to "click" that 'Convert to Excel' button without actually going on the webpage to do it?
I was looking at Python libraries like Requests and bs4, but I'm not sure which methods might be applicable for this, if those are even what I should be looking for.
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("http://example.com")
button = driver.find_element_by_id('buttonID')
button.click()
What you want is called a robot and for Python I highly recommend looking into Selenium. Just google for python selenium tutorial or start by reading this: https://selenium-python.readthedocs.io/getting-started.html#simple-usage
Related
I'm using Python to access the SEC's website for 10-K downloadable spreadsheets. I created code that requests user input for a Stock Ticker Symbol, successfully opens Firefox, accesses the Edgar search page at https://www.sec.gov/edgar/searchedgar/companysearch.html, and inputs the correct ticker symbol. The problem is downloading the spreadsheet automatically and saving it.
Right now, I can manually click on "View Excel Spreadsheet", and the spreadsheet automatically downloads. But when I run my Python code, I get a dialog box from Firefox. I've set Firefox to automatically download, I've tried using 'find_element_by_xpath', 'find_element_by_css_selector' and both do not work to simply download the file. Both those methods merely call up the same dialog box. I tried 'find_element_by_link_text' and got an error message about not being able to find "view Excel Spreadsheet". My example ticker symbol was CAT for Caterpillar (NYSE: CAT). My code is below:
import selenium.webdriver.support.ui as ui
from pathlib import Path
import selenium.webdriver as webdriver
import time
ticker = input("please provide a ticker symbol: ")
# can do this other ways, but will create a function to do this
def get_edgar_results(ticker):
url = "https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=" + str(ticker) + "&type=10-k&dateb=20200501&count=20"
# define variable that opens Firefox via my executable path for geckodriver
driver = webdriver.Firefox(executable_path=r"C:\Program Files\JetBrains\geckodriver.exe")
# timers to wait for the webpage to open and display the page itself
wait = ui.WebDriverWait(driver,40)
driver.set_page_load_timeout(40)
driver.get(url)
# timers to have page wait for the page to load.
# seemed that the total amount of time was necessary; not sure about these additional lines
driver.set_page_load_timeout(50)
wait = ui.WebDriverWait(driver, 50)
time.sleep(30)
# actual code to search the resulting page for the button to click and access the excel document for download
annual_links = driver.find_element_by_xpath('//*[#id="interactiveDataBtn"]')
annual_links.click()
# need to download the excel spreadsheet itself in the "financial report"
driver.set_page_load_timeout(50)
wait = ui.WebDriverWait(driver, 50)
excel_sheet = driver.find_element_by_xpath('/html/body/div[5]/table/tbody/tr[1]/td/a[2]')
excel_sheet.click()
# i'm setting the resulting dialog box to open and download automatically from now on. if i want to change it back
# i'll need to use this page: https://support.mozilla.org/en-US/kb/change-firefox-behavior-when-open-file
# Testing showed that dialog box "open as" probably suits my needs better than 'save'.
driver.close()
driver.quit()
get_edgar_results(ticker)
Any help or suggestions are greatly appreciated. Thanks!!
This is not so much a recommendation based on your actual code, or how Selenium works, but more general advice when trying to gather information from the web.
Given the opportunity, accessing a website through its API is far more friendly to programming than attempting the same task through Selenium. When you use Selenium for webscraping, very often the websites do not behave in the same way they do when accessed through a normal browser. This could be for any number of reasons, not thee least of which could be websites intentionally preventing automated browsers like Selenium from accessing them.
In this case, EDGAR SEC provides an HTTPS access service through which you should be able to get the information you're looking for.
Without digging too deeply into this data, it should not be tremendously difficult to instead request this information with an http request library like requests, and save it that way.
import requests
result = requests.get("https://www.sec.gov/Archives/edgar/data/18230/000001823020000214/Financial_Report.xlsx")
with open("file.xlsx", "wb") as excelFile:
excelFile.write(result.content)
The only difficulty comes with getting the stock ticker's CIK to build the above URL, but that shouldn't be too hard with the same API information.
The EDGAR website fairly transparently exposes you to its data through its URLs. You can bypass all the Selenium weirdness and instead just build the URL, and request the information directly without loading all the JavaScript, etc.
EDIT: You can also browse through this information in a more programmatic fashion, too. The link above mentions that each directory within edgar/full-index also provides a JSON file, which is easily computer-readable. So you could request https://www.sec.gov/Archives/edgar/full-index/index.json, parse out the year you want, request that year, parse out the quarter you want, request that quarter, then parse out the company you want, and request that company's information, etc.
For instance, to get the CIK number of Caterpillar, you would get and parse thee company.gz file from https://www.sec.gov/Archives/edgar/full-index/2020/QTR4.json, parse it into a dataframe, find the line with CATERPILLAR INC on it, find the CIK and accession numbers from the associated .txt file, and then find the right URL to download their Excel file. A bit circuitous, but if you can work out a way to just skip to the CIK number you can cut down on the number of requests needed.
I'm new at Python and I need expert guidance for the project I'm trying to finish at work, as none of my coworkers are programmers.
I'm making a script that logs into a website and pulls a CSV dataset. Here are the steps that I'd like to automate:
Open chrome, go to a website
Login with username/password
Navigate to another internal site via menu dropdown
Input text into a search tag box or delete search tags, e.g. "Hours", press "Enter" or "Tab" to select (repeat this for 3-4 search tags)
Click "Run data"
Wait until data loads, then click "Download" to get a CSV file with 40-50k rows of data
Repeat this process 3-4 times for different data pulls, different links and different search tags
This process usually takes 30-40 minutes for a total of 4 or 5 data pulls each week so it's like watching paint dry.
I've tried to automate this using the pyautogui module, but it isn't working out for me. It works too fast, or doesn't work at all. I think I'm using it wrong.
This is my code:
import webbrowser
import pyautogui
#pyautogui.position()
#print(pyautogui.position())
#1-2
pyautogui.FAILSAFE = True
chrome_path = 'open -a /Applications/Google\ Chrome.app %s'
#2-12
url = 'http://Google.com/'
webbrowser.get(chrome_path).open(url)
pyautogui.moveTo(185, 87, duration=0.25)
pyautogui.click()
pyautogui.typewrite('www.linkedin.com')
pyautogui.press('enter')
#loginhere? Research
In case pyautogui is not suited for this task, can you recommend an alternative way?
The way you are going about grabbing your data is very error prone and not how people generally go about grabbing data from websites. What you want is a web scraper, which allows you to grab information from websites or some companies provide API's that allow you easier access to the data.
To grab information from LinkedIn it has a built in API. You did mention that you were navigating to another site though in which case I would see if that site has an API or look into using Scrapy, a web scraper that should allow you to pull the information you need.
Sidenote: You can also look into synchronous and asynchronous programming with python to make multiple requests faster/easier
In python you can open a web browser like this...
import webbrowser
webbrowser.open("stackoverflow.com")
This method opens a new tab EVERY time the page is called. I want to create a web page with text boxes, graphic (SVG) devices, etc... then pass variables to it. Basically... use the browser as a display screen.
The HTML page would reside in the same folder with the python code... so this works just fine...
import webbrowser
webbrowser.open("sample.html")
The issue is... if I place this in a timer that updates every second... I get tab after tab... but what I want is for it to open the page ONCE, then just pass data to it as if I had used a SUBMIT button...
My code would generate the appropriate text... URL plus data... then pass it as a long URL.
webbrowser.open("sample.html?alpha=50&beta=100")
The page would pull the variables "alpha" and "beta", then shove the data into some graphic device using javascript. I have had great success manipulating SVG this way... http://askjerry.info/SVG for example.
(Feel free to grab my graphics if you like.)
Is it possible to keep updating a SINGLE page/window instead of a new tab every time??
Thanks,
Jerry
Use the selenium module. The .get() method actually just opens the given url in the same tab and leaves the old url. In fact, I think there's even a .refresh().
From this question: Refresh a local web page using Python
from selenium import webdriver
import time
driver = webdriver.Firefox()
driver.get('URL')
while True:
time.sleep(20)
driver.refresh()
driver.quit()
Where you can replace your url and parameters with 'URL' - but if you want to pass data from python to html/javascript you will be better off learning flask or something similar. Then you can update your page using ajax which will make your graphics look nicer and will be tractable if you need to pass more data than just alpha and beta.
I want to trigger a button of a html file.
There is a web site in which there are number of options and a button.
After clicking on the button, using the options, a html table is created on next page.
I want to automate the process but I dont know how I can trigger a button using python.
DO anyone knows about the same?
You can use windmill, mechanize or selenium RC.
When I try to automatically download a file from some webpage using Python,
I get Webpage Dialog window (I use IE). The window has two buttons, such as 'Continue' and 'Cancel'. I cannot figure out how to click on the Continue Button. The problem is
that I don't know how to control Webpage Dialog with Python. I tried to use
winGuiAuto to find the controls of the window, but it fails to recognize any Button type
controls... An ideas?
Sasha
A clarification of my question:
My purpose is to download stock data from a certain web site. I need to perform it for many stocks so I need python to do it for me in a repetitive way. This specific site exports the data by letting me download it in Excel file by clicking a link. However after clicking the link I get a Web Page dialog box asking me if I am sure that I want to download this file. This Web page dialog is my problem - it is not an html page and it is not a regular windows dialog box. It is something else and I cannot configure how to control it with python. It has two buttons and I need to click on one of them (i.e. Continue). It seems like it is a special kind of window implemented in IE. It is distinguished by its title which looks like this: Webpage Dialog -- Download blalblabla. If I click Continue mannually it opens a regular windows dialog box (open,save,cancel) which i know how to handle with winGuiAuto library. Tried to use this library for the Webpage Dialog window with no luck. Tried to recognize the buttons with Autoit Info tool -no luck either. In fact, maybe these are not buttons, but actually links, however I cannot see the links and there is no source code visible... What I need is someone to tell me what this Web page Dialog box is and how to control it with Python. That was my question.
You can't, and you don't want to. When you ask a question, try explaining what you are trying to achieve, and not just the task immediately before you. You are likely barking down the wrong path. There is some other way of doing what you are trying to do.
The title 'Webpage Dialog' suggests that is a Javascript-generated input box, hence why you can't access it via winGuiAuto. What you're asking directly is unlikely to be possible.
However, making the assumption that what you want to do is just download this data from the site, why are you using the GUI at all? Python provides everything you need to download files from the internet without controlling IE. The process you will want to follow is:
Download the host page
Find the url for your download in the page (if it changes)
Download the file from that url to a local file
In Python this would look something like this:
import urllib,re
f = urllib.urlopen('http://yoursitehere') # Original page where the download button is
html = f.read()
f.close()
m = re.search('/[\'"](.*\.xls)["\']/', html, re.S) # Find file ending .xls in page
if m:
urllib.urlretrieve(m.group(1), 'local_filename.xls') # Retrieve the Excel file
It is better to use selenium Python bindings:
from selenium import webdriver
from selenium.webdriver.common import alert
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.common.exceptions import TimeoutException
class AlertsManager:
def alertsManager(self,url):
self.url_to_visit=url
self.driver=webdriver.Ie()
self.driver.get(self.url_to_visit)
try:
while WebDriverWait(self.driver,1).until(EC.alert_is_present()):
self.alert=self.driver.switch_to_alert()
self.driver.switch_to_alert().accept()
except TimeoutException:
pass
if __name__=='__main__':
AM=AlertsManager()
url="http://htmlite.com/JS006.php" # This website has 2 popups
AM.alertsManager(url)