I want to do a specific sets of operations using Python:
1- Access a webpage
2- Click on a page button
3- Clear cache and cookies and any other site data from the browser memory.
4- Do the above in a loop.
I'm a complete novice when it comes to interacting with the web using Python.
The language itself however I'm intermediate in.
I want some learning material that I can use to understand the basic HTTP framework and be able to interact with a webpage using Python.
Which libraries, tutorials, documentation I can use to learn further?
Selenium sounds like your best bet! It's an open-source web-based automation tool.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
browser = webdriver.Firefox()
browser.get('http://www.yahoo.com')
assert 'Yahoo' in browser.title
elem = browser.find_element_by_name('p') # Find the search box
elem.send_keys('seleniumhq' + Keys.RETURN)
browser.quit()
This snippet will remove cache, etc:
from selenium.webdriver.support.ui import WebDriverWait
def get_clear_browsing_button(driver):
"""Find the "CLEAR BROWSING BUTTON" on the Chrome settings page."""
return driver.find_element_by_css_selector('* /deep/ #clearBrowsingDataConfirm')
def clear_cache(driver, timeout=60):
"""Clear the cookies and cache for the ChromeDriver instance."""
# navigate to the settings page
driver.get('chrome://settings/clearBrowserData')
# wait for the button to appear
wait = WebDriverWait(driver, timeout)
wait.until(get_clear_browsing_button)
# click the button to clear the cache
get_clear_browsing_button(driver).click()
# wait for the button to be gone before returning
wait.until_not(get_clear_browsing_button)
Related
So I'm trying to learn Selenium for automated testing. I have the Selenium IDE and the WebDrivers for Firefox and Chrome, both are in my PATH, on Windows. I've been able to get basic testing working but this part of the testing is eluding me. I've switched to using Python because the IDE doesn't have enough features, you can't even click the back button.
I'm pretty sure this has been answered elsewhere but none of the recommended links provided an answer that worked for me. I've searched Google and YouTube with no relevant results.
I'm trying to find every link on a page, which I've been able to accomplish, even listing the I would think this would be just a default test. I even got it to PRINT the text of the link but when I try to click the link it doesn't work. I've tried doing waits of various sorts, including
visibility_of_any_elements_located AND time.sleep(5) To wait before trying to click the link.
I've tried this to click the link after waiting self.driver.find_element(By.LINK_TEXT, ("lnktxt")).click(). But none work, not in below code, the below code works, listing the URL Text, the URL and the URL Text again, defined by a variable.
I guess I'm not sure how to get a variable into the By.LINK_TEXT or ...by_link_text statement, assuming that would work. I figured if I got it into the variable I could use it again. That worked for print but not for click()
I basically want to be able to load a page, list all links, click a link, go back and click the next link, etc.
The only post this site recommended that might be helpful was...
How can I test EVERY link on the WEBSITE with Selenium
But it's Java based and I've been trying to learn Python for the past month so I'm not ready to learn Java just to make this work. The IDE does not seem to have an easy option for this, or from all my searches it's not documented well.
Here is my current Selenium code in Python.
import pytest
import time
import json
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.support import expected_conditions
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
wait_time_out = 15
class TestPazTestAll2():
def setup_method(self, method):
driver = webdriver.Firefox()
self.driver = webdriver.Firefox()
self.vars = {}
def teardown_method(self, method):
self.driver.quit()
def test_pazTestAll(self):
self.driver.get('https://poetaz.com/poems/')
lnks=self.driver.find_elements_by_tag_name("a")
print ("Total Links", len(lnks))
# traverse list
for lnk in lnks:
# get_attribute() to get all href
print(lnk.get_attribute("text"))
lnktxt = (lnk.get_attribute("text"))
print(lnk.get_attribute("href"))
print(lnktxt)
driver.quit()
Again, I'm sure I missed something in my searches but after hours of searching I'm reaching out.
Any help is appreciated.
I basically want to be able to load a page, list all links, click a link, go back and click the next link, etc.
I don't recommend doing this. Selenium and manipulating the browser is slow and you're not really using the browser for anything where you'd really need a browser.
What I recommend is simply sending requests to those scraped links and asserting response status codes.
import requests
link_elements = self.driver.find_elements_by_tag_name("a")
urls = map(lambda l: l.get_attribute("href"), link_elements)
for url in urls:
response = requests.get(url)
assert response.status_code == 200
(You also might need to prepend some base url to those strings found in href attributes.)
I plan to build a scraper that'll utilize both Selenium and BeautifulSoup.
I'm struggling to click the load more button with selenium. I've managed to detect the button, scroll to it etc. - can't seem to figure out a way to continuously click the button.
Any suggestions on how to pass this hurdle?
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import TimeoutException, NoSuchElementException
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time, requests
from bs4 import BeautifulSoup
def search_agent(zip):
location = bot.find_element_by_name('hheroquotezip')
time.sleep(3)
location.clear()
location.send_keys(zip)
location.submit()
def load_all_agents():
# click more until no more results to load
while True:
try:
#more_button = wait.until(EC.visibility_of_element_located((By.CLASS_NAME, 'results.length'))).click()
more_button = wait.until(EC.visibility_of_element_located((By.XPATH, '//*[#id="searchResults"]/div[3]/button'))).click()
except TimeoutException:
break
# wait for results to load
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, '.seclection-result .partners-detail')))
print ("Complete")
bot.quit()
#define Zip for search query
zip = 20855
bot = webdriver.Safari()
wait = WebDriverWait(bot, 10)
#fetch agents page
bot.get('https://www.erieinsurance.com/find-an-insurance-agent')
search_agent(zip)
load_all_agents()
With the above approach, the console spits out these errors:
[Error] Refused to load https://9203275.fls.doubleclick.net/activityi;src=9203275;type=agent0;cat=agent0;ord=7817740349177;gtm=2wg783;auiddc=373080108.1594822533;~oref=https%3A%2F%2Fwww.erieinsurance.com%2Ffind-an-insurance-agent-results%3Fzipcode%3D20855? because it does not appear in the frame-src directive of the Content Security Policy.
[Error] Refused to connect to https://api.levelaccess.net/analytics/3.0/results because it does not appear in the connect-src directive of the Content Security Policy.
Creating an answer to post a couple of images.
When i ran the attached script in chrome it worked fine.
When #furas did the same in firefox he had the same result
I ran the same script 10 times back to back and i wasn't refused.
What i note based on the error is that iframe seems broswer sensitive:
In Chrome this header contains chromium scripts:
In Firefox it contains no scripts:
Have a look and see what you get manually in your safari.
A simple answer might be to not use safari - use chrome or FF. Is that an option? (if it MUST be safari just say and i'll look again.)
Finally - couple of quick additional notes.
The site is using angular, so you might want to consider protractor if you're struggling with synchronisation. (protractor helps with some script-syncing capailies)
Also worth a note - don't feel you have to land on the home page and then navigate as user. Update your URL to the search results page and feed in the zip code and save yourself some time:
https://www.erieinsurance.com/find-an-insurance-agent-results?zipcode=20855
[edit/update]
This the same thing? https://github.com/SeleniumHQ/selenium/issues/458
Closed bug in 2016 around "Content Security Policies" - logged as an apple thing.
I need to download a massive amount of excel-files (estimated: 500 - 1000) from sellercentral.amazon.de. Manually downloading is not an option, as every download needs several clicks until the excel pops up.
Since amazon cannot provide me a simple xml with its structure, I decided to automate this on my own. The first thing coming to mind was Selenium and Firefox.
The Problem:
A login to sellercentral is required, as well as 2-factor-authentication (2FA). So if I login once, i can open another tab, enter sellercentral.amazon.de and am instantly logged in.
I can even open another instance of the browser, and be instantly logged in there too. They might be using session-cookies. The target URL to "scrape" is https://sellercentral.amazon.de/listing/download?ref=ag_dnldinv_apvu_newapvu .
But when I open the URL from my python-script with selenium webdrive, a new instance of the browser is launched, in which I am not logged in. Even though, there are instances of firefox running at the same time, in which I am logged in. So I guess the instances launched by selenium are somewhat different.
What I've tried:
I tried setting a timedelay after the first .get() (to open site), then I'll manually login, and after that redoing the .get(), which makes the script go on for forever.
from selenium import webdriver
import time
browser = webdriver.Firefox()
# Wait for website to fire onload event
browser.get("https://sellercentral.amazon.de/listing/download?ref=ag_dnldinv_apvu_newapvu")
time.sleep(30000)
browser.get("https://sellercentral.amazon.de/listing/download?ref=ag_dnldinv_apvu_newapvu")
elements = browser.find_elements_by_tag_name("browse-node-component")
print(str(elements))
What am I looking for?
Need solution to use the two factor authentication token from google authenticator.
I want the selenium to be opened up as a tab in the existing instance of the firefox browser, where I will have already logged in beforehand. Therefore no login (should be) required and the "scraping" and downloading can be done.
If there's no direct way, maybe someone comes up with a workaround?
I know selenium cannot download the files itself, as the popups are no longer part of the browser. I'll fix that when I get there.
Important Side-Notes:
Firefox is not a given! I'll gladly accept a solution for any browser.
Here is the code that will read the google authenticator token and used in the login. Used js to open the new tab.
Install pyotp package before running the test code.
pip install pyotp
Test code:
from pyotp import *
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
driver = webdriver.Firefox()
driver.get("https://sellercentral.amazon.de/listing/download?ref=ag_dnldinv_apvu_newapvu")
wait = WebDriverWait(driver,10)
# enter the email
email = wait.until(EC.presence_of_element_located((By.XPATH, "//input[#name='email']")))
email.send_keys("email goes here")
# enter password
driver.find_element_by_xpath("//input[#name='password']").send_keys("password goes here")
# click on signin button
driver.find_element_by_xpath("//input[#id='signInSubmit']").click()
#wait for the 2FA feild to display
authField = wait.until(EC.presence_of_element_located((By.XPATH, "xpath goes here")))
# get the token from google authenticator
totp = TOTP("secret goes here")
token = totp.now()
print (token)
# enter the token in the UI
authField.send_keys(token)
# click on the button to complete 2FA
driver.find_element_by_xpath("xpath of the button goes here").click()
# now open new tab
driver.execute_script("""window.open("https://sellercentral.amazon.de/listing/download?ref=ag_dnldinv_apvu_newapvu")""")
# continue with your logic from here
I have written a small python script with selenium to search Google and open the first link but whenever I run this script, it opens a console and open a new Chrome window and run this script in that Chrome window.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import pyautogui
def main():
setup()
# open Chrome and open Google
def setup():
driver = webdriver.Chrome(r'C:\\python_programs'+
'(Starting_out_python)'+
'\\chromedriver.exe')
driver.get('https://www.google.com')
assert 'Google' in driver.title
mySearch(driver)
#Search keyword
def mySearch(driver):
search = driver.find_element_by_id("lst-ib")
search.clear()
search.send_keys("Beautiful Islam")
search.send_keys(Keys.RETURN)
first_link(driver)
#click first link
def first_link(driver):
link = driver.find_elements_by_class_name("r")
link1 = link[0]
link1.click()
main()
How can I open this in the same browser I am using?
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains
def main():
setup()
# open Chrome and open Google
def setup():
driver = webdriver.Chrome()
driver.get('https://www.google.com')
assert 'Google' in driver.title
mySearch(driver)
#Search keyword
def mySearch(driver):
search = driver.find_element_by_id("lst-ib")
search.clear()
search.send_keys("test")
search.send_keys(Keys.RETURN)
first_link(driver)
#click first link
def first_link(driver):
link = driver.find_elements_by_xpath("//a[#href]")
# uncomment to see each href of the found links
# for i in link:
# print(i.get_attribute("href"))
first_link = link[0]
url = first_link.get_attribute("href")
driver.execute_script("window.open('about:blank', 'tab2');")
driver.switch_to.window("tab2")
driver.get(url)
# Do something else with this new tab now
main()
A few observation: the first link you get might not be the first link you want. In my case, the first link is the login to Google account. So you might want to do some more validation on it until you open it, like check it's href property, check it's text to see if it matches something etc.
Another observation is that there are easier ways of crawling google search results and using googles API directly or a thirdparty implementation like this: https://pypi.python.org/pypi/google or https://pypi.python.org/pypi/google-search
To my knowledge, there's no way to attach Selenium to an already-running browser.
More to the point, why do you want to do that? The only thing I can think of is if you're trying to set up something with the browser manually, and then having Selenium do things to it from that manually-set-up state. If you want your tests to run as consistently as possible, you shouldn't be relying on a human setting up the browser in a particular way; the script should do this itself.
I started to learn scrape websites with Python and Selenium. I choose selenium because I need to navigate through the website and I also have to login.
I wrote an script that is able to open a firefox window and it opens the website www.flashscore.com. With this script I also be able to login and navigate to the different sports section (main menu) they have.
The code:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
# open website
driver = webdriver.Firefox()
driver.get("http://www.flashscore.com")
# login
driver.find_element_by_id('signIn').click()
username = driver.find_element_by_id("email")
password = driver.find_element_by_id("passwd")
username.send_keys("*****")
password.send_keys("*****")
driver.find_element_by_name("login").click()
# go to the tennis section
link = driver.find_element_by_link_text('Tennis')
link.click()
#go to the live games tab in the tennis section
# ?????????????????????????????'
Then it went more difficult. I also want to navigate to, for example, the sections "live games" and "finished" tabs in the sports sector. This part wouldn't work. I tried many things but I can't get into one of this tabs. When analyzing the website I see that they use some Iframes. I also find some code to switch to a Iframes window. But the problem is, I can't find the name of the Iframe where the tabs are that I want to click on. Maybe the Iframes are not the problem and do I look to the wrong way. (Maybe the problem is caused by some javascript?)
Can anybody please help me with this?
No, the iframes are not the problem in this case. The "Live games" element is not inside an iframe. Locate it by link text and click:
live_games_link = driver.find_element_by_link_text("LIVE Games")
live_games_link.click()
You may need to wait for this link to be clickable before actually trying to click it:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
wait = WebDriverWait(driver, 10)
live_games_link = wait.until(EC.element_to_be_clickable((By.LINK_TEXT, "LIVE Games")))
live_games_link.click()