Using Python 3.4 and Selenium
I'm trying to test the current webpage vs a string. So for example:
while(webdriver.current_url == "https://www.youtube.com/"):
print("sleep")
time.sleep(5)
However, this does not work. I've tried printing out the links and just copying and pasting it to the string portion of my check, but that doesn't work either.
Any help is greatly appreciated.
My guess is that webdriver.current_url does not return a string, but I've encased it in str() in python and that still doesn't work. I've also tried making the current_url smaller, by doing current_url[1:-1] and so on, that hasn't helped so I'm not sure what other things I can try.
I tried with python 3.4, the code you shared is working for me.
from selenium import webdriver
import time
driver = webdriver.Firefox()
driver.get("https://www.youtube.com/")
time.sleep(5)
print(driver.current_url, type(driver.current_url), type("https://www.youtube.com/"))
while(driver.current_url == "https://www.youtube.com/"):
print("sleep")
time.sleep(5)
Add the following line for debugging purpose before while loop:
print webdriver.current_url, type(webdriver.current_url) # prints types as 'unicode' for me, but still code is working fine. tried with python 2.7
while(webdriver.current_url == "https://www.youtube.com/"):
print("sleep")
time.sleep(5)
Suggest to check what the value is being returned by webdriver.current_url
YouTube might be stripping the trailing / or adding query string parameters automatically after you navigate to the URL. Try:
while('https://www.youtube.com' in webdriver.current_url):
webdriver is the name of the python module you should have imported with: from selenium import webdriver. Therefore, it wouldn't even have a current_url property.
did you mean something like this?
from selenium import webdriver
driver = webdriver.Chrome()
driver.get('https://www.youtube.com/')
if driver.current_url == "https://www.youtube.com/":
print('look, they are equal')
(notice I am getting the value of current_url from the webdriver.Chrome instance I create)
Related
I have started selenium using python. I am able to change the message text using find_element_by_id. I want to do the same with find_element_by_xpath which is not successful as the xpath has two instances. want to try this out to learn about xpath.
I want to do web scraping of a page using python in which I need clarity on using Xpath mainly needed for going to next page.
#This code works:
import time
import requests
import requests
from selenium import webdriver
driver = webdriver.Chrome()
url = "http://www.seleniumeasy.com/test/basic-first-form-demo.html"
driver.get(url)
eleUserMessage = driver.find_element_by_id("user-message")
eleUserMessage.clear()
eleUserMessage.send_keys("Testing Python")
time.sleep(2)
driver.close()
#This works fine. I wish to do the same with xpath.
#I inspect the the Input box in chrome, copy the xpath '//*[#id="user-message"]' which seems to refer to the other box as well.
# I wish to use xpath method to write text in this box as follows which does not work.
driver = webdriver.Chrome()
url = "http://www.seleniumeasy.com/test/basic-first-form-demo.html"
driver.get(url)
eleUserMessage = driver.find_elements_by_xpath('//*[#id="user-message"]')
eleUserMessage.clear()
eleUserMessage.send_keys("Test Python")
time.sleep(2)
driver.close()
To elaborate on my comment you would use a list like this:
eleUserMessage_list = driver.find_elements_by_xpath('//*[#id="user-message"]')
my_desired_element = eleUserMessage_list[0] # or maybe [1]
my_desired_element.clear()
my_desired_element.send_keys("Test Python")
time.sleep(2)
The only real difference between find_elements_by_xpath and find_element_by_xpath is the first option returns a list that needs to be indexed. Once it's indexed, it works the same as if you had run the second option!
I try to get page source by using Selenium.
My code looks like below:
#!/usr/bin/env python
from selenium import webdriver
browser = webdriver.Firefox()
browser.get('https://python.org')
html_source = browser.page_source
print html_source
When I run the script, it opens browser but nothing happens. When I'm waiting without doing anything it throws "Connection refused", after about 15 seconds.
If I enter the address and go to the website, nothing happens too.
Why doesn't it work? Script looks good in my opinion and it should work
I'm doing it because I need to get page source after JS scripts are executed and I suspect that it can be done by Selenium.
Or maybe you know any other ways to get page source after JavaScript is loaded?
As per your question you have invoked get() method passing the argument as https://python.org. Instead you must have passed the argument as https://www.python.org/ as follows :
from selenium import webdriver
browser = webdriver.Firefox()
browser.get('https://www.python.org/')
html_source = browser.page_source
print (html_source)
Note : Ensure that you are using the latest Selenium-Python v3.8.0 clients, GeckoDriver v0.19.1 binary along with latest Firefox Quantum v57.x Web Browser.
I am new to python. I have a code in R that I am trying to replace with a python
script. I am running into issues getting python to select a value from a drop
down menu.
This is the code in R that worked:
remDr$findElement(using = 'xpath', "//select[#id = 'groupby1']/option[#value = 'ReportDate']")$clickElement()
This is the HTML code:
select style="" class="dropdown" name="groupby1" id="groupby1" accesskey="" waffle_affected_fields=""
option value="ReportData">Report Date</option>
here are a couple things I tried after searching how to do this in python and I
keep running into errors.
find_element_by_xpath("//select[#id='groupby1']/option[#value='ReportDate']").click()
NameError: name 'find_element_by_xpath' is not defined
Select(driver.find_element_by_css_selector("select#groupby1")).select_by_value('ReportDate').click()
NameError: name 'Select' is not defined
Any help is appropriated!
Select doesn't have click(). Use it like this
Select(driver.find_element_by_id('groupby1')).select_by_value('ReportDate')
# or by text
Select(driver.find_element_by_id('groupby1')).select_by_visible_text('ReportDate')
These functions are properties of your webdriver instance. You need to do something like this:
from selenium import webdriver
driver = webdriver.Firefox()
driver.get("http://www.python.org")
driver.find_element_by_xpath("//select[#id='groupby1']/option[#value='ReportDate']").click()
See the getting started page for examples.
I have problem with following task:
Open Google start page
Type request in search form
Choose result where url matches some given url(for example http://www.theguardian.com)
Currently i have this script:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Firefox()
driver.get("https://google.com/")
search_form = driver.find_element_by_xpath("/html/body/div/div[3]/form/div[2]/div[2]/div[1]/div[1]/div[3]/div/div[3]/div/input[1]")
search_form.send_keys("guardian")
search_form.send_keys(Keys.ENTER)
driver.find_element_by_xpath('//a[starts-with(#href,"http://www.theguardian.com")]').click()
It succesfully executes first 2 subtasks but when on last line throws exception:
selenium.common.exceptions.NoSuchElementException: Message: Unable to locate element: {"method":"xpath","selector":"//a[starts-with(#href,\"http://www.theguardian.com\")]"}
Also i have this script which satisfies only last subtask:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
q = "guardian"
browser = webdriver.Firefox()
body = browser.find_element_by_tag_name("body")
body.send_keys(Keys.CONTROL + 't')
browser.get("https://www.google.com/search?q=" + q + "&start=" + str(counter))
browser.find_element_by_xpath('//a[starts-with(#href,"http://www.theguardian.com")]').click()
I works OK. My question is why first script throws exception on how can i modify it so it opens search result as second script does?
UPDATE:
As Bart and Shubham mentioned in comments, problem was in that i was trying to find element on page that wasn't yet loaded. So solution is to use 'wait'.
Selenium-webdriver provides 2 types of 'wait' -- explicit and implicit more on that in documentation.
For my solution i used implicit wait. Basically, it's telling WebDriver to wait for certain amount of time to find an element if it's not immediately available.
For that i just added 1 line to script:
driver.implicitly_wait(5)
Probably this is because the elements the first version is not yet on the page. If you create a "wait until element is present" kind of loop (do not know if it exists by heart) then it should work.
The second example does work because browser.get only returns if the page is loaded.
You can do something like below ...
Try to put some wait from first place
The code is in java but it is very similar/near to python, take a reference from it
You can check everytime that your element is present or not in your HTML DOM to prevent from error/failer of script. like below:-
if (driver.findElements(By.xpath('//a[starts-with(#href,"http://www.theguardian.com')).size() != 0) {
YOUR FIRST Working code
System.out.println("element exists");
}
else{
Your second working code
}
Hope it will help you :)
In this video, you can see how it can be done:
https://www.youtube.com/watch?v=IUBwtLG9hbs
Move to 12:30 to see it in action: Go to google.com, type something in the search box and click on a search result.
I am scraping a website with a lot of javascript that is generated when the page is called. As a result, traditional web scraping methods (beautifulsoup, ect.) are not working for my purposes (at least I have been unsuccessful in getting them to work, all of the important data is in the javascript parts). As a result I have started using selenium webdriver. I need to scrape a few hundred pages, each of which has between 10 and 80 data points (each with about 12 fields), so it is important that this script (is that the right terminology?) can run for quite awhile without me having to babysit it.
I have the code working for a single page, and I have a controlling section that tells the scraping section what page to scrape. The problem is that sometimes the javascript portions of the page load, and sometimes they don't when they don't(~1/7), a refresh fixes things, but occasionally the refresh will freeze webdriver and thus the python runtime environment as well. Annoyingly, when it freezes like this, the code fails to time out. What is going on?
Here is a stripped down version of my code:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select
from selenium.common.exceptions import NoSuchElementException, TimeoutException
import time, re, random, csv
from collections import namedtuple
def main(url_full):
driver = webdriver.Firefox()
driver.implicitly_wait(15)
driver.set_page_load_timeout(30)
#create HealthPlan namedtuple
HealthPlan = namedtuple( "HealthPlan", ("State, County, FamType, Provider, PlanType, Tier,") +
(" Premium, Deductible, OoPM, PrimaryCareVisitCoPay, ER, HospitalStay,") +
(" GenericRx, PreferredPrescription, RxOoPM, MedicalDeduct, BrandDrugDeduct"))
#check whether the page has loaded and handle page load and time out errors
pageNotLoaded= bool(True)
while pageNotLoaded:
try:
driver.get(url_full)
time.sleep(6+ abs(random.normalvariate(1.8,3)))
except TimeoutException:
driver.quit()
time.sleep(3+ abs(random.normalvariate(1.8,3)))
driver.get(url_full)
time.sleep(6+ abs(random.normalvariate(1.8,3)))
# Handle page load error by testing presence of showAll,
# an important feature of the page, which only appears if everything else loads
try:
driver.find_element_by_xpath('//*[#id="showAll"]').text
# catch NoSuchElementException=>refresh page
except NoSuchElementException:
try:
driver.refresh()
# catch TimeoutException => quit and load the page
# in a new instance of firefox,
# I don't think the code ever gets here, because it freezes in the refresh
# and will not throw the timeout exception like I would like
except TimeoutException:
driver.quit()
time.sleep(3+ abs(random.normalvariate(1.8,3)))
driver.get(url_full)
time.sleep(6+ abs(random.normalvariate(1.8,3)))
pageNotLoaded= False
scrapePage() # this is a dummy function, everything from here down works fine,
I have looked extensively for similar problems, and I do not think anyone else has posted about this on so, or anywhere else that I have looked. I am using python 2.7, selenium 2.39.0 and I am trying to scrape Healthcare.gov 's get premium estimate's pages
EDIT: (as an example,this page) It may also be worth mentioning that the page fails to load completely more often when the computer has been on/ doing this for awhile (i'm guessing that the free ram is getting full, and it glitches while loading) this is kind of beside the point though, because this should be handled by the try/except.
EDIT2: I should also mention that this is being run on windows7 64bit, with firefox 17 (which I believe is the newest supported version)
Dude, time.sleep it's a fail!
What's this?
time.sleep(3+ abs(random.normalvariate(1.8,3)))
Try this:
class TestPy(unittest.TestCase):
def waits(self):
self.implicit_wait = 30
Or this:
(self.)driver.implicitly_wait(10)
Or this:
WebDriverWait(driver, 10).until(lambda driver: driver.find_element_by_xpath('some_xpath'))
Or, instead of driver.refresh() you can trick it :
driver.get(your url)
Also you can cick the cookie :
driver.delete_all_cookies()
scrapePage() # this is a dummy function, everything from here down works fine, :
http://scrapy.org