I'm trying to scrape Chinese economic data from an official website, but I keep getting an Element Not Found exception on the last line here. I've scoured stackoverflow and have tried adding implicitly_wait and switching the problem line from xpath to ID, but nothing has worked. Any thoughts?
from selenium import webdriver
FAI = []
FAIinfra = []
FAIestate = []
path_to_chromedriver = '/Users/cargillsk/Downloads/chromedriver'
browser = webdriver.Chrome(executable_path = path_to_chromedriver)
browser.implicitly_wait(30)
url = 'http://www.cqdata.gov.cn/easyquery.htm?cn=A0101'
browser.get(url)
browser.find_element_by_id('treeZhiBiao_4').click()
browser.find_element_by_xpath('//*
[#id="mySelect_sj"]/div[2]/div[1]').click()
browser.find_element_by_xpath('//*
[#id="mySelect_sj"]/div[2]/div[2]/div[3]/input').clear()
browser.find_element_by_xpath('//*
[#id="mySelect_sj"]/div[2]/div[2]/div[3]/input').send_keys('last100')
browser.find_element_by_xpath('//*
[#id="mySelect_sj"]/div[2]/div[2]/div[3]/div[1]').click()
FAIinitial = browser.find_element_by_xpath('//*[#id="main-container"]/div[2]/div[2]/div[2]/div/div[2]/table/thead/tr/th[2]/strong').text
for i in range(2,102):
i = str(i)
FAI.append(browser.find_element_by_xpath('//*[#id="table_main"]/tbody/tr[1]/td[%s]' % i).text)
FAIinfra.append(browser.find_element_by_xpath('//*[#id="table_main"]/tbody/tr[4]/td[%s]' % i).text)
FAIestate.append(browser.find_element_by_xpath('//*[#id="table_main"]/tbody/tr[55]/td[%s]' % i).text)
browser.find_element_by_id("treeZhiBiao_3").click()
browser.find_element_by_id("treeZhiBiao_14").click()
So... the implicit wait is not your issue. Looking through the websites code I found that there is no "treeZhiBiao_14", so I'm not sure what your trying to click here. Maybe try using something like this instead so you know what your clicking.
browser.find_element_by_xpath("//*[contains(text(), '工业')]").click()
or
browser.find_element_by_xpath("//*[contains(text(), 'industry')]").click()
Related
I'm starting web scraping and followed tutorials. Yet in this code I get a "nameError: name 'avail' is not defined". I guess it's really easy, but how could I fix this ? (Error is probably in the for loop at line 15 in avail = i.text())
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome('/Users/victorfichtner/Downloads/Chromedriver')
driver.get('https://www.myntra.com/smart-watches/boat/boat-unisex-black-storm-m-
smart-watch/13471916/buy')
a = driver.find_elements_by_xpath("//*[#class='pdp-add-to-bag pdp-button pdp-flex
pdp-center']")
for i in a :
avail = i.text()
driver.quit()
print(avail)
Things to be noted.
find_elements return a list, where as find_element return a single web element.
Xpath is brittle.
Use explicit waits for dynamic loading.
It is .text in Python not .text()
Sample code :
driver = webdriver.Chrome('/Users/victorfichtner/Downloads/Chromedriver')
driver.maximize_window()
driver.implicitly_wait(50)
driver.get('https://www.myntra.com/smart-watches/boat/boat-unisex-black-storm-m- smart-watch/13471916/buy')
a = driver.find_elements_by_xpath("//*[contains(#class,'pdp-add-to-bag pdp-button pdp-flex')]")
avail = ""
for i in a :
avail = i.text
driver.quit()
print(avail)
Output :
ADD TO BAG
I'm having an issue with my Python code. The intension is to use Selenium to open up the website (craigslist), search a text (Honda) then scrape three pages of this site. I keep getting the
"StaleElementReferenceException: stale element reference: element is not attached to the page document" exception
when the iteration reaches the second page. I cant exactly tell why its stopping at the second page and not clicking the "next" button once more to reach the third page then finally scraping the data and printing it.
This is my code:
import time
from selenium import webdriver
from bs4 import BeautifulSoup
DRIVER_PATH = "/Users/mouradsal/Downloads/DataSets Python/chromedriver"
URL = "https://vancouver.craigslist.org/"
browser = webdriver.Chrome(DRIVER_PATH)
browser.get(URL)
browser.maximize_window()
time.sleep(4)
search = browser.find_element_by_css_selector("#query")
search.send_keys("Honda")
search.send_keys(u'\ue007')
content = browser.find_elements_by_css_selector(".hdrlnk")
button = browser.find_element_by_css_selector(".next")
for i in range(0,3):
button.click()
print("Count: "+ str(i))
time.sleep(10)
print("done loop ")
for e in content:
start = e.get_attribute("innerHTML")
soup = BeautifulSoup(start, features=("lxml"))
print(soup.get_text())
print("***************************")
Any suggestions would be greatly appreciated!
Thanks
for i in range(0,3):
button = driver.find_element_by_css_selector(".next")
button.click()
print("Count: "+ str(i))
time.sleep(10)
You need to nest your finding of elements cause webelements change every time you get to a new page.
I am trying to webscrape indeed.com to search for jobs using python, with selenium and beautifulsoup. I want to click next page but cant seem to figure out how to do this. Looked at many threads but it is unclear to me which element I am supposed to perform on. Here is the web page html and the code marked with grey comes up when I inspect the next button.
Also just to mention I tried first to follow what happens to the url when mousedown is executed. After reading the addppurlparam function and adding the strings in the function and using that url I just get thrown back to page one.
Here is my code for the class with selenium meant to click on the button:
from selenium import webdriver
from selenium.webdriver import ActionChains
driver = webdriver.Chrome("C:/Users/alleballe/Downloads/chromedriver.exe")
driver.get("https://se.indeed.com/Internship-jobb")
print(driver.title)
#assert "Python" in driver.title
elem = driver.find_element_by_class_name("pagination-list")
elem = elem.find_element_by_xpath("//li/a[#aria-label='Nästa']")
print(elem)
assert "No results found." not in driver.page_source
assert elem
action = ActionChains(driver).click(elem)
action.perform()
print(elem)
driver.close()
The indeed site is formatted so that it shows 10 per page.
Your photo shows the wrong section of HTML instead you can see the links contain start=0 for the first page, start=10 for the second, start=20 for the third,...
You could use this knowledge to do a code like this:
while True:
i = 0
driver.get(f'https://se.indeed.com/jobs?q=Internship&start={i}')
# code here
i = i + 10
But, to directly answer to your question you should do:
next_page_link = driver.find_element_by_xpath('/html/head/link[6]')
driver.get(next_page_link)
This will find the link and then get it.
its work. paginated to next page.
driver.find_element_by_class_name("pagination-list").find_element_by_tag_name('a').click()
I'm new in python and i try to crawl a whole website recursive with selenium.
I would like to do this with selenium because i want get all cookies which the website is used. I know that other tools can crawl a website easier and faster but other tools can't give me all cookies (first and third party).
Here my code:
from selenium import webdriver
import os, shutil
url = "http://example.com/"
links = set()
def crawl(start_link):
driver.get(start_link)
elements = driver.find_elements_by_tag_name("a")
urls_to_visit = set()
for el in elements:
urls_to_visit.add(el.get_attribute('href'))
for el in urls_to_visit:
if url in el:
if el not in links:
links.add(el)
crawl(el)
else:
return
dir_name = "userdir"
if os.path.isdir(dir_name):
shutil.rmtree(dir_name)
co = webdriver.ChromeOptions()
co.add_argument("--user-data-dir=userdir")
driver = webdriver.Chrome(options = co)
crawl(url)
print(links)
driver.close();
My problem is that the crawl function not open all pages from the website apparently. On some websites i can navigate to pages by hand that the function not reached. Why?
One thing I have noticed while using webdriver is that it needs time to load the page, the elements are not instantly available just like in a regular browser.
You may want to add some delays, or a loop to check for some type of footer to indicate that the page is loaded and you can start crawling.
I'm trying to extract data from the link below using selenium via python:
www.oanda.com
But I'm getting an error that, "Unable to Locate an Element". In browser console i tried using this Css selector:
document.querySelector('div.position.short-position.style-scope.position-ratios-app')
This querySelector returns me the data for short percentage of 1st row in the browser console(for this test), but when i used this selector in the python script below it gives me an error that, "Unable to Locate element" or sometimes empty sctring.
Please suggest me solution if there's any.Will be grateful, thanks :)
# All Imports
import time
from selenium import webdriver
#will return driver
def getDriver():
driver = webdriver.Chrome()
time.sleep(3)
return driver
def getshortPercentages(driver):
shortPercentages = []
shortList = driver.find_elements_by_css_selector('div.position.short-position.style-scope.position-ratios-app')
for elem in shortList:
shortPercentages.append(elem.text)
return shortPercentages
def getData(url):
driver = getDriver()
driver.get(url)
time.sleep(5)
# pagesource = driver.page_source
# print("Page Source: ", pagesource)
shortList = getshortPercentages(driver)
print("Returned source from selector: ", shortList)
if __name__ == '__main__':
url = "https://www.oanda.com/forex-trading/analysis/open-position-ratios"
getData(url)
Required data is located inside an iframe, so you need to switch to iframe before handling elements:
driver.switch_to.frame(driver.find_element_by_class_name('position-ratios-iframe'))
Also note that data inside iframe is dynamic, so make sure that you're using Implicit/Explicit wait (using time.sleep(5) IMHO is not the best solution)