Cant fetch the content of some tabular data from a webpage

Cant fetch the content of some tabular data from a webpage - python

I've written Python 3 script which uses Selenium to extract data from a table within an IFrame from Rooster Resource. This table contains the MLB Schedule for 2018.
However, when the script is executed I receive the following error:
selenium.common.exceptions.TimeoutException:
when it reaches the line containing iframe within my script. Why is this the case?
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.get("link above")
wait = WebDriverWait(driver, 10)
wait.until(EC.frame_to_be_available_and_switch_to_it(wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "iframe#pageswitcher-content")))))
for items in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "table.waffle tr"))):
data = [item.text for item in items.find_element_by_css_selector("td")]
print(data)
driver.quit()
Btw, If you browse the above link you can see the table containing different colorful logos and texts
FYI, I don't wish to resuse the link within that iframe; rather, I wanna switch to it to get the data.

There are two nested iframes in that page to reach the content. Try this instead:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.get("above link")
wait = WebDriverWait(driver, 10)
wait.until(EC.frame_to_be_available_and_switch_to_it(wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "iframe")))))
wait.until(EC.frame_to_be_available_and_switch_to_it(wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "iframe#pageswitcher-content")))))
for items in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "table.waffle tr"))):
data = [item.text for item in items.find_elements_by_css_selector("td")]
print(data)
driver.quit()

Related

Selenium not printing inner text of div

I am using selenium to try to scrape data from a website (https://www.mergentarchives.com/), and I am attempting to get the innerText from this element:
<div class="x-paging-info" id="ext-gen200">Displaying reports 1 - 15 of 15</div>
This is my code so far:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.support.ui import WebDriverWait
driver = webdriver.Firefox()
driver.maximize_window()
search_url = 'https://www.mergentarchives.com/search.php'
driver.get(search_url)
assert 'Mergent' in driver.title
company_name_input = '//*[#id="ext-comp-1009"]'
search_button = '//*[#id="ext-gen287"]'
driver.implicitly_wait(10)
driver.find_element_by_xpath(company_name_input).send_keys('3com corp')
driver.find_element_by_xpath(search_button).click()
driver.implicitly_wait(20)
print(driver.find_element_by_css_selector('#ext-gen200').text)
basically I am just filling out a search form, which works, and its taking me to a search results page, where the number of results is listed in a div element. When I attempt to print the text of this element, I simply get a blank space, there is nothing written and no error.
[Finished in 21.1s]
What am I doing wrong?

I think you may need explicit Wait :
wait = WebDriverWait(driver, 10)
info = wait.until(EC.visibility_of_element_located((By.XPATH, "//div[#class = 'x-paging-info' and #id='ext-gen200']"))).get_attribute('innerHTML')
print(info)
Imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

You may need to put a condition by verifying if search results loaded or not and once its loaded you can use below code
print(driver.find_element_by_id('ext-gen200').text)

How to click on onclick link with image? Python Selenium

I'm just learning how to webscrape dynamically using Selenium in Python. I'm currently trying to click on a link within the webpage to page forward over search results.
So far this is the code that I'm using:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome('C:\\Users\\km13\\chromedriver.exe')
driver.get("http://www.congreso.gob.pe/pley-2016-2021")
elem = driver.find_element_by_css_selector("img[src='/Sicr/TraDocEstProc/CLProLey2016.nsf/8eac1ef603908b5105256cdf006c41b1/$Body/0.AB2?OpenElement&FieldElemFormat=gif']")
elem.click()
This is the HTML that corresponds with the element I'd like to click on:
`<img src="/Sicr/TraDocEstProc/CLProLey2016.nsf/8eac1ef603908b5105256cdf006c41b1/$Body/0.AB2?OpenElement&FieldElemFormat=gif" width="81" height="16" border="0">`
From my somewhat limited knowledge of HTML this seems like the link is actually embedded in the gif which is why I tried to use the CSS selector that goes along with that image. But this did not work.
Any guidance would be greatly appreciated!
Update:
I changed my code by adding the following import
from selenium.webdriver.common.by import By
And I changed the following:
elem = driver.find_element(By.CSS_SELECTOR, "img[src='/Sicr/TraDocEstProc/CLProLey2016.nsf/8eac1ef603908b5105256cdf006c41b1/$Body/0.AB2?OpenElement&FieldElemFormat=gif']")
elem.click()
Now I get an error for "no such element."

There is an iframe.You need to switch to iframe first to access the element.Try below code.use WebDriverWait to handle dynamic element.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome('C:\\Users\\km13\\chromedriver.exe')
driver.get("http://www.congreso.gob.pe/pley-2016-2021")
WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.NAME, 'ventana02')))
elem = WebDriverWait(driver, 10).until(EC.element_to_be_clickable(
(By.XPATH, "//a[contains(#onclick,'A50')]/img[contains(#src,'Sicr/TraDocEstProc/CLProLey')]")))
elem.click()
EDITED
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome('C:\\Users\\km13\\chromedriver.exe')
driver.get("http://www.congreso.gob.pe/pley-2016-2021")
driver.switch_to.frame(0)
elem=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,"//a[contains(#onclick,'A50')]/img[contains(#src,'Sicr/TraDocEstProc/CLProLey')]")))
elem.click()

Trouble parsing some text from a webpage

I've written a script in python using selenium to scrape some text out of a webpage. The text I wanna scrape generates upon filling in an inputbox. My script can fill in in the right way and can populate the value. However, it can't parse the text. How can I do it?
This is what I've tried so far:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
wait = WebDriverWait(driver, 10)
driver.get("http://dev.delshlearning.com.au/test.php")
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR,"#AM"))).send_keys("`(2(3^5-sqrt(3)))/2`",Keys.RETURN)
item = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR,"#MQ"))).text
print(item)
driver.quit()
The send_keys() parameter is already filled in within the script for your consideration.

It is storing the text values in the attribute value. This should work:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
wait = WebDriverWait(driver, 10)
driver.get("http://dev.delshlearning.com.au/test.php")
# I changed your locater to ID since it's a little more clear
wait.until(EC.visibility_of_element_located((By.ID,"AM"))).send_keys("`(2(3^5-sqrt(3)))/2`",Keys.RETURN)
item = wait.until(EC.visibility_of_element_located((By.ID,"MQ"))).get_attribute('value')
print(item)
driver.quit()
I found it by going to the element's properties as shown highlighted here:

Facing issues while clicking on some links in a webpage

I've written a script in python to click on some categories in a webpage. I could manage to click on the first two categories but got stuck when it comes to initiate the final click. I've given a link leading to the two images in I have marked where to click.
This is the first link where there is a sign (marked with pencil) to click on to enter the second portion.
This is the second link where I get stuck when I try to click on the names (I've marked those names with pencil)
This is the site link.
Script I've tried with so far:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
wait = WebDriverWait(driver, 10)
driver.get("replace_with_above_link")
wait.until(EC.element_to_be_clickable((By.CLASS_NAME, "i4ewOd-pzNkMb-ornU0b-b0t70b-Bz112c"))).click()
post = wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div[role='checkbox']")))[1]
post.click()
for item in wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR,".HzV7m-pbTTYe-JNdkSc .suEOdc"))):
item.click()
driver.quit()
My intention is to click the names cyclically. Thanks in advance.

Try below code to click each item in list:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
wait = WebDriverWait(driver, 10)
driver.get(URL)
wait.until(EC.element_to_be_clickable((By.CLASS_NAME, "i4ewOd-pzNkMb-ornU0b-b0t70b-Bz112c"))).click()
post = wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div[role='checkbox']")))[1]
post.click()
for item in wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR,".HzV7m-pbTTYe-JNdkSc .suEOdc")))[1:]:
item.click()
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, ".HzV7m-tJHJj-LgbsSe-Bz112c.qqvbed-a4fUwd-LgbsSe-Bz112c"))).click()
wait.until(EC.invisibility_of_element_located((By.CSS_SELECTOR, ".qqvbed-p83tee")))
driver.quit()

python dynamic webscraping javascript content

I am using Python and Selenium to scrape a website. What I do is go to the homepage, type in a keyword, such as 1300746-79-5. On the resulting page, I am trying to scrape the data in the "pricing" section. Specifically, I need to get the "SKU-Pack Size" and "Price(USD)" information. But these information is Javascript encripted, so I cannot see them in the source code. I am wondering how I can achieve this.
I have written some code that gets me to the page of interest, but I still cannot see the javascript information. Here is what I have so far.
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pprint
# Create a new instance of the Firefox driver
driver = webdriver.Chrome('C:\Users\Rei\Desktop\chromedriver.exe')
driver.get("http://www.sigmaaldrich.com/united-states.html")
print driver.title
inputElement = driver.find_element_by_name("Query")
# type in the search
inputElement.send_keys("1300746-79-5")
inputElement.submit()

Everything you have done looks correct to me.
"SKU-Pack Size" and "Price(USD)" information are not "encrypted", but retrieved after JavaScript clicking action. All you need to do is to click product name or pricing link.
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pprint
driver = webdriver.Chrome()
driver.get("http://www.sigmaaldrich.com/united-states.html")
print driver.title
inputElement = driver.find_element_by_name("Query")
# type in the search
inputElement.send_keys("1300746-79-5")
inputElement.submit()
pricing_link = driver.find_element_by_css_selector("li.priceValue a")
print pricing_link.text
pricing_link.click()
# then deal with the data you want
price_table = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CSS_SELECTOR, ".priceAvailContainer tbody"))
)
print 'price_table.text: ' + price_table.text
driver.quit()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Cant fetch the content of some tabular data from a webpage - python

Related

Selenium not printing inner text of div

How to click on onclick link with image? Python Selenium

Trouble parsing some text from a webpage

Facing issues while clicking on some links in a webpage

python dynamic webscraping javascript content

Categories

Resources