using pd.read_html to read current page - python

I'm trying to use pd.read_html() to read the current page I'm trying to scrape using Selenium.
The only problem is the web page does not contain a table until you press a few buttons using selenium button click and then the table is displayed.
So when I input an argument:
pd.read_html('html_string')
It gives me an error.
Is there a way to read in the current page after the buttons have been clicked and not just putting in the html string as an argument?
I've also looked at the documentation for this and could not find anything to help.
Thanks for reading/answering

I would try to pass a page source instead of an address when the source is updated:
url = ...
button_id = ...
driver.get(url)
button = driver.find_element(by=button_id)
button.click()
... # wait?
data = pd.read_html(driver.page_source)

Related

Selenium can not move to next page

I try to crawl data from a dynamic web using selenium. It required an account to log in, and I must click on some link to forward to information page. After doing all these steps, I found that the source code is not changed and I can not get element that exist on new page. In otherhands, I get direct to this page, and do login but source code I get is parent pages. Can you explain to me why and how to tackle this problem?
how I perform click action
element = driver.find_element(By.CLASS_NAME, "class_name")
element2 = element.find_element(By.CSS_SELECTOR, "css_element")
element2.click()
how I get source code:
page_source = driver.execute_script("return document.body.outerHTML")
with open('a.html', 'w') as f:
f.write(page_source)

Pagination with selenium and python

I'm trying to scrape using selenium and python, the web page have a paginator in javascript, when I click in the button, I can see that the content reload but when I try to get the new table information it's the same old table info, selenium doesn't noticed that de DOM info has changed, I'm aware about the stale DOM, I'm just looking for the best path to solve this problem
for link in source.find_all('div', {'class': 'company-row d-flex'}):
print(link.a.text, link.small.text, link.find('div', {'class': 'col-2'}).text)
# Next button (I´ll make an iterator)
driver.find_element_by_xpath('//a[#href="hrefcurrentpage=2"]').click()
# Tried this and doesn't work
# time.sleep(5)
# Here the table change but get the same old info
for link in source.find_all('div', {'class': 'company-row d-flex'}):
print(link.a.text, link.small.text, link.find('div', {'class': 'col-2'}).text) ```
I think you are getting the same data after opening the next page even after delay since you are getting your data from the existing source.
So, you should re-read, reload the source after clicking the pagination, possibly with some delay.

How to perform data fetch on button click on a built in html page using selenium

I am new to Selenium and I am trying to mimic user actions on a site to fetch data from a built in html page on button click. I am able to populate all the field details, but button click is not working, it looks like js code not running.
I tried many options like adding wait time, Action chain etc but it didnt work, i am providing site and code i have written.
driver = webdriver.Chrome()
driver.get("https://www1.nseindia.com/products/content/derivatives/equities/historical_fo.htm")
driver.implicitly_wait(10)
assigned values to all the other fields
driver.find_element_by_id('rdDateToDate').click()
Dfrom = driver.find_element_by_id('fromDate')
Dfrom.send_keys("02-Oct-2020")
Dto = driver.find_element_by_id('toDate')
Dto.send_keys("08-Oct-2020")
innerHTML = driver.execute_script("document.ready")
sleep(5)
getdata_btn = driver.find_element_by_id('getButton')
ActionChains(driver).move_to_element(getdata_btn).click().click().perform()
I recommend using a full xpath.
chrome.get("https://www1.nseindia.com/products/content/derivatives/equities/historical_fo.htm")
time.sleep(2)
print("click")
fullxpath = "/html/body/div[2]/div[3]/div[2]/div[1]/div[3]/div/div[1]/div/form/div[19]/p[2]/input"
chrome.find_element_by_xpath(fullxpath).click()
I have tried the button clicking and it worked with XPath ... I though its because of someone used the ID twice on a website, but I can not find it ... so i have no idea whats going wrong there ...
Good luck :)

Python selenium select modal

I am trying to get the google map embed url using selenium. I am able to click the share button and the page shows a modal with share url and an embed url. However i am un able to switch the dialog box.
Here is my code
browser.get('https://www.google.com/maps/place/%s?hl=en'%(code))
time.sleep(3)
share_class = "ripple-container"
buttons = browser.find_elements_by_class_name(share_class)
for but in buttons:
x = but.text
if x == 'SHARE':
but.click()
modal = browser.switch_to.active_element
share = modal.find_element_by_id("modal-dialog")
print(share.text)
here is the image.
You don't need to switch to the modal dialog, you can access it just like you would any other HTML on the page. You can simplify your code to
browser.get('https://www.google.com/maps/place/%s?hl=en'%(code))
browser.find_element_by_xpath("//button/div[.='SHARE']").click()
url = browser.find_element_by_id("last-focusable-in-modal").text
print(url)
But... if you read the dialog, you will see that it states
You can also copy the link from your browser's address bar.
so the URL you are navigating to in the first line is what you are going to copy from the Share link so there's really no point. You already have the URL.

Python splinter cant click on element by css on page

I am trying to automate a booking in process on a travel site using
splinter and having trouble clicking on a css element on the page.
This is my code
import splinter
import time
secret_deals_email = {
'user[email]': 'adf#sad.com'
}
browser = splinter.Browser()
url = 'http://roomer-qa-1.herokuapp.com'
browser.visit(url)
click_FIND_ROOMS = browser.find_by_css('.blue-btn').first.click()
time.sleep(10)
# click_Book_button = browser.find_by_css('.book-button-row.blue-btn').first.click()
browser.fill_form(secret_deals_email)
click_get_secret_deals = browser.find_by_name('button').first.click()
time.sleep(10)
click_book_first_room_list = browser.find_by_css('.book-button-row-link').first.click()
time.sleep(5)
click_book_button_entry = browser.find_by_css('.entry-white-box.entry_box_no_refund').first.click()
The problem is whenever I run it and the code gets to the page where I need to click the sort of purchase I would like. I can't click any of the option on the page.
I keep getting an error of the element not existing no matter what should I do.
http://roomer-qa-1.herokuapp.com/hotels/atlanta-hotels/ramada-plaza-atlanta-downtown-capitol-park.h30129/44389932?rate_plan_id=1&rate_plan_token=6b5aad6e9b357a3d9ff4b31acb73c620&
This is the link to the page that is causing me trouble please help :).
You need to whait until the element is present at the website. You can use the is_element_not_present_by_css method with a while loop to do that
while not(is_element_not_present_by_css('.entry-white-box.entry_box_no_refund')):
time.sleep(50)

Categories

Resources