Selenium - Google Travel Scraping Price History missing

Selenium - Google Travel Scraping Price History missing - python

I am returning html with this python script but it doesn't return price history (see screenshot). Using non-selenium browser does return html with the prices (even without expending this section by simple regex); chrome/safari/firefox all do, incognito as well.
from selenium import webdriver
import time
url = 'https://www.google.com/flights?hl=en#flt=SFO.JFK.2021-06-01*JFK.SFO.2021-06-07'
options = webdriver.ChromeOptions()
driver = webdriver.Chrome(options=options)
driver.get(url)
time.sleep(10)
html = driver.page_source
print(html)
driver.quit()
I can't quite pinpoint if it's some setting in chromedriver. It is possible to do because there is a 3rd party scraper that currently returns this data.
Tried this to no avail. Can a website detect when you are using Selenium with chromedriver?
Any thoughts appreciated.

After I added chrome_options.add_argument("--disable-blink-features=AutomationControlled") I started to see this block. Not sure why it is not always loaded.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.chrome.options import Options
url = 'https://www.google.com/flights?hl=en#flt=SFO.JFK.2021-06-01*JFK.SFO.2021-06-07'
chrome_options = Options()
chrome_options.add_argument("start-maximized")
chrome_options.add_argument("--disable-blink-features=AutomationControlled")
driver = webdriver.Chrome(executable_path='/snap/bin/chromium.chromedriver', chrome_options=chrome_options)
driver.get(url)
# wait = WebDriverWait(driver, 20)
# wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".EA71Tc.q7Eewe")))
time.sleep(10)
history = driver.find_element_by_css_selector(".EA71Tc.q7Eewe").get_attribute("innerHTML")
print(history)
Here the full block is returned, including all tag names. As you see, I tried explicit waits, but this block was not visible. Experiment with adding another explicit wait.

Related

Unable to locate element on a webpage using xpath in selenium (python)

I am trying to get some data from a website called : https://dexscreener.com/ethereum/0x1a89ae3ba4f9a97b10bac6a77061f00bb956858b
and i'm trying to get the element : /html/body/div[1]/div/main/div/div[2]/div/div[2]/div/div/div[1]/div[4]/div[2]/div[1]/div[1]/div[2]/span[2] which is basically a number on the webpage representing volume.
i used this code here:
driver.get('https://dexscreener.com/ethereum/' + str(tokenadress))
try:
fivemVolume = WebDriverWait(driver, delay).until(EC.presence_of_element_located(
(By.XPATH, '/html/body/div[1]/div/main/div/div[2]/div/div[2]/div/div/div[1]/div[4]/div[2]/div[1]/div[1]/div[2]/span[2]')))
except:
#more codee
I think its something to do with the webpage loading into some iframe as a default but when i added this code it didn't help:
driver.switch_to.default_content()

Your locator do not match any element on that page.
Elements you trying to access are inside iframe.
So, you need first to switch into the iframe.
The following code should work but I had problems running Selenium on that page since it is blocked by cloudflare:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
options = Options()
options.add_argument("start-maximized")
webdriver_service = Service('C:\webdrivers\chromedriver.exe')
driver = webdriver.Chrome(options=options, service=webdriver_service)
wait = WebDriverWait(driver, 30)
url = "https://dexscreener.com/ethereum/0x1a89ae3ba4f9a97b10bac6a77061f00bb956858b"
driver.get(url)
wait.until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR, "iframe[id*='tradingview']")))
value = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "[data-name='legend-source-item'] [class='valueItem-1WIwNaDF'] .valueValue-1WIwNaDF"))).text
print(value)

Click button with Selenium Python works only with debug mode

I tried to get data from the menu in this website. I've written this script which seems work well in debug mode. In this script, i close the chrome every time I've get information for certain city.
In debug mode, the detail information is shown when 'element' is clicked by the script. However, if i run the script, it seems that it doesn't do anything after city information is sent. The button 'Visualize Results' is enabled in the website when city data is entered, but the detail information that supposed to be shown after clicking this button by the script is not shown as if it is not clicked. Is there something that i miss?
thank you in advance.
for city in Cities:
chrome_options = webdriver.ChromeOptions()
chrome_options.add_experimental_option('prefs', prefs)
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get(link)
driver.find_element(By.ID, "inputText").send_keys(city)
driver.find_element(By.ID, 'btninputText').click()
element = wait(driver, 5).until(EC.presence_of_element_located((By.XPATH,div[#id="btviewPVGridGraph"]'))).click()
driver.close()

You need to wait for clickability of all 3 elements you accessing here. Presence is not enough.
Need to add a short delay between clicking the search button and clicking visualization button
The following code works:
import time
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
options = Options()
options.add_argument("start-maximized")
webdriver_service = Service('C:\webdrivers\chromedriver.exe')
driver = webdriver.Chrome(options=options, service=webdriver_service)
wait = WebDriverWait(driver, 30)
url = "https://re.jrc.ec.europa.eu/pvg_tools/en/"
driver.get(url)
wait.until(EC.element_to_be_clickable((By.ID, "inputText"))).send_keys("Paris")
wait.until(EC.element_to_be_clickable((By.ID, "btninputText"))).click()
time.sleep(2)
wait.until(EC.element_to_be_clickable((By.ID, "btviewPVGridGraph"))).click()
Th result is:

Try below xpath
wait(driver, 5).until(EC.presence_of_element_located(By.XPATH,'//div[#id="btviewPVGridGraph"]')).click()

Issues with Selenium Click on Link

I'm having a strange issue with Python Selenium when trying to click on the next page link in Curry's PC World Televisions page. See the next button on (https://www.currys.co.uk/tv-and-audio/televisions/tvs)
I am testing in non-headless mode with chrome, so i can see the outline of the next button being highlighted, when actioning the click, but it is not navigating to the next page url as expected.
Here is what i have tried on the page above:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as ec
from selenium.common.exceptions import NoSuchElementException
import time
chrome_options = Options()
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=chrome_options)
driver.set_window_size(1920, 1080)
driver.maximize_window()
driver.get('https://www.currys.co.uk/tv-and-audio/televisions/tvs')
next_page_link = WebDriverWait(driver, 10).until(
ec.element_to_be_clickable((By.CSS_SELECTOR,
'ul.pagination '
'li.pagination-arrrow-next a.page-next')))
next_page_link.click()
time.sleep(10)
I have also tried using execute script, (which does not work either):
driver.execute_script('arguments[0].click()', next_page_link)
I am able to extract the href from the link element (so i know its the right element):
href = next_page_link.get_attribute('href')
print("href is: {}".format(href))
# href is: https://www.currys.co.uk/tv-and-audio/televisions/tvs?start=20&sz=20
I'm a bit lost as to what the issue might be here, would appreciate any help!

I see the following issues here:
You have to close the accept cookies pop-up.
You need to scroll the page down to the pager elements.
Selenium indeed doesn't make the page to go to the next page BUT even when I click the Next button manually or even any other pagination button there manually the page is not changed. So, the problem is with the web page itself, not with Selenium.
This is the code I used:
import time
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
options = Options()
options.add_argument("start-maximized")
webdriver_service = Service('C:\webdrivers\chromedriver.exe')
driver = webdriver.Chrome(service=webdriver_service, options=options)
url = 'https://www.currys.co.uk/tv-and-audio/televisions/tvs'
driver.get(url)
wait = WebDriverWait(driver, 20)
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button#onetrust-accept-btn-handler"))).click()
next_page_link = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR,'ul.pagination li.pagination-arrrow-next a.page-next')))
next_page_link.location_once_scrolled_into_view
time.sleep(1)
next_page_link.click()

How to get a full-page screenshot in Python using Selenium and Screenshot

I'm trying to get a full-length screenshot and haven't been able to make it work. Here's the code I'm using:
from Screenshot import Screenshot
from selenium import webdriver
import time
ob = Screenshot.Screenshot()
driver = webdriver.Chrome()
driver.maximize_window()
driver.implicitly_wait(10)
url = "https://stackoverflow.com/questions/73298355/how-to-remove-duplicate-values-in-one-column-but-keep-the-rows-pandas"
driver.get(url)
img_url = ob.full_Screenshot(driver, save_path=r'.', image_name='example.png')
print(img_url)
driver.quit()
But this gives us a clipped screenshot:
So as you can see that's just what the driver window is showing, not a full-length screenshot. How can I tweak this code to get what I'm looking for?

Here is an example of how you can take full <body> screenshot of a page:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time as t
chrome_options = Options()
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument('disable-notifications')
chrome_options.add_argument("window-size=1280,720")
webdriver_service = Service("chromedriver/chromedriver") ## path to where you saved chromedriver binary
browser = webdriver.Chrome(service=webdriver_service, options=chrome_options)
url = 'https://stackoverflow.com/questions/7263824/get-html-source-of-webelement-in-selenium-webdriver-using-python?rq=1'
browser.get(url)
required_width = browser.execute_script('return document.body.parentNode.scrollWidth')
required_height = browser.execute_script('return document.body.parentNode.scrollHeight')
browser.set_window_size(required_width, required_height)
t.sleep(5)
browser.execute_script("window.scrollTo(0,document.body.scrollHeight);")
required_width = browser.execute_script('return document.body.parentNode.scrollWidth')
required_height = browser.execute_script('return document.body.parentNode.scrollHeight')
browser.set_window_size(required_width, required_height)
t.sleep(1)
body_el = WebDriverWait(browser,10).until(EC.element_to_be_clickable((By.TAG_NAME, "body")))
body_el.screenshot('full_page_screenshot.png')
print('took full screenshot!')
t.sleep(1)
browser.quit()
Selenium setup is for linux, but just note the imports, and the part after defining the browser. Code above is starting from a small window, then it maximizes it to fit in the full page body, then it waits a bit and computes the body size again, just to account for some scripts kicking in on user's input. Then it takes the screenshot - tested and working on a really long page.

To get a full-page screenshot using Selenium-Python clients you can use the GeckoDriver and firefox based save_full_page_screenshot() method as follows:
Code:
driver = webdriver.Firefox(service=s, options=options)
driver.get('https://stackoverflow.com/questions/73298355/how-to-remove-duplicate-values-in-one-column-but-keep-the-rows-pandas')
driver.save_full_page_screenshot('fullpage_gecko_firefox.png')
driver.quit()
Screenshot:
tl; dr
[py] Adding full page screenshot feature for Firefox

Python, selenium find_element_by_link_text not working

I am trying to scrape a website. Where in I have to press a link. for this purpose, I am using selenium library with chrome drive.
from selenium import webdriver
url = 'https://sjobs.brassring.com/TGnewUI/Search/Home/Home?partnerid=25222&siteid=5011&noback=1&fromSM=true#Applications'
browser = webdriver.Chrome()
browser.get(url)
time.sleep(3)
link = browser.find_element_by_link_text("Don't have an account yet?")
link.click()
But it is not working. Any ideas why it is not working? Is there a workaround?

You can get it done in several ways. Here is one of such. I've used driver.execute_script() command to force the clicking. You should not go for hardcoded delay as they are very inconsistent.
Modified script:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.webdriver.support import expected_conditions as EC
url = 'https://sjobs.brassring.com/TGnewUI/Search/Home/Home?partnerid=25222&siteid=5011&noback=1&fromSM=true#Applications'
driver = webdriver.Chrome()
driver.get(url)
item = wait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, "a[ng-click='newAccntScreen()']")))
driver.execute_script("arguments[0].click();",item)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Selenium - Google Travel Scraping Price History missing - python

Related

Unable to locate element on a webpage using xpath in selenium (python)

Click button with Selenium Python works only with debug mode

Issues with Selenium Click on Link

How to get a full-page screenshot in Python using Selenium and Screenshot

Python, selenium find_element_by_link_text not working

Categories

Resources