I'm trying to take screenshots of different elements (on different sections) of a long page. The first screenshot I take is near the top of the page and comes out not chopped off:
However, once selenium has to scroll to take a screenshot of an element, the screenshot comes out chopped off:
I'm pretty sure this is happening because selenium isn't scrolling far enough for the entire element to be exposed, and thus takes a screenshot that comes out incomplete – but I don't know how to solve the problem. Any help would be greatly appreciated. Below is my code so far (screen-shotting the element happens in the last few lines of code):
from time import sleep
from os import chdir
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
option = webdriver.ChromeOptions()
option.add_experimental_option("excludeSwitches", ["enable-automation"])
option.add_experimental_option('useAutomationExtension', False)
option.add_argument("--disable-infobars")
option.add_argument("start-maximized")
option.add_argument("--disable-extensions")
option.add_experimental_option("detach", True)
option.add_experimental_option("prefs", {
"profile.default_content_setting_values.notifications": 2
})
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()),options=option)
driver.get("https://www.reddit.com/r/AskReddit/")
#Show top reddit posts on the subreddit
topButton = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "/html/body/div[1]/div/div[2]/div[2]/div/div/div/div[2]/div[4]/div[1]/div[1]/div[2]/a[3]")))
topButton.click()
#Topic
topic = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, '/html/body/div[1]/div/div[2]/div[2]/div/div/div/div[2]/div[4]/div[1]/div[4]/div[2]')))
topic.click()
sleep(1)
topicText = driver.find_element(By.XPATH, '/html/body/div[1]/div/div[2]/div[3]/div/div/div/div[2]/div[1]/div[2]/div[1]/div/div[3]/div[1]/div/h1').text
topicPicture = driver.find_element(By.XPATH, '/html/body/div[1]/div/div[2]/div[3]/div/div/div/div[2]/div[1]/div[2]/div[1]/div')
chdir(r'C:\Users\jack_l\Documents\PLAYING_WITH_REDDIT\currentVideo')
topicPicture.screenshot('topic.png')
#Comments
comment = ''
verified = 0
counter4 = 0
while(verified <= 10):
try:
comment = driver.find_element(By.XPATH, '/html/body/div[1]/div/div[2]/div[3]/div/div/div/div[2]/div[1]/div[2]/div[5]/div/div/div/div['+str(counter4)+']')
except Exception:
counter4 = counter4+1
else:
if 'level 1' in comment.text:
comment.screenshot('comment '+str(verified)+'.png')
verified = verified + 1
counter4 = counter4+1
Add this line in the try block after the definition of comment. What it does is simply to scroll down in such a way that the webelement comment is at the center of the page. This way it takes a full screenshot
try:
comment = driver.find_element(By.XPATH, '...')
driver.execute_script('arguments[0].scrollIntoView({block: "center"});', comment)
except:
...
Related
I keep getting the ElementClickInterceptedException on this script I'm writing, I'm supposed to click a link that will open a new window, scrape from the new window and close it and move to the next link to scrape, but it just won't work, it gives the error after max 3 link clicks. I saw a similar question here and I tried using wait.until(EC.element_to_be_clickable()) and also maximized my screen but still did not work for me. Here is the site I am scraping from trying to scrape all the games for each day and here is a chunk of the code I'm using
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.firefox.options import Options as FirefoxOptions
from selenium.common.exceptions import TimeoutException, NoSuchElementException, ElementNotInteractableException, StaleElementReferenceException
from time import sleep
l = "https://www.flashscore.com/"
options = FirefoxOptions()
#options.add_argument("--headless")
driver = webdriver.Firefox(executable_path="geckodriver.exe",
firefox_options=options)
driver.install_addon('C:\\Windows\\adblock_plus-3.10.1-an+fx.xpi')
driver.maximize_window()
driver.get(l)
driver.implicitly_wait(5)
cnt = 0
sleep(5)
wait = WebDriverWait(driver, 20)
a = driver.window_handles[0]
b = driver.window_handles[1]
driver.switch_to.window(a)
# Close Adblock tab
if 'Adblock' in driver.title:
driver.close()
driver.switch_to.window(a)
else:
driver.switch_to.window(b)
driver.close()
driver.switch_to.window(a)
var1 = driver.find_elements_by_xpath("//div[#class='leagues--live ']/div/div")
knt = 0
for i in range(len(var1)):
if (var1[i].get_attribute("id")):
knt += 1
#sleep(2)
#driver.switch_to.window(driver.window_handles)
var1[i].click()
sleep(2)
#var2 = wait.until(EC.visibility_of_element_located((By.XPATH, "//div[contains(#classs, 'event__match event__match--last event__match--twoLine')]")))
print(len(driver.window_handles))
driver.switch_to.window(driver.window_handles[1])
try:
sleep(4)
driver.close()
driver.switch_to.window(a)
#sleep(3)
except(Exception):
print("Exception caught")
#WebDriverWait(driver, 30).until(EC.visibility_of_element_located((By.CLASS_NAME, "event__match event__match--last event__match--twoLine")))
sleep(10)
driver.close()
Any ideas to help please.
It looks like the element you are trying to click on is covered by a banner ad or something else like a cookie message.
To fix this you can scroll down to the last element using the following code:
driver.execute_script('\
let items = document.querySelectorAll(\'div[title="Click for match detail!"]\'); \
items[items.length - 1].scrollIntoView();'
)
Add it before clicking on the desired element in the loop.
I tried to make a working example for you but it works on chromedriver not gecodriver:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
options = webdriver.ChromeOptions()
# options.add_argument("--headless")
options.add_experimental_option("excludeSwitches", ["enable-automation", "enable-logging"])
service = Service(executable_path='your\path\to\chromedriver.exe')
driver = webdriver.Chrome(service=service, options=options)
wait = WebDriverWait(driver, 5)
url = 'https://www.flashscore.com/'
driver.get(url)
# accept cookies
wait.until(EC.presence_of_element_located((By.ID, 'onetrust-accept-btn-handler'))).click()
matches = driver.find_elements(By.CSS_SELECTOR, 'div[title="Click for match detail!"]')
for match in matches:
driver.execute_script('\
let items = document.querySelectorAll(\'div[title="Click for match detail!"]\'); \
items[items.length - 1].scrollIntoView();'
)
match.click()
driver.switch_to.window(driver.window_handles[1])
print('get data from open page')
driver.close()
driver.switch_to.window(driver.window_handles[0])
driver.quit()
It works in both normal and headless mode
I am trying to extract number of youtube comments and tried several methods.
My Code:
from selenium import webdriver
import pandas as pd
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import time
DRIVER_PATH = <your chromedriver path>
wd = webdriver.Chrome(executable_path=DRIVER_PATH)
url = 'https://www.youtube.com/watch?v=5qzKTbnhyhc'
wd.get(url)
wait = WebDriverWait(wd, 100)
time.sleep(40)
v_title = wd.find_element_by_xpath('//*[#id="container"]/h1/yt-formatted-string').text
print("title Is ")
print(v_title)
comments_xpath = '//h2[#id="count"]/yt-formatted-string/span[1]'
v_comm_cnt = wait.until(EC.visibility_of_element_located((By.XPATH, comments_xpath)))
#wd.find_element_by_xpath(comments_xpath)
print(len(v_comm_cnt))
I get the following error:
selenium.common.exceptions.TimeoutException: Message:
I get correct value for title but not for comment_cnt. Can any one please guide me what is wrong with my code?
Please note that comments count path - //h2[#id="count"]/yt-formatted-string/span[1] point to correct place if I search the value in inspect element.
Updated answer
Well, it was tricky!
There are several issues here:
This page has some bad java scripts on it making the Selenium webdriver driver.get() method to wait until the timeout for the page loaded while it looks like the page is loaded. To overcome that I used Eager page load strategy.
This page has several blocks of code for the same areas so as sometimes one of them is used (visible) and sometimes the second. This makes working with element locators difficultly. So, here I am waiting for visibility of title element from one of that blocks. In case it was visible - I'm extracting the text from there, otherwise I'm waiting for the visibility of the second element (it comes immediately) and extracting the text from there.
There are several ways to make page scrolling. Not all of them worked here. I found the one that is working and scrolling not too much.
The code below is 100% working, I run it several times.
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.chrome.service import Service
options = Options()
options.add_argument("--start-maximized")
caps = DesiredCapabilities().CHROME
caps["pageLoadStrategy"] = "eager"
s = Service('C:\webdrivers\chromedriver.exe')
driver = webdriver.Chrome(options=options, desired_capabilities=caps, service=s)
url = 'https://www.youtube.com/watch?v=5qzKTbnhyhc'
driver.get(url)
driver.maximize_window()
wait = WebDriverWait(driver, 10)
title_xpath = "//div[#class='style-scope ytd-video-primary-info-renderer']/h1"
alternative_title = "//*[#id='title']/h1"
v_title = ""
try:
v_title = wait.until(EC.visibility_of_element_located((By.XPATH, title_xpath))).text
except:
v_title = wait.until(EC.visibility_of_element_located((By.XPATH, alternative_title))).text
print("Title is " + v_title)
comments_xpath = "//div[#id='title']//*[#id='count']//span[1]"
driver.execute_script("window.scrollBy(0, arguments[0]);", 600)
try:
v_comm_cnt = wait.until(EC.visibility_of_element_located((By.XPATH, comments_xpath)))
except:
pass
v_comm_cnt = driver.find_element(By.XPATH, comments_xpath).text
print("Video has " + v_comm_cnt + " comments")
The output is:
Title is Music for when you are stressed 🍀 Chil lofi | Music to Relax, Drive, Study, Chill
Video has 834 comments
Process finished with exit code 0
Objective: Find an appointment
Code:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import numpy as np
options = Options()
options.add_argument("--start-maximized")
import time
start = time.process_time()
time.sleep(3)
s = Service(path)
driver = webdriver.Chrome(options=options, service=s)
page = 'https://service.berlin.de/terminvereinbarung/termin/tag.php?\
termin=1&dienstleister=122231&anliegen[]=326798&herkunft=1'
driver.get(page)
time.sleep(5)
driver.refresh()
While the code works, I would like to extend it by making it do auto-refresh every 5 second interval until the element on 9 September 2022 is clickable.
I am thinking of something like
if wait.until(EC.element_to_be_clickable((By.XPATH,
'//*[#id="layout-grid__area--maincontent"] \
/div/div/div[2]/div[2]/div/div/div[5]/div/div[2] \
/div[1]/table/tbody/tr[2]/td[5]'))).click() is False:
time.sleep(5)
driver.refresh()
else:
break
but the second part of the code does not work.
An example of a clickable date is on Nov 4.
Update:
Added a while True loop
i=0
while (True):
element = driver.find_element(by=By.XPATH, value='//*[#id="layout-grid__area--maincontent"]\
/div/div/div[2]/div[2]/div/div/div[5]/div/div[2]\
/div[1]/table/tbody/tr[2]/td[5]')
i+=1
if "nichtbuchbar" in element.get_attribute("class"):
time.sleep(5)
driver.refresh()
print(f'The {i}\'th try has failed')
I am not sure whether the code above does the job properly, because it seems that every new line is printed in less than 1 second, whereas it is supposed to take a 5 second pause before refresh.
You can check the class name on that element instead, that will give you the same information as is_clickable, would such a function exist (assuming the red ones are not clickable and the white ones are. If you need a blue one, look at the css class on that one and see if that is in the class name).
if "buchbar" not in element.get_attribute("class"):
time.sleep(5)
driver.refresh()
Disclaimer: I'm coming back to scripting after more than a decade (and I was a novice to begin with) so apologies if the question is mundane but help is much needed and appreciated.
I'm trying to modify a python script to log me into a job portal, search job vacancies based on attributes and then apply to said jobs.
Since the portal opens new vacancies on separate tabs, the code is supposed to go to the next tab and test the criteria on the job description
The code snippet is as below:
for i in range(1,6):
driver.get('https://www.naukri.com/'+role+'-jobs-in-'+location+'-'+str(i)+'?ctcFilter='+str(LL)+'to'+str(UL))
driver.switch_to.window(driver.window_handles[1])
url = driver.current_url
driver.get(url)
try:
test = driver.find_element_by_xpath('//*[#id="root"]/main/div[2]/div[2]/section[2]')
if all(word in test.text.lower() for word in Skillset):
driver.find_element_by_xpath('//*[#id="root"]/main/div[2]/div[2]/section[1]/div[1]/div[3]/div/button[2]').click()
time.sleep(2)
driver.close()
driver.switch_to.window(driver.window_handles[0])
else:
driver.close()
driver.switch_to.window(driver.window_handles[0])
except:
driver.close()
driver.switch_to.window(driver.window_handles[0])
However, when I run the script, it just logs me in to the portal and goes to the correct listings page but just stays there. Plus, it pops this error:
> line 43, in <module> driver.switch_to.window(driver.window_handles[1])
> IndexError: list index out of range
Not able to understand what list this is referring to and how to fix this code. Any help is appreciated.
The complete code for those interested:
import selenium
from selenium import webdriver as wb
import pandas as pd
import time
from time import sleep
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
driver = wb.Chrome("/Users/shashank/Desktop/Naukri Auto Apply/chromedriver")
Skillset = ['Marketing','Communications','Sales']
Skillset = [x.lower() for x in Skillset]
LL = 15 #lower limit of expected CTC
UL = 25 #upper limit of expected CTC
location = 'Bangalore'
location = location.lower().replace(" ","-")
role = 'Marketing Manager'
role = role.lower().replace(" ","-")
driver.get("https://www.naukri.com")
driver.find_element_by_xpath('//*[#id="login_Layer"]/div').click()
time.sleep(5)
driver.find_element_by_xpath('//*[#id="root"]/div[3]/div[2]/div/form/div[2]/input').send_keys("email_address_here")
driver.find_element_by_xpath('//*[#id="root"]/div[3]/div[2]/div/form/div[3]/input').send_keys("password_here")
time.sleep(5)
driver.find_element_by_xpath('//*[#id="root"]/div[3]/div[2]/div/form/div[6]/button').click()
time.sleep(20)
driver.find_element_by_xpath('/html/body/div[3]/div/div[1]/div[1]/div').click()
for i in range(1,6):
driver.get('https://www.naukri.com/'+role+'-jobs-in-'+location+'-'+str(i)+'?ctcFilter='+str(LL)+'to'+str(UL))
driver.switch_to.window(driver.window_handles[1])
url = driver.current_url
driver.get(url)
try:
test = driver.find_element_by_xpath('//*[#id="root"]/main/div[2]/div[2]/section[2]')
if all(word in test.text.lower() for word in Skillset):
driver.find_element_by_xpath('//*[#id="root"]/main/div[2]/div[2]/section[1]/div[1]/div[3]/div/button[2]').click()
time.sleep(2)
driver.close()
driver.switch_to.window(driver.window_handles[0])
else:
driver.close()
driver.switch_to.window(driver.window_handles[0])
except:
driver.close()
driver.switch_to.window(driver.window_handles[0])
Thanks in advance for answering! It will really help with the job search!
No need to switch to another window
When you are opening the URL with that specific details in the for loop, the page is getting loaded one by one in the same window. Switch to window when there is a New window tab opened. Link to Refer
Choose Explicit waits instead of time.sleep(). You have refined WebdriverWait but never used it.
Try to come up with good locators. Go for Relative Xpath instead of Absolute Xpath.Link to Refer
Not sure what you are trying to do in try block. The locators does not highlight any elements in the page.
Refer below code:
# Imports
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
driver = webdriver.Chrome(service= Service("path to chromedriver.exe"))
driver.maximize_window()
driver.get("https://www.naukri.com")
wait = WebDriverWait(driver,30)
login_btn = wait.until(EC.element_to_be_clickable((By.XPATH,"//div[text()='Login']")))
login_btn.click()
email = wait.until(EC.element_to_be_clickable((By.XPATH,"//form[#name='login-form']//input[contains(#placeholder,'Email')]")))
email.send_keys("abc#gmail.com")
password = wait.until(EC.element_to_be_clickable((By.XPATH,"//form[#name='login-form']//input[contains(#placeholder,'password')]")))
password.send_keys("password")
password.submit()
role = "marketing-manager"
loc = "bangalore"
llimt = "15"
ulimit = "25"
for i in range(1,6):
driver.get(f"https://www.naukri.com/{role}-jobs-in-{loc}-{i}?ctcFilter={llimt}to{ulimit}")
time.sleep(2)
# Update for Chat bot
try:
chatbot_close = wait.until(EC.element_to_be_clickable((By.XPATH,"//div[#class='chatbot_Nav']/div")))
chatbot_close.click()
except:
print("Chat bot did not appear")
I have this code which goes to https://xiaomifirmwareupdater.com/miui/ , searches query and selects first element and downloads the rom, but its always selecting the same element despite of different query, I first thought website was giving same top result, but I checked with my browser but its giving different results, How can I fix / do this?
My Code :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select
from selenium.webdriver.chrome.options import Options
import asyncio
GOOGLE_CHROME_BIN = 'path here'
CHROME_DRIVER = 'driver path here'
async def bruh(query="Redmi Note 8 Pro China"):
url = "https://xiaomifirmwareupdater.com/miui"
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.binary_location = GOOGLE_CHROME_BIN
chrome_options.add_argument("--disable-dev-shm-usage")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-gpu")
driver = webdriver.Chrome(executable_path=CHROME_DRIVER, options=chrome_options)
driver.get(url)
await asyncio.sleep(10)
w = WebDriverWait(driver, 20)
search_xpath = '/html/body/div[3]/section/div[2]/div[3]/div[1]/div/div[2]/div[1]/div[2]/div/label/input'
next_page_url_xpath = '/html/body/div[3]/section/div[2]/div[3]/div[1]/div/div[2]/div[2]/div/table/tbody/tr[1]/td[8]/a'
version_xpath = '/html/body/div[3]/section/div[2]/div[2]/div[2]/div[1]/div/ul/li[3]/h5'
name_xpath = '/html/body/div[3]/section/div[2]/div[2]/div[2]/div[1]/div/ul/li[8]/h5/span'
w.until(expected_conditions.presence_of_element_located((By.XPATH, search_xpath)))
elem = driver.find_element_by_xpath(search_xpath)
elem.send_keys(query)
await asyncio.sleep(20)
next_page_elem = driver.find_element_by_xpath(next_page_url_xpath)
nextm = next_page_elem.get_attribute('href')
driver.get(nextm)
await asyncio.sleep(10)
version_elem = driver.find_element_by_xpath(version_xpath).text
name_elem = driver.find_element_by_xpath(name_xpath).text
version_elem = version_elem.replace("Version: ", "")
print(version_elem)
print(name_elem)
url = f"https://bigota.d.miui.com/{version_elem}/{name_elem}"
print(url)
driver.close()
I want to visit website, send my query and select first option and convert it to download able url. can anyone help? Thank You
You are doing a lot of extra stuff that I don't quite follow.
A few pieces of feedback...
Waiting for presence just means that the element is in the DOM, not that it's visible or interactable. If you are going to use click or send keys, you need to wait until clickable or visible, respectively, or you may get an exception.
You don't need all those sleeps, especially when you are using WebDriverWait. Best practice is to avoid sleeps (they make your script slower and less reliable), instead use WebDriverWait.
The wait for an element will return that element so you don't need to wait, then find, then click... you can just wait.until(...).click().
You were getting the href from a link and then navigating to it... just click the link.
Instead of replace(), I just used split()... either is fine. I think split() is less likely to break, e.g. if they change the labels.
Updating your code based on my feedback above, this should work.
driver.get(url)
wait = WebDriverWait(driver, 20)
wait.until(expected_conditions.visibility_of_element_located((By.CSS_SELECTOR, "input"))).send_keys(query)
wait.until(expected_conditions.element_to_be_clickable((By.CSS_SELECTOR, "#miui td > a"))).click()
version = wait.until(expected_conditions.visibility_of_element_located((By.XPATH, "//h5[./b[text()='Version: ']]"))).text.split()[1]
package_name = wait.until(expected_conditions.visibility_of_element_located((By.ID, "filename"))).text
print(version)
print(package_name)
url = f"https://bigota.d.miui.com/{version}/{package_name}"
print(url)
driver.close()