I have a python script that goes through the UI and fills out a lot of pages. I take a screenshot, but if the page is longer than the screen, then I miss half of the page. I tried a longer length, but with the normal pages, it's ridiculous and unusable. I tried to use ashot, but can't figure out how to add it to my project in pycharm.
Any wisdom or other ideas? Thanks.
UPDATE on things tried that didn't work (nothing happened on the page:
actions = ActionChains(driver)
actions.move_to_element(continuebtn)
Also:
chrome_options.add_argument('--enable-javascript')
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
driver.execute_script("window.scrollTo(0, 1080);")
You could take a multiple screenshots and scroll through the whole page.
A simple example would be:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as ec
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.firefox.options import Options
import time
d = webdriver.Firefox(executable_path="PATH TO YOUR DRIVER")
d.get("https://en.wikipedia.org/wiki/HTTP_cookie")
time.sleep(1)
scrollHeight = d.execute_script("return window.scrollMaxY")
index = 0
while d.execute_script("return window.pageYOffset") < scrollHeight:
d.save_screenshot(PICTURE_PATH+"/"+ FILENAME + str(index).zfill(2) + ".png")
d.execute_script("window.scrollByPages(1)")
# maybe call time.sleep in here somewhere to load lazyloaded images
time.sleep(0.2)
index += 1
d.quit()
Related
I want to download the image at this site https://imginn.com/p/CXVmwujLqbV/ via the button. but i always fail.
this is the code i use.
driver.find_element_by_xpath('/html/body/div[2]/div[5]/a').click()
Well,
Check this post for downloading resource. Picture has 'src' attribute in 'img' tag, that holds it.
Also, (though it just might be simplification just for this question), do not hardcode your xpath. Learn to code nicely using "Page Object Pattern".
There are several possible problems here:
You need to add a delay / wait before accessing this element.
You have to scroll the page since the element you wish to click is initially out of the view.
You should improve you locator. It's highly NOT recommended to use absolute XPaths.
This should work:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
import time
driver = webdriver.Chrome(executable_path='chromedriver.exe')
driver.set_window_size(1920,1080)
wait = WebDriverWait(driver, 20)
actions = ActionChains(driver)
driver.get("https://imginn.com/p/CXVmwujLqbV/")
button = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "div.downloads a")))
time.sleep(0.5)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(1)
#actions.move_to_element(button).perform()
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.downloads a"))).click()
This question already has answers here:
Python Beautifulsoup cannot get svg tags
(2 answers)
Closed 1 year ago.
from selenium import webdriver
import time
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome(executable_path='C:/chromedriver/chromedriver.exe')
driver.get('https://ggl-maxim.com/')
driver.find_element_by_xpath('//*[#id="body"]/div/div[2]/div/div[2]/fieldset/input[1]').send_keys('tnrud3080')
driver.find_element_by_xpath('//*[#id="body"]/div/div[2]/div/div[2]/fieldset/input[2]').send_keys('tnrud3080')
driver.find_element_by_xpath('//*[#id="body"]/div/div[2]/div/div[2]/fieldset/button[1]').click()
time.sleep(2)
driver.get('https://ggl-maxim.com/api/popup/popup_menu.asp?mobile=0&lobby=EVOLUTION')
wait = WebDriverWait(driver, 20)
wait.until(EC.frame_to_be_available_and_switch_to_it("gameIframe"))
wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, ".svg--1nrnH")))
targets = driver.find_elements_by_css_selector(".svg--1nrnH")
res = []
for el in targets:
res.append(el.get_attribute('innerHTML'))
print(*res, sep='\n')
This code gets the svg (records of the game) as you look at the picture. However, if you click the button that I wrote "multi" the picture at the bottom, I can see records also at the right of the page. I found out that this part shows up the records more faster than before. In order to do that I have to get svg value only from that div. How can I? Please help me!
The second approach is not faster and is harder to implement, as each container is loaded separately and starts to reload after the first load is done. It looks like a nightmare to automate.
I tried Selenium's explicit waits and time.sleep() neither of the approached worked.
The code below clicks the button, switches to a new iframe and tries to get containers content. But the content is almost always empty for the reasons described above.
from selenium import webdriver
import time
from bs4 import BeautifulSoup
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome(executable_path='/snap/bin/chromium.chromedriver')
driver.get('http://ggl-maxim.com/')
driver.find_element_by_xpath('//*[#id="body"]/div/div[2]/div/div[2]/fieldset/input[1]').send_keys('tnrud3080')
driver.find_element_by_xpath('//*[#id="body"]/div/div[2]/div/div[2]/fieldset/input[2]').send_keys('tnrud3080')
driver.find_element_by_xpath('//*[#id="body"]/div/div[2]/div/div[2]/fieldset/button[1]').click()
time.sleep(2)
driver.get('http://ggl-maxim.com/api/popup/popup_menu.asp?mobile=0&lobby=EVOLUTION')
wait = WebDriverWait(driver, 30)
wait.until(EC.frame_to_be_available_and_switch_to_it("gameIframe"))
wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, ".svg--1nrnH")))
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "span[data-role=button-label]"))).click()
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, ".sidebar-container>iframe")))
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".sidebar-container")))
iframe2 = driver.find_element_by_css_selector('iframe[src^="https://evo.kplaycasino.com/frontend/evo/r2/"]')
driver.switch_to.frame(iframe2)
# wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".list--BLuiJ .svg--1nrnH")))
time.sleep(20)
targets = driver.find_elements_by_css_selector(".list--BLuiJ .svg--1nrnH")
res = []
for el in targets:
res.append(el.get_attribute('innerHTML'))
print(*res, sep='\n')
As you see, even 20 seconds is not enough, because content is reloading on fly.
I left explicit wait commented, so you could reassure that it won't work as well.
However, from my answer you can learn how to find a locator which starts with a specific text:
iframe2 = driver.find_element_by_css_selector('iframe[src^="https://evo.kplaycasino.com/frontend/evo/r2/"]')
Where, src^=means that src starts with some specified text.
I am attempting to scrape the website basketball-reference and am running into an issue I can't seem to solve. I am trying to grab the box score element for each game played. This is something I was able to easily do with urlopen but b/c other portions of the site require Selenium I thought I would rewrite the entire process with Selenium
Issue seems to be that even if I wait to scrape until I to see the first element load using WebDriverWait, when I then move forward to grabbing the elements I get nothing returned.
One thing I found interesting is if I did a full site print using my results from urlopen w/ something like print (uClient.read()) I would get roughly 300 more lines of html after beautifying compared to doing the same with print (driver.page_source). Even if I put an ImplicitlyWait set for 5 minutes.
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome('/usr/local/bin/chromedriver')
driver.wait = WebDriverWait(driver, 10)
driver.get('https://www.basketball-reference.com/boxscores/')
driver.wait.until(EC.presence_of_element_located((By.XPATH,'//*[#id="content"]/div[3]/div[1]')))
box = driver.find_elements_by_class_name('game_summary expanded nohover')
print (box)
driver.quit()
Try the below code, it is working in my computer. Do let me know if you still face problem.
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.wait = WebDriverWait(driver, 60)
driver.get('https://www.basketball-reference.com/boxscores/')
driver.wait.until(EC.presence_of_element_located((By.XPATH, '//*[#id="content"]/div[3]/div[1]')))
boxes = driver.wait.until(
EC.presence_of_all_elements_located((By.XPATH, "//div[#class=\"game_summary expanded nohover\"]")))
print("Number of Elements Located : ", len(boxes))
for box in boxes:
print(box.text)
print("-----------")
driver.quit()
If it resolves your problem then please mark it as answer. Thanks
Actually the site doesn't require selenium at all. All the data is there through a simple requests (it's just in the Comments of the html, would just need to parse that). Secondly, you can grab the box scores quite easily with pandas
import pandas as pd
dfs = pd.read_html('https://www.basketball-reference.com/boxscores/')
for idx, table in enumerate(dfs[:-2]):
print (table)
if (idx+1)%3 == 0:
print("-----------")
I am working on a project where am required to fetch data from a site using selenium.
The website has a load more clickable div.
i have managed to make selenium click the div and it works you can see it do the clicking when its running on none --headless mode
However when i try to get all the items i don't get the newly loaded
items after clicking.
Here is my code snippet
driver.get('https://jamboshop.com/search/tv')
i=1
maximum=4
while i<maximum:
try:
i += 1
el=driver.find_element_by_css_selector("div.showMoreLoaderPanel")
action=ActionChains(driver)
action.move_to_element(el).click().perform()
driver.implicitly_wait(3)
except:
break
products =driver.find_elements_by_css_selector("div.col-xs-6.col-sm-4.col-md-4.col-lg-3")
for product in products:
print({"item_name":product.find_element_by_css_selector("h6.prd-title").text})
This only prints the items that were present before the clicks...how do i get all the items in the page including ones loaded after clicking load more?
extra
# My imports and chrome settings
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--window-size=1420,1080')
#chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu')
chrome_options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome(chrome_options=chrome_options)
I think this is a lazy loading application.So when go bottom of the page it seems lost the previous elements it has capture and that is why you can see only current elements on the page available.
There is an alternative way to handle this by checking with a list and then capture those data while iterating the while loop.
Code:
import time
driver.get('https://jamboshop.com/search/tv')
i=1
maximum=4
itemlist=[]
while i<maximum:
try:
products = driver.find_elements_by_css_selector("div.col-xs-6.col-sm-4.col-md-4.col-lg-3")
for product in products:
if product.find_element_by_css_selector("h6.prd-title").text in itemlist:
continue
else:
itemlist.append(product.find_element_by_css_selector("h6.prd-title").text)
i += 1
el=driver.find_element_by_css_selector("div.showMoreLoaderPanel")
action=ActionChains(driver)
action.move_to_element(el).click().perform()
time.sleep(3)
except:
break
print(len(itemlist))
print(itemlist)
Let me know if this works for you.Website is not accessible at my end.
I've written some code in python in combination with selenium to parse the different questions from quora.com. My scraper is doing it's job at this moment. The thing is I've used here hardcoded delay for the scraper to work, even when Explicit Wait has already been defined. As the page is an infinite scrolling one, i tried to make the scrolling process to a limited number. Now, I have got two questions:
Why wait.until(EC.staleness_of(page)) is not working within my scraper. It is commented out now.
If i use something else instead of page = wait.until(EC.visibility_of_element_located((By.CLASS_NAME, "question_link"))) the scraper throws an error: can't focus element.
Btw, I do not wish to go for page = driver.find_element_by_tag_name('body') this option.
Here is what I've written so far:
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.get("https://www.quora.com/topic/C-programming-language")
wait = WebDriverWait(driver, 10)
page = wait.until(EC.visibility_of_element_located((By.CLASS_NAME, "question_link")))
for scroll in range(10):
page.send_keys(Keys.PAGE_DOWN)
time.sleep(2)
# wait.until(EC.staleness_of(page))
for item in wait.until(EC.visibility_of_all_elements_located((By.CLASS_NAME, "rendered_qtext"))):
print(item.text)
driver.quit()
You can try below code to get as much XHR as possible and then parse the page:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
driver = webdriver.Chrome()
driver.get("https://www.quora.com/topic/C-programming-language")
wait = WebDriverWait(driver, 10)
page = wait.until(EC.visibility_of_element_located((By.CLASS_NAME, "question_link")))
links_counter = len(wait.until(EC.visibility_of_all_elements_located((By.CLASS_NAME, "question_link"))))
while True:
page.send_keys(Keys.END)
try:
wait.until(lambda driver: len(driver.find_elements_by_class_name("question_link")) > links_counter)
links_counter = len(driver.find_elements_by_class_name("question_link"))
except TimeoutException:
break
for item in wait.until(EC.visibility_of_all_elements_located((By.CLASS_NAME, "rendered_qtext"))):
print(item.text)
driver.quit()
Here we scroll page down and wait up to 10 seconds for more links to be loaded or break the while loop if the number of links remains the same
As for your questions:
wait.until(EC.staleness_of(page)) is not working because when you scroll page down you don't get the new DOM - you just make XHR which adds more links into existed DOM, so the first link (page) will not be stale in this case
(I'm not quite confident about this, but...) I guess you can send keys only to nodes that can be focused (user can set focus manually), e.g. links, input fields, textareas, buttons..., but not content division (div), paragraphs (p), etc