Selenium Python 3.4.3 Automatic Torrenting: Link section - python

What i am trying to do is make a simple program that lets me run and it basically goes to Torrentz and follows a few link to finally be able to download the file through uttorent. Below is what i have coded so far and i cant seem to make the variable linkElem work. And i also cant seem to make linkElem.find_elements_by_xpath go to the link necessary. If you think you know what is wrong, please do help.
Thanks.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
browser = webdriver.Firefox()
browser.get('https://torrentz.eu/')
searchElem = browser.find_element_by_id('thesearchbox')
searchElem.send_keys('Limitless')
searchButton = browser.find_element_by_id('thesearchbutton')
searchButton.click()
linkElem = linkElem.find_elements_by_xpath("//div[#class='results']//a[#href='/9ea0c575520a3065d85b285c9474231192368db7']")
#wait = WebDriverWait(browser, 6)
#linkElem = wait.until(EC.visibility_of_element_located((By.href, "/9ea0c575520a3065d85b285c9474231192368db7")))
#linkElem.clear()
#linkElem = browser.find_element_by_link_text('S01E20 HDTV x264 LOL ettv')
#linkElem.click()
#SignIn = browser.find_elements_by_id('signIn')
#SignIn.click()
#passwordElem.submit()

I don't think you can and should rely on the href attribute value. Instead, get the links from under the dl elements inside the search results container. Also, add a wait:
# wait for search results to appear
wait = WebDriverWait(browser, 6)
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.results dl")))
links = driver.find_elements_by_css_selector("div.results dl dt a")
links[0].click()
links in your case would contain all of the search results links, links[0] is the first link.

Related

Python Selenium find element from specific div [duplicate]

This question already has answers here:
Python Beautifulsoup cannot get svg tags
(2 answers)
Closed 1 year ago.
from selenium import webdriver
import time
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome(executable_path='C:/chromedriver/chromedriver.exe')
driver.get('https://ggl-maxim.com/')
driver.find_element_by_xpath('//*[#id="body"]/div/div[2]/div/div[2]/fieldset/input[1]').send_keys('tnrud3080')
driver.find_element_by_xpath('//*[#id="body"]/div/div[2]/div/div[2]/fieldset/input[2]').send_keys('tnrud3080')
driver.find_element_by_xpath('//*[#id="body"]/div/div[2]/div/div[2]/fieldset/button[1]').click()
time.sleep(2)
driver.get('https://ggl-maxim.com/api/popup/popup_menu.asp?mobile=0&lobby=EVOLUTION')
wait = WebDriverWait(driver, 20)
wait.until(EC.frame_to_be_available_and_switch_to_it("gameIframe"))
wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, ".svg--1nrnH")))
targets = driver.find_elements_by_css_selector(".svg--1nrnH")
res = []
for el in targets:
res.append(el.get_attribute('innerHTML'))
print(*res, sep='\n')
This code gets the svg (records of the game) as you look at the picture. However, if you click the button that I wrote "multi" the picture at the bottom, I can see records also at the right of the page. I found out that this part shows up the records more faster than before. In order to do that I have to get svg value only from that div. How can I? Please help me!
The second approach is not faster and is harder to implement, as each container is loaded separately and starts to reload after the first load is done. It looks like a nightmare to automate.
I tried Selenium's explicit waits and time.sleep() neither of the approached worked.
The code below clicks the button, switches to a new iframe and tries to get containers content. But the content is almost always empty for the reasons described above.
from selenium import webdriver
import time
from bs4 import BeautifulSoup
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome(executable_path='/snap/bin/chromium.chromedriver')
driver.get('http://ggl-maxim.com/')
driver.find_element_by_xpath('//*[#id="body"]/div/div[2]/div/div[2]/fieldset/input[1]').send_keys('tnrud3080')
driver.find_element_by_xpath('//*[#id="body"]/div/div[2]/div/div[2]/fieldset/input[2]').send_keys('tnrud3080')
driver.find_element_by_xpath('//*[#id="body"]/div/div[2]/div/div[2]/fieldset/button[1]').click()
time.sleep(2)
driver.get('http://ggl-maxim.com/api/popup/popup_menu.asp?mobile=0&lobby=EVOLUTION')
wait = WebDriverWait(driver, 30)
wait.until(EC.frame_to_be_available_and_switch_to_it("gameIframe"))
wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, ".svg--1nrnH")))
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "span[data-role=button-label]"))).click()
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, ".sidebar-container>iframe")))
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".sidebar-container")))
iframe2 = driver.find_element_by_css_selector('iframe[src^="https://evo.kplaycasino.com/frontend/evo/r2/"]')
driver.switch_to.frame(iframe2)
# wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".list--BLuiJ .svg--1nrnH")))
time.sleep(20)
targets = driver.find_elements_by_css_selector(".list--BLuiJ .svg--1nrnH")
res = []
for el in targets:
res.append(el.get_attribute('innerHTML'))
print(*res, sep='\n')
As you see, even 20 seconds is not enough, because content is reloading on fly.
I left explicit wait commented, so you could reassure that it won't work as well.
However, from my answer you can learn how to find a locator which starts with a specific text:
iframe2 = driver.find_element_by_css_selector('iframe[src^="https://evo.kplaycasino.com/frontend/evo/r2/"]')
Where, src^=means that src starts with some specified text.

Selenium not able to find particular elements on slow loading page

I am attempting to scrape the website basketball-reference and am running into an issue I can't seem to solve. I am trying to grab the box score element for each game played. This is something I was able to easily do with urlopen but b/c other portions of the site require Selenium I thought I would rewrite the entire process with Selenium
Issue seems to be that even if I wait to scrape until I to see the first element load using WebDriverWait, when I then move forward to grabbing the elements I get nothing returned.
One thing I found interesting is if I did a full site print using my results from urlopen w/ something like print (uClient.read()) I would get roughly 300 more lines of html after beautifying compared to doing the same with print (driver.page_source). Even if I put an ImplicitlyWait set for 5 minutes.
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome('/usr/local/bin/chromedriver')
driver.wait = WebDriverWait(driver, 10)
driver.get('https://www.basketball-reference.com/boxscores/')
driver.wait.until(EC.presence_of_element_located((By.XPATH,'//*[#id="content"]/div[3]/div[1]')))
box = driver.find_elements_by_class_name('game_summary expanded nohover')
print (box)
driver.quit()
Try the below code, it is working in my computer. Do let me know if you still face problem.
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.wait = WebDriverWait(driver, 60)
driver.get('https://www.basketball-reference.com/boxscores/')
driver.wait.until(EC.presence_of_element_located((By.XPATH, '//*[#id="content"]/div[3]/div[1]')))
boxes = driver.wait.until(
EC.presence_of_all_elements_located((By.XPATH, "//div[#class=\"game_summary expanded nohover\"]")))
print("Number of Elements Located : ", len(boxes))
for box in boxes:
print(box.text)
print("-----------")
driver.quit()
If it resolves your problem then please mark it as answer. Thanks
Actually the site doesn't require selenium at all. All the data is there through a simple requests (it's just in the Comments of the html, would just need to parse that). Secondly, you can grab the box scores quite easily with pandas
import pandas as pd
dfs = pd.read_html('https://www.basketball-reference.com/boxscores/')
for idx, table in enumerate(dfs[:-2]):
print (table)
if (idx+1)%3 == 0:
print("-----------")

Having trouble referencing to a certain element on page with Selenium

I am having a terribly hard time referencing to a certain "next page" button on a website that I am trying to scrape links from [https://www.sreality.cz/adresar?strana=2]. If you scroll down you can see a red right arrow button that you can click to go to the next page and so the website load new dynamic content. Every approach seems to report the same exact error and I don't know how am I supposed to point to the element without running into it.
This is the code that I currently have :
from selenium import webdriver
chromedriver_path = "/home/user/Dokumenty/iCloud/RealityScraper/chromedriver"
driver = webdriver.Chrome(chromedriver_path)
print("WebDriver Successfully Initialized")
driver.get("https://www.sreality.cz/adresar?strana=2")
links = driver.find_elements_by_css_selector("h2.title a")
nextPage = driver.find_element_by_css_selector("li.paging-item a.btn-paging-pn.icof.icon-arr-right.paging-next")
for link in links:
print(link.get_attribute("href"))
nextPage.click()
The "nextPage" variable is holding a supposed value to be clicked on once the "links" variable search finishes scraping all the links from the company titles. However when I run this code I get an error :
selenium.common.exceptions.StaleElementReferenceException: Message:
stale element reference: element is not attached to the page document
I have been searching for various fixes online but none of them seemed to resolve the issue. I think that the issue at this point is not caused by the element not loading quickly enough but rather Selenium having trouble finding the element because of wrong reference.
Because of this I have tried using XPath to accurately point to the actual element and so I changed the "nextPage" variable to :
nextPage = driver.find_element_by_xpath("""/html/body/div[2]/div[1]/div[2]/div[2]/div[4]/div/div/div/div[2]/div/div[2]/ul[1]/li[12]/a""")
Which returns exactly the same error as stated above. I have been trying to find a solution to this for hours now and I can't understand where the issue lies. I would be grateful if anyone could explain to me what am I doing wrong. Thanks to anyone.
If you want to get all the ng-href tags from every page. Or you could look into their api.
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from time import sleep
driver.get("https://www.sreality.cz/adresar?strana=2")
wait = WebDriverWait(driver, 10)
while True:
try:
links = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "h2.title > a")))
#print(len(links))
for link in links:
print(link.get_attribute("ng-href"))
nextPage = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "a.btn-paging-pn.icof.icon-arr-right.paging-next")))
nextPage.click()
time.sleep(10)
except Exception as e:
print(e)
break
First of all never use the absolute xpath it will breakdown easily, Use the relative xpath.
Secondly, i think the error you are getting is because after clicking "Next" button for the first time it loads a new page. Which has a different DOM structure and that's why you are not able to find that element.
You can try searching for the element after every new page load (after clicking "Next" button everytime.)
// imports
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver import ActionChains
from selenium.webdriver.common.by import By
// initialize
driver = webdriver.Chrome()
wait = WebDriverWait(driver, 20)
action = ActionChains(driver)
// Try to use the below code and see if it works.
Next_btn = wait.until(EC.presence_of_element_located((By.XPATH, '(//li[#class="paging-item"])[2]')))
action.move_to_element(Next_btn).click().perform()

Element Not Clickable - even though it is there

Hoping you can help. I'm relatively new to Python and Selenium. I'm trying to pull together a simple script that will automate news searching on various websites. The primary focus was football and to go and get me the latest Manchester United news from a couple of places and save the list of link titles and URLs for me. I could then look through the links myself and choose anything I wanted to review.
In trying the the independent newspaper (https://www.independent.co.uk/) I seem to have come up against a problem with element not interactable when using the following approaches:
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome('chromedriver')
driver.get('https://www.independent.co.uk')
time.sleep(3)
#accept the cookies/privacy bit
OK = driver.find_element_by_id('qcCmpButtons')
OK.click()
#wait a few seconds, just in case
time.sleep(5)
search_toggle = driver.find_element_by_class_name('icon-search.dropdown-toggle')
search_toggle.click()
This throws the selenium.common.exceptions.ElementNotInteractableException: Message: element not interactable error
I've also tried with XPATH
search_toggle = driver.find_element_by_xpath('//*[#id="quick-search-toggle"]')
and I also tried ID.
I did a lot of reading on here and then also tried using WebDriverWait and execute_script methods:
element = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH, '//*[#id="quick-search-toggle"]')))
driver.execute_script("arguments[0].click();", element)
This didn't seem to error but the search box never appeared, i.e. the appropriate click didn't happen.
Any help you could give would be fantastic.
Thanks,
Pete
Your locator is //*[#id="quick-search-toggle"], there are 2 on the page. The first is invisible and the second is visible. By default selenium refers to the first element, sadly the element you mean is the second one, so you need another unique locator. Try this:
search_toggle = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, '//div[#class="row secondary"]//a[#id="quick-search-toggle"]')))
search_toggle.click()
First you need to open search box, then send search keys:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options
import os
chrome_options = Options()
chrome_options.add_argument("--start-maximized")
browser = webdriver.Chrome(executable_path=os.path.abspath(os.getcwd()) + "/chromedriver", options=chrome_options)
link = 'https://www.independent.co.uk'
browser.get(link)
# accept privacy
button = browser.find_element_by_xpath('//*[#id="qcCmpButtons"]/button').click()
# open search box
li = browser.find_element_by_xpath('//*[#id="masthead"]/div[3]/nav[2]/ul/li[1]')
search_tab = li.find_element_by_tag_name('a').click()
# send keys to search box
search = browser.find_element_by_xpath('//*[#id="gsc-i-id1"]')
search.send_keys("python")
search.send_keys(Keys.RETURN)
Can you try with below steps
search_toggle = driver.find_element_by_xpath('//*[#class="row secondary"]/nav[2]/ul/li[1]/a')
search_toggle.click()

Unable to use "explicit wait" in the right way

I've written a script using python with selenium to click on some links listed in the sidebar of google maps. When any of the items get clicked, the related information attached to each lead shows up in the right sided area. The script is doing fine. However, I've used hardcoded delay to do the job. How can I get rid of hardcoded delay by achieving the same with explicit wait. Thanks in advance.
Link to the site: website
The script I'm trying with:
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
link = "replace_with_above_link"
driver = webdriver.Chrome()
driver.get(link)
wait = WebDriverWait(driver, 10)
for item in wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "[id^='rlimg0_']"))):
item.location
time.sleep(3) #wish to try with explicit wait but can't find any idea
item.click()
driver.quit()
I tried with wait.until(EC.staleness_of(item)) instead of hardcoded delay but no luck.
If you want to wait until new data displayed after each clcik you may try below:
for item in wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "[id^='rlimg0_']"))):
div = driver.find_element_by_xpath("//div[#class='xpdopen']")
item.location
item.click()
wait.until(EC.staleness_of(div))

Categories

Resources