I am attempting to scrape the website basketball-reference and am running into an issue I can't seem to solve. I am trying to grab the box score element for each game played. This is something I was able to easily do with urlopen but b/c other portions of the site require Selenium I thought I would rewrite the entire process with Selenium
Issue seems to be that even if I wait to scrape until I to see the first element load using WebDriverWait, when I then move forward to grabbing the elements I get nothing returned.
One thing I found interesting is if I did a full site print using my results from urlopen w/ something like print (uClient.read()) I would get roughly 300 more lines of html after beautifying compared to doing the same with print (driver.page_source). Even if I put an ImplicitlyWait set for 5 minutes.
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome('/usr/local/bin/chromedriver')
driver.wait = WebDriverWait(driver, 10)
driver.get('https://www.basketball-reference.com/boxscores/')
driver.wait.until(EC.presence_of_element_located((By.XPATH,'//*[#id="content"]/div[3]/div[1]')))
box = driver.find_elements_by_class_name('game_summary expanded nohover')
print (box)
driver.quit()
Try the below code, it is working in my computer. Do let me know if you still face problem.
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.wait = WebDriverWait(driver, 60)
driver.get('https://www.basketball-reference.com/boxscores/')
driver.wait.until(EC.presence_of_element_located((By.XPATH, '//*[#id="content"]/div[3]/div[1]')))
boxes = driver.wait.until(
EC.presence_of_all_elements_located((By.XPATH, "//div[#class=\"game_summary expanded nohover\"]")))
print("Number of Elements Located : ", len(boxes))
for box in boxes:
print(box.text)
print("-----------")
driver.quit()
If it resolves your problem then please mark it as answer. Thanks
Actually the site doesn't require selenium at all. All the data is there through a simple requests (it's just in the Comments of the html, would just need to parse that). Secondly, you can grab the box scores quite easily with pandas
import pandas as pd
dfs = pd.read_html('https://www.basketball-reference.com/boxscores/')
for idx, table in enumerate(dfs[:-2]):
print (table)
if (idx+1)%3 == 0:
print("-----------")
Related
This question already has answers here:
Python Beautifulsoup cannot get svg tags
(2 answers)
Closed 1 year ago.
from selenium import webdriver
import time
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome(executable_path='C:/chromedriver/chromedriver.exe')
driver.get('https://ggl-maxim.com/')
driver.find_element_by_xpath('//*[#id="body"]/div/div[2]/div/div[2]/fieldset/input[1]').send_keys('tnrud3080')
driver.find_element_by_xpath('//*[#id="body"]/div/div[2]/div/div[2]/fieldset/input[2]').send_keys('tnrud3080')
driver.find_element_by_xpath('//*[#id="body"]/div/div[2]/div/div[2]/fieldset/button[1]').click()
time.sleep(2)
driver.get('https://ggl-maxim.com/api/popup/popup_menu.asp?mobile=0&lobby=EVOLUTION')
wait = WebDriverWait(driver, 20)
wait.until(EC.frame_to_be_available_and_switch_to_it("gameIframe"))
wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, ".svg--1nrnH")))
targets = driver.find_elements_by_css_selector(".svg--1nrnH")
res = []
for el in targets:
res.append(el.get_attribute('innerHTML'))
print(*res, sep='\n')
This code gets the svg (records of the game) as you look at the picture. However, if you click the button that I wrote "multi" the picture at the bottom, I can see records also at the right of the page. I found out that this part shows up the records more faster than before. In order to do that I have to get svg value only from that div. How can I? Please help me!
The second approach is not faster and is harder to implement, as each container is loaded separately and starts to reload after the first load is done. It looks like a nightmare to automate.
I tried Selenium's explicit waits and time.sleep() neither of the approached worked.
The code below clicks the button, switches to a new iframe and tries to get containers content. But the content is almost always empty for the reasons described above.
from selenium import webdriver
import time
from bs4 import BeautifulSoup
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome(executable_path='/snap/bin/chromium.chromedriver')
driver.get('http://ggl-maxim.com/')
driver.find_element_by_xpath('//*[#id="body"]/div/div[2]/div/div[2]/fieldset/input[1]').send_keys('tnrud3080')
driver.find_element_by_xpath('//*[#id="body"]/div/div[2]/div/div[2]/fieldset/input[2]').send_keys('tnrud3080')
driver.find_element_by_xpath('//*[#id="body"]/div/div[2]/div/div[2]/fieldset/button[1]').click()
time.sleep(2)
driver.get('http://ggl-maxim.com/api/popup/popup_menu.asp?mobile=0&lobby=EVOLUTION')
wait = WebDriverWait(driver, 30)
wait.until(EC.frame_to_be_available_and_switch_to_it("gameIframe"))
wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, ".svg--1nrnH")))
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "span[data-role=button-label]"))).click()
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, ".sidebar-container>iframe")))
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".sidebar-container")))
iframe2 = driver.find_element_by_css_selector('iframe[src^="https://evo.kplaycasino.com/frontend/evo/r2/"]')
driver.switch_to.frame(iframe2)
# wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".list--BLuiJ .svg--1nrnH")))
time.sleep(20)
targets = driver.find_elements_by_css_selector(".list--BLuiJ .svg--1nrnH")
res = []
for el in targets:
res.append(el.get_attribute('innerHTML'))
print(*res, sep='\n')
As you see, even 20 seconds is not enough, because content is reloading on fly.
I left explicit wait commented, so you could reassure that it won't work as well.
However, from my answer you can learn how to find a locator which starts with a specific text:
iframe2 = driver.find_element_by_css_selector('iframe[src^="https://evo.kplaycasino.com/frontend/evo/r2/"]')
Where, src^=means that src starts with some specified text.
I'm new to using Selenium but I watched enough videos and followed enough articles to know something is missing. I'm trying to get values from TradingView but the problem I'm running into is that I simply can't find any of the elements, not by Xpath or Css. I went ahead and tried to do a simple visibility element test as shown in the code below and to my surprise it times out.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
# Stops the UI interface (chrome browser) from popping up
# chrome_options.add_argument("--headless")
driver = webdriver.Chrome(executable_path='c:\se\chromedriver.exe', options=chrome_options)
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
import time
driver.get("https://www.tradingview.com/chart/")
timeout = 20
try:
WebDriverWait(driver, timeout).until(EC.visibility_of_element_located((By.XPATH, "/html/body/div[1]")))
print("Page loaded")
except TimeoutException:
print("Timed out waiting for page to load")
driver.quit()
I tried to click on one of the chart buttons too using the following and that doesn't work either. I noticed that unlike many other websites for Tradingview the elements don't have names and don't generate a relative path (only full) using Xpath.
driver.find.element_by_xpath('/html/body/div[2]/div[5]/div/div[2]/div/div/div/div/div[4]').click()
Any help is greatly appreciated!
I think there must be an issue with xpath.
When I try to click the AAPL button it is working for me.
The xpath I used is:
(//div[contains(text(),'AAPL')])[1]
If you specify exactly which element to be clicked I will try.
And also be familiar with the concept of frames because these type of websites has lot of frames in it.
Hoping you can help. I'm relatively new to Python and Selenium. I'm trying to pull together a simple script that will automate news searching on various websites. The primary focus was football and to go and get me the latest Manchester United news from a couple of places and save the list of link titles and URLs for me. I could then look through the links myself and choose anything I wanted to review.
In trying the the independent newspaper (https://www.independent.co.uk/) I seem to have come up against a problem with element not interactable when using the following approaches:
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome('chromedriver')
driver.get('https://www.independent.co.uk')
time.sleep(3)
#accept the cookies/privacy bit
OK = driver.find_element_by_id('qcCmpButtons')
OK.click()
#wait a few seconds, just in case
time.sleep(5)
search_toggle = driver.find_element_by_class_name('icon-search.dropdown-toggle')
search_toggle.click()
This throws the selenium.common.exceptions.ElementNotInteractableException: Message: element not interactable error
I've also tried with XPATH
search_toggle = driver.find_element_by_xpath('//*[#id="quick-search-toggle"]')
and I also tried ID.
I did a lot of reading on here and then also tried using WebDriverWait and execute_script methods:
element = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH, '//*[#id="quick-search-toggle"]')))
driver.execute_script("arguments[0].click();", element)
This didn't seem to error but the search box never appeared, i.e. the appropriate click didn't happen.
Any help you could give would be fantastic.
Thanks,
Pete
Your locator is //*[#id="quick-search-toggle"], there are 2 on the page. The first is invisible and the second is visible. By default selenium refers to the first element, sadly the element you mean is the second one, so you need another unique locator. Try this:
search_toggle = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, '//div[#class="row secondary"]//a[#id="quick-search-toggle"]')))
search_toggle.click()
First you need to open search box, then send search keys:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options
import os
chrome_options = Options()
chrome_options.add_argument("--start-maximized")
browser = webdriver.Chrome(executable_path=os.path.abspath(os.getcwd()) + "/chromedriver", options=chrome_options)
link = 'https://www.independent.co.uk'
browser.get(link)
# accept privacy
button = browser.find_element_by_xpath('//*[#id="qcCmpButtons"]/button').click()
# open search box
li = browser.find_element_by_xpath('//*[#id="masthead"]/div[3]/nav[2]/ul/li[1]')
search_tab = li.find_element_by_tag_name('a').click()
# send keys to search box
search = browser.find_element_by_xpath('//*[#id="gsc-i-id1"]')
search.send_keys("python")
search.send_keys(Keys.RETURN)
Can you try with below steps
search_toggle = driver.find_element_by_xpath('//*[#class="row secondary"]/nav[2]/ul/li[1]/a')
search_toggle.click()
I am trying to retrieve all reviewers comments for a particular app (https://play.google.com/store/apps/details?id=com.getsomeheadspace.android&hl=en&showAllReviews=true) using selenium and beautifulsoup. I load the link by using following code:
driver = webdriver.Chrome(path)
driver.get('https://play.google.com/store/apps/details?id=com.tudasoft.android.BeMakeup&hl=en&showAllReviews=true')
The above command does not load all reviewers comments. I mean it only loads the first 39 comments and does not load remaining comments. Is there any way to load all comments in single go?
You can use infinite loop and load the page until the Show More element is found because of lazy loading.To slowdown the loop I have used time.sleep(1). It gives 200 reviews on that page.If you want to get more then you need to click on Show More again.
However some the review format is not supporting hence try..except block.Hope this will helps.
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
driver = webdriver.Chrome()
driver.get('https://play.google.com/store/apps/details?id=com.tudasoft.android.BeMakeup&hl=en&showAllReviews=true')
while True:
driver.execute_script("window.scrollTo(0,document.body.scrollHeight)")
time.sleep(1)
elements=WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, 'div.UD7Dzf')))
if len(driver.find_elements_by_xpath("//span[text()='Show More']"))>0:
break;
print(len(elements))
allreview=[]
try:
for review in elements:
allreview.append(review.text)
except:
allreview.append("format incorrect")
print(allreview)
Looks like you have to scroll down to get all the information on the page.
try this:
driver.execute_script("window.scrollTo(0,document.body.scrollHeight)")
You may have to do that a couple of times to load all the data
I'm trying to automate the filling out of an Easily Apply job application on Indeed. Here is an example of a job application on Indeed that uses the Easily Apply approach. I've tried every which way to navigate the nested iframes; however, I cannot find an approach that works. I even found that this question has been asked before, unfortunately, the solution given to the question does not work for me. Below is my code as it stands now:
from selenium import webdriver
driver = webdriver.Firefox()
driver.get('indeed_url_goes_here')
driver.find_element_by_class_name('indeed-apply-button').click()
driver.switch_to_frame(driver.find_element_by_xpath('/html/body/iframe'))
driver.switch_to_frame(driver.find_element_by_xpath('//*[#id="indeedapply-modal-preload-iframe"]'))
driver.find_element_by_class_name('applicant.name')
Find the first parent iframe and switch to it and then to the nested frame by index.
Complete working code:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox()
driver.get("http://www.indeed.com/viewjob?jk=2e3d019aa34a2801&q=bartender&tk=1a9g51n08a3iof6h&from=web&advn=5333586156877432&sjdu=UvkB_mgi5f7NyMagFcTHP0E6zA3mclLGHWb8Kte-0FV3cY2ZuZvj3LUvh8wnnxrqeYWG3HpvTXBK3G4htWfwgfQeMa0N1Tds6VxYb4V3Vlg&pub=4a1b367933fd867b19b072952f68dceb")
driver.find_element_by_class_name('indeed-apply-button').click()
wait = WebDriverWait(driver, 10)
frame = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "iframe[name$=modal-iframe]")))
driver.switch_to.frame(frame)
driver.switch_to.frame(0)
print(driver.find_element_by_css_selector("h1.jobtitle").text)
Prints the job title from the popup: Bartender/Mixologist.
Well first off, the element doesn't have a class name - it has a regular name and an ID, so use either driver.find_element_by_nameor driver.find_element_by_id.