Webscrape Flashscore with Python/Selenium

Webscrape Flashscore with Python/Selenium - python

I started to learn scrape websites with Python and Selenium. I choose selenium because I need to navigate through the website and I also have to login.
I wrote an script that is able to open a firefox window and it opens the website www.flashscore.com. With this script I also be able to login and navigate to the different sports section (main menu) they have.
The code:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
# open website
driver = webdriver.Firefox()
driver.get("http://www.flashscore.com")
# login
driver.find_element_by_id('signIn').click()
username = driver.find_element_by_id("email")
password = driver.find_element_by_id("passwd")
username.send_keys("*****")
password.send_keys("*****")
driver.find_element_by_name("login").click()
# go to the tennis section
link = driver.find_element_by_link_text('Tennis')
link.click()
#go to the live games tab in the tennis section
# ?????????????????????????????'
Then it went more difficult. I also want to navigate to, for example, the sections "live games" and "finished" tabs in the sports sector. This part wouldn't work. I tried many things but I can't get into one of this tabs. When analyzing the website I see that they use some Iframes. I also find some code to switch to a Iframes window. But the problem is, I can't find the name of the Iframe where the tabs are that I want to click on. Maybe the Iframes are not the problem and do I look to the wrong way. (Maybe the problem is caused by some javascript?)
Can anybody please help me with this?

No, the iframes are not the problem in this case. The "Live games" element is not inside an iframe. Locate it by link text and click:
live_games_link = driver.find_element_by_link_text("LIVE Games")
live_games_link.click()
You may need to wait for this link to be clickable before actually trying to click it:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
wait = WebDriverWait(driver, 10)
live_games_link = wait.until(EC.element_to_be_clickable((By.LINK_TEXT, "LIVE Games")))
live_games_link.click()

Related

Clicking on LI link using Selenium Webdriver (Python)

New to Selenium here. Thanks for the help in advance! (Solved)
I've had success at clicking on links using the code below, but having a hard time clicking on links under LI. I've referenced a couple of other stackoverflow pages, but have yet to find a solution.
In this case, I am trying to click on the page number "2", and then run my scraper (which I have working for page 1) for all subsequent pages. Note that clicking on page 2 will cause a change in the table (aka, a new set of stock tickers and information gets pulled up), but the website link itself will not change.
Website link: https://www.gurufocus.com/insider/summary
Here is what I am trying to click on:
The number 2, highlighted in yellow
What I see when I inspect the element: inspect
I can click on a different link (titled "Can Aggregated Insider Trading Activities Predict the Market?" on the same page via code below, but when I input "2" instead, I get an error message, "NoSuchElementException: Message: no such element: Unable to locate element: {"method":"link text","selector":"2"}"
In summary, I would like to "click" on page 2, and bring up more stock information, and then run my scraper through it (then use a for loop for the rest of the pages). I won't have any troubles creating the for loop to scrape multiple pages, but I can't seem to get Selenium to click on the next page for me.
Solved Code - thanks for all the help everyone!
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
PATH = "C:\\Users\\MYUSERNAME\\Webdrivers\\chromedriver.exe"
driver = webdriver.Chrome(PATH)
driver.get("https://www.gurufocus.com/insider/summary")
driver.find_element_by_xpath("//ul[#class='el-pager']/li[text()='2']").click
try:
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.XPATH, "//ul[#class='el-pager']/li[text()='2']"))
)
element.click()
except:
driver.quit()

Your issue is because page '2' is not a "link text". If you will notice 'Can Aggregated Insider Trading Activities Predict the Market?" is link text because its tag is "a". Add this to click on your page 2
driver.find_element_by_xpath("//ul[#class='el-pager']/li[text()='2']").click

HTML page seems to add iframe element when opening chrome by selenium in python

I am new to selenium and web automation tasks, and I am trying to code an application for automating papers search on PubMed, using chromedriver.
My aim is to click the top right "Sign in" button in the PubMed main page, https://www.ncbi.nlm.nih.gov/pubmed. So the problem is:
when I open PubMed main page manually, there are no iframes tags in the html source, and therefore the "Sign in" element should be simply accessible by its xpath "//*[#id="sign_in"]".
when same page is openened by selenium instead, I cannot find that "Sign in" element by its xpath, and if a try to inspect it, the html source seems to have embedded it in an <iframe> tag, so that it cannot be found anymore unless a driver._switch_to.frame method is executed. But if I open the html source code by Ctrl+Uand search for <iframe> element, there is still none of them. Here is the "Sign in" element inspection capture:
["Sign in" inspection][1]
I already got round this problem by:
bot = PubMedBot()
bot.driver.get('https://www.ncbi.nlm.nih.gov/pubmed')
sleep(2)
frames = bot.driver.find_elements_by_tag_name('iframe')
bot.driver._switch_to.frame(frames[0])
btn = bot.driver.find_element_by_xpath('/html/body/a')
btn.click()
But all I would like to understand is why the inspection code is different from the html source code, whether this <iframe> element is really appearing from nowhere, and if so why.
Thank you in advance.

You are facing issue due to synchronization .Kinddly find below solution to resolve your issue:
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium import webdriver
driver = webdriver.Chrome(executable_path=r"path ofchromedriver.exe")
driver.maximize_window()
wait = WebDriverWait(driver, 10)
driver.get("https://www.ncbi.nlm.nih.gov/pubmed")
iframe = wait.until(EC.presence_of_element_located((By.TAG_NAME, "iframe")))
driver.switch_to.frame(iframe)
wait.until(EC.element_to_be_clickable((By.XPATH, "//a[contains(text(), 'Sign in to NCBI')]"))).click()
Output:
Inspect element:

How can I "click" download button on selenium in python

I want to download user data on Google analytics by using crawler so I write some code using selenium. However, I cannot click the "export" button. It always shows the error "no such element". I tried to use find_element_by_xpath, by_name and by_id.
I upload inspect of GA page below.
I TRIED:
driver.find_element_by_xpath("//*[#class='download-link']").click()
driver.find_element_by_xpath('//*[#id="ID-activity-userActivityTable"]/div/div[2]/span[6]/button')
driver.find_element_by_xpath('//*[#class='_GAD.W_DECORATE_ELEMENT.C_USER_ACTIVITY_TABLE_CONTROL_ITEM_DOWNLOAD']')
Python Code:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
driver = webdriver.Chrome('/Users/parkjunhong/Downloads/chromedriver')
driver.implicitly_wait(3)
usrid = '1021'
url = 'https://analytics.google.com/analytics/web/#/report/app-visitors-user-activity/a113876882w169675624p197020837/_u.date00=20190703&_u.date01=20190906&_r.userId='+usrid+'&_r.userListReportStates=%3F_u.date00=20190703%2526_u.date01=20190906%2526explorer-
table.plotKeys=%5B%5D%2526explorer-table.rowStart=0%2526explorer-
table.rowCount=1000&_r.userListReportId=app-visitors-user-id'
driver.get(url)
driver.find_element_by_name('identifier').send_keys('ID')
idlogin = driver.find_element_by_xpath('//*[#id="identifierNext"]/span/span')
idlogin.click()
driver.find_element_by_name('password').send_keys('PASSWD')
element = driver.find_element_by_id('passwordNext')
driver.execute_script("arguments[0].click();", element)
#login
driver.find_element_by_xpath("//*[#class='download-link']").click()
#click the download button
ERROR:
Message: no such element: Unable to locate element
inspection of GA

your click element is in an iFrame (iFrame id="galaxyIframe" ...). Therefore, you need to tell the driver to switch from the "main" page to said iFrame. If you add this line of code after your #login it should work:
driver.switch_to.frame(galaxyIframe)
(If the frame did not have a name, you would use: iframe = driver.find_element_by_xpath("xpath-to-frame") and then driver.switch_to.frame(iframe)
To get back to your default frame, use:
driver.switch_to.default_content()
Crawling GA is generally a pain. Not just because you have these iFrames everywhere.
Apart from that, I would recommend looking into puppeteer, the new kid on the crawler block. Even though the prospect of switching to javascript from python may be daunting, it is worth it! Once you get into it, selenium will have felt super clunky.

You can try with the text:
If you want to click on 'Export'-
//button[contains(text(),'Export')]

How to force a link that is invoked via javascript to open in a new tab

I am trying to scrape a webpage that opens up hyperlinks via javascript as shown below. I am using Selenium with Python.
<a href="javascript:openlink('120000020846')">
<subtitle>Blah blah blah</subtitle>
</a>
Using XPATH, I was able to open up the hyperlink using the following Python code.
driver = webdriver.Chrome()
xpath = '//a/subtitle[contains(text(),"Blah blah blah")]'
link_to_open = driver.find_element(By.XPATH,xpath);
link_to_open.click()
However, the link opens in the same tab. This is not what I want, as I want the links to open in a new tab, so that I can retain the information of the current page and continue processing the rest of the links.
Would greatly appreciate if anyone can give me some pointers if this can be done?
Thank you so much! :)

You can force a link to open in a new tab through the ActionChains implementation as follows :
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
action = ActionChains(driver)
link_to_open = driver.find_element(By.XPATH, "//a/subtitle[contains(.,'Blah blah blah')]")
action.key_down(Keys.CONTROL).click(link_to_open).key_up(Keys.CONTROL).perform()

python selenium webscraping - cannot obtain data

I came across a website where I am hoping to scrape some data from. But the site seems to be un-scrapable for my limited Python knowledge. When using driver.find_element_by_xpath, I usually run into timeout exceptions.
Using the code I provided below, I hope to click on the first result and go to a new page. On the new page, I want to scrape the product title, and pack size. But no matter how I try it, I cannot even get Python to click the right thing for me. Let alone scraping the data. Can someone help out?
My desired output is:
Tris(triphenylphosphine)rhodium(I) chloride, 98%
190420010
1 GR 87.60
5 GR 367.50
These are the codes I have so far:
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
url = "http://www.acros.com/"
cas = "14694-95-2" # need to select for the appropriate one
driver = webdriver.Firefox()
driver.get(url)
country = driver.find_element_by_name("ddlLand")
for option in country.find_elements_by_tag_name("option"):
if option.text == "United States":
option.click()
driver.find_element_by_css_selector("input[type = submit]").click()
choice = driver.find_element_by_name("_ctl1:DesktopThreePanes1:ThreePanes:_ctl4:ddlType")
for option in choice.find_elements_by_tag_name("option"):
if option.text == "CAS registry number":
option.click()
inputElement = driver.find_element_by_id("_ctl1_DesktopThreePanes1_ThreePanes__ctl4_tbSearchString")
inputElement.send_keys(cas)
driver.find_element_by_id("_ctl1_DesktopThreePanes1_ThreePanes__ctl4_btnGo").click()

Your code as presented works fine for me, in that it directs the instance of Firefox to http://www.acros.com/DesktopModules/Acros_Search_Results/Acros_Search_Results.aspx?search_type=CAS&SearchString=14694-95-2 which shows the search results.
If you locate the iframe element on that page:
<iframe id="searchAllFrame" allowtransparency="" background-color="transparent" frameborder="0" width="1577" height="3000" scrolling="auto" src="http://newsearch.chemexper.com/misc/hosted/acrosPlugin/center.shtml?query=14694-95-2&searchType=cas&currency=&country=NULL&language=EN&forGroupNames=AcrosOrganics,FisherSci,MaybridgeBB,BioReagents,FisherLCMS&server=www.acros.com"></iframe>
and use driver.switch_to.frame to switch to that frame then I think data you want should be scrapable from there, for example:
driver.switch_to.frame(driver.find_element_by_xpath("//iframe[#id='searchAllFrame']"))
You can then carry on using the driver as usual to find elements within that iframe. (I think switch_to_frame works similarly, but is deprecated.)
(I can't seem to find a decent link to docs for switch_to, this isn't all that helpful.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Webscrape Flashscore with Python/Selenium - python

Related

Clicking on LI link using Selenium Webdriver (Python)

HTML page seems to add iframe element when opening chrome by selenium in python

How can I "click" download button on selenium in python

How to force a link that is invoked via javascript to open in a new tab

python selenium webscraping - cannot obtain data

Categories

Resources