Webscrape data from webpage -Python/Selenium - python

My code executes and gets into the page I want to scrape. Once I am there, im having a hard time printing any elements, in this case just the Names.
The page log in through the code so you can replace the "ExampleUsername" with any email / fake account if you are skeptical.
Here is the code:
import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager
productlinks=[]
test1=[]
options = Options()
driver = webdriver.Chrome(ChromeDriverManager().install())
url = "https://www.linkedin.com/uas/login?session_redirect=https%3A%2F%2Fwww%2Elinkedin%2Ecom%2Fsearch%2Fresults%2Fpeople%2F%3FcurrentCompany%3D%255B%25221252860%2522%255D%26geoUrn%3D%255B%2522103644278%2522%255D%26keywords%3Dsales%26origin%3DFACETED_SEARCH%26page%3D2&fromSignIn=true&trk=cold_join_sign_in"
driver.get(url)
time.sleep(2)
username = driver.find_element_by_id('username')
username.send_keys('Example#gmail.com')
password = driver.find_element_by_id('password')
password.send_keys('ExamplePassword')
password.submit()
element1 = driver.find_elements_by_class_name("name actor-name")
title=[t.text for t in element1]
print(title)

find_elements_by_class_name() doesn't accepts multiple class name. Instead you can use css selector.
To avoid synchronization issue Induce WebDriverWait() and wait for visibility_of_all_elements_located() and following css selector.
element1 =WebDriverWait(driver,10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR,".name.actor-name")))
title=[t.text for t in element1]
print(title)
you need to import below libraries.
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait

Related

Selecting jquery dropdown list using XPATH

Actually I am doing tasks from https://demo.seleniumeasy.com/jquery-dropdown-search-demo.html. But I found a problem - I can't find any element on this page using XPATH. For example I want to find "Select Country" using driver.find_element and XPATH:
from selenium import webdriver
from selenium.webdriver.common.by import By
driver = webdriver.Chrome()
driver.get("https://demo.seleniumeasy.com/jquery-dropdown-search-demo.html")
jquery_drop_list = driver.find_element(by=By.XPATH, value="//span[#class='select2-selection select2-selection--single']")
#jquery_drop_list = driver.find_element(by=By.XPATH, value="//span[#class='select2 select2-#container select2-container--default select2-container--above select2-container--focus']")
#jquery_drop_list = driver.find_element(by=By.XPATH, value="//span[#class='select2-hidden-#accessible']")
print(jquery_drop_list)
But none of the above searches works.
Could you advise me on what a proper selector should look like for similar problems? Maybe XPATH selector is not a good choice here?
There is a Select block here.
You need to utilize Selenium Select object for that.
This code is selecting Denmark:
from selenium import webdriver
from selenium.webdriver import ActionChains
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.select import Select
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
options = Options()
options.add_argument("start-maximized")
webdriver_service = Service('C:\webdrivers\chromedriver.exe')
driver = webdriver.Chrome(options=options, service=webdriver_service)
wait = WebDriverWait(driver, 20)
actions = ActionChains(driver)
url = "https://demo.seleniumeasy.com/jquery-dropdown-search-demo.html"
driver.get(url)
select_country = Select(wait.until(EC.element_to_be_clickable((By.ID, 'country'))))
select_country.select_by_value("Denmark")
But if you still want to open that drop down with regular click it is possible too. This XPath works:
from selenium import webdriver
from selenium.webdriver import ActionChains
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.select import Select
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
options = Options()
options.add_argument("start-maximized")
webdriver_service = Service('C:\webdrivers\chromedriver.exe')
driver = webdriver.Chrome(options=options, service=webdriver_service)
wait = WebDriverWait(driver, 20)
actions = ActionChains(driver)
url = "https://demo.seleniumeasy.com/jquery-dropdown-search-demo.html"
driver.get(url)
wait.until(EC.element_to_be_clickable((By.XPATH, "//span[#aria-labelledby='select2-country-container']"))).click()
Generally, XPath is most powerful way to select web elements with selenium.
Some people just not familiar with it :)
And sometimes some XPaths not properly supported by some webdrivers, but if you are using Chromedriver you will see no problems with XPaths.

My Selenium webdriver doesn’t work on cTrader

Why is my Selenium webdriver not working?
I would like to log in automatically on https://ct.spotware.com/. But Selenium can't find the HTML class for the login box.
For this, I wrote this little script:
from selenium import webdriver
from selenium.webdriver.common.by import By
import time
driver = webdriver.Chrome("./chromedriver")
driver.get("https://ct.spotware.com/")
time.sleep(10)
Login = driver.find_element(By.CLASS_NAME,"_a _b _gc _gw _dq _dx _gd _cw _em _cy _gx _fu _gy _fv _fy _fw _fx _db _ge _gf _gz _gg _gh _gi _gj _gk _gl _gm _gn")
Ctrader HTM class reference
The error message is:
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"._a _b _gc _gw _dq _dx _gd _cw _em _cy _gx _fu _gy _fv _fy _fw _fx _db _ge _gf _gz _gg _gh _gi _gj _gk _gl _gm _gn"}
Somehow the whole site doesn't work with Selenium. On other sites, like Wikipedia, my script works perfectly. Just not on cTrader.
Is there a solution?
There are several issues here:
All these class name values _a _b _gc _gw _dq _dx _gd _cw _em _cy _gx _fu _gy _fv _fy _fw _fx _db _ge _gf _gz _gg _gh _gi _gj _gk _gl _gm _gn are multiple separate class names. To use them you need to use CSS Selector or XPath.
The sequence of all the above class names looks to be fragile. You should use another, more stable and more clear locator.
Instead of hardcoded sleep you should use WebdriverWait explicit waits.
You need to close the cookies banner
And insert the user name and passwords
Anyway, the code below clicks the login button itself.
Please see the code below:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
options = Options()
options.add_argument("--start-maximized")
s = Service('C:\webdrivers\chromedriver.exe')
driver = webdriver.Chrome(options=options, service=s)
wait = WebDriverWait(driver, 20)
driver.get("https://ct.spotware.com/")
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button[type='submit']"))).click()
The spaces in your class name are not handled by Selenium. The following may help.
Login = driver.find_element(By.CSS_SELECTOR, "._a._b._gc._gw._dq._dx._gd _cw._em._cy._gx._fu._gy._fv._fy._fw._fx._db._ge._gf._gz._gg._gh._gi._gj._gk._gl._gm._gn")
However, upon examining your site, I'd recommend using a CSS selector such as this:
'input[placeholder="Enter your email or cTrader ID"]'
This is one way to correctly select the elements and login:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time as t
import pandas as pd
chrome_options = Options()
chrome_options.add_argument("--no-sandbox")
# chrome_options.add_argument("--headless")
chrome_options.add_argument('disable-notifications')
chrome_options.add_argument("window-size=1920,1080")
webdriver_service = Service("chromedriver/chromedriver") ## path to where you saved chromedriver binary
browser = webdriver.Chrome(service=webdriver_service, options=chrome_options)
actions = ActionChains(browser)
wait = WebDriverWait(browser, 20)
url = 'https://ct.spotware.com/'
browser.get(url)
login_field = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, 'input[placeholder="Enter your email or cTrader ID"]')))
pass_field = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, 'input[placeholder="Enter your password"]')))
submit_button = wait.until(EC.element_to_be_clickable((By.XPATH, '//button[text() = "Log In"]')))
login_field.send_keys('username')
pass_field.send_keys('bad_pass')
submit_button.click()
print('clicked')
Selenium documentation can be found at https://www.selenium.dev/documentation/

python using selenium webdriver mouser

I'm trying to open the Mouser website and use the search bar to send some data. Here's an example of the code but I can't get the right CSS selector. Thank you.
import time
from openpyxl import load_workbook
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
driver = webdriver.Chrome(executable_path='C:/Users/amuri/AppData/Local/Microsoft/WindowsApps/PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0/site-packages/chromedriver.exe')
driver.implicitly_wait(1)
url ='https://www.mouser.com/'
driver.get(url)
print(driver.title)
wait = WebDriverWait(driver, timeout=1)
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#as-input-066 .form-control")))
elem = driver.find_element_by_css_selector("#as-input-066 .form-control")
elem.click()
elem.send_keys("myString")
Try the following css:
.form-control.headerSearchBox.search-input.js-search-autosuggest.as-input
xpath is even shorter:
//input[contains(#id,'as-input')]
Explanation: it looks at id that contains as-input
One more suggestion:
Change
wait = WebDriverWait(driver, timeout=1)
to
wait = WebDriverWait(driver, timeout=15)
1 second is too small timeout. It should be at least 10.

Search on YouTube and return all links in Python

On YouTube, I want to search for certain videos (i.e. videos on Python) and after this, I want to return all videos this search returns. Right now if, I try this Python returns all the videos on the start page not on the page after the search.
Current code:
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get("http://youtube.com")
driver.find_element_by_name("search_query").send_keys("Python")
driver.find_element_by_id("search-icon-legacy").click()
links = driver.find_elements_by_id("video-title")
for x in links:
print(x.get_attribute("href"))
What goes wrong here?
But is better to use an explicit wait for this:
links = ui.WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.ID, "video-title")))
Reference.
Hope it helps you!
As per the discussion with #Mark:
It seems that the elements of the first page of Youtube are still in the DOM...
The only fix I see is to go to the search URL:
driver.get("http://youtube.com/results?search_query=Python")
# driver.find_element_by_name("search_query").send_keys("Python")
# driver.find_element_by_id("search-icon-legacy").click()
You should use WebDriverWait not sleep:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.chrome.options import Options
opt = Options()
opt.add_argument("--incognito")
driver = webdriver.Chrome(executable_path=r'C:\path\to\chromedriver.exe', chrome_options=opt)
driver.get("http://youtube.com")
driver.find_element_by_name("search_query").send_keys("Python")
driver.find_element_by_id("search-icon-legacy").click()
WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.ID, "video-title")))
links = driver.find_elements_by_id("video-title")
for x in links:
print(x.get_attribute("href"))
The output:
https://www.youtube.com/watch?v=rfscVS0vtbw
https://www.youtube.com/watch?v=f79MRyMsjrQ
https://www.youtube.com/watch?v=kLZuut1fYzQ
https://www.youtube.com/watch?v=N4mEzFDjqtA
https://www.youtube.com/watch?v=Z1Yd7upQsXY
https://www.youtube.com/watch?v=hnDU1G9hWqU
https://www.youtube.com/watch?v=3cZsjOclmoM
https://www.youtube.com/watch?v=f3EbDbm8XqY
https://www.youtube.com/watch?v=2uCXIbkbDSE
https://www.youtube.com/watch?v=HXV3zeQKqGY
https://www.youtube.com/watch?v=JJmcL1N2KQs
https://www.youtube.com/watch?v=qiSCMNBIP2g
https://www.youtube.com/watch?v=7lmCu8wz8ro
https://www.youtube.com/watch?v=25ovCm9jKfA
https://www.youtube.com/watch?v=q6Mc_sAPZ2Y
https://www.youtube.com/watch?v=yE9v9rt6ziw
https://www.youtube.com/watch?v=Y8Tko2YC5hA
https://www.youtube.com/watch?v=G0rQ7AEl5LA
https://www.youtube.com/watch?v=CtbckFw0pJs
https://www.youtube.com/watch?v=sugvnHA7ElY
To return all videos from the search with the keyword as Python you need to:
Maximize the screen so all the resultant video links get rendered within the HTML DOM.
Induce WebDriverWait for the desired elements to be visible before extracting the href attributes.
You can use the following solution
Code Block:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_argument("disable-infobars")
options.add_argument("--disable-extensions")
driver=webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
driver.get("https://www.youtube.com/")
WebDriverWait(driver, 5).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input#search"))).send_keys("Python")
driver.find_element_by_css_selector("button.style-scope.ytd-searchbox#search-icon-legacy").click()
print([my_href.get_attribute("href") for my_href in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a.yt-simple-endpoint.style-scope.ytd-video-renderer#video-title")))])
Console Output:
['https://www.youtube.com/watch?v=rfscVS0vtbw', 'https://www.youtube.com/watch?v=7UeRnuGo-pg', 'https://www.youtube.com/watch?v=3cZsjOclmoM', 'https://www.youtube.com/watch?v=f79MRyMsjrQ', 'https://www.youtube.com/watch?v=CtbckFw0pJs', 'https://www.youtube.com/watch?v=Z1Yd7upQsXY', 'https://www.youtube.com/watch?v=kLZuut1fYzQ', 'https://www.youtube.com/watch?v=IZ0IM_T4aio', 'https://www.youtube.com/watch?v=qiSCMNBIP2g', 'https://www.youtube.com/watch?v=N0lxfilGfak', 'https://www.youtube.com/watch?v=N4mEzFDjqtA', 'https://www.youtube.com/watch?v=s3Ejdx6cIho', 'https://www.youtube.com/watch?v=Y8Tko2YC5hA', 'https://www.youtube.com/watch?v=c3FXQU3TyCU', 'https://www.youtube.com/watch?v=yE9v9rt6ziw', 'https://www.youtube.com/watch?v=yvHrNlAF0Y0', 'https://www.youtube.com/watch?v=ZDa-Z5JzLYM']

Python, Selenium, and Beautiful Soup for URL

I am trying to write a script using Selenium to access pastebin do a search and print out in text the URL results. I need the visible URL results and nothing else.
<div class="gs-bidi-start-align gs-visibleUrl gs-visibleUrl-long" dir="ltr" style="word-break:break-all;">pastebin.com/VYQTSbzY</div>
Current script is:
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
browser = webdriver.Firefox()
browser.get('http://www.pastebin.com')
search = browser.find_element_by_name('q')
search.send_keys("test")
search.send_keys(Keys.RETURN)
soup=BeautifulSoup(browser.page_source)
for link in soup.find_all('a'):
print link.get('href',None),link.get_text()
You don't actually need BeautifulSoup. selenium itself is very powerful at locating element:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
browser = webdriver.Firefox()
browser.get('http://www.pastebin.com')
search = browser.find_element_by_name('q')
search.send_keys("test")
search.send_keys(Keys.RETURN)
# wait for results to appear
wait = WebDriverWait(browser, 10)
results = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.gsc-resultsbox-visible")))
# grab results
for link in results.find_elements_by_css_selector("a.gs-title"):
print link.get_attribute("href")
browser.close()
Prints:
http://pastebin.com/VYQTSbzY
http://pastebin.com/VYQTSbzY
http://pastebin.com/VAAQCjkj
...
http://pastebin.com/fVUejyRK
http://pastebin.com/fVUejyRK
Note the use of an Explicit Wait which helps to wait for the search results to appear.

Categories

Resources