elements from selenium does not show specific documents - python

not very familiar with interacting with page source code, I was trying to get the table on a stock screening website, but I just can't seem to get the table in the #document from inspect element to show up in any of my attempts.
the table I'm trying to reach
the inspect element
just the table it seems
Then I tried with selenium:
from time import sleep
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
username = 'yemingsang'
password = MY_PASSWORD
url = 'https://www.esignal.com/members/login'
options = Options()
options.add_experimental_option("detach",True)
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()),options=options)
driver.get(url)
sleep(1)
driver.find_element(By.NAME,'user').send_keys(username)
driver.find_element(By.NAME,'password').send_keys(password)
driver.find_element(By.CSS_SELECTOR,'input[type=\"SUBMIT\" i]').click()
sleep(1)
get_source = driver.page_source
print(get_source)
does not have the #document element either using selenium

Related

How do I access an internal link with Selenium that seems restricted

I am trying to fetch data from nj.58.com using selenium. I can access the homepapage and some internal links. While navigating through the links, I noticed that the website sees me as a web crawler when I visit a specific url; even if I interact with the links as a human.
I have built my selenium script to a point but I'm stock because the sites throws antibot response back at me.
Here is what I've done:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import Select
import undetected_chromedriver as uc
import time
import pandas as pd
driver = uc.Chrome()
website = 'https://nj.58.com/'
driver.get(website)
driver.implicitly_wait(4)
wait = WebDriverWait(driver, 10)
driver.maximize_window()
switch_city = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#commonTopbar_ipconfig > a"))).click()
city_location = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, '#selector-search-input')))
city_location.clear()
city_location.send_keys('南京' + Keys.RETURN)
keyword = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, '#keyword')))
keyword.clear()
keyword.send_keys('"废纸回收"')
time.sleep(2)
search_btn = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, '#searchbtn')))
search_btn.click()
When I click on search_btn, I'm expecting to see a list of items that I'm interested in. But instead it sees me as a web crawler at this variable position (search_btn) even before using the selenium.
How can I bypass this antibot/antihuman detection at point when I click on search_btn?

why is my this simple selenium code isn't working?

Actually it only opens youtube but don't type anything and search
from selenium import webdriver
driver = webdriver.Chrome()
driver.get('https://youtube.com')
searchArea = driver.find_element_by_xpath('//*[#id="search"]')
searchArea.send_keys('Sujeet Gund')
searchButton = driver.find_element_by_xpath('//*[#id="search-icon-legacy"]')
searchButton.click()
Try //input[#id="search"] instead.
------------------ updated below ------------------
Well, I guess you had copied wrong xpath. Maybe you might just copied the wrapper(or container) not the exact input field.
This solution using specific tag input makes selenium easier to find the element. But be careful not to use when there're more than one element with the same xpath.
Plus, I recommend you to use in this way
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager
SERVICE = Service(ChromeDriverManager().install())
OPTIONS = webdriver.ChromeOptions()
driver = webdriver.Chrome(service=SERVICE, options=OPTIONS)
driver.get('https://youtube.com')
searchArea = driver.find_element(by=By.XPATH, value='//input[#id="search"]')
searchArea.send_keys('Sujeet Gund')
because currently some functions in selenium are deprecated.

Do I have to use pyautogui to fill in a text box with requested text?Or can I just use selenium, however everytime my xpath isn't correct. Code below

Code:
from lib2to3.pgen2 import driver
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
s=Service("/usr/local/bin/chromedriver")
driver = webdriver.Chrome(service=s)
driver.implicitly_wait(0.5)
driver.maximize_window().
driver.get("https://greenhillsschool.myschoolapp.com/app#login")
driver.find_element(By.XPATH, "//*.
[#id="Username"]").send_keys("rpatel#greenhillsschool.org")
'''
usernamebox.send_keys("email")
next = driver.find_element_by_xpath("Next")
next.click()
'''

Scraping webpage with tabs that do not change url

I am trying to scrape Nasdaq webpage and have some issue with locating elements:
My code:
from selenium import webdriver
import time
import pandas as pd
driver.get('http://www.nasdaqomxnordic.com/shares/microsite?Instrument=CSE32679&symbol=ALK%20B&name=ALK-Abell%C3%B3%20B')
time.sleep(5)
btn_overview = driver.find_element_by_xpath('//*[#id="tabarea"]/section/nav/ul/li[2]/a')
btn_overview.click()
time.sleep(5)
employees = driver.find_element_by_xpath('//*[#id="CompanyProfile"]/div[6]')
After the last call, I receive the following error:
NoSuchElementException: no such element: Unable to locate element: {"method":"xpath","selector":"//*[#id="CompanyProfile"]/div[6]"}
Normally the problem would be in wrong 'xpath' but I tried several items, also by 'id'. I suspect that it has something to do with tabs (in my case navigating to "Overview"). Visually the webpage changes, but if for example, I scrape the table, it gets it from the first page:
table_test = pd.read_html(driver.page_source)[0]
What am I missing or doing wrong?
The overview page is under iframe
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
option = webdriver.ChromeOptions()
option.add_argument("start-maximized")
#chrome to stay open
option.add_experimental_option("detach", True)
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()),options=option)
driver.get('http://www.nasdaqomxnordic.com/shares/microsite?Instrument=CSE32679&symbol=ALK%20B&name=ALK-Abell%C3%B3%20B')
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, '//*[#id="tabarea"]/section/nav/ul/li[2]/a'))).click()
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, '//*[#id="cookieConsentOK"]'))).click()
WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe#MorningstarIFrame")))
employees=WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH, '//*[#id="CompanyProfile"]/div[6]'))).text.split()[1]
print(employees)
Output:
2,537
webdriverManager
You sure you need Selenium?
import requests
from bs4 import BeautifulSoup
url = 'http://lt.morningstar.com/gj8uge2g9k/stockreport/default.aspx'
payload = {
'SecurityToken': '0P0000A5LL]3]1]E0EXG$XCSE_3060'}
response = requests.get(url, params=payload)
soup = BeautifulSoup(response.text, 'html.parser')
employees = soup.find('h3', text='Employees').next_sibling.text
print(employees)
Output:
2,537

AttributeError: 'NoneType' object has no attribute 'find' Web Scraping Python

I am working on an office project to get data to check active status on different websites but whenever I want to get data sometimes it shows none and sometimes it shows this Attribute error, I follow youtube videos steps but still get this error. help, please.
//Python Code
from bs4 import BeautifulSoup
import requests
html_text = requests.get(
"https://www.mintscan.io/cosmos/validators/cosmosvaloper1we6knm8qartmmh2r0qfpsz6pq0s7emv3e0meuw").text
soup = BeautifulSoup(html_text, 'lxml')
status = soup.find('div', {'class': "ValidatorInfo_statusBadge__PBIGr"})
para = status.find('p').text
print(para)
The url is dynamic meaning data is populated by javascript. So you need automation tool something like selenium.
from bs4 import BeautifulSoup
import time
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
url = 'https://www.mintscan.io/cosmos/validators/cosmosvaloper1we6knm8qartmmh2r0qfpsz6pq0s7emv3e0meuw'
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.maximize_window()
time.sleep(8)
driver.get(url)
time.sleep(10)
soup = BeautifulSoup(driver.page_source, 'lxml')
#driver.close()
status = soup.find('div', {'class': "ValidatorInfo_statusBadge__PBIGr"})
para = status.find('p').text
print(para)
Output:
Active
You have the most common problem - modern pages use JavaScript to add elements but requests/BeautifulSoup can't run JavaScript.
So soup.find('div',...) gives None instead expected element and later it makes problem with None.find('p')
You may use Selenium to control real web browser which can run JavaScript.
from selenium import webdriver
#from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
#from selenium.common.exceptions import NoSuchElementException, TimeoutException
#from webdriver_manager.chrome import ChromeDriverManager
from webdriver_manager.firefox import GeckoDriverManager
url = "https://www.mintscan.io/cosmos/validators/cosmosvaloper1we6knm8qartmmh2r0qfpsz6pq0s7emv3e0meuw"
#driver = webdriver.Chrome(executable_path=ChromeDriverManager().install())
driver = webdriver.Firefox(executable_path=GeckoDriverManager().install())
driver.get(url)
#status = driver.find_element(By.XPATH, '//div[#class="ValidatorInfo_statusBadge__PBIGr"]')
wait = WebDriverWait(driver, 10)
status = wait.until(EC.visibility_of_element_located((By.XPATH, '//div[#class="ValidatorInfo_statusBadge__PBIGr"]')))
print(status.text)
Eventually you should check if page gives some API to get data.
You may also use DevTools (tab: Network) to check if JavaScript reads data from some URL and you may try to use this URL with requests. It could work faster than with Selenium but server may detect script/bot and block it.
JavaScript usually get data as JSON so it may not need to scrape HTML with BeautifulSoup

Categories

Resources