I'm trying to do a simple Python Selenium automation on a website while the website is blocked by a dialog which needs to scroll down to see all the paragraph so as to pass into the website.
I tried to use the code below to scroll the paragraph, but unsuccessful.
driver = webdriver.Chrome('chromedriver')
driver.maximize_window()
driver.implicitly_wait(30)
driver.get('https://www.fidelity.com.hk/en/our-funds/mpf')
wait = WebDriverWait(driver, 20)
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, 'div[data-action="button68"]'))).click()
time.sleep(1)
ele =driver.find_element_by_css_selector('.content-scrolling-behavior')
driver.execute_script("return arguments[0].scrollIntoView(true);", ele)
html capture
I would appreciate any feedback on how to consistently select an option from the dropdown noted in the code provided. And here is the website I looking at: https://www.fidelity.com.hk/en/our-funds/mpf
You can scroll using ActionChain like this :
also, in that div, there are 27 li tags, so I am doing xpath indexing and then one by one I am moving driver focus to those li.
Sample code :
driver.implicitly_wait(30)
driver.maximize_window()
driver.get("https://www.fidelity.com.hk/en/our-funds/mpf")
wait = WebDriverWait(driver, 20)
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, 'div[data-action="button68"]'))).click()
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.container")))
list_size = len(wait.until(EC.visibility_of_all_elements_located((By.XPATH, "//ul[#class='list']/li"))))
print(list_size)
j = 1
for i in range(list_size):
ActionChains(driver).move_to_element(wait.until(EC.visibility_of_element_located((By.XPATH, f"(//ul[#class='list']/li)[{j}]")))).perform()
j = j + 1
time.sleep(1)
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "div[class$='btn-confirm']"))).click()
This should work
ele =driver.find_element_by_css_selector('div.content-scrolling-behavior')
driver.execute_script("arguments[0].scrollTop = arguments[0].scrollHeight", ele)
UPD
Try this instead:
ele =driver.find_element_by_css_selector('div.container')
driver.execute_script("arguments[0].scrollTop = arguments[0].scrollHeight", ele)
Related
I am using the code below to try to scrape product data from 90 pages; however the data from the first and last pages are missing in the list object when complete. Due to the nature of the website I cannot use scrapy or beautiful soup, so I am trying to navigate page by page with Selenium web driver. I have tried adjusting the number_of_pages to the actual number pages +1, which still skipped the first & last pages. I have also tried to set the page_to_start_clicking to 0 which produces a timeout error. Unfortunately I cannot share more about the source because of the authentication. Thank you in advanced for the help!
wait = WebDriverWait(driver, 20)
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, '#ResultsPerPageBottom > nav > span.next'))).click() # next button
number_of_pages = 90 # PROBLEM 1st & last pages missed
page_to_start_clicking = 1 # error if 0
# range set from 0; skips 1st and last page
for i in range(0, 90):
time.sleep(2)
for ele in driver.find_elements(By.CSS_SELECTOR, 'div.srp-item-body'):
driver.execute_script("arguments[0].scrollIntoView(true);", ele)
print(ele.text)
wait.until(EC.element_to_be_clickable((By.LINK_TEXT, f"{page_to_start_clicking}"))).click()
page_to_start_clicking = page_to_start_clicking + 1
This was the code from the solution described in the comments.
# Scrape & pagination
wait = WebDriverWait(driver, 20)
number_of_pages = 91
listings = []
for i in range(0, 91):
time.sleep(2)
for ele in driver.find_elements(By.CSS_SELECTOR, 'div.srp-item-body'):
driver.execute_script("arguments[0].scrollIntoView(true);", ele)
listings.append(ele.text)
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, '#ResultsPerPageBottom > nav > span.next'))).click()
I have a problem clicking every button on the LinkedIn page. In some profiles which contain a lot of information about job experience, schools, license we have to expand this information by click on 'Show more button".
Sample profile 1
Sample profile 2
I try many things like searching for elements by Xpath and then looping them to click every button on the page but it didn't work - because every button class is the same as other elements that we can find using selenium. I figure it that first "show more" button is always for experiane section and that code make job to click it:
self.driver.execute_script("arguments[0].click();", WebDriverWait(
self.driver, 3).until(EC.element_to_be_clickable((By.XPATH, "//li-icon[#class='pv-profile"
"-section__toggle-detail-icon']"))))
Then we have the education section, license, and certification section - this makes me trouble. Temporary solution is to click at element that contain string:
self.driver.find_element_by_xpath("//*[contains(text(), 'Pokaż więcej')]").click()
OR
self.driver.find_element_by_xpath("//*[contains(text(), 'Pokaż 1 uczelnię więcej')]").click()
Sooner than later I know that code has a lot of limitations. Does anyone have a better idea of how to solve this problem?
Solution
containers = self.driver.find_elements_by_xpath("//li-icon[#class='pv-profile-section__toggle-detail-icon']")
for button in containers:
self.driver.execute_script('arguments[0].click()', button)
I tested page with own code and it seems you can get all buttons with
items = driver.find_elements(By.XPATH, '//div[#class="profile-detail"]//button')
for item in items:
driver.execute_script("arguments[0].click();", item)
But there can be other problem. Page uses "lazy loading" and it may need to use JavaScript code which scrolls down to load all component.
Here is my full code with some ideas in comments.
I tried also select buttons in sections but not all methods work.
But maybe it will be useful for other ideas.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import NoSuchElementException, TimeoutException
import time
USERNAME = 'XXXXX'
PASSWORD = 'YYYYY'
url = 'https://www.linkedin.com/in/jakub-bialoskorski/?miniProfileUrn=urn%3Ali%3Afs_miniProfile%3AACoAABp5UJ8BDpi5ZwNGebljqDlYx7OXIKgxH80'
driver = webdriver.Firefox()
# -------------------------------------
driver.get(url)
time.sleep(5)
#wait = WebDriverWait(driver, 10)
cookies = driver.find_element(By.XPATH, '//button[#action-type="ACCEPT"]')
cookies.click()
time.sleep(1)
link = driver.find_element(By.XPATH, '//p[#class="authwall-join-form__subtitle"]/button')
link.click()
time.sleep(1)
login_form = driver.find_element(By.XPATH, '//div[#class="authwall-sign-in-form"]')
time.sleep(1)
username = login_form.find_element(By.XPATH, '//input[#id="session_key"]')
username.send_keys(USERNAME)
password = login_form.find_element(By.XPATH, '//input[#id="session_password"]')
password.send_keys(PASSWORD)
time.sleep(1)
#button = login_form.find_element(By.XPATH, '//button[#type="submit"]')
button = login_form.find_element(By.XPATH, '//button[contains(text(), "Zaloguj się")]')
button.click()
time.sleep(5)
# -------------------------------------
url = 'https://www.linkedin.com/in/jakub-bialoskorski/?miniProfileUrn=urn%3Ali%3Afs_miniProfile%3AACoAABp5UJ8BDpi5ZwNGebljqDlYx7OXIKgxH80'
#from selenium.webdriver.common.action_chains import ActionChains
driver.get(url)
time.sleep(5)
# -----------
print('... scrolling for lazy loading ...')
last_height = 0
while True:
driver.execute_script("window.scrollTo(0, window.scrollY + window.innerHeight);")
time.sleep(2)
new_height = driver.execute_script("return window.scrollY")
if new_height == last_height:
break
last_height = new_height
# -----------
def click_items(items):
for item in items:
print('text:', item.text)
#print(item.get_attribute('innerHTML'))
#print('... scrolling ...')
#ActionChains(driver).move_to_element(item).perform()
print('... scrolling ...')
driver.execute_script("arguments[0].scrollIntoView(true);", item)
#print('... clicking ...')
#item.click()
#time.sleep(1)
print('... clicking ...')
driver.execute_script("arguments[0].click();", item)
time.sleep(1)
print('----')
print('\n>>> Pokaż <<<\n')
#items = driver.find_elements(By.XPATH, '//button[contains(text(), "Pokaż")]')
#click_items(items)
print('\n>>> Doświadczenie - Pokaż więcej <<<\n')
#section = driver.find_elements(By.XPATH, '//section[#id="experience-section"]')
#items = driver.find_elements(By.XPATH, '//button[contains(text(), "zobacz wi")]')
items = driver.find_elements(By.XPATH, '//button[contains(#class, "inline-show-more-text__button")]')
click_items(items)
print('\n>>> Umiejętności i potwierdzenia - Pokaż więcej <<<\n')
#section = driver.find_elements(By.XPATH, '//section[#id="experience-section"]')
items = driver.find_elements(By.XPATH, '//button[#data-control-name="skill_details"]')
click_items(items)
print('\n>>> Wyświetl <<<\n')
items = driver.find_elements(By.XPATH, '//button[contains(text(), "Wyświetl")]')
click_items(items)
print('\n>>> Rekomendacje <<<\n')
items = driver.find_elements(By.XPATH, '//button[#aria-controls="recommendation-list"]')
click_items(items)
print('\n>>> Osiągnięcia <<<\n')
print('--- projects ---')
items = driver.find_elements(By.XPATH, '//button[#aria-controls="projects-expandable-content"]')
click_items(items)
print('--- languages ---')
items = driver.find_elements(By.XPATH, '//button[#aria-controls="languages-expandable-content"]')
click_items(items)
# --- all buttons ---
#items = driver.find_elements(By.XPATH, '//div[#class="profile-detail"]//button')
#click_items(items)
I'd need to extract information from a website. This website has information inside the following path:
<div class="accordion-block__question">
<div class="accordion-block__text">Server</div></div>
...
<div class="block__col"><b>Country</b></div>
Running
try:
# Country
c=driver.find_element_by_xpath("//div[contains(#class,'block__col') and contains(text(),'Country')]").get_attribute('textContent')
country.append(c)
except:
country.append("Error")
I create a df with all errors. I'd interest in all the fields (but for fixing this issue, just one would be great), included the Trustscore (number), but I don't know if it'd possible to get it. I'm using selenium, web driver on Chrome.
The website is https://www.scamadviser.com/check-website.
CODE
This is the entire code:
def scam(df):
chrome_options = webdriver.ChromeOptions()
trust=[]
country = []
isp_country = []
query=df['URL'].unique().tolist()
driver=webdriver.Chrome('mypath',chrome_options=chrome_options))
for x in query:
wait = WebDriverWait(driver, 10)
response=driver.get('https://www.scamadviser.com/check-website/'+x)
try:
wait = WebDriverWait(driver, 30)
# missing trustscore
# Country
c=driver.execute_script("arguments[0].scrollTop = arguments[0].scrollHeight", driver.find_element_by_xpath("//div[contains(#class,'block__col') and contains(text(),'Country')]")).get_attribute('innerText')
country.append(c)
# ISP country
ic=driver.find_element_by_xpath("//div[contains(#class,'block__col') and contains(text(),'ISP')]").get_attribute('innerText')
isp_country.append(ic)
except:
# missing trustscore
country.append("Error")
isp_country.append("Error")
# Create dataframe
dict = {'URL': query, 'Trustscore':trust, 'Country': country, 'ISP': isp_country}
df=pd.DataFrame(dict)
driver.quit()
return df
You can try for example with df['URL'] equal to
stackoverflow.com
gitHub.com
You are looking for innerText not textContent.
Code :
try:
# Country
c = driver.find_element_by_xpath("//div[contains(#class,'block__col') and contains(text(),'Country')]").get_attribute('innerText')
print(c)
country.append(c)
except:
country.append("Error")
Updated 1 :
In case already used locator is correct.
driver.execute_script("arguments[0].scrollTop = arguments[0].scrollHeight", driver.find_element_by_xpath("//div[contains(#class,'block__col') and contains(text(),'Country')]"))
or may be try with both the options with this xpath :-
//div[contains(#class,'block__col')]/b[text()='Country']
Udpated 2 :
try:
wait = WebDriverWait(driver, 30)
# missing trustscore
# Country
time.sleep(2)
ele = driver.find_element_by_xpath("//div[contains(#class,'block__col')]/b[text()='Country']")
driver.execute_script("arguments[0].scrollIntoView(true);", ele)
country.append(ele.get_attribute('innerText'))
time.sleep(2)
# ISP country
ic = driver.find_element_by_xpath("//div[contains(#class,'block__col')]/b[text()='ISP']")
driver.execute_script("arguments[0].scrollIntoView(true);", ele)
isp_country.append(ic.get_attribute('innerText'))
Udpate 3 :
to get the Company data, Country name.
use this xpath :
//div[text()='Company data']/../following-sibling::div/descendant::b[text()='Country']/../following-sibling::div
also, make sure few things before using this xpath.
Launch browser in full screen mode.
Scroll using js, and then use sroll into view or Actions chain.
Code :-
driver.maximize_window()
time.sleep(2)
driver.execute_script("window.scrollTo(0, 1000)")
time.sleep(2)
driver.execute_script("arguments[0].scrollTop = arguments[0].scrollHeight", WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[text()='Company data']"))))
# now use the mentioned xpath.
company_data_country_name` = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[text()='Company data']/../following-sibling::div/descendant::b[text()='Country']/../following-sibling::div")))
print(company_data_country_name.text)
Im so embarrassed because i doesn't scroll on a modal in instagram. Instead of that it scrolling the page on the back :(
i already try with browser.execute_script("arguments[0].scrollTop = arguments[0].scrollHeight", modalDiv) ...i try with any thing i founded but still not work.
My code :
browser.get('https://www.instagram.com/%s' % account)
sleep(2)
clickToFollowing= browser.find_element_by_xpath("/html/body/div[1]/section/main/div/header/section/ul/li[3]/a/span")
clickToFollowing.click()
actionChain = webdriver.ActionChains(browser)
time.sleep(2)
modalDiv= browser.find_elements_by_xpath("//html/body/div[5]/div/div/div[2]/div/div/div[4]/div[2]/div[2]")
browser.execute_script("window.scrollBy(0, 660);")
You need to scroll multiple time in follower list section to grab everything you need.
Code :
options = webdriver.ChromeOptions()
options.add_argument('start-maximized')
driver = webdriver.Chrome("C:\\Users\\etc\\Desktop\\Selenium+Python\\chromedriver.exe", options=options)
wait = WebDriverWait(driver, 30)
driver.get("https://www.instagram.com")
wait.until(EC.element_to_be_clickable((By.NAME,"username"))).send_keys('your username')
wait.until(EC.element_to_be_clickable((By.NAME,"password"))).send_keys('your password')
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR,"button[type='submit']"))).click()
wait.until(EC.element_to_be_clickable((By.XPATH,"//button[text()='Not Now']"))).click()
sleep(3)
wait.until(EC.element_to_be_clickable((By.XPATH,"//button[text()='Not Now']"))).click()
wait.until(EC.element_to_be_clickable((By.XPATH,"//*[name()='svg' and #aria-label='Activity Feed']/../../following-sibling::div/span"))).click()
wait.until(EC.element_to_be_clickable((By.XPATH, "//div[text()='Profile']"))).click()
sleep(3)
wait.until(EC.element_to_be_clickable((By.XPATH, "//button[#title='Change Profile Photo']/../../../following-sibling::section/descendant::li[3]/a"))).click()
sleep(3)
fBody = driver.find_element_by_xpath("//div[#class='isgrP']")
scroll = 0
while scroll < 5: # scroll 5 times
driver.execute_script('arguments[0].scrollTop = arguments[0].scrollTop + arguments[0].offsetHeight;', fBody)
sleep(2)
scroll += 1
fList = driver.find_elements_by_xpath("//div[#class='isgrP']//li")
print("fList len is {}".format(len(fList)))
print("ended")
I'm trying to scrape company's jobs offer from linkedin. I need to scroll a section in the page (with an inner scrollbar). I have been trying this :
1.
scroll_active = WebDriverWait(driver, 40).until(EC.presence_of_element_located((By.CSS_SELECTOR, "body > div.application-outlet > div.authentication-outlet > div.job-search-ext > div > div > section.jobs-search__left-rail > div > div > ul")))
scroll_active.location_once_scrolled_into_view
while driver.find_element_by_tag_name('div'):
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
Divs=driver.find_element_by_tag_name('div').text
if 'End of Results' in Divs:
print 'end'
break
else:
continue
Need to extract 'href'
If any one facing that, I wish this could help, you just have to choose well the element that you want to scroll
my_xpath = WebDriverWait(driver, 40).until(EC.presence_of_element_located((By.XPATH, "/html/body/div[8]/div[3]/div[3]/div/div/section[1]/div/div")))
driver.execute_script('arguments[0].scrollTop = arguments[0].scrollHeight', my_xpath)
Why do need to scroll here?
seems like you can get all of the element by command:
elements = driver.find_elements(By.XPATH, "//a[#class='result-card__full-card-link']")
and looks like:
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
driver = webdriver.Chrome()
driver.get('https://www.linkedin.com/jobs/search/?f_C=1110%2C12800%2C5115950%2C3165553%2C603115%2C10916%2C8331%2C3297950%2C8238%2C5509188%2C3093%2C2625246%2C1112%2C947572%2C11018069%2C407323&geoId=92000000')
time.sleep(3)
def element_present():
try:
driver.find_element(By.XPATH, "//button[#class='infinite-scroller__show-more-button infinite-scroller__show-more-button--visible']")
except Exception:
return False
return True
while not element_present():
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
elements = driver.find_elements(By.XPATH, "//a[#class='result-card__full-card-link']")
hrefs = [el.get_attribute('href') for el in elements]
print(hrefs)
print(len(hrefs))
driver.quit()
might I missed smth, but seems like it works well as well