Python Selenium - Get Google search HREF

Python Selenium - Get Google search HREF - python

I have two examples of href values from my google search site:linkedin.com/in/ AND "Software Developer" AND "London":
<br><h3 class="LC20lb DKV0Md"><span>Roxana Andreea Popescu - Software Developer - Gumtree ...</span></h3><div class="TbwUpd NJjxre"><cite class="iUh30 Zu0yb qLRx3b tjvcx">uk.linkedin.com<span class="dyjrff qzEoUe"><span> › roxana-andreea-popescu</span></span></cite></div>
<br><h3 class="LC20lb DKV0Md"><span>Tunji Jabitta - London, Greater London, United Kingdom ...</span></h3><div class="TbwUpd NJjxre"><cite class="iUh30 Zu0yb qLRx3b tjvcx">uk.linkedin.com<span class="dyjrff qzEoUe"><span> › tunjijabitta</span></span></cite></div>
I am creating a LinkedIn scraper and I am having a problem when it comes to getting the href value (Which all differ) for each of the results so I can loop through them.
I tried
linkedin_urls = driver.find_elements_by_xpath('//div[#class="yuRUbf"]//a')
links = [linkedin_url.get_attribute('href') for linkedin_url in linkedin_urls]
for linkedin_url in linkedin_urls:
driver.get(links)
sleep(5)
sel = Selector(text=driver.page_source)
But I get the errror A invalid argument: 'url' must be a string'
Another alternative I have tried was
linkedin_urls = driver.find_elements_by_xpath('//div[#class="yuRUbf"]//a[#href]')
for linkedin_url in linkedin_urls:
url = linkedin_url.get_attribute("href")
driver.get(url)
sleep(5)
sel = Selector(text=driver.page_source)
I managed to get the first link opened but it through an error url = linkedin_url.get_attribute("href") when trying to get the other link
Any help would be greatly appreciated, I have been stuck on this for quite a while.

Your driver is opening the link to the new page but it appears, is discarding the previous page. You may want to consider opening in a new tab or window, then switching to that tab/window, once complete, go back to previous page and continue.
Suggested execution:
1. Create a function to open link (or element) in a new tab – and to switch to that tab:
from selenium.webdriver.common.action_chains import ActionChains
# Define a function which opens your element in a new tab:
def open_in_new_tab(driver, element):
"""This is better than opening in a new link since it mimics 'human' behavior"""
# What is the handle you're starting with
base_handle = driver.current_window_handle
ActionChains(driver) \
.move_to_element(element) \
.key_down(Keys.COMMAND) \
.click() \
.key_up(Keys.COMMAND) \
.perform()
# There should be 2 tabs right now...
if len(driver.window_handles)!=2:
raise ValueError(f'Length of {driver.window_handles} != 2... {len(driver.window_handles)=};')
# get the new handle
for x in driver.window_handles:
if x!= base_handle:
new_handle = x
# Now switch to the new window
driver.switch_to.window(new_handle)
2. Execute + Switch back to the main tab:
import time
# This returns a list of elements
linkedin_urls = driver.find_elements_by_xpath('//div[#class="yuRUbf"]//a[#href]')
# A bit redundant, but it's web scraping, so redundancy won't hurt you.
BASE_HANDLE = driver.current_window_handle # All caps so you can follow it more easily...
for element in linkedin_urls:
# switch to the new tab:
open_in_new_tab(driver, element)
# give the page a moment to load:
time.sleep(0.5)
# Do something on this page
print(driver.current_url
# Once you're done, get back to the original tab
# Go through all tabs (there should only be 2) and close each one unless
# it's the "base_handle"
for x in driver.window_handles:
if x!= base_handle:
driver.switch_to.window(x)
driver.close()
# Now switch to the new window
assert BASE_HANDLE in driver.window_handles # a quick sanity check
driver.switch_to.window(BASE_HANDLE) # this takes you back
# Finally, once you for-loop is complete, you can choose to continue with the driver or close + quit (like a human would)
driver.close()
driver.quit()

Related

Python/Selenium - How to webscrape this dropdown

My code runs fine and prints the title for all rows but the rows with dropdowns.
For example, row 4 has a dropdown if clicked. I implemented a try which would in theory initiate the dropdown, to then pull the titles.
But my click/scrape for the rows with these drop downs are not printing.
Expected output- Print all titles including the ones in dropdown.
from selenium import webdriver
from bs4 import BeautifulSoup
import time
driver = webdriver.Chrome()
driver.get('https://cslide.ctimeetingtech.com/esmo2021/attendee/confcal/session/list')
time.sleep(4)
page_source = driver.page_source
soup = BeautifulSoup(page_source,'html.parser')
productlist=soup.find_all('div',class_='card item-container session')
for property in productlist:
sessiontitle=property.find('h4',class_='session-title card-title').text
print(sessiontitle)
try:
ifDropdown=driver.find_elements_by_class_name('item-expand-action expand')
ifDropdown.click()
time.sleep(4)
newTitle=driver.find_element_by_class_name('card-title').text
print(newTitle)
except:
newTitle='none'

There were a couple of issues. First, when you locate from the driver by class and there is more than one, you need to separate them by dots, not spaces, so that the driver knows it's dealing with another class.
Second, find_elements returns a list, and the list has no .click(), so you get an error, which your except catches but assumes means there was no link to click.
I rewrote it (without soup for now) so that it instead checks (With the dot replacing space) for a link to open within the session and then loops over the new ones that appeared.
Here is what I have and tested. Note at the end this only gets the sessions and subsessions in the view. You will need to add logic to scroll and get the rest.
# stuff to initialize driver is above here, I used firefox
# Open the website page
URL = "https://cslide.ctimeetingtech.com/esmo2021/attendee/confcal/session/list"
driver.get(URL)
time.sleep(4)#time for page to populate
product_list=driver.find_elements_by_css_selector('div.card.item-container.session')
#above line gets all top level sessions
for product in product_list:
session_title=product.find_element_by_css_selector('h4.card-title').text
print(session_title)
dropdowns=product.find_elements_by_class_name('item-expand-action.expand')
#above line finds dropdown within this session, if any
if len(dropdowns)==0:#nothing more for this session
continue#move to next session
#still here, click on the dropdown, using execute because link can overlap chevron
driver.execute_script("arguments[0].scrollIntoView(true); arguments[0].click();",
dropdowns[0])
time.sleep(4)#wait for subsessions to appear
session_titles=product.find_elements_by_css_selector('h4.card-title')
session_index = 0#suppress reprinting title of master session
for session_title in session_titles:
if session_index > 0:
print(" " + session_title.text)#indent for clarity
session_index = session_index + 1
#still to do, deal with other sessions that only get paged into view when you scroll
#that is a different question

How to get the URL "about:blank" from empty tab using selenium?

I need to check that an opened tab is empty and switch to another one.
I tried the get_current_url() method, but it does not work.
def check_is_tab_empty(self, link):
self.click(link)
self.focus_active_tab()
tab = self.get_current_url()
This line tab = self.get_current_url() - does not work if a tab is empty, like about:blank.

You have switch to the new tab to get the url of the new tab.Lets take an example here.You have open a blank page.
driver = webdriver.Chrome()
driver.get('https://www.yahoo.com')
windows_before = driver.current_window_handle
driver.execute_script('''window.open('{}');'''.format("about:blank"))
windows_after = driver.window_handles
new_window = [x for x in windows_after if x != windows_before][0]
driver.switch_to.window(new_window)
print(driver.current_url)

I decided to handle this case by the try ... except statement:
try:
tab = self.get_current_url()
except TimeoutException:
...
self.click(link)
self.focus_active_tab()
tab = self.get_current_url()
Now it works for me as for the second time a link (not empty one) is opened.
But if someone knows a better solution, please, share.

Selenium in Python - open every link within a drop down menu

I'm new to Python, but I've been searching for the past hour about how to do this and this code almost works. I need to open up every category on a collapsing (dropdown) menu, and then Ctrl+t every link within that now .active class. The browser opens and all the categories open as well, but I'm not getting any of the .active links being opened in new tabs. I would appreciate any help.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Firefox()
driver.get("pioneerdoctor.com/productpage.cfm")
cat = driver.find_elements_by_css_selector("a[href*='Product_home']")
for i in cat:
i.click()
child = driver.find_elements_by_css_selector("li.active > a[href*='ProductPage']")
for m in child:
m.send_keys(Keys.CONTROL + 't')
EDIT:
Here's the current workaround I got going by writing to a text file and using webbrowser. The only issue I'm seeing is that it's writing duplicates of the results multiple times. I'll be looking through the comments later to see if I can get it working a better way (which I'm sure there is).
from selenium import webdriver
import webbrowser
print("Opening Google Chrome..")
driver = webdriver.Chrome()
driver.get("http://pioneerdoctor.com/productpage.cfm")
driver.implicitly_wait(.5)
driver.maximize_window()
cat = driver.find_elements_by_css_selector("a[href*='Product_home']")
print("Writing URLS to file..")
for i in cat:
i.click()
child = driver.find_elements_by_css_selector("a[href*='ProductPage']")
for i in child:
child = i.get_attribute("href")
file = open("Output.txt", "a")
file.write(str(child) + '\n')
file.close()
driver.quit
file = open("Output.txt", "r")
Loop = input("Loop Number, Enter 0 to quit: ")
Loop = int(Loop)
x = 0
if Loop == 0:
print("Quitting..")
else:
for z in file:
if x == Loop:
break
print("Done.\n")
else:
webbrowser.open_new_tab(z)
x += 1

None of the links in those categories are not found because the css selector for the links is incorrect. Remove the > in li.active > a[href*='ProductPage']. Why ? p > q gives you the immediate children of p. Space or "p q" gives you all the "q" inside p. The links you are interested in are NOT the immediate children of li. They are inside a UL which is inside the li.
The other problem is the way you open links in new tabs. Use this code instead:
combo = Keys.chord(Keys.CONTROL, Keys.RETURN)
m.sendKeys(combo)
Thats how I do it in Java. I think that python should have Keys.chord. If I were you, then I would open the links in another browser instance. I have seen that switching between tabs and windows is not supported well by selenium itself. Bad things can happen.
Before you try any tabbing, make a simple example to open a new tab and switch back to the previous tab. Do the back and forth 3-4 times. Does it work smoothly ? Good. Then, do that with 3-5 tabs. Tell me how was your experience.

Selenium Webdriver failed when use window_handles

I am trying to handle Two Tab in Python Selenium webdriver with Chrome as browser.
I am getting result for find element by link text on first tab as well as second tab if I keep the Chrome Browser as selected window.[i.e Front Screen Process ]
When I change the control to new tab using
driver.switch_to_window(driver.window_handles[1])
and minimise the google chrome[i.e if I select any process other than Google Chrome].i get the error in finding the link text saying Element Not Found Exception for Second Tab only not on first Tab.
I am getting result on First Tab.
def DriverCreation():
try:
Driver = WebBase.initWebScraping(URL) # Methods visible Driver.driver and Driver.loggerDriverWait = Driver.EC
print "Driver Creation Successful"
return Driver
except:
print "Driver Initalisation Failed"
sys.exit(1)
if __name__ == '__main__':
URL = 'https://www.example.com/'
Driver = DriverCreation() # will Load first Tab with www.Example.com
aboutlink = Driver.driver.find_element_by_link_text('about')
aboutlink.send_keys(Keys.CONTROL + Keys.RETURN)
Driver.driver.switch_to_window(Driver.driver.window_handles[1])
contactLink = Driver.driver.find_element_by_link_text('contact')
print contactLink.text() #** getting error if i change the focus from Google Chrome and works fine if i keep the window focus on Google Chrome**

you can manage tab using following code.
driver.execute_script("window.open('"+url+"', '_blank');")
driver.switch_to_window(driver.window_handles[1])

web element not detecting in selenium in a FOR LOOP

I'm trying to fetch some information from specific web elements. The problem is that when i try to fetch the information without for loop the program works like a charm. But the same when i put it in a for loop and try it does not detect the web elements in the loop. Here's the code i have been trying:
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
import time
from lxml import html
import requests
import xlwt
browser = webdriver.Firefox() # Get local session of firefox
# 0 wait until the pages are loaded
browser.implicitly_wait(3) # 3 secs should be enough. if not, increase it
browser.get("http://ae.bizdirlib.com/taxonomy/term/1493") # Load page
links = browser.find_elements_by_css_selector("h2 > a")
def test():#test function
elems = browser.find_elements_by_css_selector("div.content.clearfix > div > fieldset> div > ul > li > span")
print elems
for elem in elems:
print elem.text
elem1 = browser.find_elements_by_css_selector("div.content.clearfix>div>fieldset>div>ul>li>a")
for elems21 in elem1:
print elems21.text
return 0
for link in links:
link.send_keys(Keys.CONTROL + Keys.RETURN)
link.send_keys(Keys.CONTROL + Keys.PAGE_UP)
time.sleep(5)
test() # Want to call test function
link.send_keys(Keys.CONTROL + 'w')
The output i get when i print the object is a empty array as the output []. Can somebody help me enhance it. Newbie to selenium.
In the previous question i had asked about printing. But the problem lies is that it self is that the element is not detecting by itself. This way question is totally different.

I couldnt open the page but as I understand you want to open links sequencially and do something. With link.send_keys(Keys.CONTROL + 'w') you are closing the newly opened tab so your links open in a new tab. In this condition must switch to new window so that you can reach the element in new window. You can query windows by driver.window_handles and switch to last window by driver.switch_to_window(driver.window_handles[-1]) and after you closed the window you must switch back to the first window by driver.switch_to_window(driver.window_handles[0])
for link in links:
link.send_keys(Keys.CONTROL + Keys.RETURN)
# switch to new window
driver.switch_to_window(driver.window_handles[-1])
link.send_keys(Keys.CONTROL + Keys.PAGE_UP) # dont know why
time.sleep(5)
test() # Want to call test function
link.send_keys(Keys.CONTROL + 'w')
#switch back to the first window
driver.switch_to_window(driver.window_handles[0])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Selenium - Get Google search HREF - python

Related

Python/Selenium - How to webscrape this dropdown

How to get the URL "about:blank" from empty tab using selenium?

Selenium in Python - open every link within a drop down menu

Selenium Webdriver failed when use window_handles

web element not detecting in selenium in a FOR LOOP

Categories

Resources