Python Selenium - Source from href link

Python Selenium - Source from href link - python

I would like to be able to get the source page of the link I got from a a href without making Selenium changing page.
I am getting the a element using
driver.find_element(By.XPATH, "//a[contains(#class, 'css-1xyedec e1pf1lj70')]")
Then I can get the link in the href using
elem.get_attribute('href')
But I cannot find a way to get the source page of the link using selenium without changing the page of the browser.
EDIT: Here is the website on which I am trying to do it. The <a> is located for each sale in each div that includes the photo and the part with the title, price...

Check this answer https://sqa.stackexchange.com/questions/17022/how-to-fill-captcha-using-test-automation .
You cannot automate captcha. You should ask dev team for a workaround.
I would ask to disable CAPTCHA in test environment. It is no sense to have it there.

Related

How to extract embedded link from a webpage, having no iframe and not showing anthing on the network tab...?

This Image shows my problem
In the above image, the link inside the tag is the clickable link; it triggers a prompt to download the pdf file whose actual source link is https://lms.nust.edu.pk/portal/pluginfile.php/1504453/mod_resource/content/0/APG-Mutual-Evaluation-Report-Pakistan-October%202019.pdf
I am using Selenium to find the links specified by XPath like this
bigger_tag = driver.find_elements(By.XPATH, "//div[#class='activityinstance']//a[#class='aalink'][contains(#href, 'https://lms.nust.edu.pk/portal/mod/resource/view.php?') or contains(#href, 'https://lms.nust.edu.pk/portal/mod/url/view.php')]")
How do I extract such links from the webpage?
Since the site I am trying to scrape is a protected site and requires login credentials hence sharing the code would be fruitless here. I just want to know what's the standard procedure in a case where you can't find embedded links in the developer's tools. No Iframe, No Server request visible in the Network tab. Nothing.

SELENIUM (Python) : How to retrieve the URL to which an element redirects me to (opens a new tab) after clicking? Element has <a> tag but no href

I am trying to scrape a website with product listings that if clicked on redirect the user to a new tab with further information/contact the seller details. I am trying to retrieve said URL without actually having to click on each listing in the catalog and wait for the page to load as this would take a lot of time.
I have searched in web inspector for the "href" but the only link available is to the image source of each listing. However, I noticed that after clicking each element, a GET request method gets sent and this is the URL (https://api.wallapop.com/api/v3/items/v6g2v4y045ze?language=es) it contains pretty much all the information I need, I'm not sure if it's of any use, but its the furthest I've gotten.
UPDATE: I tried the code I was suggested (with modifications to specifically find the 'href' attributes in the clickable elements), but I get None returning. I have been looking into finding an 'onclick' element or something similar that might have what I'm looking for but so far it looks like the solution will end up being clicking each element and extracting all the information from there.
elements123 = driver.find_elements(By.XPATH, '//a[contains(#class,"ItemCardList__item")]')
for e in elements123:
print(e.get_attribute('href'))
I appreciate any insights, thank you in advance.

You need something like this:
from selenium import webdriver
from selenium.webdriver.common.by import By
driver = webdriver.Chrome()
driver.get("https://google.com")
# Get all the elements available with tag name 'a'
elements = driver.find_elements(By.TAG_NAME, 'a')
for e in elements:
print(e.get_attribute('href'))

How to scrape specific information from javascript webpage by Selenium Python?

I can't scrape the 'Resolution' field from the javascript webpage, as I believe.
Webpage address:
https://support.na.sage.com/selfservice/viewdocument.do?noCount=true&externalId=60390&sliceId=1&noCount=true&isLoadPublishedVer=&docType=kc&docTypeID=DT_Article&stateId=4183&cmd=displayKC&dialogID=197243&ViewedDocsListHelper=com.kanisa.apps.common.BaseViewedDocsListHelperImpl&openedFromSearchResults=true
I need to extract Description, Cause, and Resolution.
Tried various ways to get element, including:
find_element_by_xpath
find_element_by_id
find_element_by_class_name.
Nothing gave the desired result.
Could you direct me in which way should I work?

https://support.na.sage.com/selfservice/viewContent.do?externalId=60390&sliceId=1
This is the correct url that you can crawl html, use Network tab of your browser devtool to find that.
Example with Chrome

scrape Glassdoor for multiple pages using python lxml

I'm using the following script to scrape job listings via Glassdoor. The script below only scrapes the first page. I was wondering, how might I extend it so that it scrapes from page 1 up to the last page?
https://www.scrapehero.com/how-to-scrape-job-listings-from-glassdoor-using-python-and-lxml/
I'd greatly appreciate any help

I'll provide a more general answer. When scraping, to get the next page simply get the link on the page to the next page.
In the case of Glassdoor, your page links all have the page class and the next page is accessed by clicking an li button with class next. Your XPath then becomes:
//li[#class="next"]
You can then access it with:
element = document.xpath("//li[#class='next']")
We are specifically looking for the link so we can add a to our xpath:
//li[#class="next"]//a
And further specify that we just need the href attribute:
//li[#class="next"]//a/#href
And now you can access the link with
link = document.xpath('//li[#class="next"]//a/#href')
Tested and working on Glassdoor as of 2/9/18.

Get full link from page with Scrapy

I want to get torrents links from page. With chrome source browser I see the link is:
href="browse.php?search=Brooklyn+Nine-Nine&page=1"
But then i scrap this link with Scrapy i only get:
href="browse.php?page=1"
this "search=Brooklyn+Nine-Nine&" part is not in the link.
Into page's torrents search form I enter "Brooklyn Nine-Nine", and it will show all search results.
So my question will be is it chromes automatic links formatting feature? and how I could get link with Scrapy as Chromes shows.
I think i could enter missing part by my self. Such like replacing spaces with plus sign in text that is used for search.
Or maybe were there some more elegant solution...

It's all okey... I did a mistake in my script. My search text was empty so the links also was without any additional text.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Selenium - Source from href link - python

Check this answer https://sqa.stackexchange.com/questions/17022/how-to-fill-captcha-using-test-automation . You cannot automate captcha. You should ask dev team for a workaround. I would ask to disable CAPTCHA in test environment. It is no sense to have it there.

Related

How to extract embedded link from a webpage, having no iframe and not showing anthing on the network tab...?

SELENIUM (Python) : How to retrieve the URL to which an element redirects me to (opens a new tab) after clicking? Element has <a> tag but no href

How to scrape specific information from javascript webpage by Selenium Python?

scrape Glassdoor for multiple pages using python lxml

Get full link from page with Scrapy

Categories

Resources