Let's take the example of spotify because I'm listening to music on it right now.
I would like to get the text contained in the href tag in the following code.
<a data-testid="nowplaying-track-link" href="/album/3xIwVbGJuAcovYIhzbLO3J">Toosie Slide</a>
What I want is to get "/album/3xIwVbGJuAcovYIhzbLO3J" or if that's not possible, get "Toosie Slide" in order to store it in a variable to compare it with a constant.
The difficulty with Spotify (and many other sites) is that this href tag is present several times on the web page. So I'd like to get only the link that's contained in "nowplaying-track-link" which is a data-testid.
There, I hope I was clear.
PS: I already know the commands like: driver.find_element_by_xpath, etc... but I can't use them in this case...
I'm not sure what you mean about the commands of the type and not being able to use them, but this is how you would get the info you're seeking:
element = driver.find_element_by_css_selector('[data-testid="nowplaying-track-link"]')
href = element.get_attribute('href')
element_text = element.text
if you want to put together the link, you can do it this way:
link = driver.current_url + href
Related
I am trying to print by ID using Selenium. As far as I can tell, "a" is the tag and "title" is the attribute. See HTML below.
When I run the following code:
print(driver.find_elements(By.TAG_NAME, "a")[0].get_attribute('title'))
I get the output:
Zero Tolerance
So I'm getting the first attribute correctly. When I increment the code above:
print(driver.find_elements(By.TAG_NAME, "a")[1].get_attribute('title'))
My expected output is:
Aaliyah Love
However, I'm just getting blank. No errors. What am I doing incorrectly? Pls don't suggest using xpath or css, I'm trying to learn Selenium tags.
HTML:
<a class=" Link ScenePlayer-ChannelName-Link styles_1lHAYbZZr4 Link ScenePlayer-ChannelName-Link styles_1lHAYbZZr4" href="/en/channel/ztfilms" title="Zero Tolerance" rel="">Zero Tolerance</a>
...
<a class=" Link ActorThumb-ActorImage-Link styles_3dXcTxVCON Link ActorThumb-ActorImage-Link styles_3dXcTxVCON" href="/[RETRACTED]/Aaliyah-Love/63565" title="Aaliyah Love"
Selenium locators are a toolbox and you're saying you only want to use a screwdriver (By.TAG_NAME) for all jobs. We aren't saying that you shouldn't use By.TAG_NAME, we're saying that you should use the right tool for the right job and sometimes (most times) By.TAG_NAME is not the right tool for the job. CSS selectors are WAY more powerful locators because they can search for not only tags but also classes, properties, etc.
It's hard to say for sure what's going on without access to the site/page. It could be that the entire page isn't loaded and you need to add a wait for the page to finish loading (maybe count links expected on the page?). It could be that your locator isn't specific enough and is catching other A tags that don't have a title attribute.
I would start by doing some debugging.
links = driver.find_elements(By.TAG_NAME, "a")
for link in links:
print(link.get_attribute('title'))
What does this print?
If it prints some blank lines sprinkled throughout the actual titles, your locator is probably not specific enough. Try a CSS selector
links = driver.find_elements(By.CSS_SELECTOR, "a[title]")
for link in links:
print(link.get_attribute('title'))
If instead it returns some titles and then nothing but blank lines, the page is probably not fully loaded. Try something like
count = 20 # the number of expected links on the page
link_locator = (By.TAG_NAME, "a")
WebDriverWait(driver, 10).until(lambda wd: len(wd.find_elements(link_locator)) == count)
links = driver.find_elements(link_locator)
for link in links:
print(link.get_attribute('title'))
I am not the first and not the last one who got into this: cannot get all hrefs from instagram. Although it is common I cannot get all hrefs from a class and all solutions I tried so far desperately failed. So, would appreciate a hand or a punch into the right direction.
I am searching for a hashtag:
hashtags = '#hashtag'
search.send_keys(hashtags)
time.sleep(2)
search.send_keys(Keys.ENTER)
time.sleep(2)
search.send_keys(Keys.ENTER)
link_list=[]
links = driver.find_elements_by_class_name('Nnq7C weEfm')
for link in links:
link_list.append(link.get_attribute('href'))
print(link_list)
There are several upper level classes that select all pics by neither gives me href.
I can get href from v1Nh3 kIKUG _bz0w - the class defing an individual pic on the search results page. Despite there are 33 v1Nh3 kIKUG _bz0w on the page I get only one href.
links=[x.get_attribute("href") for x in driver.find_elements_by_xpath("//div[#class='v1Nh3 kIKUG _bz0w']/a")]
Just use /a on the class and get the hrefs like so. I'd find a more suitable xpath since that class name looks dynamic though.
Quick info: I'm using Mac OS, Python 3.
I have like 800 links that need to be clicked on a page (and many more pages to go so need automation).
They were hidden because you only see those links when you hover over.
I fixed that by injecting CSS rule (just saying in case its the reason it's not working).
When I try to find elements by xpath it does not want to click the links afterwards and it also doesn't find all of them always just 4 (even when more are displayed in view).
HTML:
Display
When i click ok copy xpath in inspect it gives me:
//*[#id="tiles"]/li[3]/div[2]/ul/li[2]/a
But it doesn't work when I use it like this:
driver.find_elements_by_xpath('//*[#id="tiles"]/li[3]/div[2]/ul/li[2]/a')
So two questions:
How do I get them all?
How do I get it to click on each of them?
The pattern in the XPath is the same, with the /li[3] being the only number that changes, for this I created a for loop to create them all based on the count on page which I did successfully.
So if it can be done with the XPaths generated by myself that are corresponding to when I copy XPath in inspector then I only need question 2 answered.
PS.: this is HTML of parent of that first HTML:
<li onclick="openPopup(event, 'collect', {item_id: 165214})" class="collect" data-item-id="165214">Display</li>
This XPath,
//a[.="Display"]
will select all a links with anchor text equal to "Display".
As per your question, the HTML you have shared and your code attempts there is no necessity to get the <li> tags. Instead we will get the <a> tags in a list. So to answer your first question How do I get them all you can use the following line of code :
all_Display = driver.find_elements_by_xpath("//*[#id='tiles']//li/div[2]/ul/li[#class='collect']/a[#title='Display']")
Next to click on each of them you have to create a loop to iterate through all the <a> tag as follows :
all_Display = driver.find_elements_by_xpath("//*[#id='tiles']//li/div[2]/ul/li[#class='collect']/a[#title='Display']")
for each_Display in all_Display :
each_Display.click()
Using an XPath with elements by position is not ideal. Instead use a CSS selector to match the attributes for the targeted elements.
Something like:
all_Display = driver.find_elements_by_css_selector("#tiles li[onclick][data-item-id] a[title]")
You can then click them in a loop if none of them is loading a new page:
for element in all_Display:
element.click()
I am trying to make a program to collect links and some values from a website. It works mostly well but I have come across a page in which it does not work.
With firebug I can see that this is the html code of the illusive "link" (cant find it when viewing the pages source thou):
<a class="visit" href="/tet?id=12&mv=13&san=221">
221
</a>
and this is the script:
<td><a href=\"/tet?id=12&mv=13&san=221\" class=\"visit\">221<\/a><\/td><\/tr>
I'm wondering how to get either the "link" ("/tet?id=12&mv=13&san=221") from the html code and the string "221" from either the script or the html using selenium, mechanize or requests (or some other library)
I have made an unsuccessful attempt at getting it with mechanize using the br.links() function, which collected a number of links from the side, just not the one i am after
extra info: This might be important. to get to the page I have to click on a button with this code:
<a id="f33" class="button-flat small selected-no" onclick="qc.pA('visitform', 'f33', 'QClickEvent', '', 'f52'); if ($j('#f44').length == 0) { $j('f44').style.display='inline'; }; $j('#f38').hide();qc.recordControlModification('f38', 'DisplayStyle', 'hide'); document.getElementById('forumpanel').className = 'section-3'; return false;" href="#">
load2
</a>
after which a "new page" loads in a part of the window (but the url never changes)
I think you pasted the wrong script of yours ;)
I'm not sure what you need exactly - there are at least two different approaches.
Matching all hrefs using regex
Matching specific tags and using getAttribute(...)
For the first one, you have to get the whole html source of the page with something like webdriver.page_source and using something like the following regex (you will have to escape either the normal or the double quotes!):
<a.+?href=['"](.*?)['"].*?/?>
If you need the hrefs of all matching links, you could use something similar to webdriver.find_elements_by_css_selector('.visit') (take care to choose find_elements_... instead of find_element_...!) to obtain a list of webelements and iterate through them to get their attributes.
This could result in code like this:
hrefs = []
elements = webdriver.find_elements_by_css_selector('.visit')
for element in elements:
hrefs.append(element.getAttribute('href'))
Or a one liner using list comprehension:
hrefs = [element.getAttribute('href') for element \
in webdriver.find_elements_by_css_selector('.visit')]
I am on the website
http://www.baseball-reference.com/players/event_hr.cgi?id=bondsba01&t=b
and trying to scrape the data from the tables. When I pull the xpath from one entry, say the pitcher
"Terry Mulholland," I retrieve this:
pitchers = site.xpath("/html/body/div[2]/div[2]/div[6]/table/tbody/tr/td[3]/table/tbody/tr[2]/td/a)
When I try to print pitcher[0].text for pitcher in printers, I get [] rather than the text, Any idea why?
The problem is, last tbody doesn't exist in the original source. If you get that xpath via some browser, keep in mind that browsers can guess and add missing elements to make html valid.
Removing the last tbody resolves the problem.
In : import lxml.html as html
In : site = html.parse("http://www.baseball-reference.com/players/event_hr.cgi?id=bondsba01&t=b")
In : pitchers = site.xpath("/html/body/div[2]/div[2]/div[6]/table/tbody/tr/td[3]/table/tr[2]/td/a")
In : pitchers[0].text
Out: 'Terry Mulholland'
But I need to add that, the xpath expression you are using is pretty fragile. One div added in some convenient place and now you have a broken script. If possible, try to find better references like id or class that points to your expected location.