i'm trying to pick up links of youtube channels which are located as below:
<a id="author-text" class="yt-simple-endpoint style-scope ytd-comment-
renderer" href="/channel/UCUSy-h1fPG1L6X7KOe70asA"> <span class="style-
scope ytd-comment-renderer">Jörgen Nilsson</span></a>
So in the example above I would want to pick up "/channel/UCUSy-h1fPG1L6X7KOe70asA". So far i have tried many options but none work:
driver = webdriver.Chrome('C:/Users/me/Chrome Web Driver/chromedriver.exe')
api_url="https://www.youtube.com/watch?v=TQG7m1BFeRc"
driver.get(api_url)
time.sleep(2)
div = driver.find_element_by_class_name("yt-simple-endpoint style-scope ytd-comment-renderer")
but I get the following error:
InvalidSelectorException: Message: invalid selector: Compound class names not permitted
I also tried other approaches:
div = driver.find_elements_by_xpath("yt-simple-endpoint style-scope ytd-comment-renderer")
div = driver.find_element_by_class_name('yt-simple-endpoint style-scope ytd-comment-renderer')
div=driver.find_element_by_css_selector('.yt-simple-endpoint style-scope ytd-comment-renderer').get_attribute('href')
but no luck.. if someone could please help it would be much appreciated. Thank you
Your selectors are invalid:
driver.find_element_by_class_name("yt-simple-endpoint style-scope ytd-comment-renderer")
you cannot pass more than one class name to find_element_by_class_name method. You can try driver.find_element_by_class_name("ytd-comment-renderer")
driver.find_elements_by_xpath("yt-simple-endpoint style-scope ytd-comment-renderer")
it's not a correct XPath syntax. You probably mean driver.find_elements_by_xpath("//*[#class='yt-simple-endpoint style-scope ytd-comment-renderer']")
driver.find_element_by_css_selector('.yt-simple-endpoint style-scope ytd-comment-renderer')
each class name should start with the dot: driver.find_element_by_css_selector('.yt-simple-endpoint.style-scope.ytd-comment-renderer')
But the best way IMHO to identify by ID:
driver.find_element_by_id("author-text")
You can use BeautifulSoup in python to get the links in anchor tag having specific class names like soup.find_all('a', attrs={'class':'yt-simple-endpoint'}) you can read more here find_all using css
Related
Element to be located
I am trying to locate a span element inside a webpage, I have tried by XPath but its raise timeout error, I want to locate title span element inside Facebook marketplace product. url
here is my code :
def title_detector():
title = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.XPATH, 'path'))).text
list_data = title.split("ISBN", 1)
Try this xpath //span[contains(text(),'isbn')]
You can't locate pseudo elements with XPath, only with CSS selector.
I see it's FaceBook with it's ugly class names...
I'm not sure this will work for you, maybe these class names are dynamic, but it worked for me this time.
Anyway, the css_locator for that span element is .dati1w0a.qt6c0cv9.hv4rvrfc.discj3wi .d2edcug0.hpfvmrgz.qv66sw1b.c1et5uql.lr9zc1uh.a8c37x1j.keod5gw0.nxhoafnm.aigsh9s9.qg6bub1s.fe6kdd0r.mau55g9w.c8b282yb.iv3no6db.o0t2es00.f530mmz5.hnhda86s.oo9gr5id
So, since we are trying to get it's before we can do it with the following JavaScript script:
span_locator = `.dati1w0a.qt6c0cv9.hv4rvrfc.discj3wi .d2edcug0.hpfvmrgz.qv66sw1b.c1et5uql.lr9zc1uh.a8c37x1j.keod5gw0.nxhoafnm.aigsh9s9.qg6bub1s.fe6kdd0r.mau55g9w.c8b282yb.iv3no6db.o0t2es00.f530mmz5.hnhda86s.oo9gr5id`
script = "return window.getComputedStyle(document.querySelector('{}'),':before').getPropertyValue('content')".format(span_locator)
print(driver.execute_script(script).strip())
In case the css selector above not working since the class names are dynamic there - try to locate that span with some stable css_locator, it is possible. Just have to try it several times until you see which class names are stable and which are not.
UPD:
You don't need to locate the pseudo elements there, will be enough to catch that span itself. So, it will be enough something like this:
span_locator = `.dati1w0a.qt6c0cv9.hv4rvrfc.discj3wi .d2edcug0.hpfvmrgz.qv66sw1b.c1et5uql.lr9zc1uh.a8c37x1j.keod5gw0.nxhoafnm.aigsh9s9.qg6bub1s.fe6kdd0r.mau55g9w.c8b282yb.iv3no6db.o0t2es00.f530mmz5.hnhda86s.oo9gr5id`
def title_detector():
title = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CSS_SELECTOR, 'span_locator'))).text
title = title.strip()
list_data = title.split("ISBN", 1)
I want to check out if any html tags have <style> attribute like <a style = ..> or <h1 style = ...> or <div style = ..> etc.
I used below code but it could not be run:
driver = webdriver.Chrome(web_driver_address, options=op)
driver.get(url)
elems = driver.find_elements_by_xpath("[#style]")
How can i fix this?
Your XPath is missing element tag name.
In your case it can be any tag name, but it still should be there as a part of syntax, so you should use * like any there.
Also, you are missing the // that means the element can be anywhere on the page.
So the correct XPath expression will be something like this:
elems = driver.find_elements_by_xpath("//*[#style]")
Don't forget to add some wait / delay to let page load all the elements before you get them
xpath needs tag to be valid. If you don't want a specific tag use *
find_elements_by_xpath("//*[#style]")
Or with css_selector
find_elements_by_css_selector("[style]")
I'm trying to fetch the budget using scrapy implementing css selector within it. I can get it when I use xpath but in case of css selector I'm lost. I can even get the content when I go for BeautifulSoup and use next_sibling.
I've tried with:
import requests
from scrapy import Selector
url = "https://www.imdb.com/title/tt0111161/"
res = requests.get(url)
sel = Selector(res)
# budget = sel.xpath("//h4[contains(.,'Budget:')]/following::text()").get()
# print(budget)
budget = sel.css("h4:contains('Budget:')::text").get()
print(budget)
Output I'm getting using css selector:
Budget:
Expected output:
$25,000,000
Relevant portion of html:
<div class="txt-block">
<h4 class="inline">Budget:</h4>$25,000,000
<span class="attribute">(estimated)</span>
</div>
website address
That portion in that site is visible as:
How can I get the budgetary information using css selector when it is used within scrapy?
This selector .css("h4:contains('Budget:')::text") is selecting the h4 tag, and the text you want is in it's parent, the div element.
You could use .css('div.txt-block::text') but this would return several elements, as the page have several elements like that. CSS selectors don't have a parent pseudo-element, I guess you could use .css('div.txt-block:nth-child(12)::text') but if you are going to scrape more pages, this will probably fail in other pages.
The best option would be to use XPath:
response.xpath('//h4[text() = "Budget:"]/parent::div/text()').getall()
I'm trying to automatically navigate web pages using python, selenium, xpath.
I want to click a next page button in a page whose code is like this:
<a _ngcontent-pnd-c43="" automation-id="discover-market-next-button"
class="menu-item-button ng-star-inserted">
<span _ngcontent-pnd-c43="" class="nav-button-right sprite"></span>
</a>
I tried with the following code:
try:
element='//span[class="nav-button-right sprite"]'
button_next = driver.find_element_by_xpath(element)
webdriver.ActionChains(driver).move_to_element(button_next).click(button_next).perform()
time.sleep(15)
content = driver.page_source.encode('utf-8')
except NoSuchElementException:
print ("NoSuchElementException")
but I got "NoSuchElementException".
Could anyone help me?
The problem is that nav-button-right sprite is not a class, not even an attribute name. So try changing you xpath expression from:
element='//span[class="nav-button-right sprite"]'
to:
element='//span[#_ngcontent-pnd-c43]'
and see if it works.
You less # in:
element='//span[#class="nav-button-right sprite"]'
I think the problem is in your XPath, you missed # symbol after class, try this hope it helps:
element='//span[#class='nav-button-right sprite']'
I want to get some links in the url but they I get all links instead. How can I pull the links by specifying the selector?
For ex:
I'm using:
ids = browser.find_elements_by_xpath('//a[#href]')
for ii in ids:
print(ii.get_attribute('href'))
Result: All Links
But I want just some selector
<a class="classifiedTitle" title="MONSTER TULPAR T5V13.1+ 15,6 EKRAN+İ7+6GB GTX1060+16RAM+256GB SS" href="/ilan/ikinci-el-ve-sifir-alisveris-bilgisayar-dizustu-notebook-monster-tulpar-t5v13.1-plus-15%2C6-ekran-plusi7-plus6gb-gtx1060-plus16ram-plus256gb-ss-793070526/detay">
MONSTER TULPAR T5V13.1+ 15,6 EKRAN+İ7+6GB GTX1060+16RAM+256GB SS</a>
So how can I add some selectors?
Thanks & Regards
Try the following css selector to get the specific link.
print(browser.find_element_by_css_selector("a.classifiedTitle[title='MONSTER TULPAR T5V13.1+ 15,6 EKRAN+İ7+6GB GTX1060+16RAM+256GB SS'][href*='/ilan/ikinci-el-ve-sifir-alisveris-bilgisayar-dizustu-notebook-monster-tulpar-']").get_attribute("href"))
If you want just the item in your example:
href=browser.find_element_by_xpath("//a[#title='MONSTER TULPAR T5V13.1+ 15,6 EKRAN+İ7+6GB GTX1060+16RAM+256GB SS" href="/ilan/ikinci-el-ve-sifir-alisveris-bilgisayar-dizustu-notebook-monster-tulpar-t5v13.1-plus-15%2C6-ekran-plusi7-plus6gb-gtx1060-plus16ram-plus256gb-ss-793070526/detay']").get_attribute('href')
There are obviously more than one way of identifying your element, this is simply an example using xpath.