Is not a valid XPath expression - python

I am trying to dowload a image in my web page with the xpath expression
Part of code
with open('stiker.png', 'wb') as file:
file.write(driver.find_element(By.XPATH,'//div[#class "_3IfUe"]/img[crossorigin = "anonymous"]').screenshot_as_png)
Part of page source i'm trying to dowload:
<div class="_3IfUe">
<img crossorigin="anonymous" src="blob:https://web.whatsapp.com/9a74a410-721b-4e8e-80f0-42d18288f480"
alt="" draggable="true" class="gndfcl4n p357zi0d ppled2lx ac2vgrno gfz4du6o r7fjleex g0rxnol2 ln8gz9je b9fczbqn i0jNr" style="visibility: visible;">
</div>
Error
SyntaxError: Failed to execute 'evaluate' on 'Document': The string '//div[#class "_3IfUe"]/img[#crossorigin = "anonymous"]' is not a valid XPath expression.

As #John Gordon pointed out in his comment, you are missing a = between the #class and the value "_3IfUe" that you are trying to compare.
After fixing that, you need an # before the crossorigin attribute name. Otherwise, it thinks you are looking for an element with that name.
It should be:
//div[#class = "_3IfUe"]/img[#crossorigin = "anonymous"]

Related

BeautifulSoup4: Fail to find 'a' tag with specific href value by find()

I am trying to crawl the realtime Bitcoin-HKD Currency from https://www.coinbase.com/pt-PT/price/ with python3.
The only way I found to locate it specificly in the HTML is by this tage a with href="/pt-PT/price/bitcoin"
<a href="/pt-PT/price/bitcoin" title="Visite a moeda Bitcoin" data-element-handle="asset-highlight-top-daily-volume" class="Link__A-eh4rrz-0 hfBqui AssetHighlight__StyledLink-sc-1srucyv-1 cbFcph" color="slate">
<h2 class="AssetHighlight__Title-sc-1srucyv-2 jmJxYl">Volume mais alto (24 h)</h2>
<div class="Flex-l69ttv-0 gaVUrq">
<img src="https://dynamic-assets.coinbase.com/e785e0181f1a23a30d9476038d9be91e9f6c63959b538eabbc51a1abc8898940383291eede695c3b8dfaa1829a9b57f5a2d0a16b0523580346c6b8fab67af14b/asset_icons/b57ac673f06a4b0338a596817eb0a50ce16e2059f327dc117744449a47915cb2.png" alt="Visite a moeda Bitcoin" aria-label="Visite a moeda Bitcoin" loading="lazy" class="AssetHighlight__AssetImage-sc-1srucyv-5 lcjcxh"/>
<div class="Flex-l69ttv-0 kvilOX">
<div class="Flex-l69ttv-0 gTbYCC">
<h3 class="AssetHighlight__SubTitle-sc-1srucyv-3 gdcBEE">Bitcoin</h3>
<p class="AssetHighlight__Price-sc-1srucyv-4 bUAWAG">460 728,81 HK$</p>
Here 460 728,81 HK$ is the data wanted.
Thus I applied the following codes:
import bs4
import urllib.request as req
url="https://www.coinbase.com/prthe ice/bitcoin/hkd"
request=req.Request(url,headers={
"user-agent":"..."
})
with req.urlopen(request) as response:
data=response.read().decode("utf-8")
root=bs4.BeautifulSoup(data,"html.parser")
secBitcoin=root.find('a',href="/pt-PT/price/bitcoin")
realtimeCurrency=secBitcoin.find('p')
print(realtimeCurrency.string)
However, it always returns secBitcoin = None. No result matches.
The find function works just fine when I search 'div' label with class parameter.
I have also tried format like
.find('a[href="/pt-PT/price/bitcoin"]')
But nothing works.
It's possible the page is loading the currency values after the initial page load. You could try hitting ctrl+s to save the full webpage and open that file instead of using requests. If that also doesn't work, then I'm not sure where the problem is.
And if that does work, then you'll probably need to use something like selenium to get what you need
href is an attribute of an element and hence I think you cannot find it that way.
def is_a_and_href_matching(element):
is_a = element.name == a
if is_a and element.has_attr(href):
if element['href'] == "/pt-PT/price/bitcoin":
return True
return False
secBitcoins=root.find_all(is_a_and_href_matching)
for secBitcoin in secBitcoins:
p = setBitcoin.find('p')

XPATH: How to target value after certain text or tag

I am new to using **XPath** and this may be a basic question. Kindly bear with me and help me in resolving the issue.
Link is here...
I would like to target the text that comes after the "Contact:" text. The "Contact:" is wrapped in a <b> tag. I want to target the name-value after the <b> tag.
I tried this xPath experession name = response.xpath('//div[#style="line-height: 1.5;"]/b').get() but it only return the Contact: text. I am interested in the text after this "Contact: " text.
<div class="vcard">
<h2><a class="fn org" target="_blank" title="https://patientcaremedical.com" href="https://patientcaremedical.com" onclick="trackClick(32589, 0)">Patient Care Medical</a><sup> <font color="red" size="2"><b>New!</b></font></sup></h2>
<div style="line-height: 1.5;">
<a title="info [at] patientcaremedical [.] com" href="/Patient_Care_Medical/rfq/sid32589.htm"><b><font color="red">Click Here To EMAIL INQUIRY</font></b></a>
<br><b>Contact: </b>Michael Blanchette - Marketing Director
<br>
Try:
//div//div[#style="line-height: 1.5;"]/normalize-space(text()[3])
or possibly:
//div//div[#style="line-height: 1.5;"]//text()[preceding-sibling::br][1]

XPath delivering blank text

I am trying to pull the text out of a tag that follows an element I'm starting with. The HTML looks like this, with multiple entries of the same structure:
<h5>
Title
</h5>
<div class="author">
<p>"Author A, Author B"</p>
</div>
<div id="abstract-more#####" class="collapse">
<p>
<strong>Abstract:</strong>
"Text here..."
</p>
<p>...</p>
So once I've isolated a given title element/node (stored as 'paper'), I want to store the author and abstract text. It works when I use this to get the author:
author = paper.find_element_by_xpath("./following::div[contains(#class, 'author')]/p").text
But is returning a blank output for 'abstract' when I use this:
abstract = paper.find_element_by_xpath("./following::div[contains(#id, 'abstract-more')]/p").text
Why does it work fine for the author but not for the abstract? I've tried using .// instead of ./ and other slight tweaks but to no avail. I also don't know why it's not giving an error out and saying it can't find the abstract element and is instead just returning a blank...
Try this:
//div[contains(#id, 'abstract-more')]/p[1]
Please use starts-with in xpath instead of contains.
XPath: .//div[starts-with(#id, 'abstract-more')]/p"
abstract = paper.find_element_by_xpath(".//div[starts-with(#id, 'abstract-more')]/p").text
You can try this xpath :
//div[#class="author"]/following-sibling::div[contains(#id,'abstract-more')]/p[1]
in code :
author = paper.find_element_by_xpath("//div[#class="author"]/following-sibling::div[contains(#id,'abstract-more'')]/p[1]")
print(author.text)

Find command in python catches only first line

Trying to grab the magnet link from the following code
rawdata = ''' <div class="iaconbox center floatright">
<a rel="12624681,0" class="icommentjs kaButton smallButton rightButton" href="https://kat.cr/zootopia-2016-1080p-hdrip-x264-ac3-jyk-t12624681.html#comment">209 <i class="ka ka-comment"></i></a> <a class="icon16" href="https://kat.cr/zootopia-2016-1080p-hdrip-x264-ac3-jyk-t12624681.html" title="Verified Torrent"><i class="ka ka16 ka-verify ka-green"></i></a> <div data-sc-replace="" data-sc-slot="_ae58c272c09a10c792c6b17d55c20208" class="none" data-sc-params="{ 'name': 'Zootopia%202016%201080p%20HDRip%20x264%20AC3-JYK', 'extension': 'mkv', 'magnet': 'magnet:?xt=urn:btih:CE8357DED670F06329F6028D2F2CEA6F514646E0&dn=zootopia+2016+1080p+hdrip+x264+ac3+jyk&tr=udp%3A%2F%2Ftracker.publicbt.com%2Fannounce&tr=udp%3A%2F%2Fglotorrents.pw%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80%2Fannounce&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce' }"></div>
<a data-nop="" title="Torrent magnet link" href="magnet:?xt=urn:btih:CE8357DED670F06329F6028D2F2CEA6F514646E0&dn=zootopia+2016+1080p+hdrip+x264+ac3+jyk&tr=udp%3A%2F%2Ftracker.publicbt.com%2Fannounce&tr=udp%3A%2F%2Fglotorrents.pw%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80%2Fannounce&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce" class="icon16 askFeedbackjs" data-id="CE8357DED670F06329F6028D2F2CEA6F514646E0"><i class="ka ka16 ka-magnet"></i></a>
<a data-download="" title="Download torrent file" href="https://kat.cr/torrents/zootopia-2016-1080p-hdrip-x264-ac3-jyk-t12624681/" class="icon16 askFeedbackjs"><i class="ka ka16 ka-arrow-down"></i></a>
</div> '''
Using this command
rawdata[rawdata.find("<")+1:rawdata.find(">")]
Gives me
div class="iaconbox center floatright"
But when I try to find Magnet link
rawdata[rawdata.find("href="magnet:?")+1:rawdata.find(""")]
It gives me
' '
What I actually want it to give me
magnet:?xt=urn:btih:CE8357DED670F06329F6028D2F2CEA6F514646E0&dn=zootopia+2016+1080p+hdrip+x264+ac3+jyk&tr=udp%3A%2F%2Ftracker.publicbt.com%2Fannounce&tr=udp%3A%2F%2Fglotorrents.pw%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80%2Fannounce&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce
It's so easy with Shell, but it has to be done with Python itself.
try rawdata[rawdata.find('href="magnet:?')+1:rawdata.find('"')]
It's better to use regular expression.
import re
rawdata = '''your rawdata......'''
regex = re.compile('href="(.+)" class="icon16')
magnet_href = regex.search(rawdata).group(1)
First of all, as pointed out by HenryM, you need to use single quotes or escape the " to make the strings valid.
Second, find() always returns the first index of the character found. So you will find the first " and not the one ending the link. To fix this use the beg parameter to define the beginning of your search.
Additionally, you need to add the length of your query to the start index, as find gives you the starting index of the match, not the end you are looking for. The code would look something like this (completely untested):
start = rawdata.find('href="magnet:?') + 14
end = rawdata.find('"', beg=start)
link = rawdata[start:end]
The input data is an HTML fragment. You should not be using regular expressions to parse it.
Use a parser instead. Here is a working sample using BeautifulSoup HTML parser:
from bs4 import BeautifulSoup
rawdata = ''' <div class="iaconbox center floatright">
<a rel="12624681,0" class="icommentjs kaButton smallButton rightButton" href="https://kat.cr/zootopia-2016-1080p-hdrip-x264-ac3-jyk-t12624681.html#comment">209 <i class="ka ka-comment"></i></a> <a class="icon16" href="https://kat.cr/zootopia-2016-1080p-hdrip-x264-ac3-jyk-t12624681.html" title="Verified Torrent"><i class="ka ka16 ka-verify ka-green"></i></a> <div data-sc-replace="" data-sc-slot="_ae58c272c09a10c792c6b17d55c20208" class="none" data-sc-params="{ 'name': 'Zootopia%202016%201080p%20HDRip%20x264%20AC3-JYK', 'extension': 'mkv', 'magnet': 'magnet:?xt=urn:btih:CE8357DED670F06329F6028D2F2CEA6F514646E0&dn=zootopia+2016+1080p+hdrip+x264+ac3+jyk&tr=udp%3A%2F%2Ftracker.publicbt.com%2Fannounce&tr=udp%3A%2F%2Fglotorrents.pw%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80%2Fannounce&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce' }"></div>
<a data-nop="" title="Torrent magnet link" href="magnet:?xt=urn:btih:CE8357DED670F06329F6028D2F2CEA6F514646E0&dn=zootopia+2016+1080p+hdrip+x264+ac3+jyk&tr=udp%3A%2F%2Ftracker.publicbt.com%2Fannounce&tr=udp%3A%2F%2Fglotorrents.pw%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80%2Fannounce&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce" class="icon16 askFeedbackjs" data-id="CE8357DED670F06329F6028D2F2CEA6F514646E0"><i class="ka ka16 ka-magnet"></i></a>
<a data-download="" title="Download torrent file" href="https://kat.cr/torrents/zootopia-2016-1080p-hdrip-x264-ac3-jyk-t12624681/" class="icon16 askFeedbackjs"><i class="ka ka16 ka-arrow-down"></i></a>
</div> '''
soup = BeautifulSoup(rawdata, "html.parser")
print(soup.find("a", title="Torrent magnet link")["href"])
Prints:
magnet:?xt=urn:btih:CE8357DED670F06329F6028D2F2CEA6F514646E0&dn=zootopia+2016+1080p+hdrip+x264+ac3+jyk&tr=udp%3A%2F%2Ftracker.publicbt.com%2Fannounce&tr=udp%3A%2F%2Fglotorrents.pw%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80%2Fannounce&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce

Checking for a field error using Selenium Webdriver

I've been trying to implement tests to check for field validation in forms. A check for specific field error messages was straightforward, but I've also tried a generic check to identify the parent element of a field for an error class. This however isn't working.
A field with an error has the following HTML;
<div class="field clearfix error ">
<div class="error">
<p>Please enter a value</p>
</div>
<label for="id_fromDate">
<input id="id_fromDate" type="text" value="" name="fromDate">
</div>
So to check for an error I've got the following function;
def assertValidationFail(self, field_id):
# Checks for a div.error sibling element
el = self.find(field_id)
try:
error_el = el.find_element_by_xpath('../div[#class="error"]')
except NoSuchElementException:
error_el = None
self.assertIsNotNone(error_el)
So el is the input field, but then the xpath always fails. I believed that ../ went up a level in the same way that command line navigation does - is this not the case?
Misunderstood your question earlier. You may try the following logic: find the parent div, then check if it contains class error, rather than find parent div.error and check NoSuchElementException.
Because .. is the way to go upper level, ../div means parent's children div.
// non-working code, only the logic
parent_div = el.find_element_by_xpath("..") # the parent div
self.assertTrue("error" in parent_div.get_attribute("class"))
When you're using a relative xpath (based on an existing element), it needs to start with ./ like this:
el.find_element_by_xpath('./../div[#class="error"]')
Only after the ./ can you start specifying xpath nodes etc.

Categories

Resources