Using Multiple xpaths in Scrapy Selector - python

I have to crawl data with Scrapy like this:
<div class="data"
data-name="{"id":"566565", "name":"data1"}"
data-property="{"length":"444", "height":"678"}"
>
data1
</div>
<div class="data"
data-name="{"id":"566566", "name":"data2"}"
data-property="{"length":"555", "height":"777"}"
>
data2
</div>
I need data-name and data-property attributes. My selector is:
selections = Selector(response).xpath('//div[#class="data"]/attribute::data-property').extract()
How can I include data-name attribute in selections?

The following XPath should return data-property and data-name attributes :
//div[#class='data']/attribute::*[name()='data-property' or name()='data-name']
XPath Demo : http://www.xpathtester.com/xpath/e720602b62461f3600989be73eb15aec
If you need to return the two attributes as a pair in a certain format for each parent div, then this can't be done using pure XPath 1.0. Some python would be required, maybe using list comprehension (not tested) :
selections = [div.xpath('concat(#data-property, " ", #data-name)').extract() \
for div in Selector(response).xpath('//div[#class="data"]')]

Related

Python Selenium get parent element

My html code looks like:
<li>
<div class="level1">
<div id="li_hw2" class="toggle open" </div>
<ul style="" mw="220">
<li>
<div class ="level2">
...
</li>
</ul>
I am currently on the element with the id = "li_hw2", which was found by
level_1_elem = self.driver.find_element(By.ID, "li_hw2")
Now i want to go from level_1_elem to class = "level2". Is it possible to go to the parent li and than to level2? Maybe with xpath?
Hint: It is neccassary to go via the parent li and not directly to the element level2 with
self.driver.find_element(By.Class_Name, "level2")
The best-suited locator for your usecase is xpath, since you want to traverse upward as well as downwards in the HTMLDOM.
level_1_elem = self.driver.find_element(By.XPATH, "//div[#class='li_hw2']")
and then using level_1_elem web element, You can do the following :
to directly go to following-sibling
level_1_elem.find_element(By.XPATH, ".//following-sibling::ul/descendant::div[#class='level2']")
Are you sure about the html i think the ul should group all the li if it s the case then it s easy if not i realy dont get that html.
//div[#class="level1"]/parent::li/parent::ul/li/div[#class="level2"]

Using BeautifulSoup to scrape specific element within a CSS class

I'm trying to use BeautifulSoup in Python to scrape the 3rd li element within a CSS class. That said, i'm pretty new to this, and am not sure the best way to go about this.
Within the below example, what i'm trying to do is to scrape the 170 votes from this list (**in the real world example there are hundreds of these on a page that i'm looking to scrape, but they're all nested under the same CSS class within the 3rd li element)
<ul class="example-ul-class">
<li class="example-li-class">EXAMPLE NAME</li>
<li><i class="example-li-class">12 hours ago</time></li>
<li><i class="example-li-class"> 170 votes</li>
<li><i class="example-li-class">3 min read</li>
</ul>
I tried using something like the below but am getting the error found after the code
subtext = soup.select('.example-ul-class > li[2]')
print(subtext)
Error:
in selector_iter
raise SelectorSyntaxError(msg, self.pattern, index)
soupsieve.util.SelectorSyntaxError: Malformed attribute selector at position 29
line 1:
.example-ul-class > li[2]
**Again, the desired output would be to return just the string '170 votes'
Appreciate the help!
Instead of a CSS selector, try selecting using normal BS methods:
print(soup.find('ul',class_='example-ul-class').find_all('li')[2].text.strip())

How to get acces to the second <p> in selenium (Python)?

I'm new in Python and Selenium. I have this code:
<div class="Product_ProductInfo__23DMi">
<p style="font-weight: bold;">4.50</p>
<p>Bread</p>
<p>390 g</p>
</div>
I want to get access to the second <p> tag and get its value (I mean Bread).
For the first <p> tag, I used:
self.driver.find_element_by_xpath('//div[#class="Product_ProductInfo__23DMi"]/p')
But I don't know how to get to the other one.
Thanks.
You can do that using find_elements_by_css_selector() function and then selecting the second element of it.
a = self.webdriver.find_element_by_css_selector('div[class="Product_ProductInfo__23DMi"]')
second_p = a.find_elements_by_css_selector('p')[1]
You can use :nth-of-type(<index>) (index starts with 1) which is you can say css property.
a = self.webdriver.find_element_by_css_selector('div[class="Product_ProductInfo__23DMi"] > p:nth-of-type(2)')

Selenium - get all children divs but not grandchildren

I'm trying to parse a html file. There are many nested divs in this html. I want to get all child divs, but not grandchildren etc.
Here is a pattern:
<div class='main_div'>
<div class='child_1'>
<div class='grandchild_1'></div>
</div>
<div class='child_2'>
...
...
</div>
So the command I'm looking for would return 2 elements - divs which classes are 'child_1' and 'child_2'.
Is it possible?
I've tried to use main_div.find_elements_by_tag_name('div') but it returned all nested divs in the div.
Here is a way to find the direct div children of the div with class name "main_div":
driver.find_elements_by_xpath('//div[#class="main_div"]/div')
The key here is the use of a single slash which would make the search inside the "main_div" non-recursive finding only direct div children.
Or, with a CSS selector:
driver.find_elements_by_css_selector("div.main_div > div")

How to get text beween < and > using Python Selenium

Further to question here.
<a id='1234' href ="http://www.google.com' class='alpha' > MY TEXT </a>
<caption>
<em> ABCD </em>
</caption>
I want to extract text between i.e
id='1234' href ="http://www.google.com' class='alpha'
How to do this using python and selenium.
Use
web_element.get_attribute(attribute_name)
method on the web_element object to get the value of any attribute present in a web_element in this case id, href, class.

Categories

Resources