This is a link to HTML I want to scrape
https://pk.khaadi.com/unstitched/r20206-red-r20206-red-pk.html
<div class="swatch-attribute-options clearfix">
<div class="swatch-option color selected" option-type="1" option-
id="61" option-label="RED" option-tooltip-thumb="" option-tooltip-
value="#ee0000" "="" style="background: #ee0000 no-repeat center;
background-size: initial;">
</div>
<div class="swatch-option color selected" option-type="1" option-
id="73" option-label="YELLOW" option-tooltip-thumb="" option-tooltip-
value="#feed00" "="" style="background: #feed00 no-repeat center;
background-size: initial;">
</div>
</div>
Color = S_Driver.find_elements_by_xpath( '//*[#id="product-options-wrapper"]/div/div/div[1]/div' )
The Xpath is of the outer div in which both color div are present
for c in Color:
n_Color.append(c.get_attribute( 'option-label' ))
print( n_Color + '\n' )
This how i tried to extract the color through 'option-label' attribute
Change the xpath with:
//div[#class='swatch-option color']
Created based on the provided screenshot, hope that there are no other matches on page based on this one. If so, change it with:
//div[#class='swatch-option color' and #option-type='1']
Related
I'd like to extract and use the red alphabet of the code below through 'Selenium', so please give me some advice on how to do it
The alphabet changes randomly on every try
<td>
<input type="text" name="WKey" id="As_wkey" value="" maxlength="10" class="inputType" style="width: 300px;" title="password" />
<span id="myspam" style="padding: 2px;">
<span style="font-size: 12pt; font-weight: bold; color: red;">H</span>123
<span style="font-size: 12pt; font-weight: bold; color: red;">R</span>
<span style="font-size: 12pt; font-weight: bold; color: red;">8</span>6789
</span>
(type red word.)
</td>
here is my code
red_characters_element = driver.find_element(By.ID, 'myspam')
red_characters_elements = red_characters_element.find_elements(by = By.CSS_SELECTOR, value="span[style='font-size: 12pt; font-weight: bold; color: red;']")
print(red_characters_elements)
result []
Given all the Red colored alphabets are inside of the <span> tag. You can retrieve it using tag.
red_characters_element = driver.find_element(By.ID, 'myspam')
red_characters_elements = red_characters_element.find_elements(By.TAG_NAME, 'span')
for red_character in red_characters_elements:
print(red_character.text)
Results :
H
R
8
If you need only red letters, you can try using the java script inside the selenium:
driver.execute_script('return document.querySelectorAll("[style*=red]")')
You get an array of objects where the style has the attribute "red", with the for loop you can get the values or anything else
I have some html which includes four pictures in a 2x2 row/col style: Picture of html output
I want to convert this into a PDF - trying this with the following lines of code:
options = {"enable-local-file-access": None}
config = pdfkit.configuration(wkhtmltopdf=path_wkhtmltopdf)
pdfkit.from_file(path_IN,path_OUT, configuration=config,options=options)
produces the following pdf output:
Picture of pdf file output
Does anyone know how I can output a PDF that has the same look as the HTML code has?
Currently, I am generating my 2x2 structure by calling the following HTML code
<div class = "row_container">
<div class = "row">
<p class="caption">Figure 1: Real yield vs. Duration</p>
<img src="IL_fig1.png" width = "{width}" height = "{height}">
</div>
<div class = "row">
<p class="caption">Figure 3: Inflation</p>
<img src="IL_fig3.png" width = "{width}" height = "{height}">
</div>
</div>
<div class = "row_container">
<div class = "row">
<p class="caption">Figure 2: Break-even-inflation vs. Duration</p>
<!--<p>BEI, seasonal adjustment</p>-->
<img src="IL_fig2.png" width = "{width}" height = "{height}">
</div>
<div class = "row">
<p class="caption">Figure 4: Break-even-inflation, 10Y inflation-linked bonds</p>
<img src="IL_fig4.png" width = "{width}" height = "{height}">
</div>
</div>
Together with some CSS styling:
.row_container {
display: flex;
justify-content: space-between;
-webkit-box-pack: center;
overflow: hidden;
padding: 0px 3px 0px 3px;
}
.row {
-webkit-box-flex: 1;
-webkit-flex: 1;
/*flex: 1;*/
}
I have tried to scale the pictures, but this did not help either.:
pdf output with scaled images
I would like to have 2 pictures side by side in 2 rows as in the output from the HTML.
Appreciate the help.
I'm trying to make an automation script for tests. My issue is that I can't select a value from drop down menu. I've try a lot of things , but can't make it. My goal is the script to choose every time different value from the menu. When I click the hidden menu , it create a 'ul class' with about 100 'li classes' . There are no id, name or class. I don't know how to reach the element there and click it.
Things that I've tried...
elem = driver.find_element_by_xpath('/html/body/div[3]/div[3]')
all_li = elem.find_elements_by_tag_name("li")
gg = random.choice(all_li)
gg = driver.find_element_by_css_selector("ul > li:nth-child(15)").click()
html code, this is what generate html when hit menu
This is my code:
driver.find_element_by_xpath("/html/body/div[1]/div/main/div/div[2]/form/div[2]/div[1]/div/div/div").click()
Simple html:
<div class="MuiPaper-root MuiMenu-paper MuiPaper-elevation8 MuiPopover-paper MuiPaper-rounded" role="document" tabindex="-1" style="opacity: 1; transform: none; min-width: 491px; transition: opacity 381ms cubic-bezier(0.4, 0, 0.2, 1) 0ms, transform 254ms cubic-bezier(0.4, 0, 0.2, 1) 0ms; top: 80px; left: 16px; transform-origin: -1px 478.513px;">
<ul class="MuiList-root MuiMenu-list MuiList-padding" role="listbox" tabindex="-1" style="padding-right: 17px; width: calc(100% + 17px);">
<li class="MuiButtonBase-root MuiListItem-root MuiMenuItem-root MuiMenuItem-gutters MuiListItem-gutters MuiListItem-button" tabindex="-1" role="option" aria-disabled="false" variant="outlined" data-value="testOne">Test One<span class="MuiTouchRipple-root"></span></li>
<li class="MuiButtonBase-root MuiListItem-root MuiMenuItem-root MuiMenuItem-gutters MuiListItem-gutters MuiListItem-button" tabindex="-1" role="option" aria-disabled="false" variant="outlined" data-value="testTwo">Test Two<span class="MuiTouchRipple-root"></span></li>
I'm trying to extract the data which is in next span of div based on previous div-span text.below is the html content,
<div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:37px; top:161px; width:38px; height:13px;"><span style="font-family: b'Times-Bold'; font-size:13px">Name
<br></span></div><div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:85px; top:161px; width:58px; height:13px;"><span style="font-family: b'Helvetica'; font-size:13px">Ven
<br></span></div>
I trying to find the text using,
n_field = soup.find('span', text="Name\")
And then trying to get the text from next sibling using,
n_field.next_sibling()
However, due to "\n" in the field, I'm unable to find the span and the extract the next_sibling text.
In short, I'm trying to form a dict in the below format,
{"Name": "Ven"}
Any help or idea on this is appreciated.
You could use re instead of bs4.
import re
html = """
<div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:37px; top:161px; width:38px; height:13px;">
<span style="font-family: b'Times-Bold'; font-size:13px">Name
<br>
</span>
</div>
<div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:85px; top:161px; width:58px; height:13px;">
<span style="font-family: b'Helvetica'; font-size:13px">Ven
<br>
</span>
"""
mo = re.search(r'(Name).*?<span.*?13px">(.*?)\n', html, re.DOTALL)
print(mo.groups())
# for consecutive cases use re.finditer or re.findall
html *= 5
mo = re.finditer(r'(Name).*?<span.*?13px">(.*?)\n', html, re.DOTALL)
for match in mo:
print(match.groups())
for (key, value) in re.findall(r'(Name).*?<span.*?13px">(.*?)\n', html, re.DOTALL):
print(key, value)
I had a go at this, and for some reason even after removing the \n, I could not get the nextSibling() so I tried a different tactic as shown below:
from bs4 import BeautifulSoup
"""Lets get rid of the \n"""
html = """<div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:37px; top:161px; width:38px; height:13px;"><span style="font-family: b'Times-Bold'; font-size:13px">Name<br></span></div><div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:85px; top:161px; width:58px; height:13px;"><span style="font-family: b'Helvetica'; font-size:13px">Ven<br></span></div>""".replace("\n","")
soup = BeautifulSoup(html)
span_list = soup.findAll("span")
result = {span_list[0].text:span_list[1].text.replace(" ","")}
And that gives result as:
{'Name': 'Ven'}
I am trying to use BeautifulSoup 4 to extract text from specific tags in an HTML Document. I have HTML that has a bunch of div tags like the following:
<div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:42px; top:90px; width:195px; height:24px;">
<span style="font-family: FIPXQM+Arial-BoldMT; font-size:12px">
Futures Daily Market Report for Financial Gas
<br/>
21-Jul-2015
<br/>
</span>
</div>
<div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:54px; top:135px; width:46px; height:10px;">
<span style="font-family: FIPXQM+Arial-BoldMT; font-size:10px">
COMMODITY
<br/>
</span>
</div>
I am trying to get the text from all span tags that are in any div tag that has a style of "left:54px".
I can get a single div if i use:
soup = BeautifulSoup(open(extracted_html_file))
print soup.find_all('div',attrs={"style":"position:absolute; border: textbox 1px solid; "
"writing-mode:lr-tb; left:42px; top:90px; "
"width:195px; height:24px;"})
It returns:
[<div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:42px; top:90px; width:195px; height:24px;"><span style="font-family: FIPXQM+Arial-BoldMT; font-size:12px">Futures Daily Market Report for Financial Gas
<br/>21-Jul-2015
<br/></span></div>]
But that only gets me the one div that exactly matches that styling. I want all divs that match only the "left:54px" style.
To do this, I've tried a few different ways:
soup = BeautifulSoup(open(extracted_html_file))
print soup.find_all('div',style='left:54px')
print soup.find_all('div',attrs={"style":"left:54px"})
print soup.find_all('div',attrs={"left":"54px"})
But all these print statements return empty lists.
Any Ideas?
You can pass in a regular expression instead of a string according to the documentation here: http://www.crummy.com/software/BeautifulSoup/bs4/doc/#the-keyword-arguments
So I would try this:
import re
soup = BeautifulSoup(open(extracted_html_file))
soup.find_all('div', style = re.compile('left:54px'))