BeautifulSoup - how to select separate values? [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 10 months ago.
Improve this question
In my scrapper I use .select("div.class-name") method but have a trouble: it returns non-separated values.
Structure of my html:
<div class="class-name">
<div>Text1</div>
<div>Text2</div>
<div>Text3</div>
</div>
And as a result it gives me a list ["Text1Text2Text3"]. Is there any way to separate it as in html?

You mean like this?
from bs4 import BeautifulSoup
sample_html = '''<div class="class-name">
<div>Text1</div>
<div>Text2</div>
<div>Text3</div>
</div>'''
print(BeautifulSoup(sample_html, "lxml").select("div.class-name div"))
Output:
[<div>Text1</div>, <div>Text2</div>, <div>Text3</div>]

Related

How to add new value to a class with lxml [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
not sure if its even possible tbh.. all im trying to do is dynamically edit the text.
HTML:
<aside class="banner">
Place <span class=red>open</span></a>
</aside>
python:
reds = root.find_class("red")
for element in reds:
*not sure what goes here*
I already have code that i can use to edit the text remotel
I know you are asking about lxml but great alternative when it comes to html files is bs4.
With bs4/BeautifulSoup it looks like this:
for element in soup.find_all("span", { "class": "red"}):
element.string = NEW_VALUE
with open("out.html", "w") as out_file:
out_file.write(str(soup))
https://github.com/poleszcz/stack-misc/blob/main/69193852-bs4-edit-content/edit.py

I am Unable to extract data between <!--<tbody> and</tr>--> [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
siteData = notificationData(url = '')
soup = BeautifulSoup(siteData, 'html.parser')
for tr in soup.find_all('table').find_all('tr'):
BeautifulSoup won't extract tags that are commented out. You could remove these strings prior to loading it in BeautifulSoup:
siteData = siteData.replace('<!--', '').replace('-->', '')

Grab specific text from string in Python [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
How can I grab the invite code from this string?
{awarded:1,inviteURL:https:\/\/www.example.com\/refer\/invite\/111A111A\/}
The expected output would be "111A111A".
Any help is appreciated
I tried it in a simple way, You could give more details for further improvement.
s = "{awarded:1,inviteURL:https:\/\/www.example.com\/refer\/invite\/111A111A\/}"
print(s[-11: -3])
This will do it with ReGex
import re
def findInvite(s):
return re.search(r"(?<=/invite\\/).*(?=\\/)",s).group()
assert findInvite("{awarded:1,inviteURL:https:\/\/www.example.com\/refer\/invite\/111A111A\/}") == "111A111A"
And if this isn't a string but a dict, then change the function to:
def findInvite(d):
s = d["inviteURL"]
return re.search(r"(?<=/invite\\/).*(?=\\/)",s).group()

How do you find patterns/combinations within a String in Python? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I'm looking to return all instances of the following in Python, but not sure how. As in, how can I search a String and print every time the following format is found:
<a href="[what I'm trying to return is here]" class="faux-block-link__overlay-link"
You need an HTML parser, like BeautifulSoup. Sample:
>>> from bs4 import BeautifulSoup
>>>
>>> s = 'link'
>>> BeautifulSoup(s, "html.parser").a["href"]
u"[what I'm trying to return is here]"
where .a is equivalent to .find("a"). Note that BeautifulSoup provides a convenient dictionary-like access to element attributes.

grep with python to match string inside quotes in html files [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I am newbie in grep and I'm familiar with Python. My problem is to find and replace every string inside the quote like "text" by < em >text< /em >
The source file has the html form
Thanks
That'll do the trick
import re
s = '"text" "some"'
res = re.subn('"([^"]*)"', '<em>\\1</em>', s)[0]

Categories

Resources