Unable to retrieve innerText/innerHTML in Python - python

Hovering over innerText shows the text data but not through Python
I am trying to retrieve innertext or innerHTML from the HTML from this website (see attached image). The HTML saved/printed from BeautifulSoup does not have the content seen in the attached image of the innerText.
import requests, re
from bs4 import BeautifulSoup
r=requests.get("https://jobs.ca.gov/CalHRPublic/Search/JobSearchResults.aspx#classid=441")
c=r.content
soup=BeautifulSoup(c,"html.parser")
print (soup.prettify())
When I inspect the page in Google Chrome , click on the div block and copy the HTML, the copied HTML from Chrome inspect has all the data I am looking for.
How do I get the same data in Python or do I have to use Selenium?
<div class="card-block" id="collapse1234" itemscope="" itemtype="http://schema.org/Organization" role="tablist" aria-multiselectable="true">
<div class="row" role="presentation">
<div class="col-md-10 " role="presentation">
<a id="cphMainContent_rptResults_hlViewJobPosting_0" class="lead visitedLink" href="/CalHrPublic/Jobs/JobPosting.aspx?JobControlId=70488">ACCOUNTING ADMINISTRATOR I (SPECIALIST)</a>
</div>
<div class="col-md-2 tar">
<div id="cphMainContent_rptResults_pnlFavoriteJob_0" class="aspNetDisabled" style="display: inline;">
<i id="cphMainContent_rptResults_iIsNotFavorite_0" class="fa fa-star-o" aria-hidden="true" style="cursor:default;color:grey;opacity:.6;" title="You must be logged in to save a job as a Favorite." onclick="">
Log in to save job
</i>
<i id="cphMainContent_rptResults_iIsFavorite_0" class="fa fa-star" title="This job is saved" style="color:#fdb81e;cursor:pointer;display:none;" aria-hidden="true" onclick="removeUserFavorite(70488, $(this) );"> Job saved</i>
</div>
</div>
</div>
<div class="row" role="presentation">
<div class="col-sm-12 col-md-9" role="presentation">
<div class="row">
<div class="col-xs-12 col-sm-6" role="presentation">
<div class="working-title details row">
<div class="col-xs-6 job-label">Working Title:</div>
<div class="col-xs-6 job-details">
<span title="Keyword Relevance: 0">N/A</span>
</div>
</div>
<div class="position-number details row">
<div class="col-xs-6 job-label">Job Control:</div>
<div class="col-xs-6 job-details">
70488
</div>
</div>
<div class="salary-range details row">
<div class="col-xs-6 job-label">Salary Range:</div>
<div class="col-xs-6 job-details">
$5053.00 - $6325.00
</div>
</div>
<div class="schedule details row">
<div class="col-xs-6 job-label">Work Type/Schedule:</div>
<div class="col-xs-6 job-details">
Permanent Fulltime
</div>
</div>
</div>
<div class="col-xs-12 col-sm-6" role="presentation">
<div class="department details row">
<div class="col-xs-6 job-label">Department:</div>
<div class="col-xs-6 job-details">
Board of Equalization
</div>
</div>
<div class="location details row">
<div class="col-xs-6 job-label">Location:</div>
<div class="col-xs-6 job-details">
Sacramento County
</div>
</div>
<div class="filing-date details row">
<div class="col-xs-6 job-label">Publish Date:</div>
<div class="col-xs-6 job-details">
<time datetime="2016-06-30">
6/29/2017</time>
</div>
</div>
</div>
</div>
</div>
<div class="col-sm-12 col-md-3 align-right" role="presentation">
<div class="filing-date details row">
<div class="col-xs-12">
<div class="job-label">Filing Deadline:</div>
<div class="job-details">
<time datetime="2016-06-30">
7/14/2017
</time>
</div>
</div>
<div class="col-xs-12">
<a id="cphMainContent_rptResults_hlViewPosting_0" class="btn btn-secondary btn-block" href="/CalHrPublic/Jobs/JobPosting.aspx?JobControlId=70488">
<span class="ca-gov-icon-search"></span>
<span>View Job Posting</span>
</a>
</div>
</div>
</div>
</div>
</div>

Related

python scrap chrome web-store comment

I am trying to scrape reviews from Chrome Web-Store and having a problem with how to distinct between a comment and the replies to the comment.
Below is an example for such HTML, where the user "John Smith" has a comment and a reply.
I am currently using pyppeteer to scrap the content.
I tried querySelectionAll for .ba-bc-Xb-K and .ba-bc-Xb and several other ways, but was not able to clearly make identification
<div class="ba-fb">
<div>
<div class="ba-bc-Xb">
<div class="ba-ji-A ba-ua-zl-Xb"><img src="//lh3.googleusercontent.com/a/default-user=s40-c-k" class="Lg-ee-A-O-xb" alt=""></div>
<div class="ba-bc-Xb-K">
<div class="ba-pa">
<span class="comment-thread-displayname" dir="auto">Lucy</span><span class="ba-Eb-Nf">Jun 26, 2022</span>
<div class="ba-Eb-N">
<div class="rsw-stars" aria-label="1 star">
<div class="rsw-starred" aria-hidden="true"></div>
<div class="rsw-unstarred" aria-hidden="true"></div>
<div class="rsw-unstarred" aria-hidden="true"></div>
<div class="rsw-unstarred" aria-hidden="true"></div>
<div class="rsw-unstarred" aria-hidden="true"></div>
</div>
</div>
<br>
<div class="ba-Eb-ba" dir="auto">We use this because its easy to hover and text over phone numbers of clients. IT VERY GLITCHY AND CRASHES OFTEN. 90% our of business runs on SMS texting, so I really wish I didn't use this for my company. If any has a better option please let me know!!!! I've been using this for a year and its getting better!!!!!!! AVOID IF YOU CAN!!!! Customer Service is ALSO TERRIBLE!</div>
</div>
<div class="ba-bc-Xb-cd">
<div class="bd-Ob Aa">
<div class="bd-Ob-L dd">Was this review helpful?</div>
<label class="voting-editor-button XzMRXd-sn"><input class="XzMRXd-sn-lc XzMRXd-lc" type="radio" name="vote_AIe9_BGDdh8EqnrWQb-fggox4SOmWi01kdMh4CdLCQD9oHM2uKG-GiDamTukgoJw7LwDNaVtssNY9zUfkPqZTbmL6bYR7YM8Tfa86zq-joAbx8qi5xjbhVjHguGAQoDUMi0YYV_pkFaVKt6ISOsZBGJKlLvhS3uCBg8VrwTO04skZFgbPvYGgPjeQgCKwOz4LyvBPf6dlvKz">Yes</label><label class="voting-editor-button XzMRXd-eb"><input class="XzMRXd-eb-lc XzMRXd-lc" type="radio" name="vote_AIe9_BGDdh8EqnrWQb-fggox4SOmWi01kdMh4CdLCQD9oHM2uKG-GiDamTukgoJw7LwDNaVtssNY9zUfkPqZTbmL6bYR7YM8Tfa86zq-joAbx8qi5xjbhVjHguGAQoDUMi0YYV_pkFaVKt6ISOsZBGJKlLvhS3uCBg8VrwTO04skZFgbPvYGgPjeQgCKwOz4LyvBPf6dlvKz">No</label>
</div>
<div class="ba-bc-zb-Pe">
<a tabindex="0" class="ba-bc-zb-y z-b-ob-y" role="button">Reply</a><a class="ba-bc-zb-y ba-Eb-xe-ba Pa" role="button" tabindex="0">Delete</a>
<div class="ba-bc-zb-y Da-ub"><a tabindex="0" class="Aa Da-ub-y" role="button">Mark as spam or abuse</a></div>
</div>
</div>
<div class="yb-ba-Eb-k">
<div class="Fg-b-ob-k Pa">
<textarea class="Fg-b-ob-Gc" rows="5" maxlength="4096" aria-label="Write your reply" placeholder="Write your reply"></textarea>
<div class="Od"></div>
<div class="Fg-b-ob-Jb-k"><input type="button" value="Cancel" class="g-c g-c-aSvl1d Aa Fg-b-ob-Fb-c"> <input type="button" value="Post" class="g-c g-c-wb Aa Fg-b-ob-qd-c"></div>
</div>
</div>
<div class="Od"></div>
<div class="Fg-b-ob-fb"></div>
<div class="Fg-b-mb-Fk Pa"><a role="button" tabindex="0" class="mb-Fk-c">Load more replies</a></div>
</div>
</div>
</div>
<div>
<div class="ba-bc-Xb">
<div class="ba-ji-A ba-ua-zl-Xb"><img src="//lh3.googleusercontent.com/a-/AFdZucpu4S27XT0-ymC2sQo4ML3v0EkQWHfQeW-YO5jyPg=s40-c-k" class="Lg-ee-A-O-xb" alt=""></div>
<div class="ba-bc-Xb-K">
<div class="ba-pa">
<span class="comment-thread-displayname" dir="auto">John Smith</span><span class="ba-Eb-Nf">May 24, 2022</span>
<div class="ba-Eb-N">
<div class="rsw-stars" aria-label="1 star">
<div class="rsw-starred" aria-hidden="true"></div>
<div class="rsw-unstarred" aria-hidden="true"></div>
<div class="rsw-unstarred" aria-hidden="true"></div>
<div class="rsw-unstarred" aria-hidden="true"></div>
<div class="rsw-unstarred" aria-hidden="true"></div>
</div>
</div>
<br>
<div class="ba-Eb-ba" dir="auto">Desktop app is interesting and the chrome browser buddy is even better. I wish I was not forced by my company to use the company.</div>
</div>
<div class="ba-bc-Xb-cd">
<div class="bd-Ob Aa">
<div class="bd-Ob-L dd">Was this review helpful?</div>
<label class="voting-editor-button XzMRXd-sn"><input class="XzMRXd-sn-lc XzMRXd-lc" type="radio" name="vote_AIe9_BE0070MjUM89cQCwjN0anL45obXJS3lggtKPsNh1lW8nApB3slGfCkLIRHtWYvTrteJ5Hsx15_Lq2GFBMLLbrKFghCR9XqAfnbN5yIZquqVmHLhEkzLpjGKotj-iX8wKux-rJoLU_8vz3wUKa76z0Ttw8QF2EKBKeT-vhT2WYDm8qPVpdpmgnYnObbYr_aDQlz4P5FD">Yes</label><label class="voting-editor-button XzMRXd-eb"><input class="XzMRXd-eb-lc XzMRXd-lc" type="radio" name="vote_AIe9_BE0070MjUM89cQCwjN0anL45obXJS3lggtKPsNh1lW8nApB3slGfCkLIRHtWYvTrteJ5Hsx15_Lq2GFBMLLbrKFghCR9XqAfnbN5yIZquqVmHLhEkzLpjGKotj-iX8wKux-rJoLU_8vz3wUKa76z0Ttw8QF2EKBKeT-vhT2WYDm8qPVpdpmgnYnObbYr_aDQlz4P5FD">No</label>
</div>
<div class="ba-bc-zb-Pe">
<a tabindex="0" class="ba-bc-zb-y z-b-ob-y" role="button">Reply</a><a class="ba-bc-zb-y ba-Eb-xe-ba Pa" role="button" tabindex="0">Delete</a>
<div class="ba-bc-zb-y Da-ub"><a tabindex="0" class="Aa Da-ub-y" role="button">Mark as spam or abuse</a></div>
</div>
</div>
<div class="yb-ba-Eb-k">
<div class="Fg-b-ob-k Pa">
<textarea class="Fg-b-ob-Gc" rows="5" maxlength="4096" aria-label="Write your reply" placeholder="Write your reply"></textarea>
<div class="Od"></div>
<div class="Fg-b-ob-Jb-k"><input type="button" value="Cancel" class="g-c g-c-aSvl1d Aa Fg-b-ob-Fb-c"> <input type="button" value="Post" class="g-c g-c-wb Aa Fg-b-ob-qd-c"></div>
</div>
</div>
<div class="Od"></div>
<div class="Fg-b-ob-fb">
<div>
<div class="ba-bc-Xb">
<div class="ba-ji-A ba-ua-zl-Xb"><img src="//lh3.googleusercontent.com/a-/AFdZucpu4S27XT0-ymC2sQo4ML3v0EkQWHfQeW-YO5jyPg=s40-c-k" class="Lg-ee-A-O-xb" alt=""></div>
<div class="ba-bc-Xb-K">
<div class="ba-pa">
<span class="comment-thread-displayname" dir="auto">John Smith</span><span class="ba-Eb-Nf">May 24, 2022</span><br>
<div class="ba-Eb-ba" dir="auto">I'm happy to chat with the engineering and UX team to tell you exactly how to fix it.</div>
</div>
<div class="ba-bc-Xb-cd">
<div class="bd-Ob Aa">
<div class="bd-Ob-L dd">Was this review helpful?</div>
<label class="voting-editor-button XzMRXd-sn"><input class="XzMRXd-sn-lc XzMRXd-lc" type="radio" name="vote_AIe9_BFmRFRFwJGRfUDOW8jG3rXnLzUlJu5dFPOnRhcZ3Qpf7k7js81NA_AsDgEfcDAZt0H9yZfs73z4D-hSlo1bxU2QLKaAXG2SMo-85mMfMl_-V6KnhrLHz2FEyGejziQP8UkVi-SsuqBw_lc0GmW9TtC5naBzAp94w9FygzBqeDyguPYXJMc">Yes</label><label class="voting-editor-button XzMRXd-eb"><input class="XzMRXd-eb-lc XzMRXd-lc" type="radio" name="vote_AIe9_BFmRFRFwJGRfUDOW8jG3rXnLzUlJu5dFPOnRhcZ3Qpf7k7js81NA_AsDgEfcDAZt0H9yZfs73z4D-hSlo1bxU2QLKaAXG2SMo-85mMfMl_-V6KnhrLHz2FEyGejziQP8UkVi-SsuqBw_lc0GmW9TtC5naBzAp94w9FygzBqeDyguPYXJMc">No</label>
</div>
<div class="ba-bc-zb-Pe">
<a class="ba-bc-zb-y ba-Eb-xe-ba Pa" role="button" tabindex="0">Delete</a>
<div class="ba-bc-zb-y Da-ub"><a tabindex="0" class="Aa Da-ub-y" role="button">Mark as spam or abuse</a></div>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="Fg-b-mb-Fk Pa"><a role="button" tabindex="0" class="mb-Fk-c">Load more replies</a></div>
</div>
</div>
</div>
</div>
Generally, I'd avoid using these classnames because they change too fast.
I see that comment on this page can be only one level deep. The parent comment has always <textarea>, so replies don't have it. You can distinguish parent-reply with this:
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, "html.parser") # <-- html_doc is your snippet from the question
out = {}
for div in soup.select("div:has(>.comment-thread-displayname)"):
# this is a reply:
if not div.parent.find("textarea"):
continue
replies = []
for s in div.find_next_siblings():
if (reply := s.find(class_="comment-thread-displayname")) :
name, date, text = reply.parent.get_text(
strip=True, separator="\n"
).split("\n", maxsplit=2)
replies.append((name, date, text))
name, date, text = div.get_text(strip=True, separator="\n").split(
"\n", maxsplit=2
)
out[(name, date, text)] = replies
print(out)
Prints dictionary where keys are 3-item tuples of (name, date, text) of parent comment and values are lists of 3-item tuples of replies:
{
(
"Lucy",
"Jun 26, 2022",
"We use this because its easy to hover and text over phone numbers of clients. IT VERY GLITCHY AND CRASHES OFTEN. 90% our of business runs on SMS texting, so I really wish I didn't use this for my company. If any has a better option please let me know!!!! I've been using this for a year and its getting better!!!!!!! AVOID IF YOU CAN!!!! Customer Service is ALSO TERRIBLE!",
): [],
(
"John Smith",
"May 24, 2022",
"Desktop app is interesting and the chrome browser buddy is even better. I wish I was not forced by my company to use the company.",
): [
(
"John Smith",
"May 24, 2022",
"I'm happy to chat with the engineering and UX team to tell you exactly how to fix it.",
)
],
}

Pyton, Selenium: I need to collect urls but there no a tags in element

Good day, guys. I have a task to collect Name and Email for person from this site:
https://www.espeakers.com/s/nsas/search?available_on=&awards&budget=0%2C10&bureau_id=304&distance=1000&fee=false&items_per_page=3701&language=en&location=&norecord=false&nt=0&page=0&presenter_type=&q=%5B%5D&require&review=false&sort=speakername&video=false&virtual=false
I use selenium and python to scrape it, but I have a problem with accessing an url for people. The sample structure of person card is:
<div class="col-xs-12 col-sm-6 col-md-4 col-lg-3">
<div class="speaker-tile" id="sid12026">
<div class="speaker-thumb" style='background-image: url("https://streamer.espeakers.com/assets/6/12026/159445.jpg"); background-size: contain;'>
<div class="row">
<div class="col-xs-8 text-left">
</div>
<div class="col-xs-4 text-right speaker-top-actions">
<i class="fa fa-ellipsis-h fa-fw">
</i>
</div>
</div>
</div>
<div class="speaker-details">
<div class="speaker-name">
Alex Aanderud
</div>
<div class="row" style="margin-top: 15px;">
<div class="col-xs-12 col-sm-12">
<div class="speaker-location">
<i class="fa fa-map-marker mp-tertiary-background">
</i>
AZ
<span>
,
</span>
US
</div>
</div>
<div class="col-sm-6 col-xs-12">
<div class="speaker-awards">
</div>
</div>
</div>
<div class="speaker-oneline text-left">
<p>
</p>
<div>
Certified Trainer of Advanced Integrative Psychology and Certified John Maxwell Speaker, Trainer, Coach, will transform your organization and improve your results.
</div>
</div>
<div class="speaker-assets">
<div class="row">
</div>
</div>
<div class="speaker-actions">
<div class="row">
<div class="text-center col-xs-12">
<div class="btn btn-flat mp-primary btn-block">
<span class="hidden-xs hidden-sm">
View Profile
</span>
<span class="visible-xs visible-sm">
Profile
</span>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
And the when you click on
<span class="hidden-xs hidden-sm">
View Profile
</span>
It moves you to page with person info where I can access it. How I can use selenium to do this, or there are others solutions that can help me.
Thanks!
If you notice, all the profile urls are of the form
https://www.espeakers.com/s/nsas/profile/id
where id is a 5 digits number such as 27397. So you just need to extract the id and concatenate it with the base url to obtain the profile url.
url = 'https://www.espeakers.com/s/nsas/profile/'
profile_urls = [url + el.get_attribute('id')[3:] for el in driver.find_elements(By.CSS_SELECTOR, '.speaker-tile')]
names = [el.text for el in driver.find_elements(By.CSS_SELECTOR, '.speaker-name')]
names is a list containing all the names, urls is a list containing the corresponding profile urls

Selenium: is there a way to check if text extends beyond the borders of an element?

I'm writing tests python+selenium, and here is the problem I've stumbled upon.
I need to check if the text in a box is displayed correctly and doesn't go beyond the margins.
What I means is:
Page source:
<div class="mt-5">
<div class="row flex-column flex-lg-row">
<div id="card-6e64c570-f558-47c8-d66a-08d9a43db45b" class="events-left-block w-100 col-lg-6 mb-4xl position-relative">
<a href="/Invite/Trainings/6e64c570-f558-47c8-d66a-08d9a43db45b?back=%2FCatalog" class="w-100">
<div class="card border-0 bg-white h-100 shadow rounded-lg p-0">
<div class="card-body p-4 h-100 d-flex flex-column">
<div class="card-title text-dark mb-4 row justify-content-between no-gutters flex-nowrap">
<h5 class="mb-0">Pneumonoultramicroscopicsilicovolcanoconiosis</h5>
<span class="text-white ml-4"></span>
</div>
<div class="card-text d-flex justify-content-between align-items-end flex-row flex-grow-1">
<ui class="list-group-flush">
<li class="d-flex align-items-center small list-group-item li-info text-dark">
<i class="fas fa-user-friends text-light-gray fa-small"></i>
1
</li>
<li class="d-flex align-items-center list-group-item small li-info text-dark">
<i class="ri-calendar-fill text-light-gray small"></i>
29.11.2021
</li>
</ui>
<img class="rounded-circle cover-image" src="/images/default-training.png" width="64" height="64" alt="img">
</div>
</div>
</div>
</a>
</div>
</div>
</div>
I was wondering if there is a way to check (by selenium, jquery, whatever) if the text fits the box?

beautifulsoup add <div> with class at end of html

I have an HTML file with multiple tags (there are multiple div inside the div as well). I want to add a new tag along with class to the end of the HTML at a specific position. I tried with append, insert, and insert_after/insert_before as well, however, it's not working as I expected.
My html input is:
<div id="page">
<div id="records">
<div class="record">
<div class="header">
<div class="title">
Something here to display
</div>
</div>
<div class="disclaimer">
<p>Here i want to print content</p>
</div>
</div>
<div class="record">
<div class="header">
<div class="title">
Something here to display again once
</div>
</div>
<div class="disclaimer">
<p>Here i want to print content again once</p>
</div>
</div>
<div class="record">
<div class="header">
<div class="title">
Something here to display second time
</div>
</div>
<div class="disclaimer">
<p>Here i want to print content second time</p>
</div>
</div>
</div>
</div>
i want to add new <div> tag with class="record" at the end, before the closing tag of <div id="records">.
output would look like this:
<div id="page">
<div id="records">
<div class="record">
<div class="header">
<div class="title">
Something here to display
</div>
</div>
<div class="disclaimer">
<p>Here i want to print content</p>
</div>
</div>
<div class="record">
<div class="header">
<div class="title">
Something here to display again once
</div>
</div>
<div class="disclaimer">
<p>Here i want to print content again once</p>
</div>
</div>
<div class="record">
<div class="header">
<div class="title">
Something here to display second time
</div>
</div>
<div class="disclaimer">
<p>Here i want to print content second time</p>
</div>
</div>
<div class="record">
<div class="header">
<div class="title">
Something here to display 3rd time
</div>
</div>
<div class="disclaimer">
<p>Here i want to print content 3rd time</p>
</div>
</div>
</div>
</div>
In my case, the number of <div class="record"> is not fixed, the number may vary always.
I would like to get a solution/suggestion for this problem using BeautifulSoup in python.
You can use insert_after after the last item in soup.find_all('div', class_='record'):
from bs4 import BeautifulSoup
html = '<div id="records"> <div class="record"> <div class="header"> <div class="title"> Something here to display </div> </div> <div class="disclaimer"> <p>Here i want to print content</p> </div> </div> <div class="record"> <div class="header"> <div class="title"> Something here to display again once </div> </div> <div class="disclaimer"> <p>Here i want to print content again once</p> </div> </div> <div class="record"> <div class="header"> <div class="title"> Something here to display second time </div> </div> <div class="disclaimer"> <p>Here i want to print content second time</p> </div> </div> </div>'
soup = BeautifulSoup(html, 'html.parser')
extra_html = '''
<div class="record">
<div class="header">
<div class="title">
Something here to display 3rd time
</div>
</div>
<div class="disclaimer">
<p>Here i want to print content 3rd time</p>
</div>
</div>'''
soup.find_all('div', class_='record')[-1].insert_after(BeautifulSoup(extra_html, 'html.parser')) # [-1] selects the last item
Output print(soup.prettify()):
<div id="records">
<div class="record">
<div class="header">
<div class="title">
Something here to display
</div>
</div>
<div class="disclaimer">
<p>
Here i want to print content
</p>
</div>
</div>
<div class="record">
<div class="header">
<div class="title">
Something here to display again once
</div>
</div>
<div class="disclaimer">
<p>
Here i want to print content again once
</p>
</div>
</div>
<div class="record">
<div class="header">
<div class="title">
Something here to display second time
</div>
</div>
<div class="disclaimer">
<p>
Here i want to print content second time
</p>
</div>
</div>
<div class="record">
<div class="header">
<div class="title">
Something here to display 3rd time
</div>
</div>
<div class="disclaimer">
<p>
Here i want to print content 3rd time
</p>
</div>
</div>
</div>
using .append(), it need to select the parent element or <div id="page">
newRecord = '''
<div class="record">
<div class="header">
<div class="title">
Something here to display 3rd time
</div>
</div>
<div class="disclaimer">
<p>Here i want to print content 3rd time</p>
</div>
</div>
'''
soup = BeautifulSoup(sourceHTML, 'html.parser')
page = soup.select_one('#page')
page.append(BeautifulSoup(newRecord, 'html.parser'))
print(soup.prettify())

Web scraping problem : Data does not show when printed

So i tried to scrape this website: https://top-1000-sekolah.ltmpt.ac.id/site/page?id=2001
if you inspect element, there's a div with id of tab-1, tab-2,tab-3, tab-4 . So I tried ti scrape each id but somehow only tab-1 data's were grabbed. so what did I do wrong??
pk = driver.find_element_by_xpath("(//div[#id='tab-1'])")
pbm = driver.find_element_by_id('tab-2')
pu = driver.find_element_by_id('tab-3')
ppu = driver.find_element_by_id('tab-4')
The output I expect from tab-2 is :
Kemampuan Kuantitatif
2
Urut Nasional
1
Urut Provinsi
Rerata
640,253
Nilai Tertinggi
721,15
Nilai Terendah
511,14
Standar Deviasi
44,1
and currently tab-2 output is blank( ' ' )
Try doing this:
pbm = driver.find_element_by_id('tab-2')
print(pbm.text)
If that doesn't work, I suspect it is because that div class with the id of tab-2 has many child elements. You will need to select those individual child elements directly to get the data you need. Use the XPATH method which you used up top.
<div class="row">
<div class="col-lg-12 details order-2 order-lg-1">
<h3 align="center">
Kemampuan Memahami Bacaan dan Menulis
</h3>
<hr>
<div class="row">
<div class="col-lg-6 col-md-6">
<div class="count-box">
<i class="icofont-award"></i>
<span data-toggle="counter-up">5</span>
<p>Urut Nasional</p>
</div>
</div>
<div class="col-lg-6 col-md-6">
<div class="count-box">
<i class="icofont-award"></i>
<span data-toggle="counter-up">1</span>
<p>Urut Provinsi</p>
</div>
</div>
</div>
<hr>
<div class="row">
<div class="col-sm-3">
<div class="card bg-light mb-3" style="max-width: 18rem;">
<div class="card-header" align="center">Rerata</div>
<div class="card-body">
<h3 class="card-title" align="center"><b>589,104</b></h3>
</div>
</div>
</div>
<div class="col-sm-3">
<div class="card bg-light mb-3" style="max-width: 18rem;">
<div class="card-header" align="center">Nilai Tertinggi</div>
<div class="card-body">
<h3 class="card-title" align="center"><b>709,61</b></h3>
</div>
</div>
</div>
<div class="col-sm-3">
<div class="card bg-light mb-3" style="max-width: 18rem;">
<div class="card-header" align="center">Nilai Terendah</div>
<div class="card-body">
<h3 class="card-title" align="center"><b>371,88</b></h3>
</div>
</div>
</div>
<div class="col-sm-3">
<div class="card bg-light mb-3" style="max-width: 18rem;">
<div class="card-header" align="center">Standar Deviasi</div>
<div class="card-body">
<h3 class="card-title" align="center"><b>65,96</b></h3>
</div>
</div>
</div>
</div>
</div>
</div>
For example to parse the name Kemampuan Kuantitatif,
name = driver.find_element_by_xpath('//*[#id="tab-2"]/div/div/h3')
print(name)

Categories

Resources