Web scraping problem : Data does not show when printed - python

So i tried to scrape this website: https://top-1000-sekolah.ltmpt.ac.id/site/page?id=2001
if you inspect element, there's a div with id of tab-1, tab-2,tab-3, tab-4 . So I tried ti scrape each id but somehow only tab-1 data's were grabbed. so what did I do wrong??
pk = driver.find_element_by_xpath("(//div[#id='tab-1'])")
pbm = driver.find_element_by_id('tab-2')
pu = driver.find_element_by_id('tab-3')
ppu = driver.find_element_by_id('tab-4')
The output I expect from tab-2 is :
Kemampuan Kuantitatif
2
Urut Nasional
1
Urut Provinsi
Rerata
640,253
Nilai Tertinggi
721,15
Nilai Terendah
511,14
Standar Deviasi
44,1
and currently tab-2 output is blank( ' ' )

Try doing this:
pbm = driver.find_element_by_id('tab-2')
print(pbm.text)
If that doesn't work, I suspect it is because that div class with the id of tab-2 has many child elements. You will need to select those individual child elements directly to get the data you need. Use the XPATH method which you used up top.
<div class="row">
<div class="col-lg-12 details order-2 order-lg-1">
<h3 align="center">
Kemampuan Memahami Bacaan dan Menulis
</h3>
<hr>
<div class="row">
<div class="col-lg-6 col-md-6">
<div class="count-box">
<i class="icofont-award"></i>
<span data-toggle="counter-up">5</span>
<p>Urut Nasional</p>
</div>
</div>
<div class="col-lg-6 col-md-6">
<div class="count-box">
<i class="icofont-award"></i>
<span data-toggle="counter-up">1</span>
<p>Urut Provinsi</p>
</div>
</div>
</div>
<hr>
<div class="row">
<div class="col-sm-3">
<div class="card bg-light mb-3" style="max-width: 18rem;">
<div class="card-header" align="center">Rerata</div>
<div class="card-body">
<h3 class="card-title" align="center"><b>589,104</b></h3>
</div>
</div>
</div>
<div class="col-sm-3">
<div class="card bg-light mb-3" style="max-width: 18rem;">
<div class="card-header" align="center">Nilai Tertinggi</div>
<div class="card-body">
<h3 class="card-title" align="center"><b>709,61</b></h3>
</div>
</div>
</div>
<div class="col-sm-3">
<div class="card bg-light mb-3" style="max-width: 18rem;">
<div class="card-header" align="center">Nilai Terendah</div>
<div class="card-body">
<h3 class="card-title" align="center"><b>371,88</b></h3>
</div>
</div>
</div>
<div class="col-sm-3">
<div class="card bg-light mb-3" style="max-width: 18rem;">
<div class="card-header" align="center">Standar Deviasi</div>
<div class="card-body">
<h3 class="card-title" align="center"><b>65,96</b></h3>
</div>
</div>
</div>
</div>
</div>
</div>
For example to parse the name Kemampuan Kuantitatif,
name = driver.find_element_by_xpath('//*[#id="tab-2"]/div/div/h3')
print(name)

Related

Pyton, Selenium: I need to collect urls but there no a tags in element

Good day, guys. I have a task to collect Name and Email for person from this site:
https://www.espeakers.com/s/nsas/search?available_on=&awards&budget=0%2C10&bureau_id=304&distance=1000&fee=false&items_per_page=3701&language=en&location=&norecord=false&nt=0&page=0&presenter_type=&q=%5B%5D&require&review=false&sort=speakername&video=false&virtual=false
I use selenium and python to scrape it, but I have a problem with accessing an url for people. The sample structure of person card is:
<div class="col-xs-12 col-sm-6 col-md-4 col-lg-3">
<div class="speaker-tile" id="sid12026">
<div class="speaker-thumb" style='background-image: url("https://streamer.espeakers.com/assets/6/12026/159445.jpg"); background-size: contain;'>
<div class="row">
<div class="col-xs-8 text-left">
</div>
<div class="col-xs-4 text-right speaker-top-actions">
<i class="fa fa-ellipsis-h fa-fw">
</i>
</div>
</div>
</div>
<div class="speaker-details">
<div class="speaker-name">
Alex Aanderud
</div>
<div class="row" style="margin-top: 15px;">
<div class="col-xs-12 col-sm-12">
<div class="speaker-location">
<i class="fa fa-map-marker mp-tertiary-background">
</i>
AZ
<span>
,
</span>
US
</div>
</div>
<div class="col-sm-6 col-xs-12">
<div class="speaker-awards">
</div>
</div>
</div>
<div class="speaker-oneline text-left">
<p>
</p>
<div>
Certified Trainer of Advanced Integrative Psychology and Certified John Maxwell Speaker, Trainer, Coach, will transform your organization and improve your results.
</div>
</div>
<div class="speaker-assets">
<div class="row">
</div>
</div>
<div class="speaker-actions">
<div class="row">
<div class="text-center col-xs-12">
<div class="btn btn-flat mp-primary btn-block">
<span class="hidden-xs hidden-sm">
View Profile
</span>
<span class="visible-xs visible-sm">
Profile
</span>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
And the when you click on
<span class="hidden-xs hidden-sm">
View Profile
</span>
It moves you to page with person info where I can access it. How I can use selenium to do this, or there are others solutions that can help me.
Thanks!
If you notice, all the profile urls are of the form
https://www.espeakers.com/s/nsas/profile/id
where id is a 5 digits number such as 27397. So you just need to extract the id and concatenate it with the base url to obtain the profile url.
url = 'https://www.espeakers.com/s/nsas/profile/'
profile_urls = [url + el.get_attribute('id')[3:] for el in driver.find_elements(By.CSS_SELECTOR, '.speaker-tile')]
names = [el.text for el in driver.find_elements(By.CSS_SELECTOR, '.speaker-name')]
names is a list containing all the names, urls is a list containing the corresponding profile urls

How can I have my template display a list on two different div tags without having redundancies

I have a list from my Model but I want my template to display the list element in groups of 4 or half the total length of the list Example: let say i have 10 elements in my list i want 5 on the right size and 5 on the left side. Please see screenshot below.
This is how i want my page to look like:
But this is what i get:
This is my HTML file.
<div class="section-title">
<h2>Skills</h2>
<p>hsjkhvdkdjhvjkdfnv kjdf, dfhvkhdnfvkjldf,xhvnkldsv.mckldfnv ,dfhxncjcshfxdjvhcnjsdnckndjvbc d,sxbc kjdjsxcbjdksbvc kjs,bhzscs,zhcnlksjhlnzcklsnzjcjsdzcjb ds
cxdbjvcsdbzcjks,gdcbkjds,zbcn jkcdxbv,m dfxvchj bdxnvbjhdujxdnkck jdfvknc dfkjhvxjdknfxzjxvkc.
</p>
</div>
{% for skill in skills_list%}
<div class="row skills-content">
<div class="col-lg-6" data-aos="fade-up">
<div class="progress">
<span class="skill">{{skill.skill_name}} <i class="val">{{skill.skill_value}}</i></span>
<div class="progress-bar-wrap">
<div class="progress-bar" role="progressbar" aria-valuenow={{skill.skill_value}} aria-valuemin="0" aria-valuemax="100"></div>
</div>
</div>
</div>
</div>
{% endfor %}
</div>
views.py:
#### TEST
class TestView(generic.ListView):
model = Skills
template_name = 'portfolio_app/test.html'
########################URL.py
from django.urls import path
from portfolio_app.models import *
from . import views
urlpatterns = [
path('',views.fact,name='index'),
#path('index/',views.SkillView.as_view,name='index'),
path('about/',views.about_me,name='about'),
path('service/',views.ServiceView.as_view(),name='service'),
path('resume/',views.ResumeView.as_view(),name='resume'),
path('contact/',views.ContactView.as_view(),name='contact'),
path('test/',views.TestView.as_view(),name='test'),
]
You can try to move <div class="row skills-content"> outside the for loop like this:
<div class="section-title">
<h2>Skills</h2>
<p>hsjkhvdkdjhvjkdfnv kjdf, dfhvkhdnfvkjldf,xhvnkldsv.mckldfnv ,dfhxncjcshfxdjvhcnjsdnckndjvbc
d,sxbc kjdjsxcbjdksbvc kjs,bhzscs,zhcnlksjhlnzcklsnzjcjsdzcjb ds
cxdbjvcsdbzcjks,gdcbkjds,zbcn jkcdxbv,m dfxvchj bdxnvbjhdujxdnkck jdfvknc
dfkjhvxjdknfxzjxvkc.
</p>
</div>
<div class="row skills-content">
{% for skill in skills_list%}
<div class="col-lg-6" data-aos="fade-up">
<div class="progress">
<span class="skill">{{skill.skill_name}} <i class="val">{{skill.skill_value}}</i></span>
<div class="progress-bar-wrap">
<div class="progress-bar" role="progressbar" aria-valuenow={{skill.skill_value}}
aria-valuemin="0" aria-valuemax="100"></div>
</div>
</div>
</div>
{% endfor %}
</div>
And you should remove redundant last </div> to make it work correctly.
Use slice like this:
<div class="row skills-content">
<div class="col-lg-6" data-aos="fade-up">
{% for skill in skills_list|slice:":5" %}
<div class="progress">
<span class="skill">{{skill.skill_name}} <i class="val">{{skill.skill_value}}</i></span>
<div class="progress-bar-wrap">
<div class="progress-bar" role="progressbar" aria-valuenow={{skill.skill_value}} aria-valuemin="0" aria-valuemax="100"></div>
</div>
</div>
{% endfor %}
</div>
<div class="col-lg-6" data-aos="fade-up">
{% for skill in skills_list|slice:"5:" %}
<div class="progress">
<span class="skill">{{skill.skill_name}} <i class="val">{{skill.skill_value}}</i></span>
<div class="progress-bar-wrap">
<div class="progress-bar" role="progressbar" aria-valuenow={{skill.skill_value}} aria-valuemin="0" aria-valuemax="100"></div>
</div>
</div>
{% endfor %}
</div>
</div>
To get rid of redundant div's create a separate template for skills and include that in you current template using include template tag like this:
skills.html:
<div class="progress">
<span class="skill">{{skill.skill_name}} <i class="val">{{skill.skill_value}}</i></span>
<div class="progress-bar-wrap">
<div class="progress-bar" role="progressbar" aria-valuenow={{skill.skill_value}} aria-valuemin="0" aria-valuemax="100"></div>
</div>
</div>
your current temple:
<div class="row skills-content">
<div class="col-lg-6" data-aos="fade-up">
{% for skill in skills_list|slice:":5" %}
{% include 'skills.html' with skill=skill %}
{% endfor %}
</div>
</div>
similarly for second loop.

Selenium: is there a way to check if text extends beyond the borders of an element?

I'm writing tests python+selenium, and here is the problem I've stumbled upon.
I need to check if the text in a box is displayed correctly and doesn't go beyond the margins.
What I means is:
Page source:
<div class="mt-5">
<div class="row flex-column flex-lg-row">
<div id="card-6e64c570-f558-47c8-d66a-08d9a43db45b" class="events-left-block w-100 col-lg-6 mb-4xl position-relative">
<a href="/Invite/Trainings/6e64c570-f558-47c8-d66a-08d9a43db45b?back=%2FCatalog" class="w-100">
<div class="card border-0 bg-white h-100 shadow rounded-lg p-0">
<div class="card-body p-4 h-100 d-flex flex-column">
<div class="card-title text-dark mb-4 row justify-content-between no-gutters flex-nowrap">
<h5 class="mb-0">Pneumonoultramicroscopicsilicovolcanoconiosis</h5>
<span class="text-white ml-4"></span>
</div>
<div class="card-text d-flex justify-content-between align-items-end flex-row flex-grow-1">
<ui class="list-group-flush">
<li class="d-flex align-items-center small list-group-item li-info text-dark">
<i class="fas fa-user-friends text-light-gray fa-small"></i>
1
</li>
<li class="d-flex align-items-center list-group-item small li-info text-dark">
<i class="ri-calendar-fill text-light-gray small"></i>
29.11.2021
</li>
</ui>
<img class="rounded-circle cover-image" src="/images/default-training.png" width="64" height="64" alt="img">
</div>
</div>
</div>
</a>
</div>
</div>
</div>
I was wondering if there is a way to check (by selenium, jquery, whatever) if the text fits the box?

Django template not showing

I don't know why but my code doesn't work, I don't know where the problem is. Please help, my product doesn't appear on the template although I have saved products in the database
{% for product in products %}
<div class="col-md-6 col-lg-3 ftco-animate">
<div class="product">
{% if product.ImgProduct %}
<a href="#" class="img-prod"><img class="img-fluid" src="{{ product.ImgProduct.url }}" alt="ColorlibTemplate">
<div class="overlay"></div>
</a>
{% endif %}
<div class="text py-3 pb-4 px-3 text-center">
<h3>{{ product.product_name }}</h3>
<div class="d-flex">
<div class="pricing">
<p class="price"><span>{{ product.product_price }}</span></p>
</div>
</div>
<div class="bottom-area d-flex px-3">
<div class="m-auto d-flex">
<a href="#" class="add-to-cart d-flex justify-content-center align-items-center text-center">
<span><i class="ion-ios-menu"></i></span>
</a>
<a href="#" class="buy-now d-flex justify-content-center align-items-center mx-1">
<span><i class="ion-ios-cart"></i></span>
</a>
<a href="#" class="heart d-flex justify-content-center align-items-center ">
<span><i class="ion-ios-heart"></i></span>
</a>
</div>
</div>
</div>
</div>
</div>
{% endfor %}

Unable to retrieve innerText/innerHTML in Python

Hovering over innerText shows the text data but not through Python
I am trying to retrieve innertext or innerHTML from the HTML from this website (see attached image). The HTML saved/printed from BeautifulSoup does not have the content seen in the attached image of the innerText.
import requests, re
from bs4 import BeautifulSoup
r=requests.get("https://jobs.ca.gov/CalHRPublic/Search/JobSearchResults.aspx#classid=441")
c=r.content
soup=BeautifulSoup(c,"html.parser")
print (soup.prettify())
When I inspect the page in Google Chrome , click on the div block and copy the HTML, the copied HTML from Chrome inspect has all the data I am looking for.
How do I get the same data in Python or do I have to use Selenium?
<div class="card-block" id="collapse1234" itemscope="" itemtype="http://schema.org/Organization" role="tablist" aria-multiselectable="true">
<div class="row" role="presentation">
<div class="col-md-10 " role="presentation">
<a id="cphMainContent_rptResults_hlViewJobPosting_0" class="lead visitedLink" href="/CalHrPublic/Jobs/JobPosting.aspx?JobControlId=70488">ACCOUNTING ADMINISTRATOR I (SPECIALIST)</a>
</div>
<div class="col-md-2 tar">
<div id="cphMainContent_rptResults_pnlFavoriteJob_0" class="aspNetDisabled" style="display: inline;">
<i id="cphMainContent_rptResults_iIsNotFavorite_0" class="fa fa-star-o" aria-hidden="true" style="cursor:default;color:grey;opacity:.6;" title="You must be logged in to save a job as a Favorite." onclick="">
Log in to save job
</i>
<i id="cphMainContent_rptResults_iIsFavorite_0" class="fa fa-star" title="This job is saved" style="color:#fdb81e;cursor:pointer;display:none;" aria-hidden="true" onclick="removeUserFavorite(70488, $(this) );"> Job saved</i>
</div>
</div>
</div>
<div class="row" role="presentation">
<div class="col-sm-12 col-md-9" role="presentation">
<div class="row">
<div class="col-xs-12 col-sm-6" role="presentation">
<div class="working-title details row">
<div class="col-xs-6 job-label">Working Title:</div>
<div class="col-xs-6 job-details">
<span title="Keyword Relevance: 0">N/A</span>
</div>
</div>
<div class="position-number details row">
<div class="col-xs-6 job-label">Job Control:</div>
<div class="col-xs-6 job-details">
70488
</div>
</div>
<div class="salary-range details row">
<div class="col-xs-6 job-label">Salary Range:</div>
<div class="col-xs-6 job-details">
$5053.00 - $6325.00
</div>
</div>
<div class="schedule details row">
<div class="col-xs-6 job-label">Work Type/Schedule:</div>
<div class="col-xs-6 job-details">
Permanent Fulltime
</div>
</div>
</div>
<div class="col-xs-12 col-sm-6" role="presentation">
<div class="department details row">
<div class="col-xs-6 job-label">Department:</div>
<div class="col-xs-6 job-details">
Board of Equalization
</div>
</div>
<div class="location details row">
<div class="col-xs-6 job-label">Location:</div>
<div class="col-xs-6 job-details">
Sacramento County
</div>
</div>
<div class="filing-date details row">
<div class="col-xs-6 job-label">Publish Date:</div>
<div class="col-xs-6 job-details">
<time datetime="2016-06-30">
6/29/2017</time>
</div>
</div>
</div>
</div>
</div>
<div class="col-sm-12 col-md-3 align-right" role="presentation">
<div class="filing-date details row">
<div class="col-xs-12">
<div class="job-label">Filing Deadline:</div>
<div class="job-details">
<time datetime="2016-06-30">
7/14/2017
</time>
</div>
</div>
<div class="col-xs-12">
<a id="cphMainContent_rptResults_hlViewPosting_0" class="btn btn-secondary btn-block" href="/CalHrPublic/Jobs/JobPosting.aspx?JobControlId=70488">
<span class="ca-gov-icon-search"></span>
<span>View Job Posting</span>
</a>
</div>
</div>
</div>
</div>
</div>

Categories

Resources