website quits before I can parse anything - python

Background:
Cannot parse all the li's within the <ul class="cmn-list"> using selenium.
Code:
url= "https://www.eslcafe.com/jobs/international?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60"
chrome_options = webdriver.ChromeOptions()
preferences = {"safebrowsing.enabled": "false"}
chrome_options.add_experimental_option("prefs", preferences)
# chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu')
browser = webdriver.Chrome('C:/chromedriver.exe', chrome_options=chrome_options)
print(url)
browser.get(url)
delay = 20 # seconds
try:
WebDriverWait(browser, delay)
except:
pass
html_list = browser.find_element_by_class_name("cmn-list")
items = html_list.find_elements_by_tag_name("li")
for item in items:
text = item.text
print(text)
Question:
How can I parse the rows lis in the link <ul class="cmn-list"> with selenium?

There are multiple ul tags with the same class_name. Using browser.find_element_by_class_name('cmn-list') will only select the first ul tag with this class_name, not the ul tag that you want. In order to get the ul tag that you want, I recommend you to use xpaths. Here is the full code to do it:
from selenium import webdriver
import time
def printDetails(items, sponsored):
if sponsored == True:
print('-'*120)
print("Sponsored")
else:
print('-' * 120)
print("Others")
for item in items:
link = item.find_element_by_xpath('.//a').get_attribute('href')
title = item.find_element_by_xpath('.//a').text
company = item.find_element_by_class_name('job-title').find_element_by_xpath('.//p').text
date_time = item.find_element_by_xpath('.//div[#class="job-post-time ng-binding"]').text.split("\n")
datee = date_time[0]
timee = date_time[1]
print('-' * 120)
print(f"Job Title = {title}")
print(f"Link = {link}")
print(f"Company = {company}")
print(f"Date = {datee}")
print(f"Time = {timee}")
url= "https://www.eslcafe.com/jobs/international?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60"
chrome_options = webdriver.ChromeOptions()
preferences = {"safebrowsing.enabled": "false"}
chrome_options.add_experimental_option("prefs", preferences)
# chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu')
browser = webdriver.Chrome('chromedriver.exe', options=chrome_options)
print(url)
browser.get(url)
delay = 20 # seconds
try:
WebDriverWait(browser, delay)
except:
pass
time.sleep(3)
sponsored = browser.find_element_by_xpath('//*[#id="mid-wrapper"]/div/section[2]/div/div[1]/div[3]/ul')
sponsored_items = sponsored.find_elements_by_class_name('ng-scope')
html_list = browser.find_element_by_xpath('//*[#id="mid-wrapper"]/div/section[2]/div/div[1]/div[4]/ul')
items = html_list.find_elements_by_class_name('ng-scope')
printDetails(sponsored_items, sponsored = True)
printDetails(items, sponsored = False)
browser.close()
Output:
https://www.eslcafe.com/jobs/international?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60
------------------------------------------------------------------------------------------------------------------------
Sponsored
------------------------------------------------------------------------------------------------------------------------
Job Title = Native-speaking English Teacher | Taiwan (NT$620 - NT$660 per hour)
Link = https://www.eslcafe.com/postajob-detail/native-speaking-english-teacher-nst?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60
Company = HESS International Educational Group
Date = Apr. 20, 2020
Time = 07:39 pm PST
------------------------------------------------------------------------------------------------------------------------
Others
------------------------------------------------------------------------------------------------------------------------
Job Title = University Teaching in Japan! – Tokyo, Kanagawa, Chiba, Saitama, and Aichi
Link = https://www.eslcafe.com/postajob-detail/university-teaching-in-japan---tokyo-kanagawa-37?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60
Company = Westgate Corporation
Date = Oct. 23, 2020
Time = 09:22 am PST
------------------------------------------------------------------------------------------------------------------------
Job Title = Elementary/Secondary School Teaching in Japan! - Tokyo, Kanagawa, and Aichi
Link = https://www.eslcafe.com/postajob-detail/elementarysecondary-school-teaching-in-japan-8?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60
Company = Westgate Corporation
Date = Oct. 23, 2020
Time = 09:22 am PST
------------------------------------------------------------------------------------------------------------------------
Job Title = Seeking online English Tutor - Up to $26USD/h - Work from home! Choose your own hours!
Link = https://www.eslcafe.com/postajob-detail/seeking-online-english-tutor---up-to-26usdh--?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60
Company = Magic Ears
Date = Oct. 23, 2020
Time = 09:20 am PST
------------------------------------------------------------------------------------------------------------------------
Job Title = English Language Lectutrer in Oman for SY 2020
Link = https://www.eslcafe.com/postajob-detail/english-language-lectutrer-in-oman-for-sy-202?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60
Company = TATI Oman
Date = Oct. 22, 2020
Time = 11:06 am PST
------------------------------------------------------------------------------------------------------------------------
Job Title = ⭐$2000/month, 3-5 hrs per day⭐, Teach English Online with GOGOKID!
Link = https://www.eslcafe.com/postajob-detail/2000month-3-5-hrs-per-day-teach-english-onlin?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60
Company = GOGOKID
Date = Oct. 22, 2020
Time = 11:06 am PST
------------------------------------------------------------------------------------------------------------------------
Job Title = [Bachelor's Required]Part-Time Online ESL Teacher - Work from home - Flexible Job!
Link = https://www.eslcafe.com/postajob-detail/bachelors-requiredpart-time-online-esl-teache?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60
Company = Magic Ears
Date = Oct. 22, 2020
Time = 11:05 am PST
------------------------------------------------------------------------------------------------------------------------
Job Title = ★★★TRAVEL ABROAD & TEACH IN THAILAND with BFITS THAILAND (Term 2 November 2020)★★★
Link = https://www.eslcafe.com/postajob-detail/travel-abroad-teach-in-thailand-with-bfits-th-22?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60
Company = BFITS Thailand
Date = Oct. 22, 2020
Time = 11:05 am PST
------------------------------------------------------------------------------------------------------------------------
Job Title = Full-time In-house Academic Editor Wanted in Taipei, Taiwan
Link = https://www.eslcafe.com/postajob-detail/full-time-in-house-academic-editor-wanted-in-6?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60
Company = Wallace Academic Editing
Date = Oct. 21, 2020
Time = 01:44 pm PST
------------------------------------------------------------------------------------------------------------------------
Job Title = Online English Tutor
Link = https://www.eslcafe.com/postajob-detail/online-english-tutor?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60
Company = Panda ABC
Date = Oct. 21, 2020
Time = 01:41 pm PST
------------------------------------------------------------------------------------------------------------------------
Job Title = Native Speaker Teacher - Changhua, Taiwan
Link = https://www.eslcafe.com/postajob-detail/native-speaker-teacher?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60
Company = Leader Language Schools
Date = Oct. 21, 2020
Time = 01:24 pm PST
------------------------------------------------------------------------------------------------------------------------
Job Title = Teachers Needed in Fiji - Pacific American School
Link = https://www.eslcafe.com/postajob-detail/fiji-pacific-american-school?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60
Company = Pacific American School
Date = Oct. 21, 2020
Time = 10:44 am PST
------------------------------------------------------------------------------------------------------------------------
Job Title = Online ESL tutor wanted! Teach Korean students online. (CNK English)
Link = https://www.eslcafe.com/postajob-detail/online-esl-tutor-wanted-teach-korean-students-1?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60
Company = CNK English
Date = Oct. 20, 2020
Time = 08:15 pm PST
------------------------------------------------------------------------------------------------------------------------
Job Title = Primary Section (Class Teacher for grades 2-3) - Dushanbe, Tajikistan
Link = https://www.eslcafe.com/postajob-detail/primary-section-class-teacher-for-grades-2-3?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60
Company = Dushanbe International School
Date = Oct. 20, 2020
Time = 08:42 am PST
------------------------------------------------------------------------------------------------------------------------
Job Title = Native English Teacher needed for private classes - Kuala Lumpur, Malaysia)
Link = https://www.eslcafe.com/postajob-detail/english-teacher-needed-for-private-classes-ku?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60
Company = BLC
Date = Oct. 20, 2020
Time = 08:40 am PST
------------------------------------------------------------------------------------------------------------------------
Job Title = English Language Center Lecturer - Taiwan
Link = https://www.eslcafe.com/postajob-detail/english-language-center-lecturer?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60
Company = Tunghai University
Date = Oct. 20, 2020
Time = 08:39 am PST
------------------------------------------------------------------------------------------------------------------------
Job Title = 【⭐GOGOKID offers candidate incentive again⭐】Teach English Online
Link = https://www.eslcafe.com/postajob-detail/extra-bonus-30-for-on-boardteach-english-onli?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60
Company = GOGOKID
Date = Oct. 19, 2020
Time = 08:46 pm PST
------------------------------------------------------------------------------------------------------------------------
Job Title = Looking for Online ESL Teacher!!!
Link = https://www.eslcafe.com/postajob-detail/looking-for-online-esl-teacher-1?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60
Company = First Future
Date = Oct. 19, 2020
Time = 09:23 am PST
------------------------------------------------------------------------------------------------------------------------
Job Title = Online English Teacher
Link = https://www.eslcafe.com/postajob-detail/online-english-teacher-7?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60
Company = Whales English
Date = Oct. 19, 2020
Time = 09:23 am PST
------------------------------------------------------------------------------------------------------------------------
Job Title = 🇪🇺 🇵🇱 Teach English in Poland with English Wizards! 🇵🇱 🇪🇺
Link = https://www.eslcafe.com/postajob-detail/teach-english-in-poland-with-english-wizards?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60
Company = English Wizards
Date = Oct. 19, 2020
Time = 09:22 am PST
------------------------------------------------------------------------------------------------------------------------
Job Title = US Certified Science Teacher - Tirane, Albania
Link = https://www.eslcafe.com/postajob-detail/us-certified-science-teacher?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60
Company = Albanian International School
Date = Oct. 19, 2020
Time = 09:22 am PST
------------------------------------------------------------------------------------------------------------------------
Job Title = Fantastic teaching jobs around Taiwan, hiring single and couples ASAP
Link = https://www.eslcafe.com/postajob-detail/fantastic-teaching-jobs-around-taiwan-hiring?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60
Company = ESLJOBTAIWAN
Date = Oct. 19, 2020
Time = 09:21 am PST
------------------------------------------------------------------------------------------------------------------------
Job Title = 🛫 🌞 🌄 Become a Mentor for Language Learners on Ski Camps - Free Hotel Stays in Europe 🛫 🌞 🌄
Link = https://www.eslcafe.com/postajob-detail/become-a-mentor-for-language-learners-on-ski?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60
Company = Angloville
Date = Oct. 19, 2020
Time = 09:19 am PST
------------------------------------------------------------------------------------------------------------------------
Job Title = 【⭐Extra Bonus-First come first served】Online English Tutor-Earn up to $25/hr
Link = https://www.eslcafe.com/postajob-detail/extra-bonus-first-come-first-servedonline-eng?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60
Company = GOGOKID
Date = Oct. 19, 2020
Time = 09:19 am PST
------------------------------------------------------------------------------------------------------------------------
Job Title = University Teaching in Japan! – Tokyo, Kanagawa, Chiba, Saitama, and Aichi
Link = https://www.eslcafe.com/postajob-detail/university-teaching-in-japan---tokyo-kanagawa-36?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60
Company = Westgate Corporation
Date = Oct. 19, 2020
Time = 09:18 am PST
------------------------------------------------------------------------------------------------------------------------
Job Title = Elementary/Secondary School Teaching in Japan! - Tokyo, Kanagawa, and Aichi
Link = https://www.eslcafe.com/postajob-detail/elementarysecondary-school-teaching-in-japan-7?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60
Company = Westgate Corporation
Date = Oct. 19, 2020
Time = 09:18 am PST
------------------------------------------------------------------------------------------------------------------------
Job Title = Online ESL Tutor - No minimum teaching requirements - $26/hr part-time job
Link = https://www.eslcafe.com/postajob-detail/online-esl-tutor---no-minimum-teaching-requir?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60
Company = Magic Ears
Date = Oct. 19, 2020
Time = 09:17 am PST
------------------------------------------------------------------------------------------------------------------------
Job Title = 💜💛💙 SUNNY SPAIN, MARVELOUS MADRID & an EXCITING, LOVELY LIFE with the Canterbury English TEFL & Madrid Lifestyle (for TEFL holders) Programs&Guaranteed Teaching Job for all students WITH US (that's the key), which starts during the Course! 💜💛💛
Link = https://www.eslcafe.com/postajob-detail/128156128155128153-sunny-spain-marvelous-madr-30?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60
Company = Canterbury English
Date = Oct. 18, 2020
Time = 09:36 am PST
------------------------------------------------------------------------------------------------------------------------
Job Title = Math Teacher - Hargeisa, Somaliland
Link = https://www.eslcafe.com/postajob-detail/math-teacher-2?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60
Company = Abaarso School of Science & Technology
Date = Oct. 17, 2020
Time = 10:56 am PST
------------------------------------------------------------------------------------------------------------------------
Job Title = Math/English Teacher - Hargeisa, Somaliland
Link = https://www.eslcafe.com/postajob-detail/mathenglish-teacher?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60
Company = Barwaaqo Univeristy
Date = Oct. 17, 2020
Time = 10:46 am PST
------------------------------------------------------------------------------------------------------------------------
Job Title = TAIWAN! Teach English at schools throughout the beautiful island of TAIWAN - $2,200 USD per month. Taipei, Tainan, Kaohsiung, Taichung, Keelung, PingDong. Summer 2020 graduates welcome.
Link = https://www.eslcafe.com/postajob-detail/taiwan-teach-english-at-schools-throughout-th-35?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60
Company = Arun Language Training & Recruitment Ltd
Date = Oct. 17, 2020
Time = 10:18 am PST
------------------------------------------------------------------------------------------------------------------------
Job Title = Online Technical Copywriter
Link = https://www.eslcafe.com/postajob-detail/technical-copywriter?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60
Company = Lingvoexpert
Date = Oct. 16, 2020
Time = 10:21 am PST
------------------------------------------------------------------------------------------------------------------------
Job Title = ★★★LIVE ABROAD & TEACH IN THAILAND with BFITS THAILAND (Term 2 November 2020)★★★
Link = https://www.eslcafe.com/postajob-detail/live-abroad-teach-in-thailand-with-bfits-thai-3?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60
Company = BFITS Thailand
Date = Oct. 16, 2020
Time = 10:18 am PST
------------------------------------------------------------------------------------------------------------------------
Job Title = US $2500-5000/M + PU Letter+ Teach in China + International & Public School + Training Center + IB + AP + A-level + Social Science, Math, Physics, Chemistry + All Regular Subjects
Link = https://www.eslcafe.com/postajob-detail/apiba-levelmathsciencechemistryphysicscompute?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60
Company = Can-Achieve Global Talent Inc.
Date = Jul. 21, 2020
Time = 07:13 pm PST
------------------------------------------------------------------------------------------------------------------------
Job Title = Head of Primary and Head of Secondary required ASAP - Iraq- Erbil
Link = https://www.eslcafe.com/postajob-detail/head-of-primary-and-head-of-secondary-require?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60
Company = British International School/Iraq-Kurdistan- Erbil
Date = Oct. 15, 2020
Time = 09:35 am PST
------------------------------------------------------------------------------------------------------------------------
Job Title = Full Time English Teacher in Ehime, JAPAN
Link = https://www.eslcafe.com/postajob-detail/full-time-english-teacher-in-ehime-japan?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60
Company = Amic International Inc.
Date = Oct. 15, 2020
Time = 09:34 am PST
------------------------------------------------------------------------------------------------------------------------
Job Title = 【⭐Dave's Recommendation】Online English Tutor-Earn up to $25/hr
Link = https://www.eslcafe.com/postajob-detail/daves-recommendationonline-english-tutor-earn?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60
Company = GOGOKID
Date = Oct. 15, 2020
Time = 09:32 am PST
------------------------------------------------------------------------------------------------------------------------
Job Title = Teaching English with Magic Ears! - Work from home - Uni students are also acceptable!
Link = https://www.eslcafe.com/postajob-detail/teaching-english-with-magic-ears---work-from?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60
Company = Magic Ears
Date = Oct. 15, 2020
Time = 09:30 am PST
------------------------------------------------------------------------------------------------------------------------
Job Title = Teach in Taiwan
Link = https://www.eslcafe.com/postajob-detail/teach-in-taiwan-1?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60
Company = Neurolink English Academy
Date = Oct. 14, 2020
Time = 09:29 am PST
------------------------------------------------------------------------------------------------------------------------
Job Title = Japan: Teaching English to children!
Link = https://www.eslcafe.com/postajob-detail/japan-teaching-english-to-children?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60
Company = Tamaki TEFL Recruitment (TTR)
Date = Oct. 14, 2020
Time = 09:26 am PST
------------------------------------------------------------------------------------------------------------------------
Job Title = ESL Instructors Needed for Community Education Courses Baghdad, Iraq
Link = https://www.eslcafe.com/postajob-detail/esl-instructors-needed-for-community-educatio?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60
Company = English Language Academy
Date = Oct. 14, 2020
Time = 09:25 am PST
------------------------------------------------------------------------------------------------------------------------
Job Title = Online English Teacher - Up to $26/hr - With no minimum teaching requirements!
Link = https://www.eslcafe.com/postajob-detail/online-english-teacher---up-to-26hr---with-no?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60
Company = Magic Ears
Date = Oct. 14, 2020
Time = 09:23 am PST
------------------------------------------------------------------------------------------------------------------------
Job Title = Worldwide ESL/EFL Projects for the U.S. Department of State in 2021/2022
Link = https://www.eslcafe.com/postajob-detail/worldwide-eslefl-projects-for-the-us-departme-13?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60
Company = U.S. Department of State English Language Programs
Date = Oct. 13, 2020
Time = 01:38 pm PST
------------------------------------------------------------------------------------------------------------------------
Job Title = Biggest ESL School in Vietnam - NOW Hiring Teachers
Link = https://www.eslcafe.com/postajob-detail/biggest-esl-school-in-vietnam---now-hiring-te-7?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60
Company = APAX English
Date = Oct. 13, 2020
Time = 10:13 am PST
------------------------------------------------------------------------------------------------------------------------
Job Title = EXPERIENCED EFL TEACHER NEEDED AT NORTHSTAR COLLEGE, Hargeisa, Somaliland
Link = https://www.eslcafe.com/postajob-detail/experienced-efl-teacher-needed-at-northstar-c?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60
Company = Northstar College
Date = Oct. 13, 2020
Time = 10:12 am PST
------------------------------------------------------------------------------------------------------------------------
Job Title = Full time English Teacher - Kanazawa, Japan
Link = https://www.eslcafe.com/postajob-detail/full-time-english-teacher-8?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60
Company = Bartolo English
Date = Oct. 13, 2020
Time = 09:09 am PST
------------------------------------------------------------------------------------------------------------------------
Job Title = A good choice for ESL teachers! Teaching English online for Chinese kids - Earn up to $26/hr
Link = https://www.eslcafe.com/postajob-detail/a-good-choice-for-esl-teachers-teaching-engli?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60
Company = Magic Ears
Date = Oct. 13, 2020
Time = 09:07 am PST
------------------------------------------------------------------------------------------------------------------------
Job Title = ⭐Attention⭐Online Teaching Position offers up to $25/hr
Link = https://www.eslcafe.com/postajob-detail/online-english-tutor-earn-up-to-25hr?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60
Company = GOGOKID
Date = Oct. 13, 2020
Time = 09:06 am PST
------------------------------------------------------------------------------------------------------------------------
Job Title = ⭐⭐⭐⭐ESL Teaching Positions Available in Taiwan NOW ⭐⭐⭐⭐
Link = https://www.eslcafe.com/postajob-detail/esl-teaching-positions-available-in-taiwan-no-6?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60
Company = Teach Taiwan
Date = Oct. 12, 2020
Time = 11:38 pm PST
------------------------------------------------------------------------------------------------------------------------
...
Job Title = Work in Japan
Link = https://www.eslcafe.com/postajob-detail/work-in-japan?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60
Company = Omni International
Date = Oct. 08, 2020
Time = 10:21 am PST

The answer above waits 3 seconds assuming your internet will take less than 3 seconds to load, but if you'd like to wait until there is a certain text in your page you can do:
url= "https://www.eslcafe.com/jobs/international?koreasearch=&koreapageno=&koreapagesize=&chinasearch=&chinapageno=&chinapagesize=&internationalsearch=&internationalpageno=1&internationalpagesize=60"
chrome_options = webdriver.ChromeOptions()
preferences = {"safebrowsing.enabled": "false"}
chrome_options.add_experimental_option("prefs", preferences)
# chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu')
browser = webdriver.Chrome('C:/chromedriver.exe', chrome_options=chrome_options)
print(url)
browser.get(url)
displayTimer = 0
wanted_Phrase = "li" # You should put some text that is only on the page when it has loaded.
while wanted_Phrase not in browser.page_source:
sleep(1)
displayTimer += 1
print("[{}] Seconds waited for page".format(displayTimer))
html_list = browser.find_element_by_class_name("cmn-list")
items = html_list.find_elements_by_tag_name("li")
for item in items:
text = item.text
print(text)

Related

I get the same output in for loop

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
import pandas as pd
s=Service("C:\selenium driver\chromedriver.exe")
driver = webdriver.Chrome(service=s)
companies_names = []
persons_names = []
phones_numbers = []
locations = []
opening_hours = []
descriptions = []
websites_links = []
all_profiles = []
driver.get("https://www.saveface.co.uk/search/")
driver.implicitly_wait(10)
blocks = driver.find_elements(By.XPATH, "//div[#class='result clientresult']")
for block in range(30):
company_name = blocks[block].find_element(By.XPATH, "//h3[#class='resulttitle']").text.strip()
companies_names.append(company_name)
person_name = blocks[block].find_element(By.XPATH, "//p[#class='name_wrapper']").text.strip()
persons_names.append(person_name)
phone_number = blocks[block].find_element(By.XPATH, "//div[#class='searchContact phone']").text.strip()
phones_numbers.append(phone_number)
location = blocks[block].find_element(By.XPATH, "//li[#class='cls_loc']").text.strip()
locations.append(location)
opening_hour = blocks[block].find_element(By.XPATH, "//li[#class='opening-hours']").text.strip()
opening_hours.append(opening_hour)
profile = blocks[block].find_element(By.XPATH, "//a[#class='visitpage']").get_attribute("href")
all_profiles.append(profile)
print(company_name, person_name, phone_number, location, opening_hour, profile)
if block == 29:
two_page = driver.find_element(By.XPATH, "//a[#class='facetwp-page']")
two_page.click()
driver.implicitly_wait(10)
blocks = driver.find_elements(By.XPATH, "//div[#class='result clientresult']")
for i in range(len(all_profiles)):
driver.get(all_profiles[i])
description = driver.find_element(By.XPATH, "//div[#class='desc-text-left']").text.strip()
descriptions.append(description)
website_link = driver.find_element(By.XPATH, "//a[#class='visitwebsite website']").get_attribute("href")
websites_links.append(website_link)
driver.implicitly_wait(10)
driver.close()
df = pd.DataFrame(
{
"company_name": companies_names,
"person_name": persons_names,
"phone_number": phones_numbers,
"location": locations,
"opening_hour": opening_hours,
"description": descriptions,
"website_link": websites_links,
"profile_on_saveface": all_profiles
}
)
df.to_csv('saveface.csv',index=False)
#print(df)
This is the result:
The Hartley Clinic Clinic Contact: Ailing Jeavons 01256 856289 , , Fleet, RG27 8NZ Monday 8:30 — 17:00 Tuesday 8:30 — 19:00 Wednesday 8:30— 17:00 Thursday 8:30 — 17:00 Friday 8:30 — 15:00 Saturday 9:00 — 17:00 Sunday Closed https://www.saveface.co.uk/clinic/the-hartley-clinic/
The Hartley Clinic Clinic Contact: Ailing Jeavons 01256 856289 , , Fleet, RG27 8NZ Monday 8:30 — 17:00 Tuesday 8:30 — 19:00 Wednesday 8:30— 17:00 Thursday 8:30 — 17:00 Friday 8:30 — 15:00 Saturday 9:00 — 17:00 Sunday Closed https://www.saveface.co.uk/clinic/the-hartley-clinic/
The Hartley Clinic Clinic Contact: Ailing Jeavons 01256 856289 , , Fleet, RG27 8NZ Monday 8:30 — 17:00 Tuesday 8:30 — 19:00 Wednesday 8:30— 17:00 Thursday 8:30 — 17:00 Friday 8:30 — 15:00 Saturday 9:00 — 17:00 Sunday Closed https://www.saveface.co.uk/clinic/the-hartley-clinic/
The Hartley Clinic Clinic Contact: Ailing Jeavons 01256 856289 , , Fleet, RG27 8NZ Monday 8:30 — 17:00 Tuesday 8:30 — 19:00 Wednesday 8:30— 17:00 Thursday 8:30 — 17:00 Friday 8:30 — 15:00 Saturday 9:00 — 17:00 Sunday Closed https://www.saveface.co.uk/clinic/the-hartley-clinic/
The Hartley Clinic Clinic Contact: Ailing Jeavons 01256 856289 , , Fleet, RG27 8NZ Monday 8:30 — 17:00 Tuesday 8:30 — 19:00 Wednesday 8:30— 17:00 Thursday 8:30 — 17:00 Friday 8:30 — 15:00 Saturday 9:00 — 17:00 Sunday Closed https://www.saveface.co.uk/clinic/the-hartley-clinic/
The Hartley Clinic Clinic Contact: Ailing Jeavons 01256 856289 , , Fleet, RG27 8NZ Monday 8:30 — 17:00 Tuesday 8:30 — 19:00 Wednesday 8:30— 17:00 Thursday 8:30 — 17:00 Friday 8:30 — 15:00 Saturday 9:00 — 17:00 Sunday Closed https://www.saveface.co.uk/clinic/the-hartley-clinic/
To restric the search within a subtree rooted at the context node, your expression should start with .// so you have to replace // with .// in each of the commands
... = blocks[block].find_element(...)
The meaning of // is to search the document from the document's root, ignoring the context node blocks[block] altogether.
Moreover, notice that not all the blocks have a location as you can see from this image
in this case
location = blocks[block].find_element(By.XPATH, "//li[#class='cls_loc']")
will raise a NoSuchElementException. To avoid this you have to put the command in a try...except... block
UPDATE
Scraping 400 blocks with selenium takes about 1 minute on my computer, I tried with BeautifulSoup and it just takes less than 1 second! The slow part is to scrape the profiles, because for each of them we have to download a new webpage, however is still way faster with BeautifulSoup.
So I write a script without using selenium, just BeautifulSoup (you can install by running pip install beautifulsoup4 in the terminal)
import requests
from bs4 import BeautifulSoup
url = 'https://www.saveface.co.uk/search/'
soup = BeautifulSoup(requests.get(url).text, "html.parser")
css_selector = {
'company name' : ".title",
'person name' : ".name_wrapper",
'phone number' : ".phone",
'location' : ".cls_loc",
'opening hours': ".opening-hours",
'profile link' : ".visitpage",
}
data = {key:[] for key in list(css_selector)+['description','website link']}
number_of_pages = int(str(soup).split('total_pages":')[1].split('}')[0])
for page in range(2,number_of_pages+2):
blocks = soup.select('.clientresult')
for idx,block in enumerate(blocks):
print(f'blocks {idx+1}/{len(blocks)}',end='\r')
for key in list(css_selector):
try:
if 'link' in key:
data[key] += [ block.select_one(css_selector[key])['href'] ]
else:
data[key] += [ block.select_one(css_selector[key]).text.strip().replace('\r\n',', ') ]
except AttributeError:
data[key] += ['*missing value*']
if page <= number_of_pages:
print('\nloading page', page)
url_page = f'{url}?fwp_paged={page}'
soup = BeautifulSoup(requests.get(url_page).text, "html.parser")
print('\nno more pages to load, moving to scrape profile links...')
for idx,url in enumerate(data['profile link']):
print(f"profile link {idx+1}/{len(data['profile link'])} ",end='\r')
soup_profile = BeautifulSoup(requests.get(url).text, "html.parser")
try:
data['description'] += [soup_profile.select_one('.clinicContent > .description').text.strip()]
except AttributeError:
data['description'] += ['*missing value*']
try:
data['website link'] += [soup_profile.select_one('.visitwebsite')['href']]
except AttributeError:
data['website link'] += ['*missing value*']
Output (it took about 8 minutes to complete the execution)
blocks 400/400
loading page 2
blocks 109/109
no more pages to load, moving to scrape profile links...
profile link 509/509
Then you can easily create the dataframe by running pd.DataFrame(data)
this is the new code
but it returns the same output on every page why:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
import pandas as pd
s=Service("C:\selenium driver\chromedriver.exe")
driver = webdriver.Chrome(service=s)
companies_names = []
persons_names = []
phones_numbers = []
locations = []
opening_hours = []
descriptions = []
websites_links = []
all_profiles = []
driver.get("https://www.saveface.co.uk/search/")
driver.implicitly_wait(10)
pages = driver.find_elements(By.XPATH, ".//a[#class='facetwp-page']")
for page in range(len(pages)+1):
blocks = driver.find_elements(By.XPATH, ".//div[#class='result clientresult']")
for block in range(10):
try:
company_name = blocks[block].find_element(By.XPATH, ".//h3[#class='resulttitle']").text.strip()
companies_names.append(company_name)
except:
companies_names.append("Not found on the site")
try:
person_name = blocks[block].find_element(By.XPATH, ".//p[#class='name_wrapper']").text.strip()
persons_names.append(person_name)
except:
persons_names.append("Not found on the site")
try:
phone_number = blocks[block].find_element(By.XPATH, ".//div[#class='searchContact phone']").text.strip()
phones_numbers.append(phone_number)
except:
phones_numbers.append("Not found on the site")
try:
location = blocks[block].find_element(By.XPATH, ".//li[#class='cls_loc']").text.strip()
locations.append(location)
except:
locations.append("Not found on the site")
try:
opening_hour = blocks[block].find_element(By.XPATH, ".//li[#class='opening-hours']").text.strip()
opening_hours.append(opening_hour)
except:
opening_hours.append("Not found on the site")
try:
profile = blocks[block].find_element(By.XPATH, ".//a[#class='visitpage']").get_attribute("href")
all_profiles.append(profile)
except:
all_profiles.append("Not found on the site")
two_page = driver.find_element(By.XPATH, ".//a[#class='facetwp-page']")
two_page.click()
for i in range(len(all_profiles)):
try:
driver.get(all_profiles[i])
driver.implicitly_wait(10)
try:
description = driver.find_element(By.XPATH, ".//div[#class='desc-text-left']").text.strip()
descriptions.append(description)
except:
descriptions.append("Not found on the site")
try:
website_link = driver.find_element(By.XPATH, ".//a[#class='visitwebsite website']").get_attribute("href")
websites_links.append(website_link)
except:
websites_links.append("Not found on the site")
except:
descriptions.append("Not found on the site")
websites_links.append("Not found on the site")
driver.implicitly_wait(10)
driver.close()
df = pd.DataFrame(
{
"company_name": companies_names,
"person_name": persons_names,
"phone_number": phones_numbers,
"location": locations,
"opening_hour": opening_hours,
"description": descriptions,
"website_link": websites_links,
"profile_on_saveface": all_profiles
}
)
df.to_csv('saveface.csv',index=False)
print(df)

How to transfer bs4.element.ResultSet to date/string?

I want to extract date and summary of an article in a website, here is my code
from bs4 import BeautifulSoup
from selenium import webdriver
full_url = 'https://www.wsj.com/articles/readers-favorite-summer-recipes-11599238648?mod=searchresults&page=1&pos=20'
url0 = full_url
browser0 = webdriver.Chrome('C:/Users/liuzh/Downloads/chromedriver_win32/chromedriver')
browser0.get(url0)
html0 = browser0.page_source
page_soup = BeautifulSoup(html0, 'html5lib')
date = page_soup.find_all("time", class_="timestamp article__timestamp flexbox__flex--1")
sub_head = page_soup.find_all("h2", class_="sub-head")
print(date)
print(sub_head)
I got the following result, how can I obtain the standard form ?(e.g. Sept. 4, 2020 12:57 pm ET; This Labor Day weekend, we’re...)
[<time class="timestamp article__timestamp flexbox__flex--1">
Sept. 4, 2020 12:57 pm ET
</time>]
[<h2 class="sub-head" itemprop="description">This Labor Day weekend, we’re savoring the last of summer with a collection of seasonal recipes shared by Wall Street Journal readers. Each one comes with a story about what this food means to a family and why they return to it each year.</h2>]
Thanks.
Try something like:
for d in date:
print(d.text.strip())
Given your sample html, output should be:
Sept. 4, 2020 12:57 pm ET

How to scrape 2nd <div> tag of same class without unique distinctive mark

I am trying to read the content of the second div class from the code:
div class="eds-event-card-content__sub eds-text-bm eds-text-color--ui-600 eds-l-mar-top-1 eds-event-card-content__sub--cropped">Starts at RM15.75
using python 3
<div class="eds-event-card-content__sub-content">
<div class="eds-event-card-content__sub eds-text-bm eds-text-color--ui-600 eds-l-mar-top-1
eds-event-card-content__sub--cropped">
<div class="card-text--truncated__one">Found8 KL Sentral • Kuala Lumpur, Kuala
Lumpur</div>
</div>
<div class="eds-event-card-content__sub eds-text-bm eds-text-color--ui-600 eds-l-mar-top-1
eds-event-card-content__sub--cropped">Starts at RM15.75</div></div>
My python code:
url = 'https://www.eventbrite.com/d/malaysia--kuala-lumpur--85675181/all-events/?page=2'
response = get(url)
html_soup = BeautifulSoup(response.text, 'html.parser')
# Select all the 20 event containers from a single page
event_containers = html_soup.find_all('div', class_='search-event-card-square-image')
# Getting price of ticket
price = container.find_all('div', class_= "eds-event-card-content__sub eds-text-bm eds-text-color--ui-600 eds-l-mar-top-1 eds-event-card-content__sub--cropped").text
print("price: ", price[1])
However my code does not works
it gives me the output:
IndexError: list index out of range
but I wanted
Starts at RM15.75
Can anyone help me with this? Thank you
I can't see any price thing in the html Source code. I guess they are generated using js script.
So for this case you need to use Selenium.
Code:
# import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import time
from webdriver_manager.chrome import ChromeDriverManager
chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(ChromeDriverManager().install(), chrome_options=chrome_options)
driver.set_window_size(1024, 600)
driver.maximize_window()
url = 'https://www.eventbrite.com/d/malaysia--kuala-lumpur--85675181/all-events/?page=2'
# response = requests.get(url)
driver.get(url)
time.sleep(4)
html_soupdf = BeautifulSoup(driver.page_source, 'html.parser')
# Select all the 20 event containers from a single page
event_containers = html_soup.find('ul', class_='search-main-content__events-list')
for event in event_containers.find_all('li'):
event_time = event.find('div', class_= "eds-text-color--primary-brand eds-l-pad-bot-1 eds-text-weight--heavy eds-text-bs").text
event_name = event.find('div', class_= "eds-event-card__formatted-name--is-clamped eds-event-card__formatted-name--is-clamped-three eds-text-weight--heavy").text
event_price_place = event.find('div', class_ = "eds-event-card-content__sub-content")
event_pp = event_price_place.find_all('div')
event_place = event_pp[0].text
try:
event_price = event_pp[2].text
except:
event_price = None
print(f"{event_name}\n{event_time}\n{event_place}\n{event_price}\n\n")
Result:
KL International Flea Market 2020 / Bazaar Antarabangsa Kuala Lumpur
Mon, Oct 5, 10:00 AM
VIVA Shopping Mall • Kuala Lumpur, Wilayah Persekutuan Kuala Lumpur
Free
FGTSD Physical Church Service
Sun, Jul 19, 9:30 AM + 105 more events
Full Gospel Tabernacle Sri Damansara • Kuala Lumpur
Free
EFE 2020 - 16th Export Furniture Exhibition Malaysia
Thu, Aug 27, 9:00 AM
Kuala Lumpur Convention Centre • Kuala Lumpur, Kuala Lumpur
Free
International Beauty Expo (IBE) 2020
Sat, Sep 12, 11:00 AM
Malaysia International Trade and Exhibition Centre • Kuala Lumpur, Wilayah Persekutuan Kuala Lumpur
Free
Learn How To Earn USD3500 In 4 Week Using Your SmartPhone
Today at 8:00 PM + 2 more events
KL Online Event • Kuala Lumpur, Bangkok
None
Turn Customers into Raving Fans of Your Brand via Equity Crowdfunding
Thu, Aug 27, 4:00 PM
Found8 KL Sentral • Kuala Lumpur, Kuala Lumpur
Starts at RM15.75
.
.
.
.
.
Edit:
I have added option for making it headerless.
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(ChromeDriverManager().install(), chrome_options=chrome_options)

In python how to fix the code when it is executing fine (exit code 0) but with no results (nothing printing)?

I am trying to scrape the webpage of the new york times. My code is running fine as it is showing exit code 0 but giving no results.
import time
import requests
from bs4 import BeautifulSoup
url = 'https://www.nytimes.com/search?endDate=20190331&query=cybersecurity&sort=newest&startDate=20180401={}'
pages = [0]
for page in pages:
res = requests.get(url.format(page))
soup = BeautifulSoup(res.text,"lxml")
for item in soup.select("#search-results li > a"):
resp = requests.get(item.get("href"))
sauce = BeautifulSoup(resp.text, "lxml")
date = sauce.select(".css-1vkm6nb ehdk2mb0 h1")
date = date.text
print(date)
time.sleep(3)
with this code, I am hoping to get the publish date from each article.
Nice attempt--you're pretty close. The problem is the selectors:
#search-results asks for an id that doesn't exist. The element is a <ol data-testid="search-results">, so we'll need other means to grab this anchor tag.
.css-1vkm6nb ehdk2mb0 h1 doesn't make much sense. It asks for an element h1 that is inside of a ehdk2mb0 element which is inside of an element with the class .css-1vkm6nb. What's actually on the page is an <h1 class="css-1vkm6nb ehdk2mb0"> element. Select this with h1.css-1vkm6nb.ehdk2mb0.
Having said that, this is not the time data you're after--it's the title. We can get the time element (<time>) with a simple sauce.find("time").
Full example:
import requests
from bs4 import BeautifulSoup
base = "https://www.nytimes.com"
url = "https://www.nytimes.com/search?endDate=20190331&query=cybersecurity&sort=newest&startDate=20180401={}"
pages = [0]
for page in pages:
res = requests.get(url.format(page))
soup = BeautifulSoup(res.text,"lxml")
for link in soup.select(".css-138we14 a"):
resp = requests.get(base + link.get("href"))
sauce = BeautifulSoup(resp.text, "lxml")
title = sauce.select_one("h1.css-1j5ig2m.e1h9rw200")
time = sauce.find("time")
print(time.text, title.text.encode("utf-8"))
Output:
March 30, 2019 b'Bezos\xe2\x80\x99 Security Consultant Accuses Saudis of Hacking the Amazon C.E.O.\xe2\x80\x99s Phone'
March 29, 2019 b'In Ukraine, Russia Tests a New Facebook Tactic in Election Tampering'
March 28, 2019 b'Huawei Shrugs Off U.S. Clampdown With a $100 Billion Year'
March 28, 2019 b'N.S.A. Contractor Arrested in Biggest Breach of U.S. Secrets Pleads Guilty'
March 28, 2019 b'Grindr Is Owned by a Chinese Firm, and the U.S. Is Trying to Force It to Sell'
March 28, 2019 b'DealBook Briefing: Saudi Arabia Wanted Cash. Aramco Just Obliged.'
March 28, 2019 b'Huawei Security \xe2\x80\x98Defects\xe2\x80\x99 Are Found by British Authorities'
March 25, 2019 b'As Special Counsel, Mueller Kept Such a Low Profile He Seemed Almost Invisible'
March 21, 2019 b'Quotation of the Day: In New Age of Digital Warfare, Spies for Any Nation\xe2\x80\x99s Budget'
March 21, 2019 b'Coast Guard\xe2\x80\x99s Top Officer Pledges \xe2\x80\x98Dedicated Campaign\xe2\x80\x99 to Improve Diversity'

Page number request doesn't change for site when scraping with Python and Beautiful Soup

I am trying to extract some information from a specific page using Python and Beautiful Soup. I figured out that request.get(url) doesn't change page even when I request different pages.
In below line I define which page I am scraping, all returns pagenumber=1, even when I tried with pagenumber=2 it starts from the first page and scrapes just the fist page.
activepage = soup.find('ul', id= 'pagination').li.string
print "Page Number: " + activepage
I tested my code on other pages and it works fine, but on this specific page I can't loop through different pages. Can anyone tell me what is the exact problem with this page and what is the solution?
import requests
import sys
from bs4 import BeautifulSoup
def trade_spider(max_pages):
page_number = 1
while page_number <= max_pages:
url = "http://munich.eventful.com/events/categories/festivals_parades#!page_number=" + str(page_number) + "&category=festivals_parades"
source_code = requests.get(url)
# just get the code, no headers or anything
plain_text = source_code.text
# BeautifulSoup objects can be sorted through easy
soup = BeautifulSoup(plain_text)
category = soup.find('li', id = 'breadcrumb-label').string
activepage = soup.find('ul', id= 'pagination').li.string
print "Page Number: " + activepage
for mylist in soup.findAll('li', {'class': 'clearfix'}):
link = mylist.find('a', {'data-ga-label': 'Event Title'})
if (link is not None):
href = link.get('href')
title = link.string # just the text, not the HTML
location = mylist.find("div", {"class": "event-meta"}).strong.string
date = mylist.find("div", {"class": "event-meta"}).span.string
print(title, category, href, date, location)
page_number += 1
trade_spider(8)
This should get what you want:
def trade_spider(max_pages):
for page_number in range(1, max_pages + 1):
url = "http://munich.eventful.com/events/categories/festivals_parades?page_number={}".format(page_number)
print(url)
# just get the code, no headers or anything
plain_text = requests.get(url).content
# BeautifulSoup objects can be sorted through easy
soup = BeautifulSoup(plain_text)
category = soup.find('li', id='breadcrumb-label').string
for mylist in soup.findAll('li', {'class': 'clearfix'}):
link = mylist.find('a', {'data-ga-label': 'Event Title'})
if link is not None:
href = link.get('href')
title = link.string # just the text, not the HTML
location = mylist.find("div", {"class": "event-meta"}).strong.string
date = mylist.find("div", {"class": "event-meta"}).span.string
print(title, category, href, date, location)
I just needed to change the base url:
http://munich.eventful.com/events/categories/festivals_parades?page_number={}
It is also simpler just to loop in range(1, max_pages + 1)
Output:
trade_spider(3)
http://munich.eventful.com/events/categories/festivals_parades?page_number=1
Rockavaria 2015 - Freitag Festivals http://munich.eventful.com/events/rockavaria-2015-freitag-/E0-001-077521284-1 Kleine Olympiahalle May 29
Rockavaria 2015 - Reiseangebote Festivals http://munich.eventful.com/events/rockavaria-2015-reiseangebote-/E0-001-077133997-3 Kleine Olympiahalle May 29 - May 31
Rockavaria 2015 - Samstag Festivals http://munich.eventful.com/events/rockavaria-2015-samstag-/E0-001-082047581-8 Kleine Olympiahalle May 30
Ermittelt wird im Cafe`Mellow München... Festivals http://munich.eventful.com/events/ermittelt-wird-im-cafemellow-mnchen-tatort-im-ma-/E0-001-083732894-3 Cafe Bar Mellow München May 24
Ermittelt wird im Cafe`Mellow München... Festivals http://munich.eventful.com/events/ermittelt-wird-im-cafemellow-mnchen-tatort-im-ma-/E0-001-083934156-8 Cafe Bar Mellow München May 31
Ermittelt wird im Cafe`Mellow München... Festivals http://munich.eventful.com/events/ermittelt-wird-im-cafemellow-mnchen-tatort-im-ma-/E0-001-084050435-8 Cafe Bar Mellow München May 25
ORGANIC DANCE MUSIC FESTIVAL Festivals http://munich.eventful.com/events/organic-dance-music-festival-/E0-001-076992422-8 Zenith Jun 13
Organic Dance Music Festival with Tal... Festivals http://munich.eventful.com/events/organic-dance-music-festival-tale-us-dixon-david-/E0-001-082231599-0 Zenith Jun 13
13. aDevantgarde-Festival »humus« Wur... Festivals http://munich.eventful.com/events/13-adevantgardefestival-humus-wurzel3-doppelkonz-/E0-001-082515660-8 Gasteig Jun 14
http://munich.eventful.com/events/categories/festivals_parades?page_number=2
Festive Concerts: Palace Schleißheim Festivals http://munich.eventful.com/events/festive-concerts-palace-schleiheim-/E0-001-082934543-1 Schloss Schleissheim Jul 5
Festive Concerts: Palace Schleißheim Festivals http://munich.eventful.com/events/festive-concerts-palace-schleiheim-/E0-001-082934542-2 Schloss Schleissheim Jun 19
Festive Concerts: Palace Schleißheim Festivals http://munich.eventful.com/events/festive-concerts-palace-schleiheim-/E0-001-082957294-7 Schloss Schleissheim Aug 2
Festive Concerts: Palace Schleißheim Festivals http://munich.eventful.com/events/festive-concerts-palace-schleiheim-/E0-001-082957292-9 Schloss Schleissheim Jun 14
Tollwood Festival: Patti Smith Festivals http://munich.eventful.com/events/tollwood-festival-patti-smith-/E0-001-080907608-3 Tollwood Jul 13
VLAD IN TEARS - FREE AND EASY FESTIVAL Festivals http://munich.eventful.com/events/vlad-tears-free-and-easy-festival-/E0-001-083500016-8 Backstage Jul 25
COMBICHRIST - FREE AND EASY FESTIVAL Festivals http://munich.eventful.com/events/combichrist-free-and-easy-festival-/E0-001-083499106-7 Backstage Jul 25
13. aDevantgarde-Festival »humus« Spi... Festivals http://munich.eventful.com/events/13-adevantgardefestival-humus-spinnen-/E0-001-082515659-2 Gasteig Jun 15
Barber & Cook - EOS Festival Festivals http://munich.eventful.com/events/barber-cook-eos-festival-/E0-001-081892354-3 Taufkirchen (Vils) Jun 4
http://munich.eventful.com/events/categories/festivals_parades?page_number=3
Yaz Aski Indoor Festival - Mabel Mati... Festivals http://eventful.com/events/yaz-aski-indoor-festival-mabel-matiz-ilyas-yalcintas-karga-/E0-001-083819488-4 Muffathalle Jun 6
Keep It Low Festival 2015 MÜNCHEN Festivals http://munich.eventful.com/events/keep-low-festival-2015-mnchen-/E0-001-083819658-1 Feierwerk Oct 16
10. Oberhachinger Classic Jazz Festiv... Festivals http://munich.eventful.com/events/10-oberhachinger-classic-jazz-festival-3-tag-/E0-001-080921159-8 Bürgersaal Beim Forstner Jun 13
10. Oberhachinger Classic Jazz Festiv... Festivals http://munich.eventful.com/events/10-oberhachinger-classic-jazz-festival-2-tag-/E0-001-080921158-9 Bürgersaal Beim Forstner Jun 12
Kool & Kabul - EOS Festival Festivals http://munich.eventful.com/events/kool-kabul-eos-festival-/E0-001-083947944-5 Crazy Town Jun 4
10. Oberhachinger Classic Jazz Festiv... Festivals http://munich.eventful.com/events/10-oberhachinger-classic-jazz-festival-1-tag-e-/E0-001-080921157-0 Bürgersaal Beim Forstner Jun 11
Mellow Monks - Psychedelic Happiness ... Festivals http://munich.eventful.com/events/mellow-monks-psychedelic-happiness-festival-201-/E0-001-078583630-3 Munich, Bayern, Germany Jun 12
Organic Dance Music Festival MÜNCHEN ... Festivals http://munich.eventful.com/events/organic-dance-music-festival-mnchen-freimann-/E0-001-081520443-8 Zenith & Kesselhaus + dazwischenliegende Freifl... Jun 13
aDevantgarde Festival: Young Lions Re... Festivals http://munich.eventful.com/events/adevantgarde-festival-young-lions-reloaded-/E0-001-083294075-9 Club Milla Jun 13

Categories

Resources